<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Testing on Bits, Trades &amp; Systems</title>
    <link>https://blog.turboawesome.win/tags/testing/</link>
    <description>Recent content in Testing on Bits, Trades &amp; Systems</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 07 May 2025 11:00:00 +0000</lastBuildDate>
    <atom:link href="https://blog.turboawesome.win/tags/testing/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Evaluating LLM-Integrated Systems: What Works and What Doesn&#39;t</title>
      <link>https://blog.turboawesome.win/2025/05/evaluating-llm-integrated-systems-what-works-and-what-doesnt/</link>
      <pubDate>Wed, 07 May 2025 11:00:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2025/05/evaluating-llm-integrated-systems-what-works-and-what-doesnt/</guid>
      <description>LLM outputs are probabilistic and context-dependent. The testing and evaluation approaches from deterministic software don&amp;#39;t transfer directly. What does work: eval datasets, LLM-as-judge, regression suites, and the practices that separate teams with confidence from teams flying blind.</description>
    </item>
    <item>
      <title>Evaluating LLM Applications: Why &#39;It Looks Good&#39; Is Not Enough</title>
      <link>https://blog.turboawesome.win/2024/05/evaluating-llm-applications-why-it-looks-good-is-not-enough/</link>
      <pubDate>Tue, 14 May 2024 14:22:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2024/05/evaluating-llm-applications-why-it-looks-good-is-not-enough/</guid>
      <description>LLM applications fail in ways that traditional software testing doesn&amp;#39;t catch. Building evaluation frameworks that give you real signal about quality — before and after deployment — is the engineering challenge that separates serious AI products from demos.</description>
    </item>
    <item>
      <title>Go&#39;s Race Detector in CI: Catching Data Races Before They Catch You</title>
      <link>https://blog.turboawesome.win/2023/10/gos-race-detector-in-ci-catching-data-races-before-they-catch-you/</link>
      <pubDate>Wed, 04 Oct 2023 09:35:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2023/10/gos-race-detector-in-ci-catching-data-races-before-they-catch-you/</guid>
      <description>Data races are among the hardest bugs to find and reproduce. Go&amp;#39;s built-in race detector finds them automatically — if you run it. Here&amp;#39;s how to integrate it into CI effectively and what to do when it fires.</description>
    </item>
    <item>
      <title>Go Benchmarks: Writing Ones That Actually Tell You Something</title>
      <link>https://blog.turboawesome.win/2020/03/go-benchmarks-writing-ones-that-actually-tell-you-something/</link>
      <pubDate>Tue, 17 Mar 2020 09:29:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2020/03/go-benchmarks-writing-ones-that-actually-tell-you-something/</guid>
      <description>Go&amp;#39;s built-in benchmark framework is excellent, but it&amp;#39;s easy to write benchmarks that measure the wrong thing — compiler optimisations, cache warming artifacts, or benchmark overhead rather than the code under test.</description>
    </item>
    <item>
      <title>Spec-Driven Development in Clojure: Validating Financial Data at the Edge</title>
      <link>https://blog.turboawesome.win/2017/02/spec-driven-development-in-clojure-validating-financial-data-at-the-edge/</link>
      <pubDate>Wed, 22 Feb 2017 10:41:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/02/spec-driven-development-in-clojure-validating-financial-data-at-the-edge/</guid>
      <description>clojure.spec was released in 2016 and changed how we thought about data validation in Clojure. For financial data where a bad field can cause a trade to execute incorrectly, spec&amp;#39;s generative testing caught bugs that normal unit tests never would.</description>
    </item>
  </channel>
</rss>
