Testing on Bits, Trades & Systems

Testing on Bits, Trades & Systems https://blog.turboawesome.win/tags/testing/ Recent content in Testing on Bits, Trades & Systems Hugo en-us Wed, 07 May 2025 11:00:00 +0000 Evaluating LLM-Integrated Systems: What Works and What Doesn't https://blog.turboawesome.win/2025/05/evaluating-llm-integrated-systems-what-works-and-what-doesnt/ Wed, 07 May 2025 11:00:00 +0000 https://blog.turboawesome.win/2025/05/evaluating-llm-integrated-systems-what-works-and-what-doesnt/ LLM outputs are probabilistic and context-dependent. The testing and evaluation approaches from deterministic software don't transfer directly. What does work: eval datasets, LLM-as-judge, regression suites, and the practices that separate teams with confidence from teams flying blind. Evaluating LLM Applications: Why 'It Looks Good' Is Not Enough https://blog.turboawesome.win/2024/05/evaluating-llm-applications-why-it-looks-good-is-not-enough/ Tue, 14 May 2024 14:22:00 +0000 https://blog.turboawesome.win/2024/05/evaluating-llm-applications-why-it-looks-good-is-not-enough/ LLM applications fail in ways that traditional software testing doesn't catch. Building evaluation frameworks that give you real signal about quality — before and after deployment — is the engineering challenge that separates serious AI products from demos. Go's Race Detector in CI: Catching Data Races Before They Catch You https://blog.turboawesome.win/2023/10/gos-race-detector-in-ci-catching-data-races-before-they-catch-you/ Wed, 04 Oct 2023 09:35:00 +0000 https://blog.turboawesome.win/2023/10/gos-race-detector-in-ci-catching-data-races-before-they-catch-you/ Data races are among the hardest bugs to find and reproduce. Go's built-in race detector finds them automatically — if you run it. Here's how to integrate it into CI effectively and what to do when it fires. Go Benchmarks: Writing Ones That Actually Tell You Something https://blog.turboawesome.win/2020/03/go-benchmarks-writing-ones-that-actually-tell-you-something/ Tue, 17 Mar 2020 09:29:00 +0000 https://blog.turboawesome.win/2020/03/go-benchmarks-writing-ones-that-actually-tell-you-something/ Go's built-in benchmark framework is excellent, but it's easy to write benchmarks that measure the wrong thing — compiler optimisations, cache warming artifacts, or benchmark overhead rather than the code under test. Spec-Driven Development in Clojure: Validating Financial Data at the Edge https://blog.turboawesome.win/2017/02/spec-driven-development-in-clojure-validating-financial-data-at-the-edge/ Wed, 22 Feb 2017 10:41:00 +0000 https://blog.turboawesome.win/2017/02/spec-driven-development-in-clojure-validating-financial-data-at-the-edge/ clojure.spec was released in 2016 and changed how we thought about data validation in Clojure. For financial data where a bad field can cause a trade to execute incorrectly, spec's generative testing caught bugs that normal unit tests never would.