<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Data-at-Scale on Bits, Trades &amp; Systems</title>
    <link>https://blog.turboawesome.win/series/data-at-scale/</link>
    <description>Recent content in Data-at-Scale on Bits, Trades &amp; Systems</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 16 Apr 2025 13:12:00 +0000</lastBuildDate>
    <atom:link href="https://blog.turboawesome.win/series/data-at-scale/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Distributed Consistency Models: What Your Service Actually Guarantees</title>
      <link>https://blog.turboawesome.win/2025/04/distributed-consistency-models-what-your-service-actually-guarantees/</link>
      <pubDate>Wed, 16 Apr 2025 13:12:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2025/04/distributed-consistency-models-what-your-service-actually-guarantees/</guid>
      <description>Linearisability, serializability, eventual consistency, causal consistency — these terms are used loosely and understood imprecisely. Knowing what your data store actually guarantees determines whether your distributed system is correct.</description>
    </item>
    <item>
      <title>Tail-Based Trace Sampling: Why Head Sampling Is Usually Wrong</title>
      <link>https://blog.turboawesome.win/2024/10/tail-based-trace-sampling-why-head-sampling-is-usually-wrong/</link>
      <pubDate>Wed, 09 Oct 2024 13:00:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2024/10/tail-based-trace-sampling-why-head-sampling-is-usually-wrong/</guid>
      <description>Head-based sampling decides whether to trace a request at the start. Tail-based sampling decides after the request completes. For finding latency outliers and errors, tail-based sampling is almost always what you want — and almost never what gets implemented.</description>
    </item>
    <item>
      <title>Observability at Scale: What &#39;Good&#39; Looks Like When You Have Too Much Data</title>
      <link>https://blog.turboawesome.win/2024/05/observability-at-scale-what-good-looks-like-when-you-have-too-much-data/</link>
      <pubDate>Wed, 29 May 2024 09:47:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2024/05/observability-at-scale-what-good-looks-like-when-you-have-too-much-data/</guid>
      <description>Observability problems at large scale are different from small-scale ones. Too little signal is replaced by too much signal, and the engineering challenge inverts.</description>
    </item>
    <item>
      <title>Cache Design as a Reliability Practice, Not an Optimisation</title>
      <link>https://blog.turboawesome.win/2024/03/cache-design-as-a-reliability-practice-not-an-optimisation/</link>
      <pubDate>Wed, 27 Mar 2024 11:47:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2024/03/cache-design-as-a-reliability-practice-not-an-optimisation/</guid>
      <description>Most engineers add caches to make things faster. At scale, the more important reason to design caches carefully is reliability — a cache failure should not cascade into a system failure. The patterns that prevent that are different from the patterns that optimise for speed.</description>
    </item>
    <item>
      <title>ClickHouse for Application Analytics: Fast Aggregations Without Spark</title>
      <link>https://blog.turboawesome.win/2023/05/clickhouse-for-application-analytics-fast-aggregations-without-spark/</link>
      <pubDate>Wed, 17 May 2023 14:08:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2023/05/clickhouse-for-application-analytics-fast-aggregations-without-spark/</guid>
      <description>When you need sub-second aggregations over billions of rows and don&amp;#39;t want to run a Spark cluster, ClickHouse is often the answer. Notes from a year in production.</description>
    </item>
    <item>
      <title>Kafka at Startup Scale</title>
      <link>https://blog.turboawesome.win/2022/05/kafka-at-startup-scale/</link>
      <pubDate>Wed, 18 May 2022 14:00:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2022/05/kafka-at-startup-scale/</guid>
      <description>Running Kafka at a startup is different from running it at enterprise scale. The operational complexity is real, the defaults are wrong for small clusters, and the failure modes are different from what the documentation implies.</description>
    </item>
    <item>
      <title>Choosing a Time-Series Database in 2020</title>
      <link>https://blog.turboawesome.win/2020/11/choosing-a-time-series-database-in-2020/</link>
      <pubDate>Wed, 18 Nov 2020 11:00:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2020/11/choosing-a-time-series-database-in-2020/</guid>
      <description>In 2020 the time-series database landscape had fragmented into specialised options with very different tradeoffs. InfluxDB, TimescaleDB, ClickHouse, Prometheus — each suited to different access patterns, retention requirements, and operational models.</description>
    </item>
    <item>
      <title>Schema Evolution in Avro: The Hard Lessons from Production</title>
      <link>https://blog.turboawesome.win/2018/10/schema-evolution-in-avro-the-hard-lessons-from-production/</link>
      <pubDate>Thu, 04 Oct 2018 11:29:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2018/10/schema-evolution-in-avro-the-hard-lessons-from-production/</guid>
      <description>Avro&amp;#39;s schema evolution rules sound simple. In production with multiple services and a regulated data retention requirement, the edges are sharp. Here are the cases that burned us and the practices that prevented future ones.</description>
    </item>
    <item>
      <title>Event Sourcing in Financial Systems: Real Benefits, Real Costs</title>
      <link>https://blog.turboawesome.win/2018/07/event-sourcing-in-financial-systems-real-benefits-real-costs/</link>
      <pubDate>Wed, 11 Jul 2018 11:03:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2018/07/event-sourcing-in-financial-systems-real-benefits-real-costs/</guid>
      <description>Event sourcing is a natural fit for financial systems that require audit trails and point-in-time reconstruction. The costs are real too — projections, eventual consistency, and the event schema evolution problem.</description>
    </item>
    <item>
      <title>Backpressure in Practice: Keeping Fast Producers from Killing Slow Consumers</title>
      <link>https://blog.turboawesome.win/2018/06/backpressure-in-practice-keeping-fast-producers-from-killing-slow-consumers/</link>
      <pubDate>Thu, 14 Jun 2018 10:33:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2018/06/backpressure-in-practice-keeping-fast-producers-from-killing-slow-consumers/</guid>
      <description>Every system has components that produce faster than consumers can handle under some conditions. Backpressure is the mechanism by which fast producers are slowed rather than dropping data or consuming unbounded memory. Here&amp;#39;s what the options look like in practice.</description>
    </item>
    <item>
      <title>Distributed Transactions Are a Lie (And What to Do Instead)</title>
      <link>https://blog.turboawesome.win/2018/01/distributed-transactions-are-a-lie-and-what-to-do-instead/</link>
      <pubDate>Wed, 17 Jan 2018 10:55:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2018/01/distributed-transactions-are-a-lie-and-what-to-do-instead/</guid>
      <description>Two-phase commit promises ACID semantics across distributed systems. In practice it&amp;#39;s slow, fragile, and blocks under failure. The patterns that actually work — sagas, idempotency, and compensating transactions — are more complex but more reliable.</description>
    </item>
    <item>
      <title>Building MiFID II Trade Reporting Infrastructure: An Engineer&#39;s View</title>
      <link>https://blog.turboawesome.win/2017/10/building-mifid-ii-trade-reporting-infrastructure-an-engineers-view/</link>
      <pubDate>Tue, 03 Oct 2017 11:45:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/10/building-mifid-ii-trade-reporting-infrastructure-an-engineers-view/</guid>
      <description>MiFID II required every trade to be reported within 15 minutes of execution. Building the infrastructure to meet that requirement across a large, heterogeneous estate taught us about the gap between regulatory requirements and production reality.</description>
    </item>
    <item>
      <title>Stream Processing with Kafka Streams vs Flink: A Real Comparison</title>
      <link>https://blog.turboawesome.win/2017/09/stream-processing-with-kafka-streams-vs-flink-a-real-comparison/</link>
      <pubDate>Wed, 27 Sep 2017 14:02:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/09/stream-processing-with-kafka-streams-vs-flink-a-real-comparison/</guid>
      <description>We evaluated Kafka Streams and Apache Flink for a real-time trade enrichment and aggregation pipeline. The technical comparison produced a clear result; the operational comparison was more nuanced.</description>
    </item>
    <item>
      <title>Column Stores for Analytics: Why Row-Based Is Wrong for This Problem</title>
      <link>https://blog.turboawesome.win/2017/04/column-stores-for-analytics-why-row-based-is-wrong-for-this-problem/</link>
      <pubDate>Wed, 05 Apr 2017 14:33:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/04/column-stores-for-analytics-why-row-based-is-wrong-for-this-problem/</guid>
      <description>When the data team started complaining about multi-hour Postgres queries on trade history, we rewrote the analytics layer around columnar storage. The why is interesting.</description>
    </item>
    <item>
      <title>Kafka in Finance: What &#39;Exactly Once&#39; Actually Costs You</title>
      <link>https://blog.turboawesome.win/2017/01/kafka-in-finance-what-exactly-once-actually-costs-you/</link>
      <pubDate>Tue, 10 Jan 2017 11:22:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/01/kafka-in-finance-what-exactly-once-actually-costs-you/</guid>
      <description>Kafka&amp;#39;s exactly-once semantics arrived in 0.11 with significant caveats. Using it in a regulated financial context forced a clear-eyed view of what the guarantee actually covers and what it doesn&amp;#39;t.</description>
    </item>
    <item>
      <title>KDB&#43;/Q for Java Developers: Reading the Matrix</title>
      <link>https://blog.turboawesome.win/2016/10/kdb-/q-for-java-developers-reading-the-matrix/</link>
      <pubDate>Tue, 11 Oct 2016 14:17:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2016/10/kdb-/q-for-java-developers-reading-the-matrix/</guid>
      <description>KDB&#43; is the database of choice for time-series analytics in investment banks. It&amp;#39;s fast, alien, and worth understanding. A Java developer&amp;#39;s field guide.</description>
    </item>
    <item>
      <title>Time-Series Data at a Bank: Why Relational Databases Break and What Comes Next</title>
      <link>https://blog.turboawesome.win/2016/07/time-series-data-at-a-bank-why-relational-databases-break-and-what-comes-next/</link>
      <pubDate>Wed, 06 Jul 2016 11:34:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2016/07/time-series-data-at-a-bank-why-relational-databases-break-and-what-comes-next/</guid>
      <description>Financial institutions generate millions of time-stamped data points every day. The relational database model, designed for transactional workloads, breaks down spectacularly for this use case — here&amp;#39;s why, and what replaces it.</description>
    </item>
  </channel>
</rss>
