<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Data on Bits, Trades &amp; Systems</title>
    <link>https://blog.turboawesome.win/tags/data/</link>
    <description>Recent content in Data on Bits, Trades &amp; Systems</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 17 May 2023 14:08:00 +0000</lastBuildDate>
    <atom:link href="https://blog.turboawesome.win/tags/data/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>ClickHouse for Application Analytics: Fast Aggregations Without Spark</title>
      <link>https://blog.turboawesome.win/2023/05/clickhouse-for-application-analytics-fast-aggregations-without-spark/</link>
      <pubDate>Wed, 17 May 2023 14:08:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2023/05/clickhouse-for-application-analytics-fast-aggregations-without-spark/</guid>
      <description>When you need sub-second aggregations over billions of rows and don&amp;#39;t want to run a Spark cluster, ClickHouse is often the answer. Notes from a year in production.</description>
    </item>
    <item>
      <title>Kafka at Startup Scale</title>
      <link>https://blog.turboawesome.win/2022/05/kafka-at-startup-scale/</link>
      <pubDate>Wed, 18 May 2022 14:00:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2022/05/kafka-at-startup-scale/</guid>
      <description>Running Kafka at a startup is different from running it at enterprise scale. The operational complexity is real, the defaults are wrong for small clusters, and the failure modes are different from what the documentation implies.</description>
    </item>
    <item>
      <title>Choosing a Time-Series Database in 2020</title>
      <link>https://blog.turboawesome.win/2020/11/choosing-a-time-series-database-in-2020/</link>
      <pubDate>Wed, 18 Nov 2020 11:00:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2020/11/choosing-a-time-series-database-in-2020/</guid>
      <description>In 2020 the time-series database landscape had fragmented into specialised options with very different tradeoffs. InfluxDB, TimescaleDB, ClickHouse, Prometheus — each suited to different access patterns, retention requirements, and operational models.</description>
    </item>
    <item>
      <title>Schema Evolution in Avro: The Hard Lessons from Production</title>
      <link>https://blog.turboawesome.win/2018/10/schema-evolution-in-avro-the-hard-lessons-from-production/</link>
      <pubDate>Thu, 04 Oct 2018 11:29:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2018/10/schema-evolution-in-avro-the-hard-lessons-from-production/</guid>
      <description>Avro&amp;#39;s schema evolution rules sound simple. In production with multiple services and a regulated data retention requirement, the edges are sharp. Here are the cases that burned us and the practices that prevented future ones.</description>
    </item>
    <item>
      <title>Stream Processing with Kafka Streams vs Flink: A Real Comparison</title>
      <link>https://blog.turboawesome.win/2017/09/stream-processing-with-kafka-streams-vs-flink-a-real-comparison/</link>
      <pubDate>Wed, 27 Sep 2017 14:02:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/09/stream-processing-with-kafka-streams-vs-flink-a-real-comparison/</guid>
      <description>We evaluated Kafka Streams and Apache Flink for a real-time trade enrichment and aggregation pipeline. The technical comparison produced a clear result; the operational comparison was more nuanced.</description>
    </item>
    <item>
      <title>Column Stores for Analytics: Why Row-Based Is Wrong for This Problem</title>
      <link>https://blog.turboawesome.win/2017/04/column-stores-for-analytics-why-row-based-is-wrong-for-this-problem/</link>
      <pubDate>Wed, 05 Apr 2017 14:33:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/04/column-stores-for-analytics-why-row-based-is-wrong-for-this-problem/</guid>
      <description>When the data team started complaining about multi-hour Postgres queries on trade history, we rewrote the analytics layer around columnar storage. The why is interesting.</description>
    </item>
    <item>
      <title>Kafka in Finance: What &#39;Exactly Once&#39; Actually Costs You</title>
      <link>https://blog.turboawesome.win/2017/01/kafka-in-finance-what-exactly-once-actually-costs-you/</link>
      <pubDate>Tue, 10 Jan 2017 11:22:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2017/01/kafka-in-finance-what-exactly-once-actually-costs-you/</guid>
      <description>Kafka&amp;#39;s exactly-once semantics arrived in 0.11 with significant caveats. Using it in a regulated financial context forced a clear-eyed view of what the guarantee actually covers and what it doesn&amp;#39;t.</description>
    </item>
    <item>
      <title>Clojure Data Pipelines: Transducers in Production Risk Processing</title>
      <link>https://blog.turboawesome.win/2016/11/clojure-data-pipelines-transducers-in-production-risk-processing/</link>
      <pubDate>Wed, 23 Nov 2016 13:55:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2016/11/clojure-data-pipelines-transducers-in-production-risk-processing/</guid>
      <description>Transducers are Clojure&amp;#39;s answer to pipeline composition that works without creating intermediate collections. For production data processing where allocation matters, they&amp;#39;re not a theoretical nicety — they&amp;#39;re genuinely useful.</description>
    </item>
    <item>
      <title>KDB&#43;/Q for Java Developers: Reading the Matrix</title>
      <link>https://blog.turboawesome.win/2016/10/kdb-/q-for-java-developers-reading-the-matrix/</link>
      <pubDate>Tue, 11 Oct 2016 14:17:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2016/10/kdb-/q-for-java-developers-reading-the-matrix/</guid>
      <description>KDB&#43; is the database of choice for time-series analytics in investment banks. It&amp;#39;s fast, alien, and worth understanding. A Java developer&amp;#39;s field guide.</description>
    </item>
    <item>
      <title>Time-Series Data at a Bank: Why Relational Databases Break and What Comes Next</title>
      <link>https://blog.turboawesome.win/2016/07/time-series-data-at-a-bank-why-relational-databases-break-and-what-comes-next/</link>
      <pubDate>Wed, 06 Jul 2016 11:34:00 +0000</pubDate>
      <guid>https://blog.turboawesome.win/2016/07/time-series-data-at-a-bank-why-relational-databases-break-and-what-comes-next/</guid>
      <description>Financial institutions generate millions of time-stamped data points every day. The relational database model, designed for transactional workloads, breaks down spectacularly for this use case — here&amp;#39;s why, and what replaces it.</description>
    </item>
  </channel>
</rss>
