Data on Bits, Trades & Systems

Data on Bits, Trades & Systems https://blog.turboawesome.win/tags/data/ Recent content in Data on Bits, Trades & Systems Hugo en-us Wed, 17 May 2023 14:08:00 +0000 ClickHouse for Application Analytics: Fast Aggregations Without Spark https://blog.turboawesome.win/2023/05/clickhouse-for-application-analytics-fast-aggregations-without-spark/ Wed, 17 May 2023 14:08:00 +0000 https://blog.turboawesome.win/2023/05/clickhouse-for-application-analytics-fast-aggregations-without-spark/ When you need sub-second aggregations over billions of rows and don't want to run a Spark cluster, ClickHouse is often the answer. Notes from a year in production. Kafka at Startup Scale https://blog.turboawesome.win/2022/05/kafka-at-startup-scale/ Wed, 18 May 2022 14:00:00 +0000 https://blog.turboawesome.win/2022/05/kafka-at-startup-scale/ Running Kafka at a startup is different from running it at enterprise scale. The operational complexity is real, the defaults are wrong for small clusters, and the failure modes are different from what the documentation implies. Choosing a Time-Series Database in 2020 https://blog.turboawesome.win/2020/11/choosing-a-time-series-database-in-2020/ Wed, 18 Nov 2020 11:00:00 +0000 https://blog.turboawesome.win/2020/11/choosing-a-time-series-database-in-2020/ In 2020 the time-series database landscape had fragmented into specialised options with very different tradeoffs. InfluxDB, TimescaleDB, ClickHouse, Prometheus — each suited to different access patterns, retention requirements, and operational models. Schema Evolution in Avro: The Hard Lessons from Production https://blog.turboawesome.win/2018/10/schema-evolution-in-avro-the-hard-lessons-from-production/ Thu, 04 Oct 2018 11:29:00 +0000 https://blog.turboawesome.win/2018/10/schema-evolution-in-avro-the-hard-lessons-from-production/ Avro's schema evolution rules sound simple. In production with multiple services and a regulated data retention requirement, the edges are sharp. Here are the cases that burned us and the practices that prevented future ones. Stream Processing with Kafka Streams vs Flink: A Real Comparison https://blog.turboawesome.win/2017/09/stream-processing-with-kafka-streams-vs-flink-a-real-comparison/ Wed, 27 Sep 2017 14:02:00 +0000 https://blog.turboawesome.win/2017/09/stream-processing-with-kafka-streams-vs-flink-a-real-comparison/ We evaluated Kafka Streams and Apache Flink for a real-time trade enrichment and aggregation pipeline. The technical comparison produced a clear result; the operational comparison was more nuanced. Column Stores for Analytics: Why Row-Based Is Wrong for This Problem https://blog.turboawesome.win/2017/04/column-stores-for-analytics-why-row-based-is-wrong-for-this-problem/ Wed, 05 Apr 2017 14:33:00 +0000 https://blog.turboawesome.win/2017/04/column-stores-for-analytics-why-row-based-is-wrong-for-this-problem/ When the data team started complaining about multi-hour Postgres queries on trade history, we rewrote the analytics layer around columnar storage. The why is interesting. Kafka in Finance: What 'Exactly Once' Actually Costs You https://blog.turboawesome.win/2017/01/kafka-in-finance-what-exactly-once-actually-costs-you/ Tue, 10 Jan 2017 11:22:00 +0000 https://blog.turboawesome.win/2017/01/kafka-in-finance-what-exactly-once-actually-costs-you/ Kafka's exactly-once semantics arrived in 0.11 with significant caveats. Using it in a regulated financial context forced a clear-eyed view of what the guarantee actually covers and what it doesn't. Clojure Data Pipelines: Transducers in Production Risk Processing https://blog.turboawesome.win/2016/11/clojure-data-pipelines-transducers-in-production-risk-processing/ Wed, 23 Nov 2016 13:55:00 +0000 https://blog.turboawesome.win/2016/11/clojure-data-pipelines-transducers-in-production-risk-processing/ Transducers are Clojure's answer to pipeline composition that works without creating intermediate collections. For production data processing where allocation matters, they're not a theoretical nicety — they're genuinely useful. KDB+/Q for Java Developers: Reading the Matrix https://blog.turboawesome.win/2016/10/kdb-/q-for-java-developers-reading-the-matrix/ Tue, 11 Oct 2016 14:17:00 +0000 https://blog.turboawesome.win/2016/10/kdb-/q-for-java-developers-reading-the-matrix/ KDB+ is the database of choice for time-series analytics in investment banks. It's fast, alien, and worth understanding. A Java developer's field guide. Time-Series Data at a Bank: Why Relational Databases Break and What Comes Next https://blog.turboawesome.win/2016/07/time-series-data-at-a-bank-why-relational-databases-break-and-what-comes-next/ Wed, 06 Jul 2016 11:34:00 +0000 https://blog.turboawesome.win/2016/07/time-series-data-at-a-bank-why-relational-databases-break-and-what-comes-next/ Financial institutions generate millions of time-stamped data points every day. The relational database model, designed for transactional workloads, breaks down spectacularly for this use case — here's why, and what replaces it.