Bits, Trades & Systems

When the Scale Changes: Moving into Institutional Finance

The job spec said “trading systems.” I assumed it would be similar to what I’d been doing — latency-focused, technically aggressive, small team, fast decisions. I was wrong on most counts, and right for reasons I didn’t expect. ...

Five Years in High-Frequency Trading: What I Actually Learned

Five years ago I joined the electronic trading firm not knowing what a cache line was. I thought garbage collection was something that happened to other people’s code. I had never looked at assembly output from a Java program. I’d heard of the LMAX Disruptor but had no idea why it existed. By the time I left, I had opinions about CPU prefetchers. I had read the Intel 64 and IA-32 Architectures Software Developer’s Manual for fun. I could look at a flame graph and immediately see the GC pressure. I had shipped components processing a million messages per second with sub-millisecond p99 guarantees. Here’s what that environment actually teaches you. ...

ZGC and Shenandoah: What Low-Pause GC Means for Trading Systems

In 2015, the state of GC for latency-sensitive Java was: use G1GC, tune it carefully, accept occasional 50–200ms pauses on large heaps, and work around them with off-heap storage and careful allocation management. The conventional wisdom was that sub-10ms GC pauses required small heaps (< 4GB) or near-zero allocation on the hot path. For trading systems with large position caches, this meant either expensive off-heap engineering or living with GC latency spikes. Then the early previews of ZGC (Oracle/Sun) and Shenandoah (Red Hat) started circulating. Both claimed sub-millisecond pause times regardless of heap size. The mechanisms were different, the implications were significant. ...

Scala Akka Actors for Trading Workflows: Promises and Pitfalls

The case for Akka actors in a trading system sounds compelling: isolated mutable state (no shared memory, no locks), message-driven concurrency, built-in supervision hierarchies for fault tolerance, and location transparency for distributed deployments. We used Akka for the order lifecycle workflow layer — the component that orchestrated the state machine from order received to fill confirmed. Here’s what we learned. ...

Risk Aggregation in Real Time: Design Constraints from the Dealing Desk

Risk in a batch system is a solved problem: collect all positions, apply valuation models, sum the results, write a report. The dealing desk doesn’t want a report. They want a number that’s correct right now, updates in under a second when a trade comes in, and doesn’t go stale when a price moves. That’s a different problem. ...

Understanding Safepoints: The JVM Pauses Nobody Talks About

We’d tuned GC to near-perfection. Pause times were sub-millisecond. The p99.9 latency was still spiking to 8ms several times a day, with no GC events anywhere near those spikes. It took three weeks to find the cause: safepoints, specifically a revoke-bias operation triggered by lock patterns in a third-party library. ...

Memory-Mapped Files in Java: Chronicle and the Art of Zero-Copy I/O

Disk I/O and latency-sensitive systems don’t mix well. Or so the conventional wisdom goes. Memory-mapped files challenge that assumption: when the OS maps a file into virtual memory, writes and reads go through the OS page cache. If the working set fits in RAM (which it usually does for recent data), access times are in the hundreds of nanoseconds — comparable to normal memory access, not disk I/O. Chronicle Queue is built entirely on this foundation. Understanding what’s happening underneath explains both why it’s fast and what its failure modes are. ...

Building a Trade Blotter That Doesn't Lie Under Load

The trade blotter is the dashboard every trader stares at during the day: live positions, recent fills, P&L, risk utilisation. It’s the interface between the trading system and the humans who run it. The blotter has an interesting engineering property: it’s not in the critical path (orders execute without waiting for the blotter to update), but it has to be consistently correct and fast-updating, because traders make decisions based on what it shows. A blotter that shows stale positions leads to over-trading; one that misses fills leads to missed hedges. ...

Chronicle Queue vs Kafka: Choosing a Persistent Journal at Nanosecond Scale

By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities. ...

End-of-Year Architecture Review: What Held, What Failed, What Changed

Three years into building trading systems, the end of 2014 felt like a good moment to stop and audit what we’d built. Not a full rewrite assessment — more a structured reflection on which bets paid off, which didn’t, and what the data was telling us about where the gaps were. This kind of review is undervalued in fast-moving engineering organisations. You learn a lot from production behaviour over years that you can’t learn from design docs. ...