A toy project that shows the flow, not the money. This is for learning about event-driven design and latency. Not trading advice!
What this is
YoctoTrader is a compact, readable pipeline:
|
|
- Feed: Gaussian random-walk prices (no I/O).
- Disruptor: single-producer ring buffer; reuses event objects to avoid GC churn.
- Strategy: fast/slow moving-average crossover.
- Risk Gate: simple position cap and order-rate throttle.
- Order Publisher: prints orders (placeholder for an OMS/gateway).
- Latency: end-to-end nanoseconds, summarized with HdrHistogram.
The code is intentionally minimal so you can profile, tweak, and see cause↔effect quickly.
The code
Get the code here on Github – https://github.com/simocoder/yoctotrader
Docker Hub
docker run –rm -it simocoder/yoctotrader:latest
Or GHCR
docker run –rm -it ghcr.io/simocoder/yoctotrader:latest
Quick start
|
|
You’ll see frequent ORDER BUY/SELL
lines and occasional latency summaries like:
|
|
If you prefer Java-21 bytecode:
|
|
Reading the logs
Example:
|
|
yoctotrader-20
— thread name from the consumer.ORDER SELL
— strategy signal (fast MA below slow MA ⇒ SELL).qty=1
— fixed size in this toy.px=...
— current simulated mid price.ts=...
— producer nanotime when the tick entered the ring (used for latency).
The stream is chatty because the random walk wiggles constantly and we process at a high tick rate. The RiskGate holds position within ±5 and limits order frequency (default 1 ms gap).
Design notes (short)
- Single-writer ring: Disruptor excels when one producer hands off to one (or a few) consumers, minimizing locks and cache misses.
- Allocation discipline:
MarketDataEvent
instances are reused. The hot path avoids new allocations. - Time base: We stamp using
System.nanoTime()
at publish; the consumer computes e2e latency against “now.” - Separation of concerns: Strategy → Risk → Publisher are cleanly split so you can swap any piece without touching the others.
Repo layout
|
|
Build artifacts land in target/
.
Parameters you’ll probably tweak
Open Main.java
:
|
|
- Less noise: set
targetRatePerSec
to10_000
, increase gap to10_000_000
(10 ms), or widen the MAs (e.g., 16 vs 64). - More pressure: shrink the gap, increase feed rate, try
BusySpin
vs other wait strategies.
Experiments (suggested)
-
Latency vs ring/wait strategy
- Compare
BusySpinWaitStrategy
toYieldingWaitStrategy
andBlockingWaitStrategy
. - Watch p50/p99 change under the same feed rate.
- Compare
-
Tick pressure
- Sweep
targetRatePerSec
(1k → 200k). Find the knee where p99 jumps.
- Sweep
-
Smoother signal
- Use bigger moving averages to reduce flips; track net position.
-
Add position/PnL printouts
- Add
public double fastMA()
/slowMA()
getters and log them with each order.
- Add
-
Thread pinning (Linux)
- Run with
taskset -c 2 java ...
and pin producer/consumer to isolated cores.
- Run with
-
Profiling
-XX:StartFlightRecording=filename=jfr.jfr,dumponexit=true
- Try
async-profiler
to verify no accidental allocations in the hot path.
Why Disruptor here?
- Single-producer → single-consumer is common in feed/strategy handoffs.
- Lock-free sequence claims + cache-friendly ring beats queues with locks in this pattern.
- It makes back-pressure explicit: when the consumer lags, the producer sees the tail.
This is an educational fit; production systems may add batching, fan-out, IPC, kernel-bypass NICs, and binary encodings.
Disruptor, in plain terms
Disruptor (from LMAX Exchange) is a high-performance in-process messaging pattern built around a pre-allocated ring buffer and sequence counters instead of conventional blocking queues. It aims for very low latency and high throughput with minimal GC.
Why it’s fast
- Pre-allocated events: the ring buffer is filled once; handlers reuse event objects → near-zero allocations on the hot path.
- Sequences, not locks: producers/consumers coordinate with monotonic sequence numbers and memory fences (CAS/volatile), avoiding OS locks.
- Single-writer principle: one producer per sequence stream removes contention on writes.
- Pluggable waiting: consumers use a wait strategy (spin, yield, park, block) to trade CPU for latency.
Core pieces
RingBuffer<T>
— fixed-size circular buffer (size must be a power of two).Sequencer
/Sequence
— coordinates claimed and published positions.EventHandler<T>
— consumer callback (onEvent
).WorkHandler<T>
/WorkerPool
— work-queue mode (each event goes to exactly one worker).ProducerType.SINGLE|MULTI
— optimize for one or many producers.WaitStrategy
— how consumers wait for new data.
Publish/consume flow
- Producer claims the next slot (sequence) → writes into the pre-allocated event.
- Producer publishes the sequence; the ring’s cursor advances.
- Consumer(s) wait until their next needed sequence is available → process the event → advance their own sequence.
- Back-pressure: if consumers lag, the producer cannot claim beyond the slowest consumer plus ring capacity.
Tiny example
|
|
Wait strategies (rule-of-thumb)
Strategy | Latency | CPU use | Notes |
---|---|---|---|
BusySpinWaitStrategy |
lowest | highest | Great for dedicated cores. |
YieldingWaitStrategy |
very low | high | Good compromise on busy boxes. |
Sleeping/Blocking |
higher | low | Better for shared servers, less jitter-sensitive. |
When to use it
- Hot, latency-sensitive pipelines (market-data processing, in-process telemetry/log fans, trading signal stages).
- You need predictable GC and millions of events/sec in a single process.
When not to bother
- Typical web APIs/microservices: standard executors/queues are simpler.
- Cross-process distribution: prefer IPC transports (Aeron, shared memory, sockets); Disruptor is in-process.
In this project (YoctoTrader)
- We run single producer → single consumer with
ProducerType.SINGLE
andBusySpinWaitStrategy
. MarketDataEvent
objects are reused in the ring (no per-tick allocation).- The pipeline is Feed → Disruptor → Strategy → Risk → Publisher, with latency measured from producer stamp to consumer handling.
Gotchas
- Ring size must be a power of two (bit-masking indexes is part of the speed).
- Back-pressure is real: a slow consumer will stall producers once the ring fills.
- Multiple producers need
ProducerType.MULTI
and careful tuning. - Treat the Disruptor as a mechanical sympathy tool: pin threads, keep handlers lean, avoid allocations/synchronization in
onEvent
.
Build details
- Shaded jar via
maven-shade-plugin
sojava -jar
works without external deps. - Java 17 bytecode by default for wider compatibility; use
-Pjava21
if you want 21. - Deps:
com.lmax:disruptor
,org.hdrhistogram:HdrHistogram
,org.slf4j:slf4j-api
(+slf4j-simple
runtime).
Not a trading system
This is a lab. The feed is synthetic. There’s no exchange connectivity, FIX/SBE, venue-specific throttles, or full risk controls. Treat the outputs like signals in a scope, not anything to trade on.
Next steps
- Swap the feed for Aeron IPC or UDP multicast.
- Replace POJOs with SBE or Chronicle Wire encodings.
- Split producer/consumer into separate processes and measure IPC budgets.
- Add a tiny stateful OMS (acks, timeouts, cancel/reject paths).
- Introduce a backtest that replays a recorded feed deterministically.
Appendix: JVM flags I sometimes use
|
|
Remove any flag your JDK complains about; they’re optional.
Copyleft 🄯 YoctoTrader. Educational code; adapt freely. If you improve the labs, I’d love to hear what you measured.