YoctoTrader: a tiny event-driven simulated trading engine (Java, Disruptor)

A toy project that shows the flow, not the money. This is for learning about event-driven design and latency. Not trading advice!

What this is

YoctoTrader is a compact, readable pipeline:

1
2
3


Simulated Feed ──> Disruptor Ring ──> Strategy ──> Risk Gate ──> Order Publisher
                         │                                         │
                         └────────────── Latency Recorder <────────┘

Feed: Gaussian random-walk prices (no I/O).
Disruptor: single-producer ring buffer; reuses event objects to avoid GC churn.
Strategy: fast/slow moving-average crossover.
Risk Gate: simple position cap and order-rate throttle.
Order Publisher: prints orders (placeholder for an OMS/gateway).
Latency: end-to-end nanoseconds, summarized with HdrHistogram.

The code is intentionally minimal so you can profile, tweak, and see cause↔effect quickly.

The code

Get the code here on Github – https://github.com/simocoder/yoctotrader

Docker Hub

docker run –rm -it simocoder/yoctotrader:latest

Or GHCR

docker run –rm -it ghcr.io/simocoder/yoctotrader:latest

Quick start

1
2
3
4
5


# build a shaded jar (all deps inside)
mvn -q clean package

# run
java -jar target/yoctotrader-1.0.0.jar

You’ll see frequent ORDER BUY/SELL lines and occasional latency summaries like:

1

e2e latency ns: p50=... p90=... p99=... max=... (count=...)

If you prefer Java-21 bytecode:

1

mvn -q -Pjava21 clean package

Reading the logs

Example:

1

[yoctotrader-20] INFO com.example.yoctotrader.engine.OrderPublisher - ORDER SELL qty=1 px=89.53 ts=74009571300808

yoctotrader-20 — thread name from the consumer.
ORDER SELL — strategy signal (fast MA below slow MA ⇒ SELL).
qty=1 — fixed size in this toy.
px=... — current simulated mid price.
ts=... — producer nanotime when the tick entered the ring (used for latency).

The stream is chatty because the random walk wiggles constantly and we process at a high tick rate. The RiskGate holds position within ±5 and limits order frequency (default 1 ms gap).

Design notes (short)

Single-writer ring: Disruptor excels when one producer hands off to one (or a few) consumers, minimizing locks and cache misses.
Allocation discipline: MarketDataEvent instances are reused. The hot path avoids new allocations.
Time base: We stamp using System.nanoTime() at publish; the consumer computes e2e latency against “now.”
Separation of concerns: Strategy → Risk → Publisher are cleanly split so you can swap any piece without touching the others.

Repo layout

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


src/main/java/com/example/yoctotrader/
  Main.java                      # wire-up and run-loop
  events/MarketDataEvent.java    # reusable event object
  feed/RandomWalkFeed.java       # Gaussian random-walk
  engine/Strategy.java           # MA(8) vs MA(32)
  engine/RiskGate.java           # position cap + min order gap
  engine/Order.java              # tiny order record
  engine/OrderPublisher.java     # logs instead of sending to an exchange
  engine/LatencyRecorder.java    # HdrHistogram percentiles
pom.xml                          # shaded jar, Java 17 by default
README.md

Build artifacts land in target/.

Parameters you’ll probably tweak

Open Main.java:

1
2
3


final long targetRatePerSec = 200_000;    // feed pressure
RiskGate risk = new RiskGate(5, 1_000_000); // max ±5; 1 ms min gap
Strategy strat = new Strategy(8, 32);     // fast vs slow MA

Less noise: set targetRatePerSec to 10_000, increase gap to 10_000_000 (10 ms), or widen the MAs (e.g., 16 vs 64).
More pressure: shrink the gap, increase feed rate, try BusySpin vs other wait strategies.

Experiments (suggested)

Latency vs ring/wait strategy
- Compare BusySpinWaitStrategy to YieldingWaitStrategy and BlockingWaitStrategy.
- Watch p50/p99 change under the same feed rate.
Tick pressure
- Sweep targetRatePerSec (1k → 200k). Find the knee where p99 jumps.
Smoother signal
- Use bigger moving averages to reduce flips; track net position.
Add position/PnL printouts
- Add public double fastMA() / slowMA() getters and log them with each order.
Thread pinning (Linux)
- Run with taskset -c 2 java ... and pin producer/consumer to isolated cores.
Profiling
- -XX:StartFlightRecording=filename=jfr.jfr,dumponexit=true
- Try async-profiler to verify no accidental allocations in the hot path.

Why Disruptor here?

Single-producer → single-consumer is common in feed/strategy handoffs.
Lock-free sequence claims + cache-friendly ring beats queues with locks in this pattern.
It makes back-pressure explicit: when the consumer lags, the producer sees the tail.

This is an educational fit; production systems may add batching, fan-out, IPC, kernel-bypass NICs, and binary encodings.

Disruptor, in plain terms

Disruptor (from LMAX Exchange) is a high-performance in-process messaging pattern built around a pre-allocated ring buffer and sequence counters instead of conventional blocking queues. It aims for very low latency and high throughput with minimal GC.

Why it’s fast

Pre-allocated events: the ring buffer is filled once; handlers reuse event objects → near-zero allocations on the hot path.
Sequences, not locks: producers/consumers coordinate with monotonic sequence numbers and memory fences (CAS/volatile), avoiding OS locks.
Single-writer principle: one producer per sequence stream removes contention on writes.
Pluggable waiting: consumers use a wait strategy (spin, yield, park, block) to trade CPU for latency.

Core pieces

RingBuffer<T> — fixed-size circular buffer (size must be a power of two).
Sequencer / Sequence — coordinates claimed and published positions.
EventHandler<T> — consumer callback (onEvent).
WorkHandler<T> / WorkerPool — work-queue mode (each event goes to exactly one worker).
ProducerType.SINGLE|MULTI — optimize for one or many producers.
WaitStrategy — how consumers wait for new data.

Publish/consume flow

Producer claims the next slot (sequence) → writes into the pre-allocated event.
Producer publishes the sequence; the ring’s cursor advances.
Consumer(s) wait until their next needed sequence is available → process the event → advance their own sequence.
Back-pressure: if consumers lag, the producer cannot claim beyond the slowest consumer plus ring capacity.

Tiny example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


// Event object reused in the ring
class PriceEvent { long seq, ts; double px; void set(long s,long t,double p){seq=s;ts=t;px=p;} }

// Consumer handler
EventHandler<PriceEvent> handler = (evt, seq, endOfBatch) -> {
  // process evt without allocating
};

int ringSize = 1 << 16; // power of two
Disruptor<PriceEvent> disruptor = new Disruptor<>(
    PriceEvent::new,                // pre-allocate
    ringSize,
    Executors.defaultThreadFactory(),
    ProducerType.SINGLE,
    new BusySpinWaitStrategy()
);
disruptor.handleEventsWith(handler);
RingBuffer<PriceEvent> rb = disruptor.start();

// Producer side
long s = rb.next();
try {
  PriceEvent e = rb.get(s);
  e.set(s, System.nanoTime(), 123.45);
} finally {
  rb.publish(s);
}

Wait strategies (rule-of-thumb)

Strategy	Latency	CPU use	Notes
`BusySpinWaitStrategy`	lowest	highest	Great for dedicated cores.
`YieldingWaitStrategy`	very low	high	Good compromise on busy boxes.
`Sleeping/Blocking`	higher	low	Better for shared servers, less jitter-sensitive.

When to use it

Hot, latency-sensitive pipelines (market-data processing, in-process telemetry/log fans, trading signal stages).
You need predictable GC and millions of events/sec in a single process.

When not to bother

Typical web APIs/microservices: standard executors/queues are simpler.
Cross-process distribution: prefer IPC transports (Aeron, shared memory, sockets); Disruptor is in-process.

In this project (YoctoTrader)

We run single producer → single consumer with ProducerType.SINGLE and BusySpinWaitStrategy.
MarketDataEvent objects are reused in the ring (no per-tick allocation).
The pipeline is Feed → Disruptor → Strategy → Risk → Publisher, with latency measured from producer stamp to consumer handling.

Gotchas

Ring size must be a power of two (bit-masking indexes is part of the speed).
Back-pressure is real: a slow consumer will stall producers once the ring fills.
Multiple producers need ProducerType.MULTI and careful tuning.
Treat the Disruptor as a mechanical sympathy tool: pin threads, keep handlers lean, avoid allocations/synchronization in onEvent.

Build details

Shaded jar via maven-shade-plugin so java -jar works without external deps.
Java 17 bytecode by default for wider compatibility; use -Pjava21 if you want 21.
Deps: com.lmax:disruptor, org.hdrhistogram:HdrHistogram, org.slf4j:slf4j-api (+ slf4j-simple runtime).

Not a trading system

This is a lab. The feed is synthetic. There’s no exchange connectivity, FIX/SBE, venue-specific throttles, or full risk controls. Treat the outputs like signals in a scope, not anything to trade on.

Next steps

Swap the feed for Aeron IPC or UDP multicast.
Replace POJOs with SBE or Chronicle Wire encodings.
Split producer/consumer into separate processes and measure IPC budgets.
Add a tiny stateful OMS (acks, timeouts, cancel/reject paths).
Introduce a backtest that replays a recorded feed deterministically.

Appendix: JVM flags I sometimes use

1

java -XX:+AlwaysActAsServerClassMachine -XX:+UseNUMA -XX:+UseStringDeduplication -jar target/yoctotrader-1.0.0.jar

Remove any flag your JDK complains about; they’re optional.

Copyleft 🄯 YoctoTrader. Educational code; adapt freely. If you improve the labs, I’d love to hear what you measured.

What this is#

The code#

Docker Hub#

Or GHCR#

Quick start#

Reading the logs#

Design notes (short)#

Repo layout#

Parameters you’ll probably tweak#

Experiments (suggested)#

Why Disruptor here?#

Disruptor, in plain terms#

Why it’s fast#

Core pieces#

Publish/consume flow#

Tiny example#

Wait strategies (rule-of-thumb)#

When to use it#

When not to bother#

In this project (YoctoTrader)#

Gotchas#

Build details#

Not a trading system#

Next steps#

Appendix: JVM flags I sometimes use#