An experimental low latency, high throughput order book in Go.
Becuase it's fun!
The initial goal of the project was to see how far I could take a practical low-latency workload in a GC'd language like Go, which seems to have been (mostly) achieved.
Add/Cancel latency is pretty good (see below) and the jitter is not too bad. The impact of GC has been reduced to nonexistence. However, there is still room for improvement in jitter caused by the Go scheduler.
- Simple API
- Standard price-time priority
- Market and limit orders
- Order cancellation. (No in-book updates. Updates will have to be handled with Cancel+Create, and all that entails)
- Stop loss / take profit orders (limit and market)
- AoN, IoC, FoK, etc. Probably not trailing stops. They're probably better handled outside the order book.
- Snapshot the ordebook state for recovery
- Handle any GC latency shenanigans
- Extensive tests and benchmarks
- Add metrics counters
- Improve consistent latency
- Extremely high throughput (see below)
On 2.1 Ghz Base Freq 12th Gen i& with 5200 MHz LPDD5, Hyperthreading off, Turbo off, AND GOMAXPROCS=2
Results are generally similar with test run without OS thread locking or run in a goroutine locked to an OS thread which in turn is run on an isolated CPU core on an unmodified -generic
ubuntu kernel. Results are also similar with GOMAXPROCS=1
Other threads run by the Go scheduler don't seem to make a notable difference whether run on isolated cores or not, as long as they're not all pinned to the same core.
The nohz_full
column represents tests run with GOMAXPROCS=1
on a nohz_full
isolated core on a -generic
ubuntu kernel compiled with nohz_full
and all possible IRQs moved out of that core. The primary thread was pinned to this isolated core and all other threads were left on non-isolated cores.
The sched_manual
column represents the settings of nohz_full
mentioned above along with calls to runtime.Gosched()
before and after the calls being benchmarked. This is because majority of the large latency spikes seems to be introduced due to the Go scheduler pauses in between the runs. I have not found a way to get rid of these. Running Gosched()
seems to mitigate these to the point where pinning goroutine to a thread introduces significant stabilitizing of the jitter. However, this comes at about 10% cost to the throughput, and incrased latencies at p50/o90 due to increased context switching which in matching engines might be an acceptable tradeoff for reduced jitter.
AddOrder | Latency (approx.) | nohz_full | sched_manual |
---|---|---|---|
p50 | 170ns | 164ns | 546ns |
p99 | 229ns | 256ns | 735ns |
p99.99 | 2.4us | 2.3us | 2.4us |
p99.9999 | 12us | 6.5us | 7.9us |
Max | 36us | 15us | 8.5us |
CancelOrder | Latency (approx.) | nohz_full | sched_manual |
---|---|---|---|
p50 | 35ns | 33ns | 50ns |
p99 | 50ns | 50ns | 59ns |
p99.99 | 86ns | 60ns | 165ns |
p99.9999 | 3.9us | 3.1us | 1.9us |
Max | 25us | 6.8us | 1.9us |
-
12.5 million Order Add/Cancel per second:
- 2.1 Ghz Base Frequency 12th Gen i7 with 5200 Mhz LPDDR5
- Turbo Boost disbabled
- Hyperthreading disabled
-
21 million Order Add/Cancel per second:
- 2.1 Ghz Base Frequency 12th Gen i7 with 5200 Mhz LPDDR5
- Turbo Boost enabled 4.7 Ghz
- Hyperthreading disabled
- 8 decimal places due to decimal library used. This should be fine for most use cases.
- Consider LMAX Disruptor to maintain the throughput while post-processing with a matching engine, although this level of thoughput is probably not necessary for most use cases.
You probably shouldn't use this as-is. It'll ideally at least require more thorough test coverage.
That said, it should be extremely simple to use this. Create an OrderBook
object as a starting point.
Please create an issue. Also PRs are welcome!