Skip to content

Low latency (<15us) high throughput (>12 million add/cancel per sec) orderbook for matching engines

License

MIT, BSD-2-Clause licenses found

Licenses found

MIT
LICENSE
BSD-2-Clause
LICENSE-RBTREE
Notifications You must be signed in to change notification settings

geseq/orderbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High Performance Order Book for matching engines

Go Report Card GoDoc Go

An experimental low latency, high throughput order book in Go.

Why?

Becuase it's fun!

Why Go?

The initial goal of the project was to see how far I could take a practical low-latency workload in a GC'd language like Go, which seems to have been (mostly) achieved.

Add/Cancel latency is pretty good (see below) and the jitter is not too bad. The impact of GC has been reduced to nonexistence. However, there is still room for improvement in jitter caused by the Go scheduler.

Features

  • Simple API
  • Standard price-time priority
  • Market and limit orders
  • Order cancellation. (No in-book updates. Updates will have to be handled with Cancel+Create, and all that entails)
  • Stop loss / take profit orders (limit and market)
  • AoN, IoC, FoK, etc. Probably not trailing stops. They're probably better handled outside the order book.
  • Snapshot the ordebook state for recovery
  • Handle any GC latency shenanigans
  • Extensive tests and benchmarks
  • Add metrics counters
  • Improve consistent latency
  • Extremely high throughput (see below)

Latency

On 2.1 Ghz Base Freq 12th Gen i& with 5200 MHz LPDD5, Hyperthreading off, Turbo off, AND GOMAXPROCS=2

Results are generally similar with test run without OS thread locking or run in a goroutine locked to an OS thread which in turn is run on an isolated CPU core on an unmodified -generic ubuntu kernel. Results are also similar with GOMAXPROCS=1

Other threads run by the Go scheduler don't seem to make a notable difference whether run on isolated cores or not, as long as they're not all pinned to the same core.

The nohz_full column represents tests run with GOMAXPROCS=1 on a nohz_full isolated core on a -generic ubuntu kernel compiled with nohz_full and all possible IRQs moved out of that core. The primary thread was pinned to this isolated core and all other threads were left on non-isolated cores.

The sched_manual column represents the settings of nohz_full mentioned above along with calls to runtime.Gosched() before and after the calls being benchmarked. This is because majority of the large latency spikes seems to be introduced due to the Go scheduler pauses in between the runs. I have not found a way to get rid of these. Running Gosched() seems to mitigate these to the point where pinning goroutine to a thread introduces significant stabilitizing of the jitter. However, this comes at about 10% cost to the throughput, and incrased latencies at p50/o90 due to increased context switching which in matching engines might be an acceptable tradeoff for reduced jitter.

AddOrder Latency (approx.) nohz_full sched_manual
p50 170ns 164ns 546ns
p99 229ns 256ns 735ns
p99.99 2.4us 2.3us 2.4us
p99.9999 12us 6.5us 7.9us
Max 36us 15us 8.5us
CancelOrder Latency (approx.) nohz_full sched_manual
p50 35ns 33ns 50ns
p99 50ns 50ns 59ns
p99.99 86ns 60ns 165ns
p99.9999 3.9us 3.1us 1.9us
Max 25us 6.8us 1.9us

Throughput

  • 12.5 million Order Add/Cancel per second:

    • 2.1 Ghz Base Frequency 12th Gen i7 with 5200 Mhz LPDDR5
    • Turbo Boost disbabled
    • Hyperthreading disabled
  • 21 million Order Add/Cancel per second:

    • 2.1 Ghz Base Frequency 12th Gen i7 with 5200 Mhz LPDDR5
    • Turbo Boost enabled 4.7 Ghz
    • Hyperthreading disabled

Limitations

  • 8 decimal places due to decimal library used. This should be fine for most use cases.
  • Consider LMAX Disruptor to maintain the throughput while post-processing with a matching engine, although this level of thoughput is probably not necessary for most use cases.

How do I use this?

You probably shouldn't use this as-is. It'll ideally at least require more thorough test coverage.

That said, it should be extremely simple to use this. Create an OrderBook object as a starting point.

There's a bug

Please create an issue. Also PRs are welcome!

About

Low latency (<15us) high throughput (>12 million add/cancel per sec) orderbook for matching engines

Resources

License

MIT, BSD-2-Clause licenses found

Licenses found

MIT
LICENSE
BSD-2-Clause
LICENSE-RBTREE

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published