This software allows the user to reconstruct the state of the limit order-book from low-level tick-data provided by the London Stock-Exchange (LSE). The tick-data can be hosted in either mysql, or Apache HBase, and tools are provided for loading to the data into either of these back-ends from the compressed raw files provided by the LSE. Once the data has been loaded, events corresponding to a particular asset and a particular date-range can be replayed through an order-book simulator in order to reconstruct the state of the book. Variables such as the mid-price can then be recorded as a time-series in CSV format. Alternatively the simulator can be run directly from a Python client using an Apache Thrift API.
The software is written in Scala and Java, along with various Unix shell scripts which automate the import process.
-
Oracle Java JVM 1.7.0 or higher. Note that the default JVM installed on MacOS or Linux needs to be replaced by the Oracle version in order for the software to work correctly.
-
If running on Windows you will need to install Cygwin in order to execute the shell scripts.
-
(Optional) In order to build the software from source, you will need the scala build tool (sbt); see the sbt documentation.
-
(Optional) In order to host the data, you will need to install Apache HBase version 1.1.2. The software can optionally connect to an existing server which already hosts the data.
-
(Optional) The best Integrated Development Environment (IDE) to use for working on the project is IntelliJ IDEA with the Scala plugin installed.
Open the file hbase-site.xml in the directory etc/ using a text-editor and
check that the hbase.master and hbase.zookeeper.quorum properties point to the
machine running Apache HBase. For example, the configuration below can be
used to connect to the machine with hostname cseesp1.essex.ac.uk
.
Alternatively to connect to your own laptop running HBase in stand-alone mode,
replace cseesp1.essex.ac.uk
with localhost
.
<configuration>
<property>
<name>hbase.master</name>
<value>cseesp1.essex.ac.uk</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>cseesp1.essex.ac.uk</value>
</property>
</configuration>
To compile the source-code to separate .class files, execute the following command:
sbt compile
To create jar files and the script files:
sbt pack
Execute the following commands in the shell to install the scripts into the directory ~/local/bin
:
cd target/pack/bin
make install
If ~/local/bin
is not already in your PATH
environment variable, add a command similar to the following to
the file ~/.profile
:
export PATH=$PATH:~/local/bin
The script replay-orders
can then be used retrieve a univariate time-series of prices.
The following example will replay all recorded events for the asset with given ISIN and provide a GUI visualisation of the order-book.
replay-orders -t GB0009252882 --with-gui
The following will replay a subset of events over a given date-range:
replay-orders -t GB0009252882 --with-gui \
--start-date 5/6/2007 --end-date 6/6/2007
The following command will log the mid-price to a CSV file called hf.csv
, but
will not provide a GUI:
replay-orders -t GB0009252882 --property midPrice \
--start-date 5/6/2007 --end-date 6/6/2007 -o hf.csv
The following command will log transaction prices to a CSV file called hf.csv:
replay-orders -t GB0009252882 --property lastTransactionPrice -o hf.csv
To get the full list of options use the built-in help:
replay-orders --help
The simulator provides an Apache Thrift API which allows clients written in non-JVM languages to call the reconstructor. To start the server, run the following script:
order-replay-service
By default the server will listen on TCP port 9090. To see the configurations options, run:
start-replay-server.sh --help
To see an example of using the API from Python see the script tickdata.py.
- The data description provided by the LSE
- The API documentation
To import the project as an IntelliJ IDEA project, first install the Scala
plugin, and then directly import the build.sbt
file as a new project.
-
Install Apache HBase 1.1.2 in standalone mode.
-
Modify the file
base-config.xml
in theetc/
directory of the folder where you unpacked the lse-data distribution as follows:<configuration> <property> <name>hbase.master</name> <value>localhost</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property> </configuration>
-
Create an empty table called
events
with column familydata
using the HBase shell:
cd /opt/hbase/bin
./hbase shell
create 'events', 'data'
- Run the shell script
hbase-import.sh
specifying the raw files to import:
cd ./scripts
./import-data-lse.sh ../data/lse/*.CSV.gz
(C) Steve Phelps 2016