A Node JS data pipeline to scrape Serum DEX events and push them to an S3 bucket for use in a Snowflake cluster.
Event schema derived from SerumTaxTime, node pipeline architecture inspired by 0x Data Pipeline, Serum event scraper code from Mango Markets' Serum History.
The basic logic of the scraper works like this:
- Pull the Serum markets from Serum's repo
- Iterate over the markets, scrape the new trades and append into a CSV
- Check if the CSV is over 250 MB or not (Snowflake's recommended batch size)
- If the CSV is over 250 mb, upload it to S3 (Snowpipe takes it over from here) and start a new CSV for the scrapers to append to.
The Serum DEX scraper is dependent on the structure of the Serum DEX contracts. The Serum DEX contracts store all filled orders in a rotating buffer - with each order that's pushed onto the queue, the Sequence Number is ratcheted up by one. Thus, you can get the difference in Sequence Number from the JSON RPC response header, and pull the part of the JSON response that corresponds to that number of events.
So, each scraper is responsible for a single market, and keeps track of the Last Sequence Number that it saw in a file in pipeline/
, pulls the new events based off of this watermark, and then writes the new watermark to it.
Use yarn
to install.
Set up .env
in the same folder as sample.env
.
Use yarn dev
to run.