Skip to content

4.) Internal Dataflow

Andrew Plaza edited this page Jan 26, 2022 · 18 revisions

Substrate Archive uses the Actor Model in order to effectively model the dataflow incoming from a substrate chain to a PostgreSQL database. This makes it conceptually and programmatically easier to model the data in a concurrent way. Rustaceans may be familiar with this concept from actix-web which uses these concepts to create a web framework. While substrate-archive isn't using the underlying actix library (it instead uses xtra, a lightweight actix-inspired actor library), the concepts are the same.

This document describes how Substrate Archive's actors work together in order to deliver data into PostgreSQL. My goal with this document is to make it easier for new contributors to become familiar with how the code works.

Why?

  • Actors make it easy to model dataflow in a consistent way
  • Easy to add/remove actors to transform data into a different way, based on requirements
    • for example, adding a GraphQL Overlay or adding Decoding for types means just creating a new actor and modifying some references in the workers to achieve a different data format, without necessarily caring what the other actors are doing internally, just knowing what types they require in order to function.
  • Concurrency: Data gathering can be made concurrent while adhering to a very structured strategy.
    • for example, the Actor Model abstraction can provide for 'Remote Actors' speeding up data gathering by including actors from another machine on the network inserting into the same database.

Directional Dataflow

The actor model means that data in substrate-archive will only ever flow one way, and each actor only knows about the data through the messages it receives.This keeps state and state changes inside of an actor to a minimum. Generally, each actor receives some form of data, transforms it into another form, and sends it to the next actor. Each actor manages their state themselves. Shared state takes the form of external connections substrate archive needs in order to function: The Postgres Database, Substrate Client, or RocksDB database. Generally, actors only use one of these connections.

Data Diagram

Diagram