Skip to content

Latest commit

 

History

History
370 lines (250 loc) · 21.8 KB

README.md

File metadata and controls

370 lines (250 loc) · 21.8 KB

Elastic License v2 Star on GitHub

HASH Engine

HASH Engine (hEngine) is the computational simulation engine at the heart of HASH. It is publicly and freely available here under the Elastic License.

This public version of hEngine is our 'alpha' engine whose architecture and performance characteristics differ significantly to the stable engine powering hCore, our in-browser simulation envrionment; and hCloud, our cloud simulation compute service. It is not yet stable.

Our ultimate intention is to migrate both hCore and hCloud to use the engine under development here (upon its first stable release).

Table of Contents

Issue Tracking

We use GitHub Issues to help prioritize and track bugs and feature requests. Please use the HASH Engine Issue form to help us deal with your report most effectively.

Additional Documentation

Our user guide for simulation contains a beginner's introduction as well as in-depth tutorials for hCore today.

The HASH glossary contains helpful explainers around key modeling, simulation and AI-related terms and concepts.

The State of Development

As outlined above, this project is the next-generation of our simulation engine, and differs from the one currently powering hCore and hCloud. It's published here as a pre-release technology preview, and as such the feature-set and codebase should be considered unstable until it's released. That means that there are a number of features you may use on the HASH platform that at present may not be supported by this project, notably:

  • Rust runners, and therefore Rust behaviors (which are generally a subset of the @hash behaviors found within hIndex) are currently disabled. This is a high-priority item for us and will be one of the main items of development focused on in the near future. A large part of the implementation is finished and can be found in this repository, if you are interested in exploring it (Although as it is not completely finished, expect to find bugs).

There are a number of other functionalities in the HASH platform that are possibly under-development and/or not stable within the current repository. Feel free to try things out, but don't be dissuaded if they don't work yet. We don't want to make any guarantees until we've had time to properly test features, and for now we're prioritising development to get those features out!

  • For now, running of simulations should be easiest with 'single runs'. (More in-depth usage documentation is found below in Running for development) Various Experiment types have not been fully tested at the moment, and documentation and support may be lacking.
  • Analysis views are also untested at the moment and thus presently are not considered stable or supported.

Building and Testing

The following section assumes execution of commands are from within this directory (i.e. /apps/engine relative to the repository root). As such, paths are relative to the folder this README is in.

Depending on your needs, different dependencies are required. Building this project requires the following.

Required dependencies

Optional dependencies

  • Flatbuffers [2.0.0] is required to generate structs in Javascript, Python, or Rust for messaging between processes in hCloud. Unless the schema files in ./format are changed (and thus require generation to be rerun), flatc is not needed.

    • Flatbuffers installation guidance from their website

      • It's necessary to match the version (2.0.0) with the Rust crate, so build (or otherwise acquire a compiled flatc binary of) the commit associated with the 2.0.0 release

        • One way of checking out the right commit is running the following from within the flatbuffers repository:

          latestTag=$(git describe --tags $(git rev-list --tags --max-count=1))
          git checkout $latestTag
  • Python [3.x >= 3.8] is required for Python initialization or Python behaviors. We strongly recommend using Python 3.10, as this is the version which we presently test hEngine against.

macOS Developer Specific Instructions

Unfortunately, Apple currently doesn't provide a way to resize shared-memory allocations. To work around this, allocations need to be sufficiently big such that they will not need to be resized. This can be done by setting the OS_MEMORY_ALLOC_OVERRIDE environment variable. A reasonable starting value might be 250000000, but it is heavily dependent on the memory-requirements of your simulation. This can be done using the command line

export OS_MEMORY_ALLOC_OVERRIDE=250000000

If you want to run Python behaviors, you will need a copy of the open basic linear algebra subroutines (brew install openblas - if you do not have Homebrew installed, this is easily done from its website) - this is currently necessary to install scipy. You may also need to install a Fortran compiler (brew install gfortran).

Possible Dependencies and Debugging

Depending on how lightweight your OS install is, you may be missing some low level dependencies, so try the following (examples given for Ubuntu/Debian-based Unix systems):

  • apt-get install build-essentials - Includes the GCC/g++ compilers, libraries, and some other utilities
  • apt-get install pkg-config - A helper tool used when compiling applications and libraries
  • apt-get install libssl-dev - A development package of OpenSSL
  • apt-get install python3-dev - A collection of developer utilities for Python such as header files (e.g. needed when seeing fatal error: Python.h: No such file or directory)

Project Setup / Building

  • Run cargo build
  • optional: If Python initialization or Python behaviors are used, set up a Python environment by running ./lib/execution/src/runner/python/setup.sh and follow the instructions from the help.

Running for development

WIP - This section is a work-in-progress. However, slightly more detailed documentation of the CLI is provided below in CLI Arguments and Options.

The CLI binary handles parsing a HASH project, and the lifetime of the engine for an experiment. To use it requires a HASH project to be accessible on the local disk. Follow instructions in the Run a simulation section to learn how to download and create one.

Then, run the CLI using:

cargo run --bin cli -- $CLI_ARGS

Where CLI args are described below in the Usage section, an example of a run command during development would be:

cargo run --bin cli -- $CLI_ARGS -p "<PATH TO HASH PROJECT DIR>" single-run --num-steps $NUM_STEPS

Quick Start Guide

This guide will walk you through downloading a demo simulation, running it, and then finding and verifying its output.

In order to run the demo:

  1. Build the engine as described above

  2. Open the demo simulation and optionally read the overview

  3. Press Open at the upper right to view the simulation in [hCore].

  4. Download it by pressing File -> Export Project

  5. Unzip it either with your file browser or by e.g. unzip ageing-agents.zip -d path/to/ageing-agents

  6. Run the simulation from the apps/engine directory and pass the path to the downloaded project as a parameter:

    cargo run --bin cli -- --project 'path/to/ageing-agents' single-run --num-steps 5

After a short time, the simulation should complete and the process will end. Once this is done, an ./output folder should have been created. Within that, a directory is created for each combination of [project/experiment_name/experiment_uuid/simulation_run]. For a deeper explanation of the output, please take a look at Simulation Outputs.

The ageing simulation increases the age of each agent by one every step. Looking in the json_state.json file, one can see the outputted JSON state has an "age" field for each agent. It should be apparent that the age is increasing with each snapshot of state.

Congratulations! 🎉 , you just ran your first simulation with the hEngine!

Usage

WIP - This section is a work-in-progress. Guidance on production usage will appear here.

CLI Arguments and Options

WIP - CLI arguments are unstable, their presence do not guarantee their usability. It's recommended to stick to single-runs while the project stabilises.

The CLI comes with a short help page: cli help or cli -h. A more detailed explanation about the parameters are available at the long help page with cli --help. To show the help page for subcommands, either append them to the command: cli help single-run, or use -h/--help after the subcommand: cli single-run --help.

If one of the environment variables shown in the help page is passed, it will overwrite the default values. Parameters take precedence over environment variables.

Run a simulation

Warning - Rust runners are currently not supported. Within your simulation project, you should only see .js files within dependencies (for example, dependencies/@hash/age/src/behaviors/age.js). Files ending in .rs will be ignored and the run will possibly fail in unclear ways.

Currently, the easiest way to create a project is by using HASH Core. In the future, an in-depth description of the expected project structure will be given here instead.

In order to download and run a simulation from HASH Core, use File -> Export Project (this is available in the toolbar at the top of the page). For help in finding or creating, and editing, simulations in HASH Core, take a look at our online documentation. Then save and unzip the downloaded project to a location of your choice, for example by

unzip my-project.zip -d my-hash-project

To run the simulation, build the binaries and pass the project location as a CLI argument:

cargo run --bin cli -- --project /path/to/my-hash-project single-run --num-steps $NUM_STEPS

In order to see more logging information while the simulation is running, you can modify the Rust logging level by exporting RUST_LOG before running, e.g.:

export RUST_LOG=debug

If your simulation requires a lot of memory and uses JavaScript behaviors, the JavaScript runner may run out of memory. As a first step, you can provide a larger heap size to the runner:

cargo run --bin cli -- --js-runner-max-heap-size $NEW_SIZE_IN_MB $CLI_ARGS

This will increase the heap size, but you may still run into limitations beyond 4GB. The next step is to recompile V8, the underlying JavaScript engine, and set flags for it:

export V8_FROM_SOURCE = "1"
export GN_ARGS = "v8_enable_pointer_compression=false v8_enable_shared_ro_heap=true"
  • V8_FROM_SOURCE will force the V8 engine to be compiled from source and not use a pre-compiled version. This will take quite a long time (expect at least 15 minutes). This can be mitigated in subsequent compiles by using sccache or ccache. Our build scripts will detect and use them. Set the environment variable $SCCACHE or $CCACHE if the binary is not in your $PATH.
  • v8_enable_pointer_compression is an optimization reducing RAM usage but limits the heap size to 4 gigabytes.
  • v8_enable_shared_ro_heap enables read-only memory sharing by V8 isolates. This means, that read-only memory may be shared across different workers for JavaScript. Enabling this is required to compile V8 without pointer compression.

Simulation Inputs

WIP - This section is a work-in-progress. More in-depth documentation is in the works for describing all input formats and options, and expected project structure. For now, we recommend that you create your simulations within [hCore] and use the "Export Project" functionality.

Behavior keys

Behavior keys define the fields, and their respective data type, that a behavior accesses on an agent's state. See the docs for an explanation of behavior keys in general.

If you haven't created and exported a project from [hCore], it's also possible to manually create the file that specifies the behaviors keys. Generally, every user-defined variable on state (i.e. a behavior key) requires it to be specified within the accompanying .json file. The top level JSON object has up to three members, "keys", "built_in_key_use", and "dynamic_access", while the latter two are neither required, nor used currently:

{
  "keys": {},
  "built_in_key_use": null,
  "dynamic_access": true
}

Note: JSON Objects have fields/members specified by [key]/[value] pairs. To avoid confusion between behavior keys and the keys in JSON Objects, we will be referring to the pairs as members or fields.

"keys" is a JSON object where each [key]/[value] pair is [behavior key name]/[behavior key specification]. The behavior key specification is a JSON object with at least two required members, "type" and "nullable". Depending on "type" other members may be required. We support the following "type" values:

  • "any": Can be any datatype (when performance becomes a concern, a specific data-type should be preferred)

  • "number": A 64 bit floating point number

  • "string": The encoding depends on the language used for the behavior

  • "boolean": Either true or false

  • "struct": A nested object, which then additionally requires adding a new member called "fields" with the same schema as the top-level "keys". Example:

    {
      "keys": {
        "struct_field": {
          "nullable": true,
          "type": "struct",
          "fields": {
            "field1": {
              "type": "number",
              "nullable": true
            },
            "field2": {
              "type": "number",
              "nullable": true
            }
          }
        }
      }
    }
  • "list": an array with an arbitrary number of sub-elements of the same type, which then have to be specified with the addition of another member called "child":

    {
      "keys": {
        "list_field": {
          "nullable": true,
          "type": "list",
          "child": {
            "type": "string",
            "nullable": true
          }
        }
      }
    }
  • "fixed-size-list": an array with exactly "length" number of sub-elements of the same type, which then have to be specified with the addition of another member called "child":

    {
      "keys": {
        "fixed_size_list_field": {
          "nullable": true,
          "type": "fixed_size_list",
          "length": 4,
          "child": {
            "type": "boolean",
            "nullable": true
          }
        }
      }
    }

Simulation Outputs

WIP - This section is a work-in-progress. More in-depth documentation is in the works for describing all output formats and options. As such some functionality may not be mentioned here, and some functionality alluded to here might not be complete at present. Currently, the engine has two main form of outputs, one coming from the json_state package and the other from the analysis package.

At the end of each simulation run, various outputs appear within the ./<OUTPUT FOLDER>/<PROJECT NAME>/<EXPERIMENT NAME>/<EXPERIMENT ID>/<SIMULATION ID> directories.

Where:

  • <PROJECT NAME> is the name of the folder containing your experiments
  • <EXPERIMENT ID> and <SIMULATION ID> are unique identifiers created for each run of an experiment or simulation.

There is an override (CLI Arguments and Options) for the default of the output folder.

JSON-State [json_state.json]

Better documentation describing the structure of the file is planned

By default, the engine outputs a serialized snapshot of Agent state every step.

During the run, the output may be buffered into the ./parts folder in multiple files. These files are not necessarily valid JSON as the resultant state blob that appears within json_state.json is split up (hence part) for buffering purposes.

Analysis [analysis_outputs.json]

WIP - This feature is currently unstable

[hCore] currently provides functionality where simulations can apply custom analysis on user-defined metrics. The functionality has been ported across to this codebase in the analysis package, however development is planned to stabilise it. As such, this functionality is neither tested, nor considered supported.

Logging

The engine (and CLI) currently logs to both stderr, and to the ./log directory. The latter is machine-parseable JSON-formatted structured logging, while the stderr logs are configurable through the command-line arguments of both binaries (see CLI Arguments and Options).

Main Concepts

Being familiar with running experiments and simulations on the HASH platform will help a lot with understanding the Engine. The docs are also a good place to search for clarification on some terms used below when unclear.

The Project Layout

Currently the hEngine consists of two binaries located within the ./bin folder. To read the documentation for the various components, run:

cargo doc --workspace --no-deps --open

and explore the documentation for the relevant crates (starting with the following two)

The CLI

Located within ./bin/cli, the CLI binary is responsible for the orchestration of a HASH simulation project, handling the management of engine processes for its experiments.

The Engine Process(es)

Located within ./bin/hash_engine, the HASH Engine binary implements all of the logic required for running a single experiment and its one or more simulations.