Skip to content

Data recordings

Sam Carlberg edited this page Nov 10, 2017 · 1 revision

Data recordings are stored in a custom binary format to reduce CPU load of parsing, as well as a lower file size, when compared to text-based formats like CSV, JSON, or XML.

Structure

Header and source ID constant pool

Bytes Description
0-3 A magic number (0xFEEDBAC4) header to help identify valid recording files
4-7 The number of recorded data points, stored as a 32-bit signed int
8-x A string array of all the recorded source IDs

Sources IDs are stored in a constant pool to drastically reduce file size. Duplicating source IDs in every data entry

Advantages of a constant pool

For a source ID of length k with n recorded data points, where k and n are both positive integers:

  • Constant pool takes k+2n bytes
  • Naive approach takes n*k bytes
n Size w/ constant pool Size w/ naive approach When is the constant pool smaller?
1 k+2 k Never
2 k+4 2k When k >= 5
3 k+6 3k k >= 4
4 k+8 4k k >= 3
5 k+10 5k k >= 3
6 k+12 6k k >= 2
7 k+14 7k Always

Since source IDs will always be in the format {protocol}://{name}, k will always be at least 5 (and often much larger), making the constant pool approach always more efficient when there are at least two recorded data points for a source, and only 2 bytes worse in the worst case of only one entry.

Data

Data is recorded as:

Bytes Description
0-7 Timestamp of when the data was recorded. 64-bit signed int
8-9 Index in the source ID constant pool for the source of the data
10-x Name of the data type (variable-length String)
x+1-y The encoded data. The format is data-type specific
Clone this wiki locally