Data recordings

Data recordings are stored in a custom binary format to reduce CPU load of parsing, as well as a lower file size, when compared to text-based formats like CSV, JSON, or XML.

Structure

Header and source ID constant pool

Bytes	Description
0-3	A magic number (`0xFEEDBAC4`) header to help identify valid recording files
4-7	The number of recorded data points, stored as a 32-bit signed int
8-x	A string array of all the recorded source IDs

Sources IDs are stored in a constant pool to drastically reduce file size. Duplicating source IDs in every data entry

Advantages of a constant pool

For a source ID of length k with n recorded data points, where k and n are both positive integers:

Constant pool takes k+2n bytes
Naive approach takes n*k bytes

n	Size w/ constant pool	Size w/ naive approach	When is the constant pool smaller?
1	`k+2`	`k`	Never
2	`k+4`	`2k`	When `k >= 5`
3	`k+6`	`3k`	`k >= 4`
4	`k+8`	`4k`	`k >= 3`
5	`k+10`	`5k`	`k >= 3`
6	`k+12`	`6k`	`k >= 2`
7	`k+14`	`7k`	Always

Since source IDs will always be in the format {protocol}://{name}, k will always be at least 5 (and often much larger), making the constant pool approach always more efficient when there are at least two recorded data points for a source, and only 2 bytes worse in the worst case of only one entry.

Data

Data is recorded as:

Bytes	Description
0-7	Timestamp of when the data was recorded. 64-bit signed int
8-9	Index in the source ID constant pool for the source of the data
10-x	Name of the data type (variable-length String)
x+1-y	The encoded data. The format is data-type specific

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data recordings

Structure

Header and source ID constant pool

Advantages of a constant pool

Data

Clone this wiki locally