-
Notifications
You must be signed in to change notification settings - Fork 84
Data recordings
Sam Carlberg edited this page Nov 10, 2017
·
1 revision
Data recordings are stored in a custom binary format to reduce CPU load of parsing, as well as a lower file size, when compared to text-based formats like CSV, JSON, or XML.
Bytes | Description |
---|---|
0-3 | A magic number (0xFEEDBAC4 ) header to help identify valid recording files |
4-7 | The number of recorded data points, stored as a 32-bit signed int |
8-x | A string array of all the recorded source IDs |
Sources IDs are stored in a constant pool to drastically reduce file size. Duplicating source IDs in every data entry
For a source ID of length k with n recorded data points, where k and n are both positive integers:
- Constant pool takes
k+2n
bytes - Naive approach takes
n*k
bytes
n | Size w/ constant pool | Size w/ naive approach | When is the constant pool smaller? |
---|---|---|---|
1 | k+2 |
k |
Never |
2 | k+4 |
2k |
When k >= 5
|
3 | k+6 |
3k |
k >= 4 |
4 | k+8 |
4k |
k >= 3 |
5 | k+10 |
5k |
k >= 3 |
6 | k+12 |
6k |
k >= 2 |
7 | k+14 |
7k |
Always |
Since source IDs will always be in the format {protocol}://{name}
, k
will always be at least 5
(and often much larger), making the constant pool approach always more efficient when there are at least two recorded
data points for a source, and only 2 bytes worse in the worst case of only one entry.
Data is recorded as:
Bytes | Description |
---|---|
0-7 | Timestamp of when the data was recorded. 64-bit signed int |
8-9 | Index in the source ID constant pool for the source of the data |
10-x | Name of the data type (variable-length String) |
x+1-y | The encoded data. The format is data-type specific |