-
Notifications
You must be signed in to change notification settings - Fork 93
Using the LibraryTool to look at a library's internal state
Alex Seaton edited this page Apr 21, 2024
·
3 revisions
The LibraryTool is a tool that can be used to explore that ArcticDB stores on disk.
It is very useful when you want/need to:
- get a better understanding for how ArcticDB works under the hood
- debug the state of ArcticDB on disk
The LibraryTool can be used both with ArcticDB and Arcticc.
For the most part, it is used in the same way, but notes will be made if there is are differences in the interface.
You can use LibraryTool with any ArcticDB library, you simply pass it to the library tool like so:
from arcticdb.toolbox.library_tool import LibraryTool, KeyType
ac = Arctic(...)
lib = ac[...]
lib_tool = LibraryTool(lib._nvs._library)
If you are using the old Arcticc bindings, the only difference is in the imports:
from arcticcxx.tools import LibraryTool
from arcticc.toolbox.storage import KeyType
In [215]: lib_tool.find_keys(KeyType.VERSION_REF)
Out[215]: [r:my_symbol, r:test, r:test2]
In [216]: lib_tool.find_keys(KeyType.SYMBOL_LIST)
Out[216]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]
In [220]: lib_tool.find_keys_for_symbol(KeyType.VERSION_REF, "test2")
Out[220]: [r:test2]
In [221]: keys = lib_tool.find_keys(KeyType.SYMBOL_LIST)
In [222]: keys
Out[222]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]
In [223]: lib_tool.read_to_dataframe(keys[1])
If you are using arcticc, you will need to recreate the DataFrame from the underlying segments.
You can use this snippet to do so:
from arcticcxx_toolbox.codec import Buffer, decode_segment
from arcticc.version_store._normalization import FrameData
from arcticcxx.version_store import PythonOutputFrame
import pandas as pd
def read_to_df(lib_tool, key):
segment = lib_tool.read(key).segment
field_names = [f.name for f in segment.header.stream_descriptor.fields]
frame_data = FrameData.from_cpp(PythonOutputFrame(decode_segment(segment)))
cols = {}
for idx, field_name in enumerate(field_names):
cols[field_name] = frame_data.data[idx]
return pd.DataFrame(cols, columns=field_names)
ArcticDB Wiki