Skip to content

Using the LibraryTool to look at a library's internal state

Alex Seaton edited this page Apr 21, 2024 · 3 revisions

What is the LibraryTool?

The LibraryTool is a tool that can be used to explore that ArcticDB stores on disk.
It is very useful when you want/need to:

  • get a better understanding for how ArcticDB works under the hood
  • debug the state of ArcticDB on disk

How do I use it?

The LibraryTool can be used both with ArcticDB and Arcticc.
For the most part, it is used in the same way, but notes will be made if there is are differences in the interface.

Initialize the LibraryTool

You can use LibraryTool with any ArcticDB library, you simply pass it to the library tool like so:

from arcticdb.toolbox.library_tool import LibraryTool, KeyType
 
ac = Arctic(...)
lib = ac[...]
 
lib_tool = LibraryTool(lib._nvs._library)

If you are using the old Arcticc bindings, the only difference is in the imports:

from arcticcxx.tools import LibraryTool
from arcticc.toolbox.storage import KeyType

Finding all VREF keys

In [215]: lib_tool.find_keys(KeyType.VERSION_REF)
Out[215]: [r:my_symbolr:testr:test2]

Finding all Symbol List keys

In [216]: lib_tool.find_keys(KeyType.SYMBOL_LIST)
Out[216]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
 l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]

Finding all VREF keys for a symbol

In [220]: lib_tool.find_keys_for_symbol(KeyType.VERSION_REF, "test2")
Out[220]: [r:test2]

Reading a specific key

In [221]: keys = lib_tool.find_keys(KeyType.SYMBOL_LIST)
 
In [222]: keys
Out[222]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
 l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]
 
In [223]: lib_tool.read_to_dataframe(keys[1])

If you are using arcticc, you will need to recreate the DataFrame from the underlying segments.
You can use this snippet to do so:

from arcticcxx_toolbox.codec import Buffer, decode_segment
from arcticc.version_store._normalization import FrameData
from arcticcxx.version_store import PythonOutputFrame
import pandas as pd
 
def read_to_df(lib_tool, key):
    segment = lib_tool.read(key).segment
    field_names = [f.name for f in segment.header.stream_descriptor.fields]
    frame_data = FrameData.from_cpp(PythonOutputFrame(decode_segment(segment)))
    cols = {}
    for idx, field_name in enumerate(field_names):
        cols[field_name] = frame_data.data[idx]
    return pd.DataFrame(cols, columns=field_names)