Skip to content

Latest commit

 

History

History
133 lines (84 loc) · 4.06 KB

index.md

File metadata and controls

133 lines (84 loc) · 4.06 KB

Documentation

Index

Main class used to interface with an index/database in the local filesystem. An index is a directory containing a pair of data.mdb and lock.mdb files.

Constructor

Index(path, map_size=...)

Opens or creates an index at the specified directory, limiting the maximum size of the underlying databse.

Parameter Required Type Description
path Yes Path-like object Index directory. Directory must exist.
map_size No int Maximum size in bytes of data.mdb. Defaults set by Milli.

Example:

>>> index = Index('path/to/index', map_size=2**30) # Open/create index of up-to 1 GiB

Methods

Index.add_documents

Index.add_documents(documents)

Adds documents to the index.

Parameter Required Type Description
documents Yes List[Dict[str,Any]] List of JSON-convertible dictionaries, i.e. dictionaries with string keys mapping to integers, floats, booleans, strings, arrays, and other dictionaries with string keys (potentially nested).

Returns: TODO.

Example:

>>> index.add_documents([
    { 'id': 0, 'title': 'Hello earth', 'tags': ['greeting', 'planet'], 'orbit': 3 },
    { 'id': 1, 'title': 'Hello mars', 'tags': ['greeting', 'planet'], 'orbit': 4 },
    { 'id': 2, 'title': 'Hello sun', 'tags': ['greeting', 'star'] },
])

Index.all_documents

Index.all_documents()

Iterator of all documents in the index alongside their internal IDs.

Returns: Iterator[Tuple[int,Dict]].

Index.clear_documents

Index.clear_documents()

Remove all documents from the index.

Returns: Number of documents removed.

Index.delete_documents

Index.delete_documents(ids)

Removes documents from the index given their external ID.

Parameter Required Type Description
ids Yes List[str] List of strings, each corresponding to an external ID.

Returns: TODO.

Index.get_document

Index.get_document(id)

Obtain a document from the index given its internal ID.

Parameter Required Type Description
id Yes int Internal document ID.

Returns: Dict[str,Any]. Document contents.

Example:

>>> index.get_document(0)
{ 'id': 0, 'title': 'Hello earth', 'tags': ['greeting', 'planet'], 'orbit': 3 }

Index.get_documents

Index.get_documents(ids)

Obtain a list of document from the index given their internal IDs.

Parameter Required Type Description
ids Yes List[int] List of internal document IDs.

Returns: List[Dict[str,Any]]. List of document contents.

Example (formatted):

>>> index.get_documents([1,2])
[
    { 'id': 1, 'title': 'Hello mars', 'tags': ['greeting', 'planet'], 'orbit': 4 },
    { 'id': 2, 'title': 'Hello sun', 'tags': ['greeting', 'star'] }
]

Index.search

Index.search(query)

Searches the index for the given input string.

Parameter Required Type Description
query Yes str Text to query the index with.

Returns: List[int]. List of internal IDs of matching documents, sorted by decreasing match score. You can retrieve the full documents by applying Index.get_documents on this list.

Example:

>>> index.search('earht')
[0]