Main class used to interface with an index/database in the local filesystem. An index is a directory containing a pair of data.mdb
and lock.mdb
files.
Index(path, map_size=...)
Opens or creates an index at the specified directory, limiting the maximum size of the underlying databse.
Parameter | Required | Type | Description |
---|---|---|---|
path |
Yes | Path-like object | Index directory. Directory must exist. |
map_size |
No | int |
Maximum size in bytes of data.mdb . Defaults set by Milli. |
Example:
>>> index = Index('path/to/index', map_size=2**30) # Open/create index of up-to 1 GiB
Index.add_documents(documents)
Adds documents to the index.
Parameter | Required | Type | Description |
---|---|---|---|
documents |
Yes | List[Dict[str,Any]] |
List of JSON-convertible dictionaries, i.e. dictionaries with string keys mapping to integers, floats, booleans, strings, arrays, and other dictionaries with string keys (potentially nested). |
Returns: TODO.
Example:
>>> index.add_documents([
{ 'id': 0, 'title': 'Hello earth', 'tags': ['greeting', 'planet'], 'orbit': 3 },
{ 'id': 1, 'title': 'Hello mars', 'tags': ['greeting', 'planet'], 'orbit': 4 },
{ 'id': 2, 'title': 'Hello sun', 'tags': ['greeting', 'star'] },
])
Index.all_documents()
Iterator of all documents in the index alongside their internal IDs.
Returns: Iterator[Tuple[int,Dict]]
.
Index.clear_documents()
Remove all documents from the index.
Returns: Number of documents removed.
Index.delete_documents(ids)
Removes documents from the index given their external ID.
Parameter | Required | Type | Description |
---|---|---|---|
ids |
Yes | List[str] |
List of strings, each corresponding to an external ID. |
Returns: TODO.
Index.get_document(id)
Obtain a document from the index given its internal ID.
Parameter | Required | Type | Description |
---|---|---|---|
id |
Yes | int |
Internal document ID. |
Returns: Dict[str,Any]
. Document contents.
Example:
>>> index.get_document(0)
{ 'id': 0, 'title': 'Hello earth', 'tags': ['greeting', 'planet'], 'orbit': 3 }
Index.get_documents(ids)
Obtain a list of document from the index given their internal IDs.
Parameter | Required | Type | Description |
---|---|---|---|
ids |
Yes | List[int] |
List of internal document IDs. |
Returns: List[Dict[str,Any]]
. List of document contents.
Example (formatted):
>>> index.get_documents([1,2])
[
{ 'id': 1, 'title': 'Hello mars', 'tags': ['greeting', 'planet'], 'orbit': 4 },
{ 'id': 2, 'title': 'Hello sun', 'tags': ['greeting', 'star'] }
]
Index.search(query)
Searches the index for the given input string.
Parameter | Required | Type | Description |
---|---|---|---|
query |
Yes | str |
Text to query the index with. |
Returns: List[int]
. List of internal IDs of matching documents, sorted by decreasing match score. You can retrieve the full documents by applying Index.get_documents
on this list.
Example:
>>> index.search('earht')
[0]