Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
islamaliev committed Dec 16, 2024
1 parent 7a49015 commit 137c634
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 16 deletions.
2 changes: 1 addition & 1 deletion client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ The `client` package is the primary access point for interacting with an embedde

[Data definition overview](./data_definition.md) - How the shape of documents are defined and grouped.

[Secondary indexes](./secondary_indexes.md) - How secondary indexes work in DefraDB and how to use them.
[Secondary indexes](./secondary_indexes.md) - Using secondary indexes in DefraDB.
30 changes: 15 additions & 15 deletions client/secondary_indexes.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Secondary Indexing in DefraDB
# Secondary indexing in DefraDB

DefraDB provides a powerful and flexible secondary indexing system that enables efficient document lookups and queries. This document explains the architecture, implementation details, and usage patterns of the indexing system.

## Overview

The indexing system consists of two main components. The first is index storage, which handles storing and maintaining index information. The second is index-based document fetching, which manages retrieving documents using these indexes. Together, these components provide a robust foundation for efficient data access patterns.

## Index Storage
## Index storage

### Core Types
### Core types

The indexing system is built around several key types that define how indexes are structured and managed. At its heart is the IndexedFieldDescription, which describes a single field being indexed, including its name and whether it should be ordered in descending order. These field descriptions are combined into an IndexDescription, which provides a complete picture of an index including its name, ID, fields, and whether it enforces uniqueness.

Expand Down Expand Up @@ -38,7 +38,7 @@ type CollectionIndex interface {
}
```

### Key Structure
### Key structure

Index keys in DefraDB follow a carefully designed format that enables efficient lookups and range scans. For regular indexes, the key format is:
```
Expand All @@ -49,15 +49,15 @@ Unique indexes follow a similar pattern but store the document ID as the value i
<collection_id>/<index_id>(/<field_value>)+ -> <doc_id>
```

### Value Encoding
### Value encoding

While DefraDB primarily uses CBOR for encoding, the indexing system employs a custom encoding/decoding solution inspired by CockroachDB. This decision was made because CBOR doesn't guarantee ordering preservation, which is crucial for index functionality. Our custom encoding ensures that numeric values maintain their natural ordering, strings are properly collated, and complex types like arrays and objects have deterministic ordering.

### Index Maintenance
### Index maintenance

Index maintenance happens through three primary operations: document creation, updates, and deletion. When a new document is saved, the system indexes all configured fields, generating entries according to the key format and validating any unique constraints. During updates, the system carefully manages both the removal of old index entries and the creation of new ones, ensuring consistency through atomic transactions. For deletions, all associated index entries are cleaned up along with related metadata.

## Index-Based Document Fetching
## Index-based document fetching

The IndexFetcher is the cornerstone of document retrieval, orchestrating the process of fetching documents using indexes. It operates in two phases: first retrieving indexed fields (including document IDs), then using a standard fetcher to get any additional requested fields.

Expand All @@ -67,13 +67,13 @@ The performance characteristics of these operations vary. Direct match operation

Note: the index fetcher can not benefit at the moment from ordered indexes, as the underlying storage does not support such range queries yet.

## Performance Considerations
## Performance considerations

When working with indexes, it's important to understand their impact on system performance. Each index increases write amplification as every document modification must update all relevant indexes. However, this cost is often outweighed by the dramatic improvement in read performance for indexed queries.

Index selection should be driven by your query patterns and data distribution. Indexing fields that are frequently used in query filters can significantly improve performance, but indexing rarely-queried fields only adds overhead. For unique indexes, the additional validation requirements make this trade-off even more important to consider.

## Indexing Related Objects
## Indexing related objects

DefraDB's indexing system provides powerful capabilities for handling relationships between documents. Let's explore how this works with a practical example.

Expand Down Expand Up @@ -108,7 +108,7 @@ query {
For requests on not indexed relations, the normal approach is from top to bottom, meaning that first all `User` documents are fetched and then for each `User` document the corresponding `Address` document is fetched. This can be very inefficient for large collections.
With indexing, we use so called inverted fetching, meaning that we first fetch the `Address` documents with the matching `city` value and then for each `Address` document the corresponding `User` document is fetched. This is much more efficient as we can use the index to directly fetch the `User` document.

### Relationship Cardinality Through Indexes
### Relationship cardinality using indexes

The indexing system also plays a crucial role in enforcing relationship cardinality. By marking an index as unique, you can enforce one-to-one relationships between documents. Here's how you would modify the schema to ensure each User has exactly one Address:

Expand All @@ -128,11 +128,11 @@ type Address {

The unique index constraint ensures that no two Users can reference the same Address document. Without the unique constraint, the relationship would be one-to-many by default, allowing multiple Users to reference the same Address.

## JSON Field Indexing
## JSON field indexing

DefraDB implements a specialized indexing system for JSON fields that differs from how other field types are handled. While a document in DefraDB can contain various field types (Int, String, Bool, JSON, etc.), JSON fields require special treatment due to their hierarchical nature.

#### The JSON Interface
#### JSON interface

The indexing system relies on the `JSON` interface defined in `client/json.go`. This interface is crucial for handling JSON fields as it enables traversal of all leaf nodes within a JSON document. A `JSON` value in DefraDB can represent either an entire JSON document or a single node within it. Each `JSON` value maintains its path information, which is essential for indexing.

Expand All @@ -149,7 +149,7 @@ For example, given this JSON document:

The system can represent the "iPhone" value as a `JSON` type with its complete path `[]string{"user", "device", "model"}`. This path-aware representation is fundamental to how the indexing system works.

#### Inverted Indexes for JSON
#### Inverted indexes for JSON

For JSON fields, DefraDB uses inverted indexes with the following key format:
```
Expand All @@ -160,13 +160,13 @@ The term "inverted" comes from how these indexes reverse the typical document-to

This approach differs from traditional secondary indexes in DefraDB. While regular fields map to single index entries, a JSON field generates multiple index entries - one for each leaf node in its structure. The system traverses the entire JSON structure during indexing, creating entries that combine the path and value information.

#### Value Normalization and JSON
#### Value normalization and JSON

The indexing system integrates with DefraDB's value normalization through `client.NormalValue`. While the encoding/decoding package handles scalar types directly, JSON values maintain additional path information. Each JSON node is encoded with both its normalized value and its path information, allowing the system to reconstruct the exact location of any value within the JSON structure.

Similar to how other field types are normalized (e.g., integers to int64), JSON leaf values are normalized based on their type before being included in the index. This ensures consistent ordering and comparison operations.

#### Integration with Index Infrastructure
#### Integration with index infrastructure

When a document with a JSON field is indexed, the system:
1. Uses the JSON interface to traverse the document structure
Expand Down

0 comments on commit 137c634

Please sign in to comment.