Skip to content

Commit

Permalink
rewrite it in rust
Browse files Browse the repository at this point in the history
Signed-off-by: Haile Lagi <[email protected]>
  • Loading branch information
hailelagi committed Oct 24, 2024
1 parent 59732da commit 4a7033d
Showing 1 changed file with 36 additions and 40 deletions.
76 changes: 36 additions & 40 deletions content/notes/how-do-databases-count.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "How do databases count?"
date: 2024-10-16T19:05:55+01:00
draft: true
tags: go, rust, sql, probability
tags: rust, sql, probability
---

Given the simple query below, how does a database count?
Expand Down Expand Up @@ -104,52 +104,44 @@ A selection is:
// 1. constant
// 2. equality selection
// SQL: SELECT * FROM R WHERE a_id = 'a2'
pub fn select(&self, idx: usize, expr: &str) -> Relation {
let result: Vec<Vec<String>> = self.rows
.iter()
.filter(|row| row[idx] == expr)
.cloned()
.collect();

Relation {
col_names: self.col_names.clone(), // Clone the column names
rows: result,
}
}

func ConstantSelect(relation Relation, idx int, expr string) Relation {
result := make([]Row, 0)

for _, r := range relation.rows {
if r[idx] == expr {
result = append(result, r)
}
}

return Relation{
colNames: relation.colNames,
rows: result,
}
}
```

A projection is:
```go
```rust
// Projection: modification(r/w/order) over columns, changes the shape of output/attributes
// π(a1,a2),. . . , (a)n(R).
// SQL: SELECT b_id-100, a_id FROM R WHERE a_id = 'a2'
func Projection(relation Relation, columns []int) Relation {
result := make([]Row, 0)

for _, row := range relation.rows {
newRow := make(Row, len(columns))

for i, colIdx := range columns {
newRow[i] = row[colIdx]
}

result = append(result, newRow)
}

colNames := make([]string, len(columns))

for i, colIdx := range columns {
colNames[i] = relation.colNames[colIdx]
}

return Relation{
colNames: colNames,
rows: result,
}
}
pub fn projection(&self, columns: &[usize]) -> Relation {
let result: Vec<Vec<String>> = self.rows
.iter()
.map(|row| {
columns.iter().map(|&col_idx| row[col_idx].clone()).collect()
})
.collect();

let col_names: Vec<String> = columns
.iter()
.map(|&col_idx| self.col_names[col_idx].clone())
.collect();

Relation {
col_names,
rows: result,
}
}
```

Now we have a **logical plan** of operations and transformations on this query, but it's defined in a _syntax_ for these operations, re-enter SQL, or was it SEQUEL? Of note is the observation, the **logical operations are independent of the syntax** used to describe them. We need to first parse the sql, and build a simple abstract syntax tree where the nodes are the logical operators: selection, projection and preserving the semantics of applying the `count`, parsing such a simple query doesn't require a [sophisticated scheme that's generalized over a grammar](https://en.wikipedia.org/wiki/Recursive_descent_parser) all we need is:
Expand Down Expand Up @@ -192,6 +184,8 @@ todo: approach? minimal execution layer?

### Probabilistic Counting

assumptions: hashed is pseudo-uniform.

The intuition:

{{% callout %}}
Expand All @@ -218,6 +212,8 @@ Space Complexity: **O(log log N)**

Parallel: (✅)

assumptions: hashed is assumed uniformly distributed

This algorithm allows the estimation of cardinality of datasets to the tune of over a billion! using only ~1.5kilobytes, and a margin of error of roughly 98% accuracy, those are incredible numbers


Expand Down

0 comments on commit 4a7033d

Please sign in to comment.