From 8a850de583865d01eb4f4b86aafa3b152f0e65d5 Mon Sep 17 00:00:00 2001 From: Haile Lagi <52631736+hailelagi@users.noreply.github.com> Date: Fri, 25 Oct 2024 13:29:08 +0100 Subject: [PATCH] more stuff Signed-off-by: Haile Lagi <52631736+hailelagi@users.noreply.github.com> --- content/bookshelf.md | 1 + .../{notes => writing}/how-do-databases-count.md | 15 ++++++++++----- 2 files changed, 11 insertions(+), 5 deletions(-) rename content/{notes => writing}/how-do-databases-count.md (91%) diff --git a/content/bookshelf.md b/content/bookshelf.md index 01c0b2b..6f71c3f 100644 --- a/content/bookshelf.md +++ b/content/bookshelf.md @@ -9,6 +9,7 @@ Stuff I've read or re-read in the past few months: Previously Explored: +- [A Tour of C++](https://www.amazon.co.uk/Tour-C-Depth/dp/0321958314) - [The Beam Book](https://github.com/happi/theBeamBook) - [The Go Programming Language](https://www.gopl.io/) - [Metaprogramming Elixir](https://pragprog.com/titles/cmelixir/metaprogramming-elixir/) diff --git a/content/notes/how-do-databases-count.md b/content/writing/how-do-databases-count.md similarity index 91% rename from content/notes/how-do-databases-count.md rename to content/writing/how-do-databases-count.md index a2b5c32..6dd00f3 100644 --- a/content/notes/how-do-databases-count.md +++ b/content/writing/how-do-databases-count.md @@ -28,7 +28,7 @@ postgres=# explain analyze select 1 + 1; (3 rows) ``` -This is not the only way to represent a query plan, sqlite on the other hand does a curious thing, instead of holding a tree as an internal representation, it compiles [down to bytecode](https://www.sqlite.org/opcode.html), why it makes this decision is a plenty interesting design space[^2]: +This is not the only reperesentation, sqlite on the other hand does a curious thing, instead of holding a tree as an internal representation, it compiles [down to bytecode](https://www.sqlite.org/opcode.html), why it makes this decision is a plenty interesting design space[^2]: ``` sqlite> explain select 1 + 1; @@ -58,7 +58,7 @@ postgres=# select 1 + 1; (1 row) ``` -Let's make a small query engine, to answer our question: +To answer our question, a query engine in [less than 500 lines]() of rust: ``` select count(distinct col) from table; @@ -145,9 +145,14 @@ A projection is: } ``` -Now we have a **logical plan** of operations and transformations on this query, but it's defined in a _syntax_ for these operations, re-enter SQL, or was it SEQUEL? Of note is the observation, the **logical operations are independent of the syntax** used to describe them. We need to first parse the sql, and build a simple abstract syntax tree where the nodes are the logical operators: selection, projection and preserving the semantics of applying the `count`, parsing such a simple query doesn't require a [sophisticated scheme that's generalized over a grammar](https://en.wikipedia.org/wiki/Recursive_descent_parser) all we need is: +Now we have a **logical plan** of operations and transformations on this query, but it's defined in a _syntax_ for these operations, +re-enter SQL, or was it SEQUEL? Of note is the observation, the **logical operations are independent of the syntax** used to describe them. +We need to first parse the sql, and build a simplified abstract syntax tree where the nodes are the logical operators: selection, projection +and preserving the semantics of applying the `count`, luckily this query engine doesn't need to support the SQL standard or dialects! and we can get +away with not [using a pretty cool generalization over a grammar](https://en.wikipedia.org/wiki/Recursive_descent_parser) +all we need is: ```go -// parser.go +// parser.rs ``` @@ -284,7 +289,7 @@ todo: port over the c++ to rust ```rust ``` -HyperLogLog is now a fairly standard data structure in analytics databases, despite being invented relatively not that long ago, a few examples of adoption in the postgres ecosystem are: [citus](https://docs.citusdata.com/en/stable/articles/hll_count_distinct.html), [crunchydata](https://www.crunchydata.com/blog/high-compression-metrics-storage-with-postgres-hyperloglog) and [timescaleDB](https://docs.timescale.com/use-timescale/latest/hyperfunctions/approx-count-distincts/hyperloglog/), broadly at [meta(presto)](https://engineering.fb.com/2018/12/13/data-infrastructure/hyperloglog/), in [google](http://research.google/pubs/hyperloglog-in-practice-algorithmic-engineering-of-a-state-of-the-art-cardinality-estimation-algorithm/) at [Big Query](https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions) and much more. Thanks for reading! +HyperLogLog is now a fairly standard data structure in analytics databases, despite being invented relatively not that long ago, a few examples of adoption in the postgres ecosystem are: [citus](https://docs.citusdata.com/en/stable/articles/hll_count_distinct.html), [crunchydata](https://www.crunchydata.com/blog/high-compression-metrics-storage-with-postgres-hyperloglog) and [timescaleDB](https://docs.timescale.com/use-timescale/latest/hyperfunctions/approx-count-distincts/hyperloglog/), broadly at [meta(presto)](https://engineering.fb.com/2018/12/13/data-infrastructure/hyperloglog/), in [google](http://research.google/pubs/hyperloglog-in-practice-algorithmic-engineering-of-a-state-of-the-art-cardinality-estimation-algorithm/) at [Big Query](https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions), [Redis](https://antirez.com/news/75) and much more. Thanks for reading! [^1]: [System R](https://www.seas.upenn.edu/~zives/cis650/papers/System-R.PDF)