From e70b62c61468dbcd049b88ce5f9d0c005aed5291 Mon Sep 17 00:00:00 2001 From: Christine Rose Date: Wed, 28 Aug 2024 01:37:14 -0700 Subject: [PATCH 1/5] Line Editing for Irmin README --- README.md | 134 +++++++++++++++++++++++++++--------------------------- 1 file changed, 68 insertions(+), 66 deletions(-) diff --git a/README.md b/README.md index 5025937aa1..ad9dcd253d 100644 --- a/README.md +++ b/README.md @@ -28,18 +28,19 @@
Irmin is based on distributed version-control systems (DVCs), -extensively used in software development to enable developers to keep -track of change provenance and expose modifications in the source +extensively used in software development to track data +provenance and show modifications in the source code. Irmin applies DVC's principles to large-scale distributed data -and exposes similar functions to Git (clone, push, pull, branch, -rebase). It is highly customizable: users can define their types to -store application-specific values and define custom storage layers (in -memory, on disk, in a remote Redis database, in the browser, -etc.). The Git workflow was initially designed for humans to manage +and includes similar functions to Git (clone, push, pull, branch, +rebase). The Git workflow was initially designed for humans to manage changes within source code. Irmin scales this to handle automatic programs performing a very high number of operations per second, with -a fully automated handling of update conflicts. Finally, Irmin exposes -an event-driven API to define programmable dynamic behaviours and to +fully-automated conflixt handling. + +Irmin is highly customisable. Users can define their types to +store application-specific values. They can also define custom storage layers (in +memory, on disk, in a remote Redis database, in the browser, +etc.). Finally, Irmin contains an event-driven API to define programmable dynamic behaviours and to program distributed dataflow pipelines. Irmin was created at the University of Cambridge in 2013 to be the @@ -52,29 +53,29 @@ challenges raised by the [CAP Theorem][]. Each application can select the right combination of libraries to solve its particular distributed problem. -Irmin consists of a core of well-defined low-level data structures -that specify how data should be persisted and be shared across -nodes. It defines algorithms for efficient synchronization of those +Irmin is built on a core of well-defined, low-level data structures that +dictate how data should be persisted and shared across nodes. +It defines algorithms for efficient synchronisation of those distributed low-level constructs. It also builds a collection of higher-level data structures that developers can use without knowing precisely how Irmin works underneath. Some of these components even -have a [formal semantics][], including [Conflict-free Replicated +have [formal semantics][], including [Conflict-Free Replicated Data-Types (CRDT)][]. Since it's a part of MirageOS, Irmin does not -make strong assumptions about the OS environment that it runs in. This -makes the system very portable: it works well for in-memory databases -and slower persistent serialization such as SSDs, hard drives, web +make strong assumptions about the OS environment, which makes the system +very portable. It works well for in-memory databases +and slower persistent serialisation, such as SSDs, hard drives, web browser local storage, or even the Git file format. Irmin is primarily developed and maintained by [Tarides][], with -contributions from many [contributors][] from various -organizations. External maintainers and contributors are welcome. +involvement by [contributors][] from various +organisations. External maintainers and contributors are welcome. [MirageOS]: https://mirage.io [CAP Theorem]: http://en.wikipedia.org/wiki/CAP_theorem -[formal semantics]: https://kcsrk.info/papers/banyan_aplas20.pdf -[Conflict-free Replicated Data-Types (CRDT)]: https://arxiv.org/abs/2203.14518 +[Formal Semantics]: https://kcsrk.info/papers/banyan_aplas20.pdf +[Conflict-Free Replicated Data-Types (CRDT)]: https://arxiv.org/abs/2203.14518 [Tarides]: https://tarides.com -[contributors]: https://github.com/mirage/irmin/graphs/contributors +[Contributors]: https://github.com/mirage/irmin/graphs/contributors
@@ -85,7 +86,7 @@ organizations. External maintainers and contributors are welcome. * [Development Version](#Development-Version) * [Usage](#Usage) * [Example](#Example) - * [Command-line](#Commandline) + * [Command Line](#Commandline) * [Context](#Context) * * [Irmin as a portable and efficient structured key-value store](#Irmin-as-a-portable-and-efficient-structured-keyvalue-store) * [Irmin as a distributed store](#Irmin-as-a-distributed-store) @@ -98,15 +99,15 @@ organizations. External maintainers and contributors are welcome. ## Features -- **Built-in Snapshotting** - backup and restore -- **Storage Agnostic** - you can use Irmin on top of your own storage layer -- **Custom Datatypes** - (de)serialization for custom data types, derivable via +- **Built-In Snapshotting** - backup and restore +- **Storage Agnostic** - use Irmin on top of your own storage layer +- **Custom Datatypes** - (de)serialisation for custom data types, derivable via [`ppx_irmin`][ppx_irmin-readme] - **Highly Portable** - runs anywhere from Linux to web browsers and Xen unikernels - **Git Compatibility** - `irmin-git` uses an on-disk format that can be inspected and modified using Git - **Dynamic Behavior** - allows the users to define custom merge functions, - use in-memory transactions (to keep track of reads as well as writes) and + use in-memory transactions (to keep track of reads as well as writes), and to define event-driven workflows using a notification mechanism ## Documentation @@ -120,7 +121,7 @@ API documentation can be found online at [https://mirage.github.io/irmin](https: Please ensure to install the minimum `opam` and `ocaml` versions. Find the latest version and install instructions on [ocaml.org](https://ocaml.org/docs/install.html). -To install Irmin with the command-line tool and all unix backends using `opam`: +To install Irmin with the command-line tool and all Unix backends using `opam`: ```bash @@ -135,7 +136,7 @@ installed by running: opam install irmin ``` -The following packages have are available on `opam`: +The following packages are available on `opam`: - `irmin` - the base package, plus an in-memory storage implementation - `irmin-chunk` - chunked storage @@ -143,10 +144,10 @@ The following packages have are available on `opam`: - `irmin-fs` - filesystem-based storage using `bin_prot` - `irmin-git` - Git compatible storage - `irmin-graphql` - GraphQL server -- `irmin-mirage` - mirage compatibility -- `irmin-mirage-git` - Git compatible storage for mirage -- `irmin-mirage-graphql` - mirage compatible GraphQL server -- `irmin-pack` - compressed, on-disk, posix backend +- `irmin-mirage` - MirageOS compatibility +- `irmin-mirage-git` - Git compatible storage for MirageOS +- `irmin-mirage-graphql` - MirageOS compatible GraphQL server +- `irmin-pack` - compressed, on-disk, POSIX backend - `ppx_irmin` - PPX deriver for Irmin content types (see [README_PPX.md][ppx_irmin-readme]) - `irmin-containers` - collection of simple, ready-to-use mergeable data structures @@ -214,7 +215,7 @@ let () = Lwt_main.run main ``` The example is contained in [examples/readme.ml](./examples/readme.ml) It can -be compiled and executed with dune: +be compiled and executed with Dune: ```bash @@ -226,7 +227,7 @@ foo/bar => 'testing 123' The [examples](./examples/) directory also contains more advanced examples, which can be executed in the same way. -### Command-line +### Command Line The same thing can also be accomplished using `irmin`, the command-line application installed with `irmin-cli`, by running: @@ -243,17 +244,17 @@ testing 123 can also set flags globally using `$HOME/.irmin/config.yml`. Run `irmin help irmin.yml` for further details. -Also see `irmin --help` for list of all commands and either +Also see `irmin --help` for a list of all commands and either `irmin --help` or `irmin help ` for more help with a specific command. ## Context -Irmin's initial desing is directly inspired from +Irmin's initial design is directly inspired from [XenStore](https://dl.acm.org/doi/10.1145/1631687.1596581), with: -- the need for efficient optimistic concurrency control features to be - able to let thousands of virtual machine concurrently access and +- the need for efficient optimistic concurrency control features to + let thousands of virtual machine concurrently access and modify a central configuration database (the Xen stack uses XenStore as an RPC mechanism to setup VM configuration on boot). Very early on, the initial focus was to specify and handle [potential @@ -267,25 +268,26 @@ Irmin's initial desing is directly inspired from after a crash), while making system debugging easy and go really fast, thanks to efficient merging strategy. -In 2014, the first release of Irmin was announced part of the MirageOS -2.0 release [here](https://mirage.io/blog/introducing-irmin). Since +In 2014, the first release of [Irmin was announced](https://mirage.io/blog/introducing-irmin) +as part of the MirageOS 2.0 release. Since then, several projects started using and improving Irmin. These can -roughly be split into 3 categories: (i) use Irmin as a portable, -structured key-value store (with expressive, mergeable types); (ii) -use Irmin as distributed database (with a customizable consistency -semantics) and (iii) an event-driven dataflow engine. +roughly be split into three categories: +1. Use Irmin as a portable, +structured key-value store (with expressive, mergeable types) +2. Use Irmin as distributed database (with a customisable consistency +semantics) +3. Use Irmin as an event-driven dataflow engine. #### Irmin as a portable and efficient structured key-value store - [XenStored](https://github.com/xen-project/xen/tree/master/tools/ocaml/xenstored) is an information storage space shared between all the Xen virtual - machines running in the same host. Each virtual machines gets its - own path in the store. When values are changed in the store, the + machines running in the same host. Each virtual machine gets its + own path in the store. When values are changed, the appropriate drivers are notified. The initial OCaml implementation - was later extended to use Irmin - [here](https://github.com/mirage/ocaml-xenstore-server). More - details + was later [extended to use Irmin](https://github.com/mirage/ocaml-xenstore-server). + More details [here](https://mirage.io/blog/introducing-irmin-in-xenstore). - [Jitsu](https://github.com/mirage/jitsu) is an experimental orchestrator for unikernels. It uses Irmin to store the unikernel @@ -293,24 +295,24 @@ semantics) and (iii) an event-driven dataflow engine. [here](https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-madhavapeddy.pdf). - [Cuekeeper](https://github.com/talex5/cuekeeper) is a web-based GTD (a fancy TODO list) that runs entirely in the browser. It uses Irmin - in the browser to store data locally, with support for structured + to store data locally with support for structured concurrent editing and snapshot export and import. More details [here](https://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-things-done-in-the-browser/). - [Canopy](https://github.com/Engil/Canopy) and [Unipi](https://github.com/roburio/unipi) both use Irmin to serve - static websites pull from Git repositories and deployed as + static websites pulled from Git repositories and deployed as unikernels. -- [Caldav](https://github.com/roburio/caldav) is using Irmin to store +- [Caldav](https://github.com/roburio/caldav) uses Irmin to store calendar entries and back them into a Git repository. More information [here](https://robur.io/Our%20Work/Projects). - [Datakit](https://github.com/moby/datakit) was developed at Docker and provided a 9p interface to the Irmin API. It was used to manage - the configuration of Docker for Desktop, with merge policies on + the configuration of Docker for Desktop with merge policies on upgrade, full auditing, and snapshot/rollback capabilites. - [Tezos](https://gitlab.com/tezos/tezos/) started using Irmin in 2017 to store the - ledger state. The first prototype used irmin-git before switching to - irmin-lmdb and irmin-leveldb (and now irmin-pack). More details + ledger state. The first prototype used `irmin-git` before switching to + `irmin-lmdb` and `irmin-leveldb` (and now `irmin-pack`). More details [here](https://tarides.com/blog/2019-11-21-irmin-v2#tezos-and-irmin-pack). #### Irmin as a distributed store @@ -322,7 +324,7 @@ semantics) and (iii) an event-driven dataflow engine. Irmin as a local key-value store) but also to experiment with replacing the IMAP on-wire protocol by an explicit Git push/pull mechanism. -- [irmin-ARP](https://github.com/yomimono/irmin-arp) uses Irmin to +- [`irmin-ARP`](https://github.com/yomimono/irmin-arp) uses Irmin to store and audit ARP configuration. It's using Irmin as a local key-value store for very low-level information (which are normally stored very deep in the kernel layers), but the main goal was really @@ -335,7 +337,7 @@ semantics) and (iii) an event-driven dataflow engine. using [Cassandra](https://cassandra.apache.org/_/index.html) as a storage backend. More information [here](https://kcsrk.info/papers/banyan_aplas20.pdf). -- [irmin-fdb](https://github.com/andreas/irmin-fdb) implements an +- [`irmin-fdb`](https://github.com/andreas/irmin-fdb) implements an Irmin store backed by [FoundationDB](https://www.foundationdb.org/). More details [here](https://www.youtube.com/watch?v=NArvw-9axeg&ab_channel=TheLinuxFoundation). @@ -343,24 +345,24 @@ semantics) and (iii) an event-driven dataflow engine. #### Irmin as a dataflow scheduler - [Datakit CI](https://github.com/moby/datakit/tree/master/ci) is a - continuous integration service that monitors GitHub project and - tests each branch, tag and pull request. It displays the test + continuous integration service that monitors GitHub projects and + tests each branch, tag, and pull request. It displays the test results as status indicators in the GitHub UI. It keeps all of its - state and logs in DataKit, rather than a traditional relational + state and logs in DataKit rather than a traditional relational database, allowing review with the usual Git tools. The core of the - project is a scheduler that manage dataflow pipelines across Git - repositories. It was used for a few years as the CI system test - Docker for Desktop on bare-metal and virtual machines, as well as - all the new opam package submissions to ocaml/opam-repository. More + project is a scheduler that manages dataflow pipelines across Git + repositories. For a few years, it was used as Docker for Desktop's CI system test + on bare-metal and virtual machines, as well as + all the new opam package submissions to `ocaml/opam-repository`. More details [here](https://www.docker.com/blog/docker-unikernels-open-source/). - [Causal RPC](https://github.com/CraigFe/causal-rpc) implements an RPC framework using Irmin as a network substrate. More details [here](https://www.craigfe.io/causalrpc.pdf). - [CISO](https://github.com/samoht/ciso) is an experimental - (distributed) Continuous Integration engine for OPAM. It was + (distributed) Continuous Integration engine for opam. It was designed as a replacement of Datakit-CI and finally turned into - [ocurrent](https://github.com/ocurrent/ocurrent). + [OCurrent](https://github.com/ocurrent/ocurrent). ## Issues From ec1982e46054ca89f530de2f7ccff860a42f2561 Mon Sep 17 00:00:00 2001 From: Christine Rose Date: Thu, 29 Aug 2024 04:21:53 -0700 Subject: [PATCH 2/5] fix typo Co-authored-by: art-w --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ad9dcd253d..d371805fed 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ and includes similar functions to Git (clone, push, pull, branch, rebase). The Git workflow was initially designed for humans to manage changes within source code. Irmin scales this to handle automatic programs performing a very high number of operations per second, with -fully-automated conflixt handling. +fully-automated conflict handling. Irmin is highly customisable. Users can define their types to store application-specific values. They can also define custom storage layers (in From ec42283c4893ec558eafd127ab1201eb70231b46 Mon Sep 17 00:00:00 2001 From: Christine Rose Date: Thu, 29 Aug 2024 04:22:20 -0700 Subject: [PATCH 3/5] Fix captialization Co-authored-by: art-w --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d371805fed..426bfe8d1f 100644 --- a/README.md +++ b/README.md @@ -59,7 +59,7 @@ It defines algorithms for efficient synchronisation of those distributed low-level constructs. It also builds a collection of higher-level data structures that developers can use without knowing precisely how Irmin works underneath. Some of these components even -have [formal semantics][], including [Conflict-Free Replicated +have [formal semantics][], including [Conflict-free Replicated Data-Types (CRDT)][]. Since it's a part of MirageOS, Irmin does not make strong assumptions about the OS environment, which makes the system very portable. It works well for in-memory databases From 3419ccc9bc1c62e645917cfcc8b8a0065b3d1bd9 Mon Sep 17 00:00:00 2001 From: Christine Rose Date: Thu, 29 Aug 2024 04:22:35 -0700 Subject: [PATCH 4/5] Fix captialization Co-authored-by: art-w --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 426bfe8d1f..5a3b005268 100644 --- a/README.md +++ b/README.md @@ -72,8 +72,8 @@ organisations. External maintainers and contributors are welcome. [MirageOS]: https://mirage.io [CAP Theorem]: http://en.wikipedia.org/wiki/CAP_theorem -[Formal Semantics]: https://kcsrk.info/papers/banyan_aplas20.pdf -[Conflict-Free Replicated Data-Types (CRDT)]: https://arxiv.org/abs/2203.14518 +[formal semantics]: https://kcsrk.info/papers/banyan_aplas20.pdf +[Conflict-free Replicated Data-Types (CRDT)]: https://arxiv.org/abs/2203.14518 [Tarides]: https://tarides.com [Contributors]: https://github.com/mirage/irmin/graphs/contributors From 0db48289421bfac1b4d7ff3c0a26d973d50b3a63 Mon Sep 17 00:00:00 2001 From: Christine Rose Date: Thu, 29 Aug 2024 04:22:53 -0700 Subject: [PATCH 5/5] Update README.md Co-authored-by: art-w --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5a3b005268..1499fa02fa 100644 --- a/README.md +++ b/README.md @@ -75,7 +75,7 @@ organisations. External maintainers and contributors are welcome. [formal semantics]: https://kcsrk.info/papers/banyan_aplas20.pdf [Conflict-free Replicated Data-Types (CRDT)]: https://arxiv.org/abs/2203.14518 [Tarides]: https://tarides.com -[Contributors]: https://github.com/mirage/irmin/graphs/contributors +[contributors]: https://github.com/mirage/irmin/graphs/contributors