Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line Editing for Irmin README #2325

Merged
merged 5 commits into from
Sep 6, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 65 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,19 @@
<hr />

Irmin is based on distributed version-control systems (DVCs),
extensively used in software development to enable developers to keep
track of change provenance and expose modifications in the source
extensively used in software development to track data
provenance and show modifications in the source
code. Irmin applies DVC's principles to large-scale distributed data
and exposes similar functions to Git (clone, push, pull, branch,
rebase). It is highly customizable: users can define their types to
store application-specific values and define custom storage layers (in
memory, on disk, in a remote Redis database, in the browser,
etc.). The Git workflow was initially designed for humans to manage
and includes similar functions to Git (clone, push, pull, branch,
rebase). The Git workflow was initially designed for humans to manage
changes within source code. Irmin scales this to handle automatic
programs performing a very high number of operations per second, with
a fully automated handling of update conflicts. Finally, Irmin exposes
an event-driven API to define programmable dynamic behaviours and to
fully-automated conflict handling.

Irmin is highly customisable. Users can define their types to
store application-specific values. They can also define custom storage layers (in
memory, on disk, in a remote Redis database, in the browser,
etc.). Finally, Irmin contains an event-driven API to define programmable dynamic behaviours and to
program distributed dataflow pipelines.

Irmin was created at the University of Cambridge in 2013 to be the
Expand All @@ -52,22 +53,22 @@ challenges raised by the [CAP Theorem][]. Each application
can select the right combination of libraries to solve its particular
distributed problem.

Irmin consists of a core of well-defined low-level data structures
that specify how data should be persisted and be shared across
nodes. It defines algorithms for efficient synchronization of those
Irmin is built on a core of well-defined, low-level data structures that
dictate how data should be persisted and shared across nodes.
It defines algorithms for efficient synchronisation of those
distributed low-level constructs. It also builds a collection of
higher-level data structures that developers can use without knowing
precisely how Irmin works underneath. Some of these components even
have a [formal semantics][], including [Conflict-free Replicated
have [formal semantics][], including [Conflict-free Replicated
Data-Types (CRDT)][]. Since it's a part of MirageOS, Irmin does not
make strong assumptions about the OS environment that it runs in. This
makes the system very portable: it works well for in-memory databases
and slower persistent serialization such as SSDs, hard drives, web
make strong assumptions about the OS environment, which makes the system
very portable. It works well for in-memory databases
and slower persistent serialisation, such as SSDs, hard drives, web
browser local storage, or even the Git file format.

Irmin is primarily developed and maintained by [Tarides][], with
contributions from many [contributors][] from various
organizations. External maintainers and contributors are welcome.
involvement by [contributors][] from various
organisations. External maintainers and contributors are welcome.

[MirageOS]: https://mirage.io
[CAP Theorem]: http://en.wikipedia.org/wiki/CAP_theorem
Expand All @@ -85,7 +86,7 @@ organizations. External maintainers and contributors are welcome.
* [Development Version](#Development-Version)
* [Usage](#Usage)
* [Example](#Example)
* [Command-line](#Commandline)
* [Command Line](#Commandline)
* [Context](#Context)
* * [Irmin as a portable and efficient structured key-value store](#Irmin-as-a-portable-and-efficient-structured-keyvalue-store)
* [Irmin as a distributed store](#Irmin-as-a-distributed-store)
Expand All @@ -98,15 +99,15 @@ organizations. External maintainers and contributors are welcome.

## Features

- **Built-in Snapshotting** - backup and restore
- **Storage Agnostic** - you can use Irmin on top of your own storage layer
- **Custom Datatypes** - (de)serialization for custom data types, derivable via
- **Built-In Snapshotting** - backup and restore
- **Storage Agnostic** - use Irmin on top of your own storage layer
- **Custom Datatypes** - (de)serialisation for custom data types, derivable via
[`ppx_irmin`][ppx_irmin-readme]
- **Highly Portable** - runs anywhere from Linux to web browsers and Xen unikernels
- **Git Compatibility** - `irmin-git` uses an on-disk format that can be
inspected and modified using Git
- **Dynamic Behavior** - allows the users to define custom merge functions,
use in-memory transactions (to keep track of reads as well as writes) and
use in-memory transactions (to keep track of reads as well as writes), and
to define event-driven workflows using a notification mechanism

## Documentation
Expand All @@ -120,7 +121,7 @@ API documentation can be found online at [https://mirage.github.io/irmin](https:
Please ensure to install the minimum `opam` and `ocaml` versions. Find the latest
version and install instructions on [ocaml.org](https://ocaml.org/docs/install.html).

To install Irmin with the command-line tool and all unix backends using `opam`:
To install Irmin with the command-line tool and all Unix backends using `opam`:

<!-- $MDX skip -->
```bash
Expand All @@ -135,18 +136,18 @@ installed by running:
opam install irmin
```

The following packages have are available on `opam`:
The following packages are available on `opam`:

- `irmin` - the base package, plus an in-memory storage implementation
- `irmin-chunk` - chunked storage
- `irmin-cli` - a simple command-line tool
- `irmin-fs` - filesystem-based storage using `bin_prot`
- `irmin-git` - Git compatible storage
- `irmin-graphql` - GraphQL server
- `irmin-mirage` - mirage compatibility
- `irmin-mirage-git` - Git compatible storage for mirage
- `irmin-mirage-graphql` - mirage compatible GraphQL server
- `irmin-pack` - compressed, on-disk, posix backend
- `irmin-mirage` - MirageOS compatibility
- `irmin-mirage-git` - Git compatible storage for MirageOS
- `irmin-mirage-graphql` - MirageOS compatible GraphQL server
- `irmin-pack` - compressed, on-disk, POSIX backend
- `ppx_irmin` - PPX deriver for Irmin content types (see [README_PPX.md][ppx_irmin-readme])
- `irmin-containers` - collection of simple, ready-to-use mergeable data structures

Expand Down Expand Up @@ -214,7 +215,7 @@ let () = Lwt_main.run main
```

The example is contained in [examples/readme.ml](./examples/readme.ml) It can
be compiled and executed with dune:
be compiled and executed with Dune:

<!-- $MDX skip -->
```bash
Expand All @@ -226,7 +227,7 @@ foo/bar => 'testing 123'
The [examples](./examples/) directory also contains more advanced examples,
which can be executed in the same way.

### Command-line
### Command Line

The same thing can also be accomplished using `irmin`, the command-line
application installed with `irmin-cli`, by running:
Expand All @@ -243,17 +244,17 @@ testing 123
can also set flags globally using `$HOME/.irmin/config.yml`. Run
`irmin help irmin.yml` for further details.

Also see `irmin --help` for list of all commands and either
Also see `irmin --help` for a list of all commands and either
`irmin <command> --help` or `irmin help <command>` for more help with a
specific command.

## Context

Irmin's initial desing is directly inspired from
Irmin's initial design is directly inspired from
[XenStore](https://dl.acm.org/doi/10.1145/1631687.1596581), with:

- the need for efficient optimistic concurrency control features to be
able to let thousands of virtual machine concurrently access and
- the need for efficient optimistic concurrency control features to
let thousands of virtual machine concurrently access and
modify a central configuration database (the Xen stack uses XenStore
as an RPC mechanism to setup VM configuration on boot). Very early
on, the initial focus was to specify and handle [potential
Expand All @@ -267,50 +268,51 @@ Irmin's initial desing is directly inspired from
after a crash), while making system debugging easy and go really
fast, thanks to efficient merging strategy.

In 2014, the first release of Irmin was announced part of the MirageOS
2.0 release [here](https://mirage.io/blog/introducing-irmin). Since
In 2014, the first release of [Irmin was announced](https://mirage.io/blog/introducing-irmin)
as part of the MirageOS 2.0 release. Since
then, several projects started using and improving Irmin. These can
roughly be split into 3 categories: (i) use Irmin as a portable,
structured key-value store (with expressive, mergeable types); (ii)
use Irmin as distributed database (with a customizable consistency
semantics) and (iii) an event-driven dataflow engine.
roughly be split into three categories:
1. Use Irmin as a portable,
structured key-value store (with expressive, mergeable types)
2. Use Irmin as distributed database (with a customisable consistency
semantics)
3. Use Irmin as an event-driven dataflow engine.


#### Irmin as a portable and efficient structured key-value store

- [XenStored](https://github.com/xen-project/xen/tree/master/tools/ocaml/xenstored)
is an information storage space shared between all the Xen virtual
machines running in the same host. Each virtual machines gets its
own path in the store. When values are changed in the store, the
machines running in the same host. Each virtual machine gets its
own path in the store. When values are changed, the
appropriate drivers are notified. The initial OCaml implementation
was later extended to use Irmin
[here](https://github.com/mirage/ocaml-xenstore-server). More
details
was later [extended to use Irmin](https://github.com/mirage/ocaml-xenstore-server).
More details
[here](https://mirage.io/blog/introducing-irmin-in-xenstore).
- [Jitsu](https://github.com/mirage/jitsu) is an experimental
orchestrator for unikernels. It uses Irmin to store the unikernel
configuration (and manage dynamic DNS entries). See more details
[here](https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-madhavapeddy.pdf).
- [Cuekeeper](https://github.com/talex5/cuekeeper) is a web-based GTD
(a fancy TODO list) that runs entirely in the browser. It uses Irmin
in the browser to store data locally, with support for structured
to store data locally with support for structured
concurrent editing and snapshot export and import. More details
[here](https://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-things-done-in-the-browser/).
- [Canopy](https://github.com/Engil/Canopy) and
[Unipi](https://github.com/roburio/unipi) both use Irmin to serve
static websites pull from Git repositories and deployed as
static websites pulled from Git repositories and deployed as
unikernels.
- [Caldav](https://github.com/roburio/caldav) is using Irmin to store
- [Caldav](https://github.com/roburio/caldav) uses Irmin to store
calendar entries and back them into a Git repository. More
information [here](https://robur.io/Our%20Work/Projects).
- [Datakit](https://github.com/moby/datakit) was developed at Docker
and provided a 9p interface to the Irmin API. It was used to manage
the configuration of Docker for Desktop, with merge policies on
the configuration of Docker for Desktop with merge policies on
upgrade, full auditing, and snapshot/rollback capabilites.
- [Tezos](https://gitlab.com/tezos/tezos/) started using Irmin in 2017
to store the
ledger state. The first prototype used irmin-git before switching to
irmin-lmdb and irmin-leveldb (and now irmin-pack). More details
ledger state. The first prototype used `irmin-git` before switching to
`irmin-lmdb` and `irmin-leveldb` (and now `irmin-pack`). More details
[here](https://tarides.com/blog/2019-11-21-irmin-v2#tezos-and-irmin-pack).

#### Irmin as a distributed store
Expand All @@ -322,7 +324,7 @@ semantics) and (iii) an event-driven dataflow engine.
Irmin as a local key-value store) but also to experiment with
replacing the IMAP on-wire protocol by an explicit Git push/pull
mechanism.
- [irmin-ARP](https://github.com/yomimono/irmin-arp) uses Irmin to
- [`irmin-ARP`](https://github.com/yomimono/irmin-arp) uses Irmin to
store and audit ARP configuration. It's using Irmin as a local
key-value store for very low-level information (which are normally
stored very deep in the kernel layers), but the main goal was really
Expand All @@ -335,32 +337,32 @@ semantics) and (iii) an event-driven dataflow engine.
using [Cassandra](https://cassandra.apache.org/_/index.html) as a
storage backend. More information
[here](https://kcsrk.info/papers/banyan_aplas20.pdf).
- [irmin-fdb](https://github.com/andreas/irmin-fdb) implements an
- [`irmin-fdb`](https://github.com/andreas/irmin-fdb) implements an
Irmin store backed by
[FoundationDB](https://www.foundationdb.org/). More details
[here](https://www.youtube.com/watch?v=NArvw-9axeg&ab_channel=TheLinuxFoundation).

#### Irmin as a dataflow scheduler

- [Datakit CI](https://github.com/moby/datakit/tree/master/ci) is a
continuous integration service that monitors GitHub project and
tests each branch, tag and pull request. It displays the test
continuous integration service that monitors GitHub projects and
tests each branch, tag, and pull request. It displays the test
results as status indicators in the GitHub UI. It keeps all of its
state and logs in DataKit, rather than a traditional relational
state and logs in DataKit rather than a traditional relational
database, allowing review with the usual Git tools. The core of the
project is a scheduler that manage dataflow pipelines across Git
repositories. It was used for a few years as the CI system test
Docker for Desktop on bare-metal and virtual machines, as well as
all the new opam package submissions to ocaml/opam-repository. More
project is a scheduler that manages dataflow pipelines across Git
repositories. For a few years, it was used as Docker for Desktop's CI system test
on bare-metal and virtual machines, as well as
all the new opam package submissions to `ocaml/opam-repository`. More
details
[here](https://www.docker.com/blog/docker-unikernels-open-source/).
- [Causal RPC](https://github.com/CraigFe/causal-rpc) implements an
RPC framework using Irmin as a network substrate. More details
[here](https://www.craigfe.io/causalrpc.pdf).
- [CISO](https://github.com/samoht/ciso) is an experimental
(distributed) Continuous Integration engine for OPAM. It was
(distributed) Continuous Integration engine for opam. It was
designed as a replacement of Datakit-CI and finally turned into
[ocurrent](https://github.com/ocurrent/ocurrent).
[OCurrent](https://github.com/ocurrent/ocurrent).

## Issues

Expand Down
Loading