diff --git a/data/planet/cwn/ocaml-weekly-news-02-jul-2024.md b/data/planet/cwn/ocaml-weekly-news-02-jul-2024.md
new file mode 100644
index 0000000000..9909a2ec80
--- /dev/null
+++ b/data/planet/cwn/ocaml-weekly-news-02-jul-2024.md
@@ -0,0 +1,12 @@
+---
+title: OCaml Weekly News, 02 Jul 2024
+description:
+url: https://alan.petitepomme.net/cwn/2024.07.02.html
+date: 2024-07-02T12:00:00-00:00
+preview_image:
+authors:
+- Caml Weekly News
+source:
+---
+
+<ol><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#1">OCaml Tech Talk | Editor Features</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#2">New release of Ocsipersist</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#3">Preview of Godotcaml for the Godot 4.2 Game Engine</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#4">euler 0.3</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#5">dune 3.15</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#6">dune 3.16</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.02.html#7">Other OCaml News</a></li></ol>
diff --git a/data/planet/cwn/ocaml-weekly-news-09-jul-2024.md b/data/planet/cwn/ocaml-weekly-news-09-jul-2024.md
new file mode 100644
index 0000000000..e7c92dbc28
--- /dev/null
+++ b/data/planet/cwn/ocaml-weekly-news-09-jul-2024.md
@@ -0,0 +1,12 @@
+---
+title: OCaml Weekly News, 09 Jul 2024
+description:
+url: https://alan.petitepomme.net/cwn/2024.07.09.html
+date: 2024-07-09T12:00:00-00:00
+preview_image:
+authors:
+- Caml Weekly News
+source:
+---
+
+<ol><li><a href="https://alan.petitepomme.net/cwn/2024.07.09.html#1">The Structure of Godotcaml as of Today, by Matt Walker [Fizzixnerd]</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.09.html#2">opam 2.2.0 is out!</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.09.html#3">OCaml.org Newsletter: June 2024</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.09.html#4">ocaml-libbpf: Libbpf C-bindings for OCaml</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.09.html#5">How I built the Acutis template language in OCaml</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.09.html#6">MirageOS podcast</a></li></ol>
diff --git a/data/planet/cwn/ocaml-weekly-news-16-jul-2024.md b/data/planet/cwn/ocaml-weekly-news-16-jul-2024.md
new file mode 100644
index 0000000000..5ca3b10813
--- /dev/null
+++ b/data/planet/cwn/ocaml-weekly-news-16-jul-2024.md
@@ -0,0 +1,12 @@
+---
+title: OCaml Weekly News, 16 Jul 2024
+description:
+url: https://alan.petitepomme.net/cwn/2024.07.16.html
+date: 2024-07-16T12:00:00-00:00
+preview_image:
+authors:
+- Caml Weekly News
+source:
+---
+
+<ol><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#1">OCaml FFI Sharp Edges and How to Avoid Them</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#2">Ortac 0.3.0 Dynamic formal verification made easy</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#3">dream-html and pure-html 3.5.2</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#4">The OCaml community is signed up for Outreachy!</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#5">OCaml LSP 1.18.0</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#6">2nd editor tooling dev-meeting: 25th of July &#129497;</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#7">A (Possibly) Safer Interface to the Ctypes FFI</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#8">OCaml Workshop 2024 at ICFP -- announcement and call for proposals</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#9">living 0.1.0</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.16.html#10">Other OCaml News</a></li></ol>
diff --git a/data/planet/cwn/ocaml-weekly-news-23-jul-2024.md b/data/planet/cwn/ocaml-weekly-news-23-jul-2024.md
new file mode 100644
index 0000000000..4b27190d6c
--- /dev/null
+++ b/data/planet/cwn/ocaml-weekly-news-23-jul-2024.md
@@ -0,0 +1,12 @@
+---
+title: OCaml Weekly News, 23 Jul 2024
+description:
+url: https://alan.petitepomme.net/cwn/2024.07.23.html
+date: 2024-07-23T12:00:00-00:00
+preview_image:
+authors:
+- Caml Weekly News
+source:
+---
+
+<ol><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#1">A Tour of the Living Library -- A Safer FFI</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#2">first release of rpmfile</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#3">Dune dev meeting</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#4">Fighting Mutation with Mutation in Living</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#5">A small extension of Bigarray.Genarray adding iteration, mapping and folding</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#6">cudajit: Bindings to the ~cuda~ and ~nvrtc~ libraries</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#7">Rpmfile 0.2.0 - changelog</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#8">Exploring the Docusaurus+Odoc combo</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#9">Mopsa 1.0 -- Modular Open Platform for Static Analysis</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#10">OCaml 5 performance</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.23.html#11">Other OCaml News</a></li></ol>
diff --git a/data/planet/cwn/ocaml-weekly-news-30-jul-2024.md b/data/planet/cwn/ocaml-weekly-news-30-jul-2024.md
new file mode 100644
index 0000000000..234c6907fa
--- /dev/null
+++ b/data/planet/cwn/ocaml-weekly-news-30-jul-2024.md
@@ -0,0 +1,12 @@
+---
+title: OCaml Weekly News, 30 Jul 2024
+description:
+url: https://alan.petitepomme.net/cwn/2024.07.30.html
+date: 2024-07-30T12:00:00-00:00
+preview_image:
+authors:
+- Caml Weekly News
+source:
+---
+
+<ol><li><a href="https://alan.petitepomme.net/cwn/2024.07.30.html#1">.mlx syntax dialect</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.30.html#2">heml, a HEEx-inspired HTML templating ppx for OCaml</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.30.html#3">Forester 4.2</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.30.html#4">First Robotics and OCaml - Do you know any local teams?</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.30.html#5">2nd editor tooling dev-meeting: 25th of July &#129497;</a></li><li><a href="https://alan.petitepomme.net/cwn/2024.07.30.html#6">Other OCaml News</a></li></ol>
diff --git a/data/planet/emillon/introducing-tree-sitter-dune.md b/data/planet/emillon/introducing-tree-sitter-dune.md
new file mode 100644
index 0000000000..76a0283069
--- /dev/null
+++ b/data/planet/emillon/introducing-tree-sitter-dune.md
@@ -0,0 +1,90 @@
+---
+title: Introducing tree-sitter-dune
+description:
+url: http://blog.emillon.org/posts/2024-07-26-introducing-tree-sitter-dune.html
+date: 2024-07-26T00:00:00-00:00
+preview_image:
+authors:
+- Etienne Millon
+source:
+---
+
+<p>I made a <a href="https://tree-sitter.github.io/tree-sitter/">tree-sitter</a> plugin for
+<code>dune</code> files. It is available <a href="https://github.com/emillon/tree-sitter-dune">on
+GitHub</a>.</p>
+<p>Tree-sitter is a parsing system that can be used in text editors.
+<a href="https://dune.build/">Dune</a> is a build system for OCaml projects.
+Its configuration language lives in <code>dune</code> files which use a s-expression
+syntax.</p>
+<p>This makes highlighting challenging: the lexing part of the language is very
+simple (atoms, strings, parentheses), but it is not enough to make a good
+highlighter.</p>
+<p>In the following example, <code>with-stdout-to</code> and <code>echo</code> are &ldquo;actions&rdquo; that we
+could highlight in a special way, but these names can also appear in places
+where they are not interpreted as actions, and doing so would be confusing (for
+example, we could write to a file named <code>echo</code> instead of <code>foo.txt</code>.</p>
+<div class="sourceCode"><pre class="sourceCode scheme"><code class="sourceCode scheme"><span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb1-1" aria-hidden="true" tabindex="-1"></a>(rule</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb1-2" aria-hidden="true" tabindex="-1"></a> (action</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb1-3" aria-hidden="true" tabindex="-1"></a>  (with-stdout-to</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb1-4" aria-hidden="true" tabindex="-1"></a>   foo.txt</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb1-5" aria-hidden="true" tabindex="-1"></a>   (echo <span class="st">&quot;testing&quot;</span>))))</span></code></pre></div>
+<p>Tree-sitter solves this, because it creates an actual parser that goes beyond
+lexing.</p>
+<p>In this example, I created grammar rules that parse the contents of <code>(action ...)</code> as an action, recognizing the various constructs of this DSL.</p>
+<p>The output of the parser is this syntax tree with location information (for
+some reason, line numbers start at 0 which is normal and unusual at the same
+time).</p>
+<pre><code>(source_file [0, 0] - [5, 0]
+  (stanza [0, 0] - [4, 22]
+    (stanza_name [0, 1] - [0, 5])
+    (field_name [1, 2] - [1, 8])
+    (action [2, 2] - [4, 20]
+      (action_name [2, 3] - [2, 17])
+      (file_name_target [3, 3] - [3, 10]
+        (file_name [3, 3] - [3, 10]))
+      (action [4, 3] - [4, 19]
+        (action_name [4, 4] - [4, 8])
+        (quoted_string [4, 9] - [4, 18])))))</code></pre>
+<p>The various strings are annotated with their type: we have stanza names
+(<code>rule</code>), field names (<code>action</code>), action names (<code>with-stdout-to</code>, <code>echo</code>), file
+names (<code>foo.txt</code>), and plain strings (<code>&quot;testing&quot;</code>).</p>
+<p>By itself, that is not useful, but it&rsquo;s possible to write <em>queries</em> to make
+this syntax tree do interesting stuff.</p>
+<p>The first one is highlighting: we can set styles for various &ldquo;patterns&rdquo; (in
+practice, I only used node names) by defining queries:</p>
+<div class="sourceCode"><pre class="sourceCode scheme"><code class="sourceCode scheme"><span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb3-1" aria-hidden="true" tabindex="-1"></a>(stanza_name) @function</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb3-2" aria-hidden="true" tabindex="-1"></a>(field_name) @property</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb3-3" aria-hidden="true" tabindex="-1"></a>(quoted_string) @string</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb3-4" aria-hidden="true" tabindex="-1"></a>(multiline_string) @string</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb3-5" aria-hidden="true" tabindex="-1"></a>(action_name) @keyword</span></code></pre></div>
+<p>The parts with <code>@</code> map to &ldquo;highlight groups&rdquo; used in text editors.</p>
+<p>Another type of query is called &ldquo;injections&rdquo;. It is used to link different
+types of grammars together. For example, <code>dune</code> files can start with a special
+comment that indicates that the rest of the file is an OCaml program. In that
+case, the parser emits a single <code>ocaml_syntax</code> node and the following injection
+indicates that this file should be parsed using an OCaml parser:</p>
+<div class="sourceCode"><pre class="sourceCode scheme"><code class="sourceCode scheme"><span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb4-1" aria-hidden="true" tabindex="-1"></a>((ocaml_syntax) @injection.content</span>
+<span><a href="http://blog.emillon.org/feeds/ocaml.xml#cb4-2" aria-hidden="true" tabindex="-1"></a> (#<span class="kw">set!</span> injection.language <span class="st">&quot;ocaml&quot;</span>))</span></code></pre></div>
+<p>Another use case for this is <code>system</code> actions: these strings in <code>dune</code> files
+could be interpreted using a shell parser.</p>
+<p>In the other direction, it is possible to inject <code>dune</code> files into another
+document. For example, a markdown parser can use injections to highlight code
+blocks.</p>
+<p>I&rsquo;m happy to have explored this technology. The toolchain seemed complex at
+first: there&rsquo;s a compiler which seems to be a mix of node and rust, which
+generates C, which is compiled into a dynamically loaded library; but this is
+actually pretty well integrated in nix and neovim to the details are made
+invisible.</p>
+<p>The testing mechanism is similar to the cram tests we use in Dune, but I was a
+bit confused with the colors at first: when the output of a test changes, Dune
+considers that the new output is a <code>+</code> in the diff, and highlights it in green;
+while tree-sitter considers that the &ldquo;expected output&rdquo; is green.</p>
+<p>There are many ways to improve this prototype: either by adding queries (it&rsquo;s
+possible to define text objects, folding expressions, etc), or by improving
+coverage for <code>dune</code> files (in most cases, the parser uses a s-expression
+fallback). I&rsquo;m also curious to see if it&rsquo;s possible to use this parser to
+provide a completion source. Since the strings are tagged with their type (are
+we expecting a library name, a module name, etc), I think we could use that to
+provide context-specific completions, but that&rsquo;s probably difficult to do.</p>
+<p>Thanks <a href="https://x.com/teej_dv">teej</a> for the initial idea and the useful
+resources.</p>
diff --git a/data/planet/emilpriver/why-i-like-ocaml.md b/data/planet/emilpriver/why-i-like-ocaml.md
new file mode 100644
index 0000000000..a92cddd8f1
--- /dev/null
+++ b/data/planet/emilpriver/why-i-like-ocaml.md
@@ -0,0 +1,374 @@
+---
+title: Why I Like Ocaml
+description: I like OCaml and this is why
+url: https://priver.dev/blog/ocaml/why-i-like-ocaml/
+date: 2024-07-21T12:10:55-00:00
+preview_image: https://priver.dev/images/ocaml/ocaml-cover.png
+authors:
+- "Emil Priv\xE9r"
+source:
+---
+
+<p>According to my <a href="https://www.linkedin.com/in/emilpriver/">Linkedin</a> profile, I have been writing code for a company for almost 6 years. During this time, I have worked on PHP and Wordpress projects, built e-commerce websites using NextJS and JavaScript, written small backends in Python with Django/Flask/Fastapi, and developed fintech systems in GO, among other things. I have come to realize that I value a good type system and prefer writing code in a more functional way rather than using object-oriented programming. For example, in GO, I prefer passing in arguments rather than creating a <code>struct</code> method. This is why I will be discussing OCaml in this article.</p>
+<p>If you are not familiar with the language OCaml or need a brief overview of it, I recommend reading my post <a href="https://priver.dev/blog/ocaml/ocaml-introduction/">OCaml introduction</a> before continuing with this post. It will help you better understand the topic I am discussing.</p>
+<h2>Hindley-Milner type system and type inference</h2>
+<p>Almost every time I ask someone what they like about OCaml, they often say &ldquo;oh, the type system is really nice&rdquo; or &ldquo;I really like the Hindley-Milner type system.&rdquo; When I ask new OCaml developers what they like about the language, they often say &ldquo;This type system is really nice, Typescript&rsquo;s type system is actually quite garbage.&rdquo; I am not surprised that these people say this, as I agree 100%. I really enjoy the Hindley-Milner type system and I think this is also the biggest reason why I write in this language. A good type system can make a huge difference for your developer experience.</p>
+<p>For those who may not be familiar with the Hindley-Milner type system, it can be described as a system where you write a piece of program with strict types, but you are not required to explicitly state the types. Instead, the type is inferred based on how the variable is used.
+Let&rsquo;s look at some code to demonstrate what I mean. In GO, you would be required to define the type of the arguments:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">FirstName</span><span class="p">(</span><span class="nx">name</span> <span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
+</span></span><span class="line"><span class="cl">  <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="nx">name</span><span class="p">)</span>
+</span></span><span class="line"><span class="cl"><span class="p">}</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>However, in OCaml, you don&rsquo;t need to specify the type:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="n">first_name</span> <span class="n">name</span> <span class="o">=</span> 
+</span></span><span class="line"><span class="cl">  <span class="n">print_endline</span> <span class="n">name</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>Since <code>print_endline</code> expects to receive a string, the signature for <code>hello</code> will be:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">val</span> <span class="n">first_name</span> <span class="o">:</span> <span class="kt">string</span> <span class="o">-&gt;</span> <span class="kt">unit</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>But it&rsquo;s not just for arguments, it&rsquo;s also used when returning a value.</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="n">first_name</span> <span class="n">name</span> <span class="o">=</span> 
+</span></span><span class="line"><span class="cl">  <span class="k">match</span> <span class="n">name</span> <span class="k">with</span> 
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="nc">Some</span> <span class="k">value</span> <span class="o">-&gt;</span> <span class="s2">&quot;We had a value&quot;</span> 
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="nc">None</span> <span class="o">-&gt;</span> <span class="n">1</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>This function will not compile because we are trying to return a string as the first value and later an integer.
+I also want to provide a larger example of the Hindley-Milner type system:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
+</span><span class="lnt"> 2
+</span><span class="lnt"> 3
+</span><span class="lnt"> 4
+</span><span class="lnt"> 5
+</span><span class="lnt"> 6
+</span><span class="lnt"> 7
+</span><span class="lnt"> 8
+</span><span class="lnt"> 9
+</span><span class="lnt">10
+</span><span class="lnt">11
+</span><span class="lnt">12
+</span><span class="lnt">13
+</span><span class="lnt">14
+</span><span class="lnt">15
+</span><span class="lnt">16
+</span><span class="lnt">17
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">module</span> <span class="nc">Car</span> <span class="o">=</span> <span class="k">struct</span>
+</span></span><span class="line"><span class="cl">  <span class="k">type</span> <span class="n">car</span> <span class="o">=</span> <span class="o">{</span>
+</span></span><span class="line"><span class="cl">    <span class="n">car</span><span class="o">:</span> <span class="kt">string</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">    <span class="n">age</span><span class="o">:</span> <span class="kt">int</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="o">}</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl">  <span class="k">let</span> <span class="n">make</span> <span class="n">car_name</span> <span class="n">age</span> <span class="o">=</span> <span class="o">{</span> <span class="n">car</span> <span class="o">=</span> <span class="n">car_name</span><span class="o">;</span> <span class="n">age</span> <span class="o">}</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl">  <span class="k">let</span> <span class="n">print_car_name</span> <span class="n">car</span> <span class="o">=</span> <span class="n">print_endline</span> <span class="n">car</span><span class="o">.</span><span class="n">car</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl">  <span class="k">let</span> <span class="n">print_car_age</span> <span class="n">car</span> <span class="o">=</span> <span class="n">print_int</span> <span class="n">car</span><span class="o">.</span><span class="n">age</span>
+</span></span><span class="line"><span class="cl"><span class="k">end</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="bp">()</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl">  <span class="k">let</span> <span class="n">car</span> <span class="o">=</span> <span class="nn">Car</span><span class="p">.</span><span class="n">make</span> <span class="s2">&quot;Volvo&quot;</span> <span class="n">12</span> <span class="k">in</span>
+</span></span><span class="line"><span class="cl">  <span class="nn">Car</span><span class="p">.</span><span class="n">print_car_name</span> <span class="n">car</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="nn">Car</span><span class="p">.</span><span class="n">print_car_age</span> <span class="n">car</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>The signature for this piece of code will be:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span><span class="lnt">6
+</span><span class="lnt">7
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">module</span> <span class="nc">Car</span> <span class="o">:</span>
+</span></span><span class="line"><span class="cl">  <span class="k">sig</span>
+</span></span><span class="line"><span class="cl">    <span class="k">type</span> <span class="n">car</span> <span class="o">=</span> <span class="o">{</span> <span class="n">car</span> <span class="o">:</span> <span class="kt">string</span><span class="o">;</span> <span class="n">age</span> <span class="o">:</span> <span class="kt">int</span><span class="o">;</span> <span class="o">}</span>
+</span></span><span class="line"><span class="cl">    <span class="k">val</span> <span class="n">make</span> <span class="o">:</span> <span class="kt">string</span> <span class="o">-&gt;</span> <span class="kt">int</span> <span class="o">-&gt;</span> <span class="n">car</span>
+</span></span><span class="line"><span class="cl">    <span class="k">val</span> <span class="n">print_car_name</span> <span class="o">:</span> <span class="n">car</span> <span class="o">-&gt;</span> <span class="kt">unit</span>
+</span></span><span class="line"><span class="cl">    <span class="k">val</span> <span class="n">print_car_age</span> <span class="o">:</span> <span class="n">car</span> <span class="o">-&gt;</span> <span class="kt">unit</span>
+</span></span><span class="line"><span class="cl">  <span class="k">end</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>In this example, we create a new module where we expose 3 functions: make, print_car_age, and print_car_name. We also define a type called <code>car</code>. One thing to note in the code is that the type is only defined once, as OCaml infers the type within the functions since <code>car</code> is a type within this scope.</p>
+<p><a href="https://ocaml.org/play#code=bW9kdWxlIENhciA9IHN0cnVjdAogIHR5cGUgY2FyID0gewogICAgY2FyOiBzdHJpbmc7CiAgICBhZ2U6IGludDsKICB9CgogIGxldCBtYWtlIGNhcl9uYW1lIGFnZSA9IHsgY2FyID0gY2FyX25hbWU7IGFnZSB9CgogIGxldCBwcmludF9jYXJfbmFtZSBjYXIgPSBwcmludF9lbmRsaW5lIGNhci5jYXIKCiAgbGV0IHByaW50X2Nhcl9hZ2UgY2FyID0gcHJpbnRfaW50IGNhci5hZ2UKZW5kCgpsZXQgKCkgPQogIGxldCBjYXIgPSBDYXIubWFrZSAiVm9sdm8iIDEyIGluCiAgQ2FyLnByaW50X2Nhcl9uYW1lIGNhcjsKICBDYXIucHJpbnRfY2FyX2FnZSBjYXI=">OCaml playground for this code</a>
+Something important to note before concluding this section is that you can define both the argument types and return types for your function.</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="n">first_name</span> <span class="o">(</span><span class="n">name</span><span class="o">:</span> <span class="kt">string</span><span class="o">)</span> <span class="o">:</span> <span class="kt">int</span> <span class="o">=</span> 
+</span></span><span class="line"><span class="cl">  <span class="n">print_endline</span> <span class="n">name</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">1</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><h2>Pattern matching &amp; Variants</h2>
+<p>The next topic is pattern matching. I really enjoy pattern matching in programming languages. I have written a lot of Rust, and pattern matching is something I use when I write Rust. Rich pattern matching is beneficial as it eliminates the need for many if statements. Additionally, in OCaml, you are required to handle every case of the match statement.</p>
+<p>For example, in the code below:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="n">first_name</span> <span class="n">name</span> <span class="o">=</span> 
+</span></span><span class="line"><span class="cl">  <span class="k">match</span> <span class="n">name</span> <span class="k">with</span> 
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="s2">&quot;Emil&quot;</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;Hello Emil&quot;</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="s2">&quot;Sabine the OCaml queen&quot;</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;Raise your swords soldiers, the queen has arrived&quot;</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="k">value</span> <span class="o">-&gt;</span> <span class="nn">Printf</span><span class="p">.</span><span class="n">printf</span> <span class="s2">&quot;Hello stranger %s&quot;</span> <span class="k">value</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>In the code above, I am required to include the last match case because we have not handled every case. For example, what should the compiler do if the <code>name</code> is Adam? The example above is very simple. We can also match on an integer and perform different actions based on the number value. For instance, we can determine if someone is allowed to enter the party using pattern matching.</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span><span class="lnt">6
+</span><span class="lnt">7
+</span><span class="lnt">8
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="n">allowed_to_join</span> <span class="n">age</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl">  <span class="k">match</span> <span class="n">age</span> <span class="k">with</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="n">0</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;Can you even walk lol&quot;</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="k">value</span> <span class="k">when</span> <span class="k">value</span> <span class="o">&gt;</span> <span class="n">18</span> <span class="o">-&gt;</span>
+</span></span><span class="line"><span class="cl">    <span class="n">print_endline</span> <span class="s2">&quot;Welcome in my friend, the beer is on Sabine&quot;</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="o">_</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;Your not allowed, go home and play minecraft&quot;</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="bp">()</span> <span class="o">=</span> <span class="n">allowed_to_join</span> <span class="n">2</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p><a href="https://ocaml.org/play#code=bGV0IGFsbG93ZWRfdG9fam9pbiBhZ2UgPQogIG1hdGNoIGFnZSB3aXRoCiAgfCAwIC0%2BIHByaW50X2VuZGxpbmUgIkNhbiB5b3UgZXZlbiB3YWxrIGxvbCIKICB8IHZhbHVlIHdoZW4gdmFsdWUgPiAxOCAtPgogICAgcHJpbnRfZW5kbGluZSAiV2VsY29tZSBpbiBteSBmcmllbmQsIHRoZSBiZWVyIGlzIG9uIFNhYmluZSIKICB8IF8gLT4gcHJpbnRfZW5kbGluZSAiWW91ciBub3QgYWxsb3dlZCwgZ28gaG9tZSBhbmQgcGxheSBtaW5lY3JhZnQiCgpsZXQgKCkgPSBhbGxvd2VkX3RvX2pvaW4gMg==">OCaml playground</a></p>
+<p>But the reason I mention variants in this section is that variants and pattern matching go quite nicely hand in hand. A variant is like an enumeration with more features, and I will show you what I mean. We can use them as a basic enumeration, which could look like this:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">type</span> <span class="n">person</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">Name</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">Age</span> 
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">FavoriteProgrammingLanguage</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>This now means that we can do different things depending on this type:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">match</span> <span class="n">person</span> <span class="k">with</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">Name</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;John&quot;</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">Age</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;30&quot;</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">FavoriteProgrammingLanguage</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;OCaml&quot;</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>But I did mention that variants are similar to enumeration with additional features, allowing for the assignment of a type to the variant.</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">type</span> <span class="n">person</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">Name</span> <span class="k">of</span> <span class="kt">string</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">Age</span> <span class="k">of</span> <span class="kt">int</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">FavoriteProgrammingLanguage</span> <span class="k">of</span> <span class="kt">string</span>
+</span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="nc">HavePets</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>Now that we have added types to our variants and included <code>HavePets</code>, we are able to adjust our pattern matching as follows:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span><span class="lnt">6
+</span><span class="lnt">7
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-OCaml" data-lang="OCaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="bp">()</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl">  <span class="k">let</span> <span class="n">person</span> <span class="o">=</span> <span class="nc">Name</span> <span class="s2">&quot;Emil&quot;</span> <span class="k">in</span>
+</span></span><span class="line"><span class="cl">  <span class="k">match</span> <span class="n">person</span> <span class="k">with</span>
+</span></span><span class="line"><span class="cl">   <span class="o">|</span> <span class="nc">Name</span> <span class="n">name</span> <span class="o">-&gt;</span> <span class="nn">Printf</span><span class="p">.</span><span class="n">printf</span> <span class="s2">&quot;Name: %s</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="n">name</span>
+</span></span><span class="line"><span class="cl">   <span class="o">|</span> <span class="nc">Age</span> <span class="n">age</span> <span class="o">-&gt;</span> <span class="nn">Printf</span><span class="p">.</span><span class="n">printf</span> <span class="s2">&quot;Age: %d</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="n">age</span>
+</span></span><span class="line"><span class="cl">   <span class="o">|</span> <span class="nc">FavoriteProgrammingLanguage</span> <span class="n">language</span> <span class="o">-&gt;</span> <span class="nn">Printf</span><span class="p">.</span><span class="n">printf</span> <span class="s2">&quot;Favorite Programming Language: %s</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="n">language</span>
+</span></span><span class="line"><span class="cl">   <span class="o">|</span> <span class="nc">HavePets</span> <span class="o">-&gt;</span> <span class="nn">Printf</span><span class="p">.</span><span class="n">printf</span> <span class="s2">&quot;Has pets</span><span class="se">\n</span><span class="s2">&quot;</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p><a href="https://ocaml.org/play#code=CnR5cGUgcGVyc29uID0KIHwgTmFtZSBvZiBzdHJpbmcKIHwgQWdlIG9mIGludAogfCBGYXZvcml0ZVByb2dyYW1taW5nTGFuZ3VhZ2Ugb2Ygc3RyaW5nCiB8IEhhdmVQZXRzCgpsZXQgKCkgPQogIGxldCBwZXJzb24gPSBOYW1lICJFbWlsIiBpbgogIG1hdGNoIHBlcnNvbiB3aXRoCiAgIHwgTmFtZSBuYW1lIC0%2BIFByaW50Zi5wcmludGYgIk5hbWU6ICVzXG4iIG5hbWUKICAgfCBBZ2UgYWdlIC0%2BIFByaW50Zi5wcmludGYgIkFnZTogJWRcbiIgYWdlCiAgIHwgRmF2b3JpdGVQcm9ncmFtbWluZ0xhbmd1YWdlIGxhbmd1YWdlIC0%2BIFByaW50Zi5wcmludGYgIkZhdm9yaXRlIFByb2dyYW1taW5nIExhbmd1YWdlOiAlc1xuIiBsYW5ndWFnZQogICB8IEhhdmVQZXRzIC0%2BIFByaW50Zi5wcmludGYgIkhhcyBwZXRzXG4iCg==">OCaml Playground</a></p>
+<p>We can now assign a value to the variant and use it in pattern matching to print different values. As you can see, I am not forced to add a value to every variant. For instance, I do not need a type on <code>HavePets</code> so I simply don&rsquo;t add it.
+I often use variants, such as in <a href="https://priver.dev/blog/dbcaml/dbcaml-project/">DBCaml</a>  where I use variants to retrieve responses from a database. For example, I return <code>NoRows</code> if I did not receive any rows back, but no error.</p>
+<p>OCaml also comes with Exhaustiveness Checking, meaning that if we don&rsquo;t check each case in a pattern matching, we will get an error. For instance, if we forget to add <code>HavePets</code> to the pattern matching, OCaml will throw an error at compile time.</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span><span class="lnt">2
+</span><span class="lnt">3
+</span><span class="lnt">4
+</span><span class="lnt">5
+</span><span class="lnt">6
+</span><span class="lnt">7
+</span><span class="lnt">8
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">File <span class="s2">&quot;bin/main.ml&quot;</span>, lines 9-12, characters 2-105:
+</span></span><span class="line"><span class="cl"> <span class="m">9</span> <span class="p">|</span> ..match person with
+</span></span><span class="line"><span class="cl"><span class="m">10</span> <span class="p">|</span>    <span class="p">|</span> Name name -&gt; Printf.printf <span class="s2">&quot;Name: %s\n&quot;</span> name
+</span></span><span class="line"><span class="cl"><span class="m">11</span> <span class="p">|</span>    <span class="p">|</span> Age age -&gt; Printf.printf <span class="s2">&quot;Age: %d\n&quot;</span> age
+</span></span><span class="line"><span class="cl"><span class="m">12</span> <span class="p">|</span>    <span class="p">|</span> FavoriteProgrammingLanguage language -&gt; Printf.printf <span class="s2">&quot;Favorite Programming Language: %s\n&quot;</span> language
+</span></span><span class="line"><span class="cl">Error <span class="o">(</span>warning <span class="m">8</span> <span class="o">[</span>partial-match<span class="o">])</span>: this pattern-matching is not exhaustive.
+</span></span><span class="line"><span class="cl">Here is an example of a <span class="k">case</span> that is not matched:
+</span></span><span class="line"><span class="cl">HavePets
+</span></span></code></pre></td></tr></table>
+</div>
+</div><h2>Binding operators</h2>
+<p>The next topic is operators and specific binding operators. OCaml has more types of operators, but binding operators are something I use in every project.
+A binding could be described as something that extends how <code>let</code> works in OCaml by adding extra logic before storing the value in memory with <code>let</code>.
+I&rsquo;ll show you:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt">1
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-ocaml" data-lang="ocaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="n">first_name</span> <span class="o">=</span> <span class="s2">&quot;Emil&quot;</span> <span class="k">in</span> 
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>This code simply takes the value &ldquo;Emil&rdquo; and stores it in memory, then assigns the memory reference to the variable hello. However, we can extend this functionality with a binding operator. For instance, if we don&rsquo;t want to use a lot of match statements on the return value of a function, we can bind <code>let</code> so it checks the value and if the value is an error, it bubbles up the error.</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
+</span><span class="lnt"> 2
+</span><span class="lnt"> 3
+</span><span class="lnt"> 4
+</span><span class="lnt"> 5
+</span><span class="lnt"> 6
+</span><span class="lnt"> 7
+</span><span class="lnt"> 8
+</span><span class="lnt"> 9
+</span><span class="lnt">10
+</span><span class="lnt">11
+</span><span class="lnt">12
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-ocaml" data-lang="ocaml"><span class="line"><span class="cl"><span class="k">let</span> <span class="o">(</span> <span class="k">let</span><span class="o">*</span> <span class="o">)</span> <span class="n">r</span> <span class="n">f</span> <span class="o">=</span> <span class="k">match</span> <span class="n">r</span> <span class="k">with</span> <span class="nc">Ok</span> <span class="n">v</span> <span class="o">-&gt;</span> <span class="n">f</span> <span class="n">v</span> <span class="o">|</span> <span class="nc">Error</span> <span class="o">_</span> <span class="k">as</span> <span class="n">e</span> <span class="o">-&gt;</span> <span class="n">e</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="n">check_result</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl">  <span class="k">let</span><span class="o">*</span> <span class="n">hello</span> <span class="o">=</span> <span class="nc">Ok</span> <span class="s2">&quot;Emil&quot;</span> <span class="k">in</span>
+</span></span><span class="line"><span class="cl">  <span class="k">let</span><span class="o">*</span> <span class="n">second_name</span> <span class="o">=</span> <span class="nc">Ok</span> <span class="s2">&quot;Priver&quot;</span> <span class="k">in</span>
+</span></span><span class="line"><span class="cl">  <span class="k">let</span><span class="o">*</span> <span class="n">non_existing</span> <span class="o">=</span> <span class="nc">Error</span> <span class="s2">&quot;no name&quot;</span> <span class="k">in</span>
+</span></span><span class="line"><span class="cl">  <span class="nc">Ok</span> <span class="o">(</span><span class="n">hello</span> <span class="o">^</span> <span class="n">second_name</span><span class="o">)</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="bp">()</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl">  <span class="k">match</span> <span class="n">check_result</span> <span class="k">with</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="nc">Ok</span> <span class="n">name</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="n">name</span>
+</span></span><span class="line"><span class="cl">  <span class="o">|</span> <span class="nc">Error</span> <span class="o">_</span> <span class="o">-&gt;</span> <span class="n">print_endline</span> <span class="s2">&quot;no name&quot;</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>This allows me to reduce the amount of code I write while maintaining the same functionality.</p>
+<p>In the code above, one of the variables is an <code>Error</code>, which means that the binding will return the error instead of returning the first name and last name.</p>
+<h2>It&rsquo;s functional on easy mode</h2>
+<p>I really like the concept of functional programming, such as immutability and avoiding side-effects as much as possible. However, I believe that a purely functional programming language could force us to write code in a way that becomes too complex. This is where I think OCaml does a good job. OCaml is clearly designed to be a functional language, but it allows for updating existing values rather than always returning new values.</p>
+<blockquote>
+<p>Immutability means that you cannot change an already existing value and must create a new value instead. I have written about the <a href="https://priver.dev/blog/functional-programming/concepts-of-functional-programming/">Concepts of Functional Programming</a> and recommend reading it if you want to learn more.</p>
+</blockquote>
+<p>One example where functional programming might make the code more complex is when creating a reader to read some bytes. If we strictly follow the rule of immutability, we would need to return new bytes instead of updating existing ones. This could lead to inefficiencies in terms of memory usage.</p>
+<p>Just to give an example of how to mutate an existing value in OCaml, I have created an example. In the code below, I am updating the age by 1 as it is the user&rsquo;s birthday:</p>
+<div class="highlight"><div class="chroma">
+<table class="lntable"><tr><td class="lntd">
+<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
+</span><span class="lnt"> 2
+</span><span class="lnt"> 3
+</span><span class="lnt"> 4
+</span><span class="lnt"> 5
+</span><span class="lnt"> 6
+</span><span class="lnt"> 7
+</span><span class="lnt"> 8
+</span><span class="lnt"> 9
+</span><span class="lnt">10
+</span><span class="lnt">11
+</span><span class="lnt">12
+</span><span class="lnt">13
+</span><span class="lnt">14
+</span></code></pre></td>
+<td class="lntd">
+<pre tabindex="0" class="chroma"><code class="language-ocaml" data-lang="ocaml"><span class="line"><span class="cl"><span class="k">type</span> <span class="n">user</span> <span class="o">=</span> <span class="o">{</span> <span class="k">mutable</span> <span class="n">age</span> <span class="o">:</span> <span class="kt">int</span><span class="o">;</span> <span class="n">name</span> <span class="o">:</span> <span class="kt">string</span> <span class="o">}</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="n">make_user</span> <span class="n">name</span> <span class="n">age</span> <span class="o">=</span> <span class="o">{</span> <span class="n">age</span><span class="o">;</span> <span class="n">name</span> <span class="o">}</span>
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="n">increase_age</span> <span class="n">user</span> <span class="o">=</span> <span class="n">user</span><span class="o">.</span><span class="n">age</span> <span class="o">&lt;-</span> <span class="n">user</span><span class="o">.</span><span class="n">age</span> <span class="o">+</span> <span class="n">1</span>
+</span></span><span class="line"><span class="cl">
+</span></span><span class="line"><span class="cl"><span class="k">let</span> <span class="bp">()</span> <span class="o">=</span>
+</span></span><span class="line"><span class="cl">  <span class="k">let</span> <span class="n">user</span> <span class="o">=</span> <span class="n">make_user</span> <span class="s2">&quot;Emil&quot;</span> <span class="n">25</span> <span class="k">in</span>
+</span></span><span class="line"><span class="cl">  <span class="nn">Printf</span><span class="p">.</span><span class="n">printf</span> <span class="s2">&quot;It's %s's birthday today! Congratz!!!!&quot;</span> <span class="n">user</span><span class="o">.</span><span class="n">name</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">print_newline</span> <span class="bp">()</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">print_int</span> <span class="n">user</span><span class="o">.</span><span class="n">age</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">print_newline</span> <span class="bp">()</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">increase_age</span> <span class="n">user</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">print_int</span> <span class="n">user</span><span class="o">.</span><span class="n">age</span><span class="o">;</span>
+</span></span><span class="line"><span class="cl">  <span class="n">print_newline</span> <span class="bp">()</span>
+</span></span></code></pre></td></tr></table>
+</div>
+</div><p>What I mean by &ldquo;it&rsquo;s functional on easy mode&rdquo; is simply that the language is designed to be a functional language, but you are not forced to strictly adhere to functional programming rules.</p>
+<h2>The end</h2>
+<p>It is clear to me that a good type system can greatly improve the developer experience. I particularly appreciate OCaml&rsquo;s type system, as well as its <code>option</code> and <code>result</code> types, which I use frequently. In languages like Haskell, you can extend the type system significantly, to the point where you can write an entire application using only types. However, I believe that this can lead to overly complex code. This is another aspect of OCaml that I appreciate - it has a strong type system, but there are limitations on how far you can extend it.</p>
+<p>I hope you enjoyed this article. If you are interested in joining a community of people who also enjoy functional programming, I recommend joining this <a href="https://discord.gg/M5PBxbnta3">Discord server.</a></p>
+
diff --git a/data/planet/janestreet/visualizing-piecewise-linear-neural-networks.md b/data/planet/janestreet/visualizing-piecewise-linear-neural-networks.md
new file mode 100644
index 0000000000..acc96718c5
--- /dev/null
+++ b/data/planet/janestreet/visualizing-piecewise-linear-neural-networks.md
@@ -0,0 +1,15 @@
+---
+title: Visualizing piecewise linear neural networks
+description: Neural networks are often thought of as opaque, black-box function approximators,
+  but theoretical tools let us describe and visualize their behavior. In part...
+url: https://blog.janestreet.com/visualizing-piecewise-linear-neural-networks/
+date: 2024-07-22T00:00:00-00:00
+preview_image: https://blog.janestreet.com/visualizing-piecewise-linear-neural-networks/./6_1.png
+authors:
+- Jane Street Tech Blog
+source:
+---
+
+<p>Neural networks are often thought of as opaque, black-box function approximators, but theoretical tools let us describe and visualize their behavior. In particular, let&rsquo;s study piecewise-linearity, a property many neural networks share. This property has <a href="https://arxiv.org/abs/1312.6098">been</a> <a href="https://arxiv.org/abs/1711.02114">studied</a> <a href="https://arxiv.org/abs/1903.08778">before</a>, but we&rsquo;ll try to visualize it in more detail than has been previously done.&nbsp;</p>
+
+
diff --git a/data/planet/ocamlpro/opam-220-release.md b/data/planet/ocamlpro/opam-220-release.md
new file mode 100644
index 0000000000..deed6db367
--- /dev/null
+++ b/data/planet/ocamlpro/opam-220-release.md
@@ -0,0 +1,246 @@
+---
+title: opam 2.2.0 release!
+description: 'Feedback on this post is welcomed on Discuss! We are very pleased to
+  announce the release of opam 2.2.0, and encourage all users to upgrade. Please read
+  on for installation and upgrade instructions. NOTE: this article is cross-posted
+  on opam.ocaml.org and ocamlpro.com, and published in discuss.ocaml...'
+url: https://ocamlpro.com/blog/2024_07_01_opam_2_2_0_releases
+date: 2024-07-01T13:31:53-00:00
+preview_image: https://ocamlpro.com/assets/img/og_image_ocp_the_art_of_prog.png
+authors:
+- "\n    Raja Boujbel - OCamlPro\n  "
+source:
+---
+
+<p><em>Feedback on this post is welcomed on <a href="https://discuss.ocaml.org/t/ann-opam-2-2-0-is-out/14893">Discuss</a>!</em></p>
+<p>We are very pleased to announce the release of opam 2.2.0, and encourage all users to upgrade. Please read on for installation and upgrade instructions.</p>
+<blockquote>
+<p>NOTE: this article is cross-posted on <a href="https://opam.ocaml.org/blog/">opam.ocaml.org</a> and <a href="https://ocamlpro.com/blog">ocamlpro.com</a>, and published in <a href="https://discuss.ocaml.org/t/ann-opam-2-2-0-is-out/14893">discuss.ocaml.org</a>.</p>
+</blockquote>
+<h2>Try it!</h2>
+<p>In case you plan a possible rollback, you may want to first backup your
+<code>~/.opam</code> or <code>$env:LOCALAPPDATAopam</code> directory.</p>
+<p>The upgrade instructions are unchanged:</p>
+<ol>
+<li>Either from binaries: run
+</li>
+</ol>
+<p>For Unix systems</p>
+<pre><code class="language-shell-session">bash -c &quot;sh &lt;(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --version 2.2.0&quot;
+</code></pre>
+<p>or from PowerShell for Windows systems</p>
+<pre><code class="language-shell-session">Invoke-Expression &quot;&amp; { $(Invoke-RestMethod https://raw.githubusercontent.com/ocaml/opam/master/shell/install.ps1) }&quot;
+</code></pre>
+<p>or download manually from <a href="https://github.com/ocaml/opam/releases/tag/2.2.0">the Github &quot;Releases&quot; page</a> to your PATH.</p>
+<ol start="2">
+<li>Or from source, manually: see the instructions in the <a href="https://github.com/ocaml/opam/tree/2.2.0#compiling-this-repo">README</a>.
+</li>
+</ol>
+<p>You should then run:</p>
+<pre><code class="language-shell-session">opam init --reinit -ni
+</code></pre>
+<h2>Changes</h2>
+<h3>Major change: Windows support</h3>
+<p>After 8 years' effort, opam and opam-repository now have official native Windows
+support! A big thank you is due to Andreas Hauptmann (<a href="https://github.com/fdopen">@fdopen</a>),
+whose <a href="https://github.com/fdopen/godi-repo">WODI</a> and <a href="https://fdopen.github.io/opam-repository-mingw/">OCaml for Windows</a>
+projects were for many years the principal downstream way to obtain OCaml on
+Windows, Jun Furuse (<a href="https://github.com/camlspotter">@camlspotter</a>) whose
+<a href="https://inbox.vuxu.org/caml-list/CAAoLEWsQK7=qER66Uixx5pq4wLExXovrQWM6b69_fyMmjYFiZA@mail.gmail.com/">initial experimentation with OPAM from Cygwin</a>
+formed the basis of opam-repository-mingw, and, most recently,
+Jonah Beckford (<a href="https://github.com/JonahBeckford">@jonahbeckford</a>) whose
+<a href="https://diskuv.com/dkmlbook/">DkML</a> distribution kept - and keeps - a full
+development experience for OCaml available on Windows.</p>
+<p>OCaml when used on native Windows requires certain tools from the Unix world
+which are provided by either <a href="https://cygwin.com">Cygwin</a> or <a href="https://msys2.org">MSYS2</a>.
+We have engineered <code>opam init</code> so that it is possible for a user not to need to
+worry about this, with <code>opam</code> managing the Unix world, and the user being able
+to use OCaml from either the Command Prompt or PowerShell. However, for the Unix
+user coming over to Windows to test their software, it is also possible to have
+your own Cygwin/MSYS2 installation and use native Windows opam from that. Please
+see the <a href="https://opam.ocaml.org/blog/opam-2-2-0-windows/">previous blog post</a>
+for more information.</p>
+<p>There are two &quot;ports&quot; of OCaml on native Windows, referred to by the name of
+provider of the C compiler. The mingw-w64 port is <a href="https://www.mingw-w64.org/">GCC-based</a>.
+opam's external dependency (depext) system works for this port (including
+providing GCC itself), and many packages are already well-supported in
+opam-repository, thanks to the previous efforts in <a href="https://github.com/fdopen/opam-repository-mingw">opam-repository-mingw</a>.
+The MSVC port is <a href="https://visualstudio.microsoft.com/">Visual Studio-based</a>. At
+present, there is less support in this ecosystem for external dependencies,
+though this is something we expect to work on both in opam-repository and in
+subsequent opam releases. In particular, it is necessary to install
+Visual Studio or Visual Studio BuildTools separately, but opam will then
+automatically find and use the C compiler from Visual Studio.</p>
+<h3>Major change: opam tree / opam why</h3>
+<p><code>opam tree</code> is a new command showing packages and their dependencies with a tree view.
+It is very helpful to determine which packages bring which dependencies in your installed switch.</p>
+<pre><code class="language-shell-session">$ opam tree cppo
+cppo.1.6.9
+&#9500;&#9472;&#9472; base-unix.base
+&#9500;&#9472;&#9472; dune.3.8.2 (&gt;= 1.10)
+&#9474;   &#9500;&#9472;&#9472; base-threads.base
+&#9474;   &#9500;&#9472;&#9472; base-unix.base [*]
+&#9474;   &#9492;&#9472;&#9472; ocaml.4.14.1 (&gt;= 4.08)
+&#9474;       &#9500;&#9472;&#9472; ocaml-base-compiler.4.14.1 (&gt;= 4.14.1~ &amp; &lt; 4.14.2~)
+&#9474;       &#9492;&#9472;&#9472; ocaml-config.2 (&gt;= 2)
+&#9474;           &#9492;&#9472;&#9472; ocaml-base-compiler.4.14.1 (&gt;= 4.12.0~) [*]
+&#9492;&#9472;&#9472; ocaml.4.14.1 (&gt;= 4.02.3) [*]
+</code></pre>
+<p>Reverse-dependencies can also be displayed using the new <code>opam why</code> command.
+This is useful to examine how dependency versions get constrained.</p>
+<pre><code class="language-shell-session">$ opam why cmdliner
+cmdliner.1.2.0
+&#9500;&#9472;&#9472; (&gt;= 1.1.0) b0.0.0.5
+&#9474;   &#9492;&#9472;&#9472; (= 0.0.5) odig.0.0.9
+&#9500;&#9472;&#9472; (&gt;= 1.1.0) ocp-browser.1.3.4
+&#9500;&#9472;&#9472; (&gt;= 1.0.0) ocp-indent.1.8.1
+&#9474;   &#9492;&#9472;&#9472; (&gt;= 1.4.2) ocp-index.1.3.4
+&#9474;       &#9492;&#9472;&#9472; (= version) ocp-browser.1.3.4 [*]
+&#9500;&#9472;&#9472; (&gt;= 1.1.0) ocp-index.1.3.4 [*]
+&#9500;&#9472;&#9472; (&gt;= 1.1.0) odig.0.0.9 [*]
+&#9500;&#9472;&#9472; (&gt;= 1.0.0) odoc.2.2.0
+&#9474;   &#9492;&#9472;&#9472; (&gt;= 2.0.0) odig.0.0.9 [*]
+&#9500;&#9472;&#9472; (&gt;= 1.1.0) opam-client.2.2.0~alpha
+&#9474;   &#9500;&#9472;&#9472; (= version) opam.2.2.0~alpha
+&#9474;   &#9492;&#9472;&#9472; (= version) opam-devel.2.2.0~alpha
+&#9500;&#9472;&#9472; (&gt;= 1.1.0) opam-devel.2.2.0~alpha [*]
+&#9500;&#9472;&#9472; (&gt;= 0.9.8) opam-installer.2.2.0~alpha
+&#9492;&#9472;&#9472; user-setup.0.7
+</code></pre>
+<blockquote>
+<p>Special thanks to <a href="https://github.com/cannorin">@cannorin</a> for contributing this feature.</p>
+</blockquote>
+<h3>Major change: with-dev-setup</h3>
+<p>There is now a way for a project maintainer to share their project development
+tools: the <code>with-dev-setup</code> dependency flag. It is used in the same way as
+<code>with-doc</code> and <code>with-test</code>: by adding a <code>{with-dev-setup}</code> filter after a
+dependency. It will be ignored when installing normally, but it's pulled in when the
+package is explicitly installed with the <code>--with-dev-setup</code> flag specified on
+the command line.</p>
+<p>For example</p>
+<pre><code class="language-shell-session">opam-version: &quot;2.0&quot;
+depends: [
+  &quot;ocaml&quot;
+  &quot;ocp-indent&quot; {with-dev-setup}
+]
+build: [make]
+install: [make &quot;install&quot;]
+post-messages:
+[ &quot;Thanks for installing the package&quot;
+  &quot;as well as its development setup. It will help with your future contributions&quot; {with-dev-setup} ]
+</code></pre>
+<h3>Major change: opam pin --recursive</h3>
+<p>When pinning a package using <code>opam pin</code>, opam looks for opam files in the root directory only.
+With recursive pinning, you can now instruct opam to look for <code>.opam</code> files in
+subdirectories as well, while maintaining the correct relationship between the <code>.opam</code>
+files and the package root for versioning and build purposes.</p>
+<p>Recursive pinning is enabled by the following options to <code>opam pin</code> and <code>opam install</code>:</p>
+<ul>
+<li>With <code>--recursive</code>, opam will look for <code>.opam</code> files recursively in all subdirectories.
+</li>
+<li>With <code>--subpath &lt;path&gt;</code>, opam will only look for <code>.opam</code> files in the subdirectory <code>&lt;path&gt;</code>.
+</li>
+</ul>
+<p>The two options can be combined: for instance, if your opam packages are stored
+as a deep hierarchy in the <code>mylib</code> subdirectory of your project you can try
+<code>opam pin . --recursive --subpath mylib</code>.</p>
+<p>These options are useful when dealing with a large monorepo-type repository with many
+opam libraries spread about.</p>
+<h3>New Options</h3>
+<ul>
+<li>
+<p><code>opam switch -</code>, inspired by <code>git switch -</code>, makes opam switch back to the previously
+selected global switch.</p>
+</li>
+<li>
+<p><code>opam pin --current</code> fixes a package to its current state (disabling pending
+reinstallations or removals from the repository). The installed package will
+be pinned to its current installed state, i.e. the pinned opam file is the
+one installed.</p>
+</li>
+<li>
+<p><code>opam pin remove --all</code> removes all the pinned packages from a switch.</p>
+</li>
+<li>
+<p><code>opam exec --no-switch</code> removes the opam environment when running a command.
+It is useful when you want to launch a command without opam environment changes.</p>
+</li>
+<li>
+<p><code>opam clean --untracked</code> removes untracked files interactively remaining
+from previous packages removal.</p>
+</li>
+<li>
+<p><code>opam admin add-constraint &lt;cst&gt; --packages pkg1,pkg2,pkg3</code> applies the given constraint
+to a given set of packages</p>
+</li>
+<li>
+<p><code>opam list --base</code> has been renamed into <code>--invariant</code>, reflecting the fact that since opam 2.1 the &quot;base&quot; packages of a switch are instead expressed using a switch invariant.</p>
+</li>
+<li>
+<p><code>opam install --formula &lt;formula&gt;</code> installs a formula instead of a list of packages. This can be useful if you would like to install one package or another one. For example <code>opam install --formula '&quot;extlib&quot; |&quot;extlib-compat&quot;'</code> will install either <code>extlib</code> or <code>extlib-compat</code> depending on what's best for the current switch.</p>
+</li>
+</ul>
+<h3>Miscellaneous changes</h3>
+<ul>
+<li>The UI now displays a status when extracting an archive or reloading a repository
+</li>
+<li>Overhauled the implementation of <code>opam env</code>, fixing many corner cases for environment updates and making the reverting of package environment variables precise. As a result, using <code>setenv</code> in an opam file no longer triggers a lint warning.
+</li>
+<li>Fix parsing pre-opam 2.1.4 switch import files containing extra-files
+</li>
+<li>Add a new <code>sys-ocaml-system</code> default global eval variable
+</li>
+<li>Hijack the <code>&quot;%{var?string-if-true:string-if-false-or-undefined}%&quot;</code> syntax to
+support extending the variables of packages with <code>+</code> in their name
+(<code>conf-c++</code> and <code>conf-g++</code> already exist) using <code>&quot;%{?pgkname:var:}%&quot;</code>
+</li>
+<li>Fix issues when using fish as shell
+</li>
+<li>Sandbox: Mark the user temporary directory
+(as returned by <code>getconf DARWIN_USER_TEMP_DIR</code>) as writable when TMPDIR
+is not defined on macOS
+</li>
+<li>Add Warning 69: Warn for new syntax when package name in variable in string
+interpolation contains several '+' (this is related to the &quot;hijack&quot; item above)
+</li>
+<li>Add support for Wolfi OS, treating it like Alpine family as it also uses apk
+</li>
+<li>Sandbox: <code>/tmp</code> is now writable again, restoring POSIX compliance
+</li>
+<li>Add a new <code>opam admin: new add-extrafiles</code> command to add/check/update the <code>extra-files:</code> field according to the files present in the <code>files/</code> directory
+</li>
+<li>Add a new <code>opam lint -W @1..9</code> syntax to allow marking a set of warnings as errors
+</li>
+<li>Fix bugs in the handling of the <code>OPAMCURL</code>, <code>OPAMFETCH</code> and <code>OPAMVERBOSE</code> environment variables
+</li>
+<li>Fix bugs in the handling of the <code>--assume-built</code> argument
+</li>
+<li>Software Heritage fallbacks is now supported, but is disabled-by-default for now. For more information you can read one of our <a href="https://opam.ocaml.org/blog/opam-2-2-0-alpha/#Software-Heritage-Binding">previous blog post</a>
+</li>
+</ul>
+<p>And many other general and performance improvements were made and bugs were fixed.
+You can take a look to previous blog posts.
+API changes and a more detailed description of the changes are listed in:</p>
+<ul>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-alpha">the release note for 2.2.0~alpha</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-alpha2">the release note for 2.2.0~alpha2</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-alpha3">the release note for 2.2.0~alpha3</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-beta1">the release note for 2.2.0~beta1</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-beta2">the release note for 2.2.0~beta2</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-beta3">the release note for 2.2.0~beta3</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0-rc1">the release note for 2.2.0~rc1</a>
+</li>
+<li><a href="https://github.com/ocaml/opam/releases/tag/2.2.0">the release note for 2.2.0</a>
+</li>
+</ul>
+<p>This release also includes PRs improving the documentation and improving
+and extending the tests.</p>
+<p>Please report any issues to <a href="https://github.com/ocaml/opam/issues">the bug-tracker</a>.</p>
+<p>We hope you will enjoy the new features of opam 2.2! &#128239;</p>
+
diff --git a/data/planet/signalsandthreads/from-the-lab-to-the-trading-floor-with-erin-murphy.md b/data/planet/signalsandthreads/from-the-lab-to-the-trading-floor-with-erin-murphy.md
new file mode 100644
index 0000000000..d86765879d
--- /dev/null
+++ b/data/planet/signalsandthreads/from-the-lab-to-the-trading-floor-with-erin-murphy.md
@@ -0,0 +1,13 @@
+---
+title: From the Lab to the Trading Floor with Erin Murphy
+description:
+url: https://signals-threads.simplecast.com/episodes/from-the-lab-to-the-trading-floor-with-erin-murphy-hD6GHMhc
+date: 2024-07-12T19:15:09-00:00
+preview_image:
+authors:
+- Signals and Threads
+source:
+---
+
+<p>Erin Murphy is Jane Street&rsquo;s first UX designer, and before that, she worked at NASA&rsquo;s Jet Propulsion Laboratory building user interfaces for space missions. She&rsquo;s also an illustrator with her own quarterly journal. In this episode, Erin and Ron discuss the challenge of doing user-centered design in an organization where experts are used to building tools for themselves. How do you bring a command-line interface to the web without making it worse for power users? They also discuss how beauty in design is more about utility than aesthetics; what Jane Street looks for in UX candidates; and how to help engineers discover what their users really want.</p><p>You can find the transcript for this episode &nbsp;on our <a href="https://signalsandthreads.com/from-the-lab-to-the-trading-floor" target="_blank">website</a>.</p><p>Some links to topics that came up in the discussion:</p><ul><li>Erin&rsquo;s <a href="https://www.byerinmurphy.com/">website</a> that shows off her work.</li><li>Her <a href="https://www.farfromthis.com/">quarterly journal</a> of sketches and observations.</li><li>An <a href="https://depts.washington.edu/desalum/2017/05/10/erin-murphy/">article</a> about Erin&rsquo;s design work with NASA JPL.</li><li>A <a href="https://ai.jpl.nasa.gov/public/documents/papers/castano-etal-AERO2022.pdf">paper</a> that among other things talks about the user study work that Erin did at JPL.</li><li>Jane Street&rsquo;s <a href="https://www.janestreet.com/join-jane-street/position/6847048002/">current UX job opening</a>.</li></ul>
+
diff --git a/data/planet/talex5/ocaml-5-performance-part-2.md b/data/planet/talex5/ocaml-5-performance-part-2.md
new file mode 100644
index 0000000000..dc149a6083
--- /dev/null
+++ b/data/planet/talex5/ocaml-5-performance-part-2.md
@@ -0,0 +1,927 @@
+---
+title: OCaml 5 performance part 2
+description:
+url: https://roscidus.com/blog/blog/2024/07/22/performance-2/
+date: 2024-07-22T11:00:00-00:00
+preview_image:
+authors:
+- Thomas Leonard
+source:
+---
+
+<p>The <a href="https://roscidus.com/blog/blog/2024/07/22/performance/">last post</a> looked at using various tools to understand why an OCaml 5 program was waiting a long time for IO.
+In this post, I'll be trying out some tools to investigate a compute-intensive program that uses multiple CPUs.</p>
+
+<p><strong>Table of Contents</strong></p>
+<ul>
+<li><a href="https://roscidus.com/#the-problem">The problem</a>
+</li>
+<li><a href="https://roscidus.com/#threadsanitizer">ThreadSanitizer</a>
+</li>
+<li><a href="https://roscidus.com/#perf">perf</a>
+</li>
+<li><a href="https://roscidus.com/#mpstat">mpstat</a>
+</li>
+<li><a href="https://roscidus.com/#offcputime">offcputime</a>
+</li>
+<li><a href="https://roscidus.com/#the-ocaml-garbage-collector">The OCaml garbage collector</a>
+</li>
+<li><a href="https://roscidus.com/#statmemprof">statmemprof</a>
+</li>
+<li><a href="https://roscidus.com/#magic-trace">magic-trace</a>
+</li>
+<li><a href="https://roscidus.com/#tuning-gc-parameters">Tuning GC parameters</a>
+</li>
+<li><a href="https://roscidus.com/#simplifying-further">Simplifying further</a>
+</li>
+<li><a href="https://roscidus.com/#perf-sched">perf sched</a>
+</li>
+<li><a href="https://roscidus.com/#olly">olly</a>
+</li>
+<li><a href="https://roscidus.com/#magic-trace-on-the-simple-allocator">magic-trace on the simple allocator</a>
+</li>
+<li><a href="https://roscidus.com/#perf-annotate">perf annotate</a>
+</li>
+<li><a href="https://roscidus.com/#perf-c2c">perf c2c</a>
+</li>
+<li><a href="https://roscidus.com/#perf-stat">perf stat</a>
+</li>
+<li><a href="https://roscidus.com/#conclusions">Conclusions</a>
+</li>
+</ul>
+<p>Further discussion about this post can be found on <a href="https://discuss.ocaml.org/t/ocaml-5-performance/15014/15">discuss.ocaml.org</a>.</p>
+<h2>The problem</h2>
+<p>OCaml 4 allowed running multiple &quot;system threads&quot;, but only one can have the OCaml runtime lock,
+so only one can be running OCaml code at a time.
+OCaml 5 allows running multiple &quot;domains&quot;, all of which can be running OCaml code at the same time
+(each domain can also have multiple system threads; only one system thread can be running OCaml code per domain).</p>
+<p>The <a href="https://github.com/ocurrent/ocaml-ci/">ocaml-ci</a> service provides CI for many OCaml programs,
+and its first step when testing a commit is to run a solver to select compatible versions for its dependencies.
+Running a solve typically only takes about a second, but it has to do it for each possible test platform,
+which includes versions of the OCaml compiler from 4.02 to 4.14 and 5.0 to 5.2,
+multiple architectures (32-bit and 64-bit x86, 32-bit and 64-bit ARM, PPC64 and s390x),
+operating systems (Alpine, Debian, Fedora, FreeBSD, macos, OpenSUSE and Ubuntu, in multiple versions), etc.
+In total, this currently does 132 solver runs per commit being tested
+(which seems too high to me, but let's ignore that for now).</p>
+<p>The solves are done by <a href="https://github.com/ocurrent/solver-service">the solver-service</a>,
+which runs on a couple of ARM machines with 160 cores each.
+The old OCaml 4 version used to work by spawning lots of sub-processes,
+but when OCaml 5 came out, I ported it to use a single process with multiple domains.
+That removed the need for lots of communication logic,
+and allowed sharing common data such as the package definitions.
+The code got a lot shorter and simpler, and I'm told it's been much more reliable too.</p>
+<p>But the performance was surprisingly bad.
+Here's a graph showing how the number of solves per second scales with the number of CPUs (workers) being used:</p>
+<p><a href="https://roscidus.com/blog/images/perf/solver-arm-orig.svg"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/solver-arm-orig.svg" title="Processes scaling better than domains" class="caption"/><span class="caption-text">Processes scaling better than domains</span></span></a></p>
+<p>The &quot;Processes&quot; line shows performance when forking multiple processes to do the work, which looks pretty good.
+The &quot;Domains&quot; line shows what happens if you instead spawn domains inside a single process.</p>
+<p>Note: The original service used many libraries (a mix of Eio and Lwt ones),
+but to make investigation easier I simplified it by removing most of them.
+The <a href="https://github.com/talex5/solver-service/tree/simplify">simplified version</a> doesn't use Eio or Lwt;
+it just spawns some domains/processes and has each of them do the same solve in a loop a fixed number of times.</p>
+<h2>ThreadSanitizer</h2>
+<p>When converting a single-domain OCaml 4 program to use multiple cores it's easy to introduce races.
+OCaml has <a href="https://ocaml.org/manual/5.2/tsan.html">ThreadSanitizer</a> (TSan) support which can detect these.
+To use it, install an OCaml compiler with the <code>tsan</code> option:</p>
+<pre><code>$ opam switch create 5.2.0-tsan ocaml-variants.5.2.0+options ocaml-option-tsan
+</code></pre>
+<p>Things run a lot slower and require more memory with this compiler, but it's good to check:</p>
+<pre><code>$ ./_build/default/stress/stress.exe --internal-workers=2
+[...]
+WARNING: ThreadSanitizer: data race (pid=133127)
+  Write of size 8 at 0x7ff2b7814d38 by thread T4 (mutexes: write M88):
+    #0 camlOpam_0install__Model.group_ors_1288 lib/model.ml:70 (stress.exe+0x1d2bba)
+    #1 camlOpam_0install__Model.group_ors_1288 lib/model.ml:120 (stress.exe+0x1d2b47)
+    ...
+
+  Previous write of size 8 at 0x7ff2b7814d38 by thread T1 (mutexes: write M83):
+    #0 camlOpam_0install__Model.group_ors_1288 lib/model.ml:70 (stress.exe+0x1d2bba)
+    #1 camlOpam_0install__Model.group_ors_1288 lib/model.ml:120 (stress.exe+0x1d2b47)
+    ...
+
+  Mutex M88 (0x558368b95358) created at:
+    #0 pthread_mutex_init ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1295 (libtsan.so.2+0x50468)
+    #1 caml_plat_mutex_init runtime/platform.c:57 (stress.exe+0x4763b2)
+    #2 caml_init_domains runtime/domain.c:943 (stress.exe+0x44ebfe)
+    ...
+
+  Mutex M83 (0x558368b95240) created at:
+    #0 pthread_mutex_init ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1295 (libtsan.so.2+0x50468)
+    #1 caml_plat_mutex_init runtime/platform.c:57 (stress.exe+0x4763b2)
+    #2 caml_init_domains runtime/domain.c:943 (stress.exe+0x44ebfe)
+    ...
+
+  Thread T4 (tid=133132, running) created by main thread at:
+    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x5e686)
+    #1 caml_domain_spawn runtime/domain.c:1265 (stress.exe+0x4504c4)
+    ...
+
+  Thread T1 (tid=133129, running) created by main thread at:
+    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x5e686)
+    #1 caml_domain_spawn runtime/domain.c:1265 (stress.exe+0x4504c4)
+    ...
+
+SUMMARY: ThreadSanitizer: data race lib/model.ml:70 in camlOpam_0install__Model.group_ors_1288
+</code></pre>
+<p>The two mutexes mentioned in the output, M83 and M88, are the <code>domain_lock</code>,
+used to ensure only one sys-thread runs at a time in each domain.
+In this program we only have one sys-thread per domain and so can ignore them.</p>
+<p>The output reveals that the solver used a global variable to generate unique IDs:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+<span class="line-number">5</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="k">let</span> <span class="n">fresh_id</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">let</span> <span class="n">i</span> <span class="o">=</span> <span class="n">ref</span> <span class="mi">0</span> <span class="k">in</span>
+</span><span class="line">  <span class="k">fun</span> <span class="bp">()</span> <span class="o">-&gt;</span>
+</span><span class="line">    <span class="n">incr</span> <span class="n">i</span><span class="o">;</span>           <span class="c">(* model.ml:70 *)</span>
+</span><span class="line">    <span class="o">!</span><span class="n">i</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p>With that fixed, TSan finds no further problems (in this simplified version).
+This gives us good confidence that there isn't any shared state:
+TSan would report use of shared state not protected by a mutex,
+and since the program was written for OCaml 4 it won't be using any mutexes.</p>
+<p>That's good, because if one thread writes to a location that another reads then that requires coordination between CPUs,
+which is relatively slow
+(though we could still experience slow-downs due to <a href="https://en.wikipedia.org/wiki/False_sharing">false sharing</a>,
+where two separate mutable items end up in the same cache line).
+However, while important for correctness, it didn't make any noticeable difference to the benchmark results.</p>
+<h2>perf</h2>
+<p><a href="https://perf.wiki.kernel.org/">perf</a> is the obvious tool to use when facing CPU performance problems.
+<code>perf record -g PROG</code> takes samples of the program's stack regularly,
+so that functions that run a lot or for a long time will appear often.
+<code>perf report</code> provides a UI to explore the results:</p>
+<pre><code>$ perf report
+  Children      Self  Command     Shared Object      Symbol
++   59.81%     0.00%  stress.exe  stress.exe         [.] Zeroinstall_solver.Solver_core.do_solve_2283
++   59.44%     0.00%  stress.exe  stress.exe         [.] Opam_0install.Solver.solve_1428
++   59.25%     0.00%  stress.exe  stress.exe         [.] Dune.exe.Domain_worker.solve_951
++   58.88%     0.00%  stress.exe  stress.exe         [.] Dune.exe.Stress.run_worker_332
++   58.18%     0.00%  stress.exe  stress.exe         [.] Stdlib.Domain.body_735
++   57.91%     0.00%  stress.exe  stress.exe         [.] caml_start_program
++   34.39%     0.69%  stress.exe  stress.exe         [.] Stdlib.List.iter_366
++   34.39%     0.03%  stress.exe  stress.exe         [.] Zeroinstall_solver.Solver_core.lookup_845
++   34.39%     0.09%  stress.exe  stress.exe         [.] Zeroinstall_solver.Solver_core.process_dep_2024
++   33.14%     0.03%  stress.exe  stress.exe         [.] Zeroinstall_solver.Sat.run_solver_1446
++   27.28%     0.00%  stress.exe  stress.exe         [.] Zeroinstall_solver.Solver_core.build_problem_2092
++   26.27%     0.02%  stress.exe  stress.exe         [.] caml_call_gc
+</code></pre>
+<p>Looks like we're spending most of our time solving, as expected.
+But this can be misleading.
+Because perf only records stack traces when the code is running, it doesn't report any time the process spent sleeping.</p>
+<pre><code>$ /usr/bin/time ./_build/default/stress/stress.exe --count=10 --internal-workers=7
+73.08user 0.61system 0:12.65elapsed 582%CPU (0avgtext+0avgdata 596608maxresident)k
+</code></pre>
+<p>With 7 workers, we'd expect to see <code>700%CPU</code>, but we only see <code>582%</code>.</p>
+<h2>mpstat</h2>
+<p><a href="https://www.man7.org/linux/man-pages/man1/mpstat.1.html">mpstat</a> can show a per-CPU breakdown.
+Here are a couple of one second intervals on my machine while the solver was running:</p>
+<pre><code>$ mpstat --dec=0 -P ALL 1
+16:24:39     CPU    %usr   %sys %iowait    %irq   %soft  %steal   %idle
+16:24:40     all      78      1       2       1       0       0      18
+16:24:40       0      19      1       0       1       0       1      78
+16:24:40       1      88      1       0       1       0       0      10
+16:24:40       2      88      1       0       1       0       0      10
+16:24:40       3      88      0       0       0       0       1      11
+16:24:40       4      89      1       0       0       0       0      10
+16:24:40       5      90      0       0       1       0       0       9
+16:24:40       6      79      1       0       1       1       1      17
+16:24:40       7      86      0      12       1       1       0       0
+
+16:24:40     CPU    %usr   %sys %iowait    %irq   %soft  %steal   %idle
+16:24:41     all      80      1       2       1       0       0      17
+16:24:41       0      85      0      12       1       0       1       1
+16:24:41       1      91      1       0       1       0       0       7
+16:24:41       2      90      0       0       1       1       0       8
+16:24:41       3      89      1       0       1       0       0       9
+16:24:41       4      67      1       0       1       0       0      31
+16:24:41       5      52      1       0       0       0       1      46
+16:24:41       6      76      1       0       1       0       0      22
+16:24:41       7      90      1       0       0       0       0       9
+</code></pre>
+<p>Note: I removed some columns with all zero values to save space.</p>
+<p>We might expect to see 7 CPUs running at 100% and one idle CPU,
+but in fact they're all moderately busy.
+On the other hand, none of them spent more than 91% of its time running the solver code.</p>
+<h2>offcputime</h2>
+<p><a href="https://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html">offcputime</a> will show why a process wasn't using a CPU
+(it's like <code>offwaketime</code>, which we saw earlier, but doesn't record the waker).
+Here I'm using <a href="https://www.man7.org/linux/man-pages/man1/pidstat.1.html">pidstat</a> to see all running threads and then examining one of the workers,
+to avoid the problem we saw last time where the diagram included multiple threads:</p>
+<pre><code>$ pidstat 1 -t
+...
+^C
+Average:      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
+Average:     1000     78304         -  550.50    9.41    0.00    0.00  559.90     -  stress.exe
+Average:     1000         -     78305   91.09    1.49    0.00    0.00   92.57     -  |__stress.exe
+Average:     1000         -     78307    8.42    0.99    0.00    0.00    9.41     -  |__stress.exe
+Average:     1000         -     78308   90.59    1.49    0.00    0.00   92.08     -  |__stress.exe
+Average:     1000         -     78310   90.59    1.49    0.00    0.00   92.08     -  |__stress.exe
+Average:     1000         -     78312   91.09    1.49    0.00    0.00   92.57     -  |__stress.exe
+Average:     1000         -     78314   89.11    1.49    0.00    0.00   90.59     -  |__stress.exe
+Average:     1000         -     78316   89.60    1.98    0.00    0.00   91.58     -  |__stress.exe
+
+$ sudo offcputime-bpfcc -f -t 78310 &gt; off-cpu
+</code></pre>
+<p>Note: The ARM machine's kernel was too old to run <code>offcputime</code>, so I ran this on my machine instead,
+with one main domain and six workers.
+As I needed good stacks for C functions too, I ran stress.exe in an Ubuntu 24.04 docker container,
+as recent versions of Ubuntu compile with <a href="https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html">frame pointers by default</a>.</p>
+<p>The raw output was very noisy, showing it waiting in many different places.
+Looking at a few, it was clear it was mostly the GC (which can run from almost anywhere).
+The output is just a text-file with one line per stack-trace, and bit of <code>sed</code> cleaned it up:</p>
+<pre><code>$ sed -E 's/stress.exe;.*;(caml_call_gc|caml_handle_gc_interrupt|caml_poll_gc_work|asm_sysvec_apic_timer_interrupt|asm_sysvec_reschedule_ipi);/stress.exe;\\1;/' off-cpu &gt; off-cpu-gc
+$ flamegraph.pl --colors=blue off-cpu-gc &gt; off-cpu-gc.svg
+</code></pre>
+<p>That removes the part of the stack-trace before any of various interrupt-type functions that can be called from anywhere.
+The graph is blue to indicate that it shows time when the process wasn't running.</p>
+<p><a href="https://roscidus.com/blog/images/perf/off-cpu-gc.svg"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/off-cpu-gc.svg" title="Time spent off-CPU" class="caption"/><span class="caption-text">Time spent off-CPU</span></span></a></p>
+<p>There are rather a lot of traces where we missed the user stack.
+However, the results seem clear enough: when our worker is waiting, it's in the garbage collector,
+calling <code>caml_plat_spin_wait</code>.
+This is used to sleep when a spin-lock has been spinning for too long (after 1000 iterations).</p>
+<h2>The OCaml garbage collector</h2>
+<p>OCaml has a <em>major heap</em> for long-lived values, plus one fixed-size <em>minor heap</em> for each domain.
+New allocations are made sequentially on the allocating domain's minor heap
+(which is very fast, just adjusting a pointer by the size required).</p>
+<p>When the minor heap is full the program performs a <em>minor GC</em>,
+moving any values that are still reachable to the major heap
+and leaving the minor heap empty.</p>
+<p>Garbage collection of the major heap is done in small slices so that the application doesn't pause for long,
+and domains can do marking and sweeping work without needing to coordinate
+(except at the very end of a major cycle, when they briefly synchronise to agree a new cycle is starting).</p>
+<p>However, as minor GCs move values that other domains may be using, they do require all domains to stop.</p>
+<p>Although the simplified test program doesn't use Eio, we can still use <a href="https://github.com/ocaml-multicore/eio-trace">eio-trace</a> to record GC events
+(we just don't see any fibers).
+Here's a screenshot of the solver running with 24 domains on the ARM machine,
+showing it performing GC work (not all domains are visible in the picture):</p>
+<p><a href="https://roscidus.com/blog/images/perf/solver-arm-gc-24.svg"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/solver-arm-gc-24.svg" title="GC work shown in eio-trace" class="caption"/><span class="caption-text">GC work shown in eio-trace</span></span></a></p>
+
+<p>The orange/red parts show when the GC is running and the yellow regions show when the domain is waiting for other domains.
+The thick columns with yellow edges are minor GCs,
+while the thin (almost invisible) red columns without any yellow between them are major slices.
+The second minor GC from the left took longer than usual because the third domain from the top took a while to respond.
+It also didn't do a major slice before that; perhaps it was busy doing something, or maybe Linux scheduled a different process to run then.</p>
+<p>Traces recorded by eio-trace can also be viewed in Perfetto, which shows the nesting better:
+Here's a close-up of a single minor GC, corresponding to the bottom two domains from the second column from the left:</p>
+<p><a href="https://roscidus.com/blog/images/perf/solver-arm-gc-24-perfetto.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/solver-arm-gc-24-perfetto.png" title="Close-up in Perfetto" class="caption"/><span class="caption-text">Close-up in Perfetto</span></span></a></p>
+<ul>
+<li>The domain triggering the GC (the bottom one here) enters a &quot;stw_leader&quot; (stop-the-world) phase
+and waits for the other domains to stop.
+</li>
+<li>One by one, the other domains stop and enter &quot;stw_api_barrier&quot; until all domains have stopped.
+</li>
+<li>All domains perform a minor GC, clearing their minor heaps.
+</li>
+<li>They then enter a &quot;minor_leave_barrier&quot; phase, waiting until all domains have finished.
+</li>
+<li>Each domain returns to running application code.
+</li>
+</ul>
+<p>We can now see why the solver spends so much time sleeping;
+when a domain performs a minor GC, it spends most of the time waiting for other domains.</p>
+<p>(the above is a slight simplification; domains may do some work on the major GC while waiting)</p>
+<h2>statmemprof</h2>
+<p>One obvious solution to GC slowness is to produce less garbage in the first place.
+To do that, we need to find out where the most costly allocations are coming from.
+Tracing every memory allocation tends to make programs unusably slow,
+so OCaml instead provides a <em>statistical</em> memory profiler.</p>
+<p>It was temporarily removed in OCaml 5 because it needed updating for the new multicore GC,
+but has recently been brought back and will be in OCaml 5.3.
+There's a backport to 5.2, but <a href="https://github.com/janestreet/memtrace/pull/22#issuecomment-2199600729">I couldn't get it to work</a>,
+so I just removed the domains stuff from the test and did a single-domain run on OCaml 4.14.
+You need the <a href="https://github.com/janestreet/memtrace">memtrace</a> library to collect samples and <a href="https://github.com/janestreet/memtrace_viewer">memtrace_viewer</a> to view them:</p>
+<pre><code>$ opam install memtrace memtrace_viewer
+</code></pre>
+<p>Put this at the start of the program to enable it:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="k">let</span> <span class="bp">()</span> <span class="o">=</span> <span class="nn">Memtrace</span><span class="p">.</span><span class="n">trace_if_requested</span> <span class="o">~</span><span class="n">context</span><span class="o">:</span><span class="s2">&quot;solver-test&quot;</span> <span class="bp">()</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p>Then running with <code>MEMTRACE</code> set records a trace:</p>
+<pre><code>$ MEMTRACE=solver.ctf ./stress.exe --count=10
+Solved warm-up request in: 1.99s
+Running another 10 * 1 solves...
+
+$ memtrace-viewer solver.ctf
+Processing solver.ctf...
+Serving http://localhost:8080/
+</code></pre>
+<p><a href="https://roscidus.com/blog/images/perf/memtrace-1.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/memtrace-1.png" title="The memtrace viewer UI" class="caption"/><span class="caption-text">The memtrace viewer UI</span></span></a></p>
+<p>The flame graph in the middle shows functions scaled by the amount of memory they allocated.
+Initially it showed two groups, one for the warm-up request and one for the 10 runs.
+To simplify the display, I used the filter panel (on the left) to show only allocations after the 2 second warm-up.
+We can immediately see that <code>OpamVersionCompare.compare</code> is the source of most memory use.</p>
+<p>Focusing on that function shows that it performed 54.1% of all allocations.
+The display now shows allocations performed within it above it (in green),
+and all the places it's called from in blue below:</p>
+<p><a href="https://roscidus.com/blog/images/perf/memtrace-2.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/memtrace-2.png" title="The compare function is expensive!" class="caption"/><span class="caption-text">The compare function is expensive!</span></span></a></p>
+<p>The bulk of the allocations are coming from <a href="https://github.com/ocaml/opam/blob/a1c9c34417735687fd9310e7dc5c4c177e020441/src/core/opamVersionCompare.ml#L20-L27">this loop</a>:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+<span class="line-number">5</span>
+<span class="line-number">6</span>
+<span class="line-number">7</span>
+<span class="line-number">8</span>
+<span class="line-number">9</span>
+<span class="line-number">10</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="c">(* [skip_while_from i f w m] yields the index of the leftmost character</span>
+</span><span class="line"><span class="c"> * in the string [s], starting from [i], and ending at [m], that does</span>
+</span><span class="line"><span class="c"> * not satisfy the predicate [f], or [length w] if no such index exists.  *)</span>
+</span><span class="line"><span class="k">let</span> <span class="n">skip_while_from</span> <span class="n">i</span> <span class="n">f</span> <span class="n">w</span> <span class="n">m</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">let</span> <span class="k">rec</span> <span class="n">loop</span> <span class="n">i</span> <span class="o">=</span>
+</span><span class="line">    <span class="k">if</span> <span class="n">i</span> <span class="o">=</span> <span class="n">m</span> <span class="k">then</span> <span class="n">i</span>
+</span><span class="line">    <span class="k">else</span> <span class="k">if</span> <span class="n">f</span> <span class="n">w</span><span class="o">.[</span><span class="n">i</span><span class="o">]</span> <span class="k">then</span> <span class="n">loop</span> <span class="o">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span> <span class="k">else</span> <span class="n">i</span>
+</span><span class="line">  <span class="k">in</span> <span class="n">loop</span> <span class="n">i</span>
+</span><span class="line">
+</span><span class="line"><span class="k">let</span> <span class="n">skip_zeros</span> <span class="n">x</span> <span class="n">xi</span> <span class="n">xl</span> <span class="o">=</span> <span class="n">skip_while_from</span> <span class="n">xi</span> <span class="o">(</span><span class="k">fun</span> <span class="n">c</span> <span class="o">-&gt;</span> <span class="n">c</span> <span class="o">=</span> <span class="sc">'0'</span><span class="o">)</span> <span class="n">x</span> <span class="n">xl</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p>It's used when processing a version like <code>1.2.3</code> to skip any leading &quot;0&quot; characters
+(so that would compare equal to <code>1.02.3</code>).
+The <code>loop</code> function refers to other variables (such as <code>f</code>) from its context,
+and so OCaml allocates a closure on the heap to hold these variables.
+Even though these allocations are small, we have to do it for every component of every version.
+And we compare versions a lot:
+for every version of a package that says it requires e.g. <code>libfoo { &gt;= &quot;1.2&quot; }</code>,
+we have to check the formula against every version of libfoo.</p>
+<p>The solution is rather simple (and shorter than the original!):</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="k">let</span> <span class="k">rec</span> <span class="n">skip_while_from</span> <span class="n">i</span> <span class="n">f</span> <span class="n">w</span> <span class="n">m</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">if</span> <span class="n">i</span> <span class="o">=</span> <span class="n">m</span> <span class="k">then</span> <span class="n">i</span>
+</span><span class="line">  <span class="k">else</span> <span class="k">if</span> <span class="n">f</span> <span class="n">w</span><span class="o">.[</span><span class="n">i</span><span class="o">]</span> <span class="k">then</span> <span class="n">skip_while_from</span> <span class="o">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span> <span class="n">f</span> <span class="n">w</span> <span class="n">m</span> <span class="k">else</span> <span class="n">i</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p>Removing the other allocations from <code>compare</code> too reduces total memory allocations
+from 21.8G to 9.6G!
+The processes benchmark got about 14% faster, while the domains one was 23% faster:</p>
+<p><a href="https://roscidus.com/blog/images/perf/solver-arm-no-alloc.svg"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/solver-arm-no-alloc.svg" title="Effect of reducing allocations. Old values are shown in grey." class="caption"/><span class="caption-text">Effect of reducing allocations. Old values are shown in grey.</span></span></a></p>
+<p>A nice optimisation,
+but using domains is still nowhere close to even the original version with separate processes.</p>
+<h2>magic-trace</h2>
+<p>The traces above show the solver taking a long time for all domains to enter the <code>stw_api_barrier</code> phase.
+What was the slow domain doing to cause that?
+<code>magic-trace</code> let's us tell it when to save the ring buffer and we can use this to get detailed information.
+Tracing multiple threads with magic-trace doesn't seem to work well
+(each thread gets a very small buffer, they don't stop at quite the same time, and triggers don't work)
+so I find it's better to trace just one thread.</p>
+<p>I modified the OCaml runtime so that the leader (the domain requesting the GC) records the time.
+As each domain enters <code>stw_api_barrier</code> it checks how late it is and calls a function to print a warning if it's above a threshold.
+Then I attached magic-trace to one of the worker threads and told it to save a sample when that function got called:</p>
+<p><a href="https://roscidus.com/blog/images/perf/gc-magic-1.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/gc-magic-1.png" title="A domain being slow to join a minor GC" class="caption"/><span class="caption-text">A domain being slow to join a minor GC</span></span></a></p>
+<p>In the example above,
+magic-trace saved about 7ms of the history of a domain up to the point where it entered <code>stw_api_barrier</code>.
+The first few ms show the solver working normally.
+Then it needs to do a minor GC and tries to become the leader.
+But another domain has the lock and so it spins, calling <code>handle_incoming</code> 293,711 times in a loop for 2.5ms.</p>
+<p>I had a look at the code in the OCaml runtime.
+When a domain wants to perform a minor GC, the steps are:</p>
+<ol>
+<li>Acquire <code>all_domains_lock</code>.
+</li>
+<li>Populate the <code>stw_request</code> global.
+</li>
+<li>Interrupt all domains.
+</li>
+<li>Release <code>all_domains_lock</code>.
+</li>
+<li>Wait for all domains to get the interrupt.
+</li>
+<li>Mark self as ready, allowing GC work to start.
+</li>
+<li>Do minor GC.
+</li>
+<li>The last domain to finish its minor GC signals <code>all_domains_cond</code> and everyone resumes.
+</li>
+</ol>
+<p>I added some extra event reporting to the GC, showing when a domain is trying to perform a GC (<code>try</code>),
+when the leader is signalling other domains (<code>signal</code>), and when a domain is sleeping waiting for something (<code>sleep</code>).
+Here's what that looks like (in some places):</p>
+<p><a href="https://roscidus.com/blog/images/perf/solver-try.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/solver-try.png" title="One sleeping domain delays all the others" class="caption"/><span class="caption-text">One sleeping domain delays all the others</span></span></a></p>
+<ol>
+<li>The top domain finished its minor collection quickly (as it's mostly idle and had nothing to do),
+and started waiting for the other domains to finish. For some reason, this sleep call took 3ms to run.
+</li>
+<li>The other domains resume work. One by one, they fill their minor heaps and try to start a GC.
+</li>
+<li>They can't start a new GC, as the old one hasn't completely finished yet, so they spin.
+</li>
+<li>Eventually the top domain wakes up and finishes the previous STW section.
+</li>
+<li>One of the other domains immediately starts a new minor GC and the pattern repeats.
+</li>
+</ol>
+<p>These <code>try</code> events seem useful;
+the program is spending much more time stuck in GC than the original traces indicated!</p>
+<p>One obvious improvement here would be for idle domains to opt out of GC.
+Another would be to tell the kernel when to wake instead of using sleeps &mdash;
+and I see there's a PR already:
+<a href="https://github.com/ocaml/ocaml/pull/12579">OS-based Synchronisation for Stop-the-World Sections</a>.</p>
+<p>Another possibility would be to let domains perform minor GCs independently.
+The OCaml developers did make a version that worked that way,
+but it requires changes to all C code that uses the OCaml APIs,
+since a value in another domain's minor heap might move while it's running.</p>
+<p>Finally, I wonder if the code could be simplified a bit using a compare-and-set instead of taking a lock to become leader.
+That would eliminate the <code>try</code> state, where a domain knows another domain is the leader, but doesn't know what it wants to do.
+It's also strange that there's a state where
+the top domain has finished its critical section and allowed the other domains to resume,
+but is not quite finished enough to let a new GC start.</p>
+<p>We can work around this problem by having the main domain do work too.
+That could be a problem for interactive applications (where the main domain is running the UI and needs to respond fast),
+but it should be OK for the solver service.
+This was about 15% faster on my machine, but appeared to have no effect on the ARM server.
+Lesson: get traces on the target machine!</p>
+<h2>Tuning GC parameters</h2>
+<p>Another way to reduce the synchronisation overhead of minor GCs is to make them less frequent.
+We can do that by increasing the size of the minor heap,
+doing a few long GCs rather than many short ones.
+The size is controlled by the setting e.g. <code>OCAMLRUNPARAM=s=8192k</code>.
+On my machine, this actually makes things slower, but it's about 18% faster on the ARM server with 80 domains.</p>
+<p>Here are the first few domains (from a total of 24) on the ARM server with different minor heap sizes
+(both are showing 1s of execution):</p>
+<p><a href="https://roscidus.com/blog/images/perf/small-heap-24.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/small-heap-24.png" title="The default minor heap size (256k words)" class="caption"/><span class="caption-text">The default minor heap size (256k words)</span></span></a>
+<a href="https://roscidus.com/blog/images/perf/big-heap-24.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/big-heap-24.png" title="With a larger minor heap (8192k works)" class="caption"/><span class="caption-text">With a larger minor heap (8192k works)</span></span></a>
+Note that the major slices also get fewer and larger, as they happen half way between minor slices.</p>
+<p>Also, there's still a lot of variation between the time each domain spends doing GC
+(despite the fact that they're all running exactly the same task), so they still end up waiting a lot.</p>
+<h2>Simplifying further</h2>
+<p>This is all still pretty odd, though.
+We're getting small performance increases, but still nothing like when forking.
+Can the test-case be simplified further?
+Yes, it turns out!
+This <a href="https://gitlab.com/talex5/slow">simple function</a> takes much longer to run when using domains, compared to forking!</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="k">let</span> <span class="n">run_worker</span> <span class="n">n</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">for</span> <span class="o">_</span><span class="n">i</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">to</span> <span class="n">n</span> <span class="o">*</span> <span class="mi">10000000</span> <span class="k">do</span>
+</span><span class="line">    <span class="n">ignore</span> <span class="o">(</span><span class="nn">Sys</span><span class="p">.</span><span class="n">opaque_identity</span> <span class="o">(</span><span class="n">ref</span> <span class="bp">()</span><span class="o">))</span>
+</span><span class="line">  <span class="k">done</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p><code>ref ()</code> allocates a small block (2 words, including the header) on the minor heap.
+<code>opaque_identity</code> is to make sure the compiler doesn't optimise this pointless allocation away.</p>
+<p><a href="https://roscidus.com/blog/images/perf/loop-arm.svg"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/loop-arm.svg" title="Time to run the loop on the 160-core ARM server (lower is better)" class="caption"/><span class="caption-text">Time to run the loop on the 160-core ARM server (lower is better)</span></span></a></p>
+<p>Here's what I would expect here:</p>
+<ol>
+<li>The domains all start to fill their minor heaps. One fills it and triggers a minor GC.
+</li>
+<li>The triggering domain sets an indicator in each domain saying a GC is due.
+None of the domains is sleeping, so the OS isn't involved in any wake-ups here.
+</li>
+<li>The other domains check the indicator on their next allocation,
+which happens immediately since that's all they're doing.
+</li>
+<li>The GCs all proceed quickly, since there's nothing to scan and nothing to promote
+(except possibly the current single allocation).
+</li>
+<li>The all resume quickly and continue.
+</li>
+</ol>
+<p>So ideally the lines would be flat.
+In practice, we may hit physical limits due to memory bandwidth, CPU temperature or kernel limitations;
+I assume this is why the &quot;Processes&quot; time starts to rise eventually.
+But it looks like this minor slow-down causes knock-on effects in the &quot;Domains&quot; case.</p>
+<p>If I remove the allocation, then the domains and processes versions take the same amount of time.</p>
+<h2>perf sched</h2>
+<p><code>perf sched record</code> records kernel scheduling events, allowing it to show what is running on each CPU at all times.
+<code>perf sched timehist</code> displays a report:</p>
+<pre><code>$ sudo perf sched record -k CLOCK_MONOTONIC
+^C
+
+$ sudo perf sched timehist
+           time    cpu  task name                       wait time  sch delay   run time
+                        [tid/pid]                          (msec)     (msec)     (msec)
+--------------- ------  ------------------------------  ---------  ---------  ---------
+  185296.715345 [0000]  sway[175042]                        1.694      0.025      0.775 
+  185296.716024 [0002]  crosvm_vcpu2[178276/178217]         0.012      0.000      2.957 
+  185296.717031 [0003]  main.exe[196519]                    0.006      0.000      4.004 
+  185296.717044 [0003]  rcu_preempt[18]                     4.004      0.015      0.012 
+  185296.717260 [0001]  main.exe[196526]                    1.760      0.000      2.633 
+  185296.717455 [0001]  crosvm_vcpu1[193502/193445]        63.809      0.015      0.194 
+  ...
+</code></pre>
+<p>The first line here shows that <code>sway</code> needed to wait for 1.694 ms for some reason (possibly a sleep),
+and then once it was due to resume, had to wait a further 0.025 ms for CPU 0 to be free. It then ran for 0.775 ms.
+I decided to use <code>perf sched</code> to find out what the system was doing when a domain failed to respond quickly.</p>
+<p>To make the output easier to read, I hacked eio-trace to display it on the traces.
+<code>perf script -g python</code> will generate a skeleton Python script that can format all the events found in the <code>perf.data</code> file,
+and I used that to convert the output to CSV.
+To correlate OCaml domains with Linux threads, I also modified OCaml to report the thread ID (TID) for each new domain
+(it was previously reporting the PID instead for some reason).</p>
+<p>Here's a trace of the simple allocator from the previous section:</p>
+<p><a href="https://roscidus.com/blog/images/perf/slow-no-affinity1.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/slow-no-affinity1.png" title="eio-trace with perf sched data" class="caption"/><span class="caption-text">eio-trace with perf sched data</span></span></a></p>
+
+<p>Note: the colour of <code>stw_api_barrier</code> has changed: previously eio-trace coloured it yellow to indicate sleeping,
+but now we have the individual <code>sleep</code> events we can see exactly which part of it was sleeping.</p>
+<p>The horizontal green bars show when each domain was running on the CPU.
+Here, we see that most of the domains ran until they called <code>sleep</code>.
+When the sleep timeout expires, the thread is ready to run again and goes on the run-queue.
+Time spent waiting on the queue is shown with a black bar.</p>
+<p>When switching to or from another process, the process name is shown.
+Here we can see that <code>crosvm_vcpu6</code> interrupted one of our domains, making it late to respond to the GC request.</p>
+<p>Here we see another odd feature of the protocol: even though the late domain was the last to be ready,
+it wasn't able to start its GC even then, because only the leader is allowed to say when everyone is ready.
+Several domains wake after the late one is ready and have to go back to sleep again.</p>
+<p>The diagram also shows when Linux migrated our OCaml domains between CPUs.
+For example:</p>
+<ol>
+<li>The bottom domain was initially running on CPU 0.
+</li>
+<li>After sleeping briefly, it spent a while waiting to resume and Linux moved it to CPU 6 (the leader domain, which was idle then).
+</li>
+<li>Once there, the bottom domain slept briefly again, and again was slow to wake, getting moved to CPU 7.
+</li>
+</ol>
+<p>Here's another example:</p>
+<p><a href="https://roscidus.com/blog/images/perf/slow-no-affinity2.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/slow-no-affinity2.png" title="Two domains on the same CPU" class="caption"/><span class="caption-text">Two domains on the same CPU</span></span></a></p>
+<ol>
+<li>The bottom domain's sleep finished a while ago, and it's been stuck on the queue because it's on the same CPU as another domain.
+</li>
+<li>All the other domains are spinning, trying to become the leader for the next minor GC.
+</li>
+<li>Eventually, Linux preempts the 5th domain from the top to run the bottom domain
+(the vertical green line indicates a switch between domains in the same process).
+</li>
+<li>The bottom domain finishes the previous minor GC, allowing the 3rd from top to start a new one.
+</li>
+<li>The new GC is delayed because the 5th domain is now waiting while the bottom domain spins.
+</li>
+<li>Eventually the bottom domain sleeps, allowing 5 to join and the GC starts.
+</li>
+</ol>
+<p>I tried using the <a href="https://github.com/haesbaert/ocaml-processor">processor</a> package to pin each domain to a different CPU.
+That cleaned up the traces a fair bit, but didn't make much difference to the runtime on my machine.</p>
+<p>I also tried using <a href="https://www.man7.org/linux/man-pages/man1/chrt.1.html">chrt</a> to run the program as a high-priority &quot;real-time&quot; task,
+which also didn't seem to help.
+I wrote a <code>bpftrace</code> script to report if one of our domains was ready to resume and the scheduler instead ran something else.
+That showed various things.
+Often Linux was migrating something else out of the way and we had to wait for that,
+but there were also some kernel tasks that seemed to be even higher priority, such as GPU drivers or uring workers.
+I suspect to make this work you'd need to set the affinity of all the other processes to keep them away from the cores being used
+(but that wouldn't work in this example because I'm using all of them!).
+Come to think of it, running a CPU intensive task on every CPU at realtime priority was a dumb idea;
+had it worked I wouldn't have been able to do anything else with the computer!</p>
+<h2>olly</h2>
+<p>Exploring the scheduler behaviour was interesting, and might be needed for latency-sensitive tasks,
+but how often do migrations and delays really cause trouble?
+The slow GCs are interesting, but there are also sections like this where everything is going smoothly,
+and minor GCs take less than 4 microseconds:</p>
+<p><a href="https://roscidus.com/blog/images/perf/slow-no-affinity3.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/slow-no-affinity3.png" title="GCs going well" class="caption"/><span class="caption-text">GCs going well</span></span></a></p>
+<p><a href="https://github.com/tarides/runtime_events_tools/">olly</a> can be used get summary statistics:</p>
+<pre><code>$ olly gc-stats './_build/default/stress/stress.exe --count=6 --internal-workers=24'
+...
+Solved 144 requests in 25.44s (0.18s/iter) (5.66 solves/s)
+
+Execution times:
+Wall time (s):	28.17
+CPU time (s):	1.66
+GC time (s):	169.88
+GC overhead (% of CPU time):	10223.84%
+
+GC time per domain (s):
+Domain0: 	0.47
+Domain1: 	9.34
+Domain2: 	6.90
+Domain3: 	6.97
+Domain4: 	6.68
+Domain5: 	6.85
+Domain6: 	6.59
+...
+</code></pre>
+<p>10223.84% GC overhead sounds like a lot but I think this is a misleading, for a few reasons:</p>
+<ol>
+<li>The CPU time looks wrong. <code>time</code> reports about 6 minutes which sounds more likely.
+</li>
+<li>GC time (as we've seen) includes time spent sleeping, while CPU time doesn't.
+</li>
+<li>It doesn't include time spent trying to become a GC leader.
+</li>
+</ol>
+<p>To double-check, I modified eio-trace to report GC statistics for a saved trace:</p>
+<pre><code>Solved 144 requests in 26.84s (0.19s/iter) (5.36 solves/s)
+...
+
+$ eio-trace gc-stats trace.fxt
+./trace.fxt:
+
+Ring  GC/s     App/s    Total/s   %GC
+  0   10.255   19.376   29.631    34.61
+  1    7.986   10.201   18.186    43.91
+  2    8.195   10.648   18.843    43.49
+  3    9.521   14.398   23.919    39.81
+  4    9.775   16.537   26.311    37.15
+  5    8.084   10.635   18.719    43.19
+  6    7.977   10.356   18.333    43.51
+...
+ 24    7.920   10.802   18.722    42.30
+
+All  213.332  308.578  521.910    40.88
+
+Note: all times are wall-clock and so include time spent blocking.
+</code></pre>
+<p>It ran slightly slower under eio-trace, perhaps because recording a trace file is more work than maintaining some counters,
+but it's similar.
+So this indicates that with 24 domains GC is taking about 40% of the total time (including time spent sleeping).</p>
+<p>But something doesn't add up, on my machine at least:</p>
+<ul>
+<li>With processes, the simple allocator test's main process spends 2% of its time in GC and takes 2.4s to run.
+</li>
+<li>With domains, the main domain spends 20% of its time in GC and takes 8.2s.
+</li>
+</ul>
+<p>Even if that 20% were removed completely, it should only save 20% of the 8.2s.
+So with domains, the code must be running more slowly even when it's not in the GC.</p>
+<h2>magic-trace on the simple allocator</h2>
+<p>I tried running magic-trace to see what it was doing outside of the GC.
+Since it wasn't calling any functions, it didn't show anything, but we can fix that:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+<span class="line-number">5</span>
+<span class="line-number">6</span>
+<span class="line-number">7</span>
+<span class="line-number">8</span>
+<span class="line-number">9</span>
+<span class="line-number">10</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="k">let</span> <span class="n">foo</span> <span class="bp">()</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">for</span> <span class="o">_</span><span class="n">i</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">to</span> <span class="mi">100</span> <span class="k">do</span>
+</span><span class="line">    <span class="n">ignore</span> <span class="o">(</span><span class="nn">Sys</span><span class="p">.</span><span class="n">opaque_identity</span> <span class="o">(</span><span class="n">ref</span> <span class="bp">()</span><span class="o">))</span>
+</span><span class="line">  <span class="k">done</span>
+</span><span class="line"><span class="o">[@@</span><span class="n">inline</span> <span class="n">never</span><span class="o">]</span> <span class="o">[@@</span><span class="n">local</span> <span class="n">never</span><span class="o">]</span> <span class="o">[@@</span><span class="n">specialise</span> <span class="n">never</span><span class="o">]</span>
+</span><span class="line">
+</span><span class="line"><span class="k">let</span> <span class="n">run_worker</span> <span class="n">n</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">for</span> <span class="o">_</span><span class="n">i</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">to</span> <span class="n">n</span> <span class="o">*</span> <span class="mi">100000</span> <span class="k">do</span>
+</span><span class="line">    <span class="n">foo</span> <span class="bp">()</span>
+</span><span class="line">  <span class="k">done</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p>Here we do blocks of 100 allocations in a function called <code>foo</code>.
+The annotations are to ensure the compiler doesn't inline it.
+The trace was surprisingly variable!</p>
+<p><a href="https://roscidus.com/blog/images/perf/foo-magic.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/foo-magic.png" title="magic-trace of foo between GCs" class="caption"/><span class="caption-text">magic-trace of foo between GCs</span></span></a></p>
+<p>I see times for <code>foo</code> ranging from 50ns to around 750ns!</p>
+<p>Note: the extra <code>foo</code> call above was probably due to a missed end event somewhere.</p>
+<h2>perf annotate</h2>
+<p>I ran <code>perf record</code> on the simplified version:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="k">let</span> <span class="n">foo</span> <span class="bp">()</span> <span class="o">=</span>
+</span><span class="line">  <span class="k">for</span> <span class="o">_</span><span class="n">i</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">to</span> <span class="mi">100</span> <span class="k">do</span>
+</span><span class="line">    <span class="n">ignore</span> <span class="o">(</span><span class="nn">Sys</span><span class="p">.</span><span class="n">opaque_identity</span> <span class="o">(</span><span class="n">ref</span> <span class="bp">()</span><span class="o">))</span>
+</span><span class="line">  <span class="k">done</span>
+</span></code></pre></td></tr></tbody></table></div></figure><p>Here the code is simple enough that we don't need stack-traces (so no <code>-g</code>):</p>
+<pre><code>$ sudo perf record ./_build/default/main.exe
+$ sudo perf annotate
+
+       &#9474;    camlDune__exe__Main.foo_273():
+       &#9474;      mov  $0x3,%eax
+  0.04 &#9474;      cmp  $0xc9,%rax
+       &#9474;    &darr; jg   39
+  7.34 &#9474; d:   sub  $0x10,%r15
+ 13.37 &#9474;      cmp  (%r14),%r15
+  0.09 &#9474;    &darr; jb   3f
+  0.21 &#9474;16:   lea  0x8(%r15),%rbx
+ 70.26 &#9474;      movq $0x400,-0x8(%rbx)
+  6.66 &#9474;      movq $0x1,(%rbx)
+  0.73 &#9474;      mov  %rax,%rbx
+  0.00 &#9474;      add  $0x2,%rax
+  0.01 &#9474;      cmp  $0xc9,%rbx
+  0.66 &#9474;    &uarr; jne  d
+  0.28 &#9474;39:   mov  $0x1,%eax
+  0.34 &#9474;    &larr; ret
+  0.00 &#9474;3f: &rarr; call caml_call_gc
+       &#9474;    &uarr; jmp  16
+</code></pre>
+<p>The code starts by (pointlessly) checking if 1 &gt; 100 in case it can skip the whole loop.
+After being disappointed, it:</p>
+<ol>
+<li>Decreases <code>%r15</code> (<code>young_ptr</code>) by 0x10 (two words).
+</li>
+<li>Checks if that's now below <code>young_limit</code>, calling <code>caml_call_gc</code> if so to clear the minor heap.
+</li>
+<li>Writes 0x400 to the first newly-allocated word (the block header, indicating 1 word of data).
+</li>
+<li>Writes 1 to the second word, which represents <code>()</code>.
+</li>
+<li>Increments the loop counter and loops, unless we're at the end.
+</li>
+<li>Returns <code>()</code>.
+</li>
+</ol>
+<p>Looks like we spent most of the time (77%) writing the block, which makes sense.
+Reading <code>young_limit</code> took 13% of the time, which seems reasonable too.
+If there was contention between domains, we'd expect to see it here.</p>
+<p>The output looked similar whether using domains or processes.</p>
+<h2>perf c2c</h2>
+<p>To double-check, I also tried <code>perf c2c</code>.
+This reports on cache-to-cache transfers, where two CPUs are accessing the same memory,
+which requires the processors to communicate and is therefore relatively slow.</p>
+<pre><code>$ sudo perf c2c record
+^C
+
+$ sudo perf c2c report
+  Load Operations                   :      11898
+  Load L1D hit                      :       4140
+  Load L2D hit                      :         93
+  Load LLC hit                      :       3750
+  Load Local HITM                   :        251
+  Store Operations                  :     116386
+  Store L1D Hit                     :     104763
+  Store L1D Miss                    :      11622
+...
+# ----- HITM -----  ------- Store Refs ------  ------- CL --------                      ---------- cycles ----------    Total       cpu                                    Shared                       
+# RmtHitm  LclHitm   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address  rmt hitm  lcl hitm      load  records       cnt                          Symbol    Object      Source:Line  Node
+...
+      7        0        7        4        0        0      0x7f90b4002b80
+  ----------------------------------------------------------------------
+    0.00%  100.00%    0.00%    0.00%    0.00%    0x0     0       1            0x44a704         0       144       107        8         1  [.] Dune.exe.Main.foo_273       main.exe  main.ml:7        0
+    0.00%    0.00%   25.00%    0.00%    0.00%    0x0     0       1            0x4ba7b9         0         0         0        1         1  [.] caml_interrupt_all_signal_  main.exe  domain.c:318     0
+    0.00%    0.00%   25.00%    0.00%    0.00%    0x0     0       1            0x4ba7e2         0         0       323       49         1  [.] caml_reset_young_limit      main.exe  domain.c:1658    0
+    0.00%    0.00%   25.00%    0.00%    0.00%    0x8     0       1            0x4ce94d         0         0         0        1         1  [.] caml_empty_minor_heap_prom  main.exe  minor_gc.c:622   0
+    0.00%    0.00%   25.00%    0.00%    0.00%    0x8     0       1            0x4ceed2         0         0         0        1         1  [.] caml_alloc_small_dispatch   main.exe  minor_gc.c:874   0
+</code></pre>
+<p>This shows a list of cache lines (memory addresses) and how often we loaded from a modified address.
+There's a lot of information here and I don't understand most of it.
+But I think the above is saying that address 0x7f90b4002b80 (<code>young_limit</code>, at offsets 0) was accessed by these places across domains:</p>
+<ul>
+<li><code>main.ml:7</code> (<code>ref ()</code>) checks against <code>young_limit</code> to see if we need to call into the GC.
+</li>
+<li><code>domain.c:318</code> sets the limit to <code>UINTNAT_MAX</code> to signal that another domain wants a GC.
+</li>
+<li><code>domain.c:1658</code> sets it back to <code>young_trigger</code> after being signalled.
+</li>
+</ul>
+<p>The same cacheline was also accessed at offset 8, which contains <code>young_ptr</code> (address of last allocation):</p>
+<ul>
+<li><code>minor_gc.c:622</code> sets <code>young_ptr</code> to <code>young_end</code> after a GC.
+</li>
+<li><code>minor_gc.c:874</code> adjusts <code>young_ptr</code> to re-do the allocation that triggered the GC.
+</li>
+</ul>
+<p>This indicates false sharing: <code>young_ptr</code> only gets accessed from one domain but it's in the same cache line as <code>young_limit</code>.</p>
+<p>The main thing is that the counts are all very low, indicating that this doesn't happen often.</p>
+<p>I tried adding an <code>incr x</code> on a global variable in the loop, and got some more operations reported.
+But using <code>Atomic.incr</code> massively increased the number of records:</p>
+<table class="table"><thead><tr><th>&nbsp;</th><th style="text-align: right">    Original </th><th style="text-align: right">  incr     </th><th style="text-align: right"> Atomic.incr</th></tr></thead><tbody><tr><td>Load Operations    </td><td style="text-align: right">     11,898  </td><td style="text-align: right">  25,860   </td><td style="text-align: right">  2,658,364</td></tr><tr><td>Load L1D hit       </td><td style="text-align: right">      4,140  </td><td style="text-align: right">  15,181   </td><td style="text-align: right">    326,236</td></tr><tr><td>Load L2D hit       </td><td style="text-align: right">         93  </td><td style="text-align: right">     163   </td><td style="text-align: right">        295</td></tr><tr><td>Load LLC hit       </td><td style="text-align: right">      3,750  </td><td style="text-align: right">   3,173   </td><td style="text-align: right">  2,321,704</td></tr><tr><td>Load Local HITM    </td><td style="text-align: right">        251  </td><td style="text-align: right">     299   </td><td style="text-align: right">  2,317,885</td></tr><tr><td>Store Operations   </td><td style="text-align: right">    116,386  </td><td style="text-align: right"> 462,162   </td><td style="text-align: right">  3,909,500</td></tr><tr><td>Store L1D Hit      </td><td style="text-align: right">    104,763  </td><td style="text-align: right"> 389,492   </td><td style="text-align: right">  3,908,947</td></tr><tr><td>Store L1D Miss     </td><td style="text-align: right">     11,622  </td><td style="text-align: right">  72,667   </td><td style="text-align: right">        550</td></tr></tbody></table><p>See <a href="https://joemario.github.io/blog/2016/09/01/c2c-blog/">C2C - False Sharing Detection in Linux Perf</a> for more information about all this.</p>
+<h2>perf stat</h2>
+<p><code>perf stat</code> shows statistics about a process.
+I ran it with <code>-I 1000</code> to collect one-second samples.
+Here are two samples from the test case on my machine,
+one when it was running processes and one while it was using domains:</p>
+<pre><code>$ perf stat -I 1000
+
+# Processes
+      8,032.71 msec cpu-clock         #    8.033 CPUs utilized
+         2,475      context-switches  #  308.115 /sec
+            51      cpu-migrations    #    6.349 /sec
+            44      page-faults       #    5.478 /sec
+35,268,665,452      cycles            #    4.391 GHz
+48,673,075,188      instructions      #    1.38  insn per cycle
+ 9,815,905,270      branches          #    1.222 G/sec
+    48,986,037      branch-misses     #    0.50% of all branches
+
+# Domains
+      8,008.11 msec cpu-clock         #    8.008 CPUs utilized
+        10,970      context-switches  #    1.370 K/sec
+           133      cpu-migrations    #   16.608 /sec
+           232      page-faults       #   28.971 /sec
+34,606,498,021      cycles            #    4.321 GHz
+25,120,741,129      instructions      #    0.73  insn per cycle
+ 5,028,578,807      branches          #  627.936 M/sec
+    24,402,161      branch-misses     #    0.49% of all branches
+</code></pre>
+<p>We're doing a lot more context switches with domains, as expected due to the sleeps,
+and we're executing many fewer instructions, which isn't surprising.
+Reporting the counts for individual CPUs gets more interesting though:</p>
+<pre><code>$ sudo perf stat -I 1000 -e instructions -Aa
+# Processes
+     1.000409485 CPU0        5,106,261,160      instructions
+     1.000409485 CPU1        2,746,012,554      instructions
+     1.000409485 CPU2       14,235,084,764      instructions
+     1.000409485 CPU3        7,545,940,906      instructions
+     1.000409485 CPU4        2,605,655,333      instructions
+     1.000409485 CPU5        6,023,131,238      instructions
+     1.000409485 CPU6        2,860,656,865      instructions
+     1.000409485 CPU7        8,195,416,048      instructions
+     2.001406580 CPU0        5,674,686,033      instructions
+     2.001406580 CPU1        2,774,756,912      instructions
+     2.001406580 CPU2       12,231,014,682      instructions
+     2.001406580 CPU3        8,292,824,909      instructions
+     2.001406580 CPU4        2,592,461,540      instructions
+     2.001406580 CPU5        7,182,922,668      instructions
+     2.001406580 CPU6        2,742,731,223      instructions
+     2.001406580 CPU7        7,219,186,119      instructions
+     3.002394302 CPU0        4,676,179,731      instructions
+     3.002394302 CPU1        2,773,345,921      instructions
+     3.002394302 CPU2       13,236,080,365      instructions
+     3.002394302 CPU3        5,142,640,767      instructions
+     3.002394302 CPU4        2,580,401,766      instructions
+     3.002394302 CPU5       13,600,129,246      instructions
+     3.002394302 CPU6        2,667,830,277      instructions
+     3.002394302 CPU7        4,908,168,984      instructions
+
+$ sudo perf stat -I 1000 -e instructions -Aa
+# Domains
+     1.002680009 CPU0        3,134,933,139      instructions
+     1.002680009 CPU1        3,140,191,650      instructions
+     1.002680009 CPU2        3,155,579,241      instructions
+     1.002680009 CPU3        3,059,035,269      instructions
+     1.002680009 CPU4        3,102,718,089      instructions
+     1.002680009 CPU5        3,027,660,263      instructions
+     1.002680009 CPU6        3,167,151,483      instructions
+     1.002680009 CPU7        3,214,267,081      instructions
+     2.003692744 CPU0        3,009,806,420      instructions
+     2.003692744 CPU1        3,015,194,636      instructions
+     2.003692744 CPU2        3,093,562,866      instructions
+     2.003692744 CPU3        3,005,546,617      instructions
+     2.003692744 CPU4        3,067,126,726      instructions
+     2.003692744 CPU5        3,042,259,123      instructions
+     2.003692744 CPU6        3,073,514,980      instructions
+     2.003692744 CPU7        3,158,786,841      instructions
+     3.004694851 CPU0        3,069,604,047      instructions
+     3.004694851 CPU1        3,063,976,761      instructions
+     3.004694851 CPU2        3,116,761,158      instructions
+     3.004694851 CPU3        3,045,677,304      instructions
+     3.004694851 CPU4        3,101,053,228      instructions
+     3.004694851 CPU5        2,973,005,489      instructions
+     3.004694851 CPU6        3,109,177,113      instructions
+     3.004694851 CPU7        3,158,349,130      instructions
+</code></pre>
+<p>In the domains case all CPUs are doing roughly the same amount of work.
+But when running separate processes the CPUs differ wildly!
+Over the last 1-second interval, for example, CPU5 executed 5.3 times as many instructions as CPU4.
+And indeed, some of the test processes are finishing much sooner than the others,
+even though they all do the same work.</p>
+<p>Setting <code>/sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference</code> to <code>performance</code> didn't make it faster,
+but setting it to <code>power</code> (power-saving mode) did make the processes benchmark much slower,
+while having little effect on the domains case!</p>
+<p>So I <em>think</em> what's happening here with separate processes is that
+the CPU is boosting the performance of one or two cores at a time,
+allowing them to make lots of progress.</p>
+<p>But with domains this doesn't happen, either because no domain runs long enough before sleeping to trigger the boost,
+or because as soon as it does it needs to stop and wait for the other domains for a GC and loses it.</p>
+<h2>Conclusions</h2>
+<p>The main profiling and tracing tools used were:</p>
+<ul>
+<li><code>perf</code> to take samples of CPU use, find hot functions and hot instructions within them,
+record process scheduling, look at hardware counters, and find sources of cache contention.
+</li>
+<li><code>statmemprof</code> to find the source of allocations.
+</li>
+<li><code>eio-trace</code> to visualise GC events and as a generic canvas for custom visualisations.
+</li>
+<li><code>magic-trace</code> to see very detailed traces of recent activity when something goes wrong.
+</li>
+<li><code>olly</code> to report on GC statistics.
+</li>
+<li><code>bpftrace</code> for quick experiments about kernel behaviour.
+</li>
+<li><code>offcputime</code> to see why a process is sleeping.
+</li>
+</ul>
+<p>I think OCaml 5's runtime events tracing was the star of the show here, making it much easier to see what was going on with GC,
+especially in combination with <code>perf sched</code>.
+<code>statmemprof</code> is also an essential tool for OCaml, and I'll be very glad to get it back with OCaml 5.3.
+I think I need to investigate <code>perf</code> more; I'd never used many of these features before.
+Though it is important to use it with <code>offcputime</code> etc to check you're not missing samples due to sleeping.</p>
+<p>Unlike the previous post's example, where the cause was pretty obvious and led to a massive easy speed-up,
+this one took a lot of investigation and revealed several problems, none of which seem very easy to fix.
+I'm also a lot less confident that I really understand what's happening here, but here is a summary of my current guess:</p>
+<ul>
+<li>OCaml applications typically allocate lots of short-lived values.
+</li>
+<li>With a single domain this isn't much of a problem; minor GCs are fast.
+With multiple domains however we have to wait for every domain to enter the
+GC, and then wait again for them all to exit.
+</li>
+<li>This can be very fast (4 microseconds or so per GC),
+but if one domain is late due to OS scheduling then it can be much longer
+(several ms in some cases).
+</li>
+<li>When a domain needs to wait for another it spins for a bit and then sleeps.
+If the other domain runs on the same CPU then spinning delays it from running.
+On the other hand, sleeping introduces longer delays and can cause the CPU to slow down.
+</li>
+<li>Idle domains are currently expensive.
+An idle domain requires a syscall to wake it, and often causes all the other domains to sleep waiting for it.
+When the idle domain does wake, it still can't start the GC and has to wait again for the leader.
+</li>
+<li>If the leader gets suspended while holding the lock, all the other domains will spin waiting for it (without ever sleeping).
+This time isn't accounted for in the GC events reported by OCaml 5.2.
+</li>
+</ul>
+<p>Since the sleeping mechanism will be changing in OCaml 5.3,
+it would probably be worthwhile checking how that performs too.
+I think there are some opportunities to improve the GC, such as letting idle domains opt out of GC after one collection,
+and it looks like there are opportunities to reduce the amount of synchronisation done
+(e.g. by letting late arrivers start the GC without having to wait for the leader,
+or using a lock-free algorithm for becoming leader).</p>
+<p>For the solver, it would be good to try experimenting with CPU affinity to keep a subset of the 160 cores reserved for the solver.
+Increasing the minor heap size and doing work in the main domain should also reduce the overhead of GC,
+and improving the version compare function in the opam library would greatly reduce the need for it.
+And if my goal was really to make it fast (rather than to improve multicore OCaml and its tooling)
+then I'd probably switch it back to using processes.</p>
+<p>Finally, it was really useful that both of these blog posts examined performance regressions,
+so I knew it must be possible to go faster.
+Without a good idea of how fast something should be, it's easy to give up too early.</p>
+<p>Anyway, I hope you found some useful new tool in these posts!</p>
+
diff --git a/data/planet/talex5/ocaml-5-performance-problems.md b/data/planet/talex5/ocaml-5-performance-problems.md
new file mode 100644
index 0000000000..bbc36aa4a6
--- /dev/null
+++ b/data/planet/talex5/ocaml-5-performance-problems.md
@@ -0,0 +1,524 @@
+---
+title: OCaml 5 performance problems
+description:
+url: https://roscidus.com/blog/blog/2024/07/22/performance/
+date: 2024-07-22T10:00:00-00:00
+preview_image:
+authors:
+- Thomas Leonard
+source:
+---
+
+<p>Linux and OCaml provide a huge range of tools for investigating performance problems.
+In this post I try using some of them to understand a network performance problem.
+In <a href="https://roscidus.com/blog/blog/2024/07/22/performance-2/">part 2</a>, I'll investigate a problem in a CPU-intensive multicore program.</p>
+
+<p><strong>Table of Contents</strong></p>
+<ul>
+<li><a href="https://roscidus.com/#the-problem">The problem</a>
+</li>
+<li><a href="https://roscidus.com/#time">time</a>
+</li>
+<li><a href="https://roscidus.com/#eio-trace">eio-trace</a>
+</li>
+<li><a href="https://roscidus.com/#strace">strace</a>
+</li>
+<li><a href="https://roscidus.com/#bpftrace">bpftrace</a>
+</li>
+<li><a href="https://roscidus.com/#tcpdump">tcpdump</a>
+</li>
+<li><a href="https://roscidus.com/#ss">ss</a>
+</li>
+<li><a href="https://roscidus.com/#offwaketime">offwaketime</a>
+</li>
+<li><a href="https://roscidus.com/#magic-trace">magic-trace</a>
+</li>
+<li><a href="https://roscidus.com/#summary-script">Summary script</a>
+</li>
+<li><a href="https://roscidus.com/#fixing-it">Fixing it</a>
+</li>
+<li><a href="https://roscidus.com/#conclusions">Conclusions</a>
+</li>
+</ul>
+<h2>The problem</h2>
+<p>While porting <a href="https://github.com/mirage/capnp-rpc">capnp-rpc</a> from <a href="https://github.com/ocsigen/lwt/">Lwt</a> to <a href="https://github.com/ocaml-multicore/eio">Eio</a>,
+to take advantage of OCaml 5's new effects system,
+I tried running the benchmark to see if it got any faster:</p>
+<pre><code>$ ./echo_bench.exe
+echo_bench.exe: [INFO] rate = 44933.359573 # The old Lwt version
+echo_bench.exe: [INFO] rate = 511.963565   # The (buggy) Eio version
+</code></pre>
+<p>The benchmark records the number of echo RPCs per second.
+Clearly, something is very wrong here!
+In fact, the new version was so slow I had to reduce the number of iterations so it would finish.</p>
+<h2>time</h2>
+<p>The old <code>time</code> command can immediately give us a hint:</p>
+<pre><code>$ /usr/bin/time ./echo_bench.exe
+1.85user 0.42system 0:02.31elapsed 98%CPU  # Lwt
+0.16user 0.05system 0:01.95elapsed 11%CPU  # Eio (buggy)
+</code></pre>
+<p>(many shells provide their own <code>time</code> built-in with different output formats; I'm using <code>/usr/bin/time</code> here)</p>
+<p><code>time</code>'s output shows time spent in user-mode (running the application's code on the CPU),
+time spent in the kernel, and the total wall-clock time.
+Both versions ran for around 2 seconds (doing a different number of iterations),
+but the Lwt version was using the CPU 98% of the time, while the Eio version was mostly sleeping.</p>
+<h2>eio-trace</h2>
+<p><a href="https://github.com/ocaml-multicore/eio-trace">eio-trace</a> can be used to see what an Eio program is doing.
+Tracing is always available (you don't need to recompile the program to get it).</p>
+<pre><code>$ eio-trace run -- ./echo_bench.exe
+</code></pre>
+<p><code>eio-trace run</code> runs the command and displays the trace in a window.
+You can also use <code>eio-trace record</code> to save a trace and examine it later.</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-eio-slow-many.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-eio-slow-many.png" title="Trace of slow benchmark (12 concurrent requests)" class="caption"/><span class="caption-text">Trace of slow benchmark (12 concurrent requests)</span></span></a></p>
+<p>The benchmark runs 12 test clients at once, making it a bit noisy.
+To simplify thing, I set it to run only one client:</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-eio-slow.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-eio-slow.png" title="Trace of slow benchmark (one request at a time)" class="caption"/><span class="caption-text">Trace of slow benchmark (one request at a time)</span></span></a></p>
+<p>I've zoomed the image to show the first four iterations.
+The first is so quick it's not really visible, but the next three take about 40ms each.
+The yellow regions labelled &quot;suspend-domain&quot; show when the program is sleeping, waiting for an event from Linux.
+Each horizontal bar is a fiber (a light-weight thread). From top to bottom they are:</p>
+<ul>
+<li>Three rows for the test client:
+<ul>
+<li>The main application fiber performing the RPC call (mostly awaiting responses).
+</li>
+<li>The network's write fiber, sending outgoing messages (mostly waiting for something to send).
+</li>
+<li>The network's read fiber, reading incoming messages (mostly waiting to something to read).
+</li>
+</ul>
+</li>
+<li>Four rows for the server:
+<ul>
+<li>A loop accepting new incoming TCP connections.
+</li>
+<li>A short-lived fiber that accepts the new connection, then short-lived fibers each handling one request.
+</li>
+<li>The server's network write fiber.
+</li>
+<li>The server's network read fiber.
+</li>
+</ul>
+</li>
+<li>One fiber owned by Eio itself (used to wake up the event loop in some situations).
+</li>
+</ul>
+<p>This trace immediately raises a couple of questions:</p>
+<ul>
+<li>
+<p>Why is there a 40ms delay in each iteration of the test loop?</p>
+</li>
+<li>
+<p>Why does the program briefly wake up in the middle of the first delay, do nothing, and return to sleep?
+(notice the extra &quot;suspend-domain&quot; at the top)</p>
+</li>
+</ul>
+<p>Zooming in on a section between the delays, let's see what it's doing when it's not sleeping:</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-eio-slow-zoom1.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-eio-slow-zoom1.png" title="Zoomed in on the active part" class="caption"/><span class="caption-text">Zoomed in on the active part</span></span></a></p>
+<p>After a 40ms delay, the server's read fiber receives the next request (the running fiber is shown in green).
+The read fiber spawns a fiber to handle the request, which finishes quickly, starts the next read,
+and then the write fiber transmits the reply.</p>
+<p>The client's read fiber gets the reply, the write fiber outputs a message, then the application fiber runs
+and another message is sent.
+The server reads something (presumably the first message, though it happens after the client had sent both),
+then there is another long 40ms delay, then (far off the right of the image) the pattern repeats.</p>
+<p>To get more context in the trace,
+I <a href="https://ocaml-multicore.github.io/eio/eio/Eio/Private/Trace/index.html#val-log">configured</a>
+the logging library to write the (existing) debug-level log messages to the trace buffer too:</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-eio-slow-zoom1-debug.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-eio-slow-zoom1-debug.png" title="With log messages" class="caption"/><span class="caption-text">With log messages</span></span></a></p>
+<p>Log messages tend to be a bit long for the trace display, so they overlap and you have to zoom right in to read them,
+but they do help navigate.
+With this, I can see that the first client write is &quot;Send finish&quot; and the second is &quot;Calling Echo.ping&quot;.</p>
+<p>Looks like we're not buffering the output, so it's doing two separate writes rather than combining them.
+That's a little inefficient, and if you've done much network programming,
+you also probably already know why this might cause a 40ms delay,
+but let's pretend we don't know so we can play with a few more tools...</p>
+<h2>strace</h2>
+<p><a href="https://github.com/strace/strace">strace</a> can be used to trace interactions between applications and the Linux kernel
+(<code>-tt -T</code> shows when each call was started and how long it took):</p>
+<pre><code>$ strace -tt -T ./echo_bench.exe
+...
+11:38:58.079200 write(2, &quot;echo_bench.exe: [INFO] Accepting&quot;..., 73) = 73 &lt;0.000008&gt;
+11:38:58.079253 io_uring_enter(4, 4, 0, 0, NULL, 8) = 4 &lt;0.000032&gt;
+11:38:58.079341 io_uring_enter(4, 2, 0, 0, NULL, 8) = 2 &lt;0.000020&gt;
+11:38:58.079408 io_uring_enter(4, 2, 0, 0, NULL, 8) = 2 &lt;0.000021&gt;
+11:38:58.079471 io_uring_enter(4, 2, 0, 0, NULL, 8) = 2 &lt;0.000018&gt;
+11:38:58.079525 io_uring_enter(4, 2, 0, 0, NULL, 8) = 2 &lt;0.000019&gt;
+11:38:58.079580 io_uring_enter(4, 2, 0, 0, NULL, 8) = 2 &lt;0.000013&gt;
+11:38:58.079611 io_uring_enter(4, 1, 0, 0, NULL, 8) = 1 &lt;0.000009&gt;
+11:38:58.079637 io_uring_enter(4, 0, 1, IORING_ENTER_GETEVENTS|IORING_ENTER_EXT_ARG, 0x7ffc1661a480, 24) = -1 ETIME (Timer expired) &lt;0.018913&gt;
+11:38:58.098669 futex(0x5584542b767c, FUTEX_WAKE_PRIVATE, 1) = 1 &lt;0.000105&gt;
+11:38:58.098889 futex(0x5584542b7690, FUTEX_WAKE_PRIVATE, 1) = 1 &lt;0.000059&gt;
+11:38:58.098976 io_uring_enter(4, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 0 &lt;0.021355&gt;
+</code></pre>
+<p>On Linux, Eio defaults to using the <a href="https://github.com/axboe/liburing">io_uring</a> mechanism for submitting work to the kernel.
+<code>io_uring_enter(4, 2, 0, 0, NULL, 8) = 2</code> means we asked to submit 2 new operations to the ring on FD 4,
+and the kernel accepted them.</p>
+<p>The call at <code>11:38:58.079637</code> timed out after 19ms.
+It then woke up some <a href="https://www.man7.org/linux/man-pages/man2/futex.2.html">futexes</a> and then waited again, getting woken up after a further 21ms (for a total of 40ms).</p>
+<p>Futexes are used to coordinate between system threads.
+<code>strace -f</code> will follow all spawned threads (and processes), not just the main one:</p>
+<pre><code>$ strace -T -f ./echo_bench.exe
+...
+[pid 48451] newfstatat(AT_FDCWD, &quot;/etc/resolv.conf&quot;, {st_mode=S_IFREG|0644, st_size=40, ...}, 0) = 0 &lt;0.000011&gt;
+...
+[pid 48451] futex(0x561def43296c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY &lt;unfinished ...&gt;
+...
+[pid 48449] io_uring_enter(4, 0, 1, IORING_ENTER_GETEVENTS|IORING_ENTER_EXT_ARG, 0x7ffe1d5d1c90, 24) = -1 ETIME (Timer expired) &lt;0.018899&gt;
+[pid 48449] futex(0x561def43296c, FUTEX_WAKE_PRIVATE, 1) = 1 &lt;0.000106&gt;
+[pid 48451] &lt;... futex resumed&gt;)        = 0 &lt;0.019981&gt;
+[pid 48449] io_uring_enter(4, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 &lt;unfinished ...&gt;
+...
+[pid 48451] exit(0)                     = ?
+[pid 48451] +++ exited with 0 +++
+[pid 48449] &lt;... io_uring_enter resumed&gt;) = 0 &lt;0.021205&gt;
+...
+</code></pre>
+<p>The benchmark connects to <code>&quot;127.0.0.1&quot;</code> and Eio uses <code>getaddrinfo</code> to look up addresses (we can't use uring for this).
+Since <code>getaddrinfo</code> can block for a long time, Eio creates a new system thread (pid 48451) to handle it
+(we can guess this thread is doing name resolution because we see it read <code>resolv.conf</code>).</p>
+<p>As creating system threads is a little slow, Eio keeps the thread around for a bit after it finishes in case it's needed again.
+The timeout is when Eio decides that the thread isn't needed any longer and asks it to exit.
+So this isn't relevant to our problem (and only happens on the first 40ms delay, since we don't look up any further addresses).</p>
+<p>However, strace doesn't tell us what the uring operations were, or their return values.
+One option is to switch to the <code>posix</code> backend (which is the default on Unix systems).
+In fact, it's a good idea with any performance problem to check if it still happens with a different backend:</p>
+<pre><code>$ EIO_BACKEND=posix strace -T -tt ./echo_bench.exe
+...
+11:53:52.935976 writev(7, [{iov_base=&quot;\0\0\0\0\4\0\0\0\0\0\0\0\1\0\1\0\4\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0&quot;..., iov_len=40}], 1) = 40 &lt;0.000170&gt;
+11:53:52.936308 ppoll([{fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=4, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 9, {tv_sec=0, tv_nsec=0}, NULL, 8) = 1 ([{fd=8, revents=POLLIN}], left {tv_sec=0, tv_nsec=0}) &lt;0.000044&gt;
+11:53:52.936500 writev(7, [{iov_base=&quot;\0\0\0\0\20\0\0\0\0\0\0\0\1\0\1\0\2\0\0\0\0\0\0\0\0\0\0\0\3\0\3\0&quot;..., iov_len=136}], 1) = 136 &lt;0.000055&gt;
+11:53:52.936831 readv(8, [{iov_base=&quot;\0\0\0\0\4\0\0\0\0\0\0\0\1\0\1\0\4\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0&quot;..., iov_len=4096}], 1) = 40 &lt;0.000056&gt;
+11:53:52.937516 ppoll([{fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}, {fd=4, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 9, NULL, NULL, 8) = 1 ([{fd=8, revents=POLLIN}]) &lt;0.038972&gt;
+11:53:52.977751 readv(8, [{iov_base=&quot;\0\0\0\0\20\0\0\0\0\0\0\0\1\0\1\0\2\0\0\0\0\0\0\0\0\0\0\0\3\0\3\0&quot;..., iov_len=4096}], 1) = 136 &lt;0.000398&gt;
+</code></pre>
+<p>(to reduce clutter, I removed calls that returned <code>EAGAIN</code> and <code>ppoll</code> calls that returned 0 ready descriptors)</p>
+<p>The problem still occurs, and now we can see the two writes:</p>
+<ul>
+<li>The client writes 40 bytes to its end of the socket (FD 7), after which the server's end (FD 8) is ready for reading (<code>revents=POLLIN</code>).
+</li>
+<li>The client then writes another 136 bytes.
+</li>
+<li>The server reads 40 bytes and then uses <code>ppoll</code> to await further data.
+</li>
+<li>After 39ms, <code>ppoll</code> says FD 8 is now ready, and the server reads the other 136 bytes.
+</li>
+</ul>
+<h2>bpftrace</h2>
+<p>Alternatively, we can trace uring operations using <a href="https://github.com/bpftrace/bpftrace">bpftrace</a>.
+bpftrace is a little scripting language similar to awk,
+except that instead of editing a stream of characters,
+it live-patches the running Linux kernel.
+Apparently this is safe to run in production
+(and I haven't managed to crash my kernel with it yet).</p>
+<p>Here is a list of uring tracepoints we can probe:</p>
+<pre><code>$ sudo bpftrace -l 'tracepoint:io_uring:*'
+tracepoint:io_uring:io_uring_complete
+tracepoint:io_uring:io_uring_cqe_overflow
+tracepoint:io_uring:io_uring_cqring_wait
+tracepoint:io_uring:io_uring_create
+tracepoint:io_uring:io_uring_defer
+tracepoint:io_uring:io_uring_fail_link
+tracepoint:io_uring:io_uring_file_get
+tracepoint:io_uring:io_uring_link
+tracepoint:io_uring:io_uring_local_work_run
+tracepoint:io_uring:io_uring_poll_arm
+tracepoint:io_uring:io_uring_queue_async_work
+tracepoint:io_uring:io_uring_register
+tracepoint:io_uring:io_uring_req_failed
+tracepoint:io_uring:io_uring_short_write
+tracepoint:io_uring:io_uring_submit_req
+tracepoint:io_uring:io_uring_task_add
+tracepoint:io_uring:io_uring_task_work_run
+</code></pre>
+<p><code>io_uring_complete</code> looks promising:</p>
+<pre><code>$ sudo bpftrace -vl tracepoint:io_uring:io_uring_complete
+tracepoint:io_uring:io_uring_complete
+    void * ctx
+    void * req
+    u64 user_data
+    int res
+    unsigned cflags
+    u64 extra1
+    u64 extra2
+</code></pre>
+<p>Here's a script to print out the time, process, operation name and result for each completion:</p>
+<figure class="code"><figcaption><span>uringtrace.bt</span></figcaption><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+<span class="line-number">5</span>
+<span class="line-number">6</span>
+<span class="line-number">7</span>
+<span class="line-number">8</span>
+<span class="line-number">9</span>
+<span class="line-number">10</span>
+<span class="line-number">11</span>
+<span class="line-number">12</span>
+<span class="line-number">13</span>
+<span class="line-number">14</span>
+<span class="line-number">15</span>
+<span class="line-number">16</span>
+<span class="line-number">17</span>
+<span class="line-number">18</span>
+</pre></td><td class="code"><pre><code class="ocaml"><span class="line"><span class="nc">BEGIN</span> <span class="o">{</span>
+</span><span class="line">  <span class="o">@</span><span class="n">op</span><span class="o">[</span><span class="nc">IORING_OP_NOP</span><span class="o">]</span> <span class="o">=</span> <span class="s2">&quot;NOP&quot;</span><span class="o">;</span>
+</span><span class="line">  <span class="o">@</span><span class="n">op</span><span class="o">[</span><span class="nc">IORING_OP_READV</span><span class="o">]</span> <span class="o">=</span> <span class="s2">&quot;READV&quot;</span><span class="o">;</span>
+</span><span class="line">  <span class="o">...</span>
+</span><span class="line"><span class="o">}</span>
+</span><span class="line">
+</span><span class="line"><span class="n">tracepoint</span><span class="o">:</span><span class="n">io_uring</span><span class="o">:</span><span class="n">io_uring_complete</span> <span class="o">{</span>
+</span><span class="line">  <span class="o">$</span><span class="n">req</span> <span class="o">=</span> <span class="o">(</span><span class="k">struct</span> <span class="n">io_kiocb</span> <span class="o">*)</span> <span class="n">args</span><span class="o">-&gt;</span><span class="n">req</span><span class="o">;</span>
+</span><span class="line">  <span class="n">printf</span><span class="o">(</span><span class="s2">&quot;%dms: %s: %s %d</span><span class="se">\n</span><span class="s2">&quot;</span><span class="o">,</span>
+</span><span class="line">    <span class="n">elapsed</span> <span class="o">/</span> <span class="mf">1e6</span><span class="o">,</span>
+</span><span class="line">    <span class="n">comm</span><span class="o">,</span>
+</span><span class="line">    <span class="o">@</span><span class="n">op</span><span class="o">[$</span><span class="n">req</span><span class="o">-&gt;</span><span class="n">opcode</span><span class="o">],</span>
+</span><span class="line">    <span class="n">args</span><span class="o">-&gt;</span><span class="n">res</span><span class="o">);</span>
+</span><span class="line"><span class="o">}</span>
+</span><span class="line">
+</span><span class="line"><span class="nc">END</span> <span class="o">{</span>
+</span><span class="line">  <span class="n">clear</span><span class="o">(@</span><span class="n">op</span><span class="o">);</span>
+</span><span class="line"><span class="o">}</span>
+</span></code></pre></td></tr></tbody></table></div></figure><pre><code>$ sudo bpftrace uringtrace.bt
+Attaching 3 probes...
+...
+1743ms: echo_bench.exe: WRITE_FIXED 40
+1743ms: echo_bench.exe: READV 40
+1743ms: echo_bench.exe: WRITE_FIXED 136
+1783ms: echo_bench.exe: READV 136
+</code></pre>
+<p>In this output, the order is slightly different:
+we see the server's read get the 40 bytes before the client sends the rest,
+but we still see the 40ms delay between the completion of the second write and the corresponding read.
+The change in order is because we're seeing when the kernel knew the read was complete,
+not when the application found out about it.</p>
+<h2>tcpdump</h2>
+<p>An obvious step with any networking problem is the look at the packets going over the network.
+<a href="https://www.tcpdump.org/">tcpdump</a> can be used to capture packets, which can be displayed on the console or in a GUI with <a href="https://www.wireshark.org/">wireshark</a>.</p>
+<pre><code>$ sudo tcpdump -n -ttttt -i lo
+...
+...041330 IP ...37640 &gt; ...7000: Flags [P.], ..., length 40
+...081975 IP ...7000 &gt; ...37640: Flags [.], ..., length 0
+...082005 IP ...37640 &gt; ...7000: Flags [P.], ..., length 136
+...082071 IP ...7000 &gt; ...37640: Flags [.], ..., length 0
+</code></pre>
+<p>Here we see the client (on port 37640) sending 40 bytes to the server (port 7000),
+and the server replying with an ACK (with no payload) 40ms later.
+After getting the ACK, the client socket sends the remaining 136 bytes.</p>
+<p>Here we can see that while the application made the two writes in quick succession,
+TCP waited before sending the second one.
+Searching for &quot;delayed ack 40ms&quot; will turn up an explanation.</p>
+<h2>ss</h2>
+<p><a href="https://www.man7.org/linux/man-pages/man8/ss.8.html">ss</a> displays socket statistics.
+<code>ss -tin</code> shows all TCP sockets (<code>-t</code>) with internals (<code>-i</code>):</p>
+<pre><code>$ ss -tin 'sport = 7000 or dport = 7000'
+State   Recv-Q   Send-Q  Local Address:Port  Peer Address:Port
+ESTAB   0        0       127.0.0.1:7000      127.0.0.1:56224
+ ato:40 lastsnd:34 lastrcv:34 lastack:34
+ESTAB   0        176     127.0.0.1:56224     127.0.0.1:7000
+ ato:40 lastsnd:34 lastrcv:34 lastack:34 unacked:1 notsent:136
+</code></pre>
+<p>There's a lot of output here; I've removed the irrelevant bits.
+<code>ato:40</code> says there's a 40ms timeout for &quot;delay ack mode&quot;.
+<code>lastsnd</code>, etc, say that nothing had happened for 34ms when this information was collected.
+<code>unacked</code> and <code>notsent</code> aren't documented in the man-page,
+but I guess it means that the client (now port 56224) is waiting for 1 packet to be ack'd and has 136 bytes waiting until then.</p>
+<p>The client socket still has both messages (176 bytes total) in its queue;
+it can't forget about the first message until the server confirms receiving it,
+since the client might need to send it again if it got lost.</p>
+<p>This doesn't quite lead us to the solution, though.</p>
+<h2>offwaketime</h2>
+<p><a href="https://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html">offwaketime</a> records why a program stopped using the CPU, and what caused it to resume:</p>
+<pre><code>$ sudo offwaketime-bpfcc -f -p (pgrep echo_bench.exe) &gt; wakes
+$ flamegraph.pl --colors=chain wakes &gt; wakes.svg
+</code></pre>
+<p><a href="https://roscidus.com/blog/images/perf/wakes.svg"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/wakes.svg" title="Time spent suspended along with wakeup reason" class="caption"/><span class="caption-text">Time spent suspended along with wakeup reason</span></span></a></p>
+<p><code>offwaketime</code> records a stack-trace when a process is suspended (shown at the bottom and going up)
+and pairs it with the stack-trace of the thread that caused it to be resumed (shown above it and going down).</p>
+<p>The taller column on the right shows Eio being woken up due to TCP data being received from the network,
+confirming that it was the TCP ACK that got things going again.</p>
+<p>The shorter column on the left was unexpected, and the <code>[UNKNOWN]</code> in the stack is annoying
+(probably C code compiled without frame pointers).
+<code>gdb</code> gets a better stack trace.
+It turned out to be OCaml's tick thread, which wakes every 50ms to prevent one sys-thread from hogging the CPU:</p>
+<pre><code>$ strace -T -e pselect6 -p (pgrep echo_bench.exe) -f
+strace: Process 20162 attached with 2 threads
+...
+[pid 20173] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=50000000}, NULL) = 0 (Timeout) &lt;0.050441&gt;
+[pid 20173] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=50000000}, NULL) = 0 (Timeout) &lt;0.050318&gt;
+</code></pre>
+<p>Having multiple threads shown on the same diagram is a bit confusing.
+I should probably have used <code>-t</code> to focus only on the main one.</p>
+<p>Also, note that when using profiling tools that record the OCaml stack,
+it's useful to compile with frame pointers enabled.
+To install e.g. OCaml 5.2.0 with frame pointers enabled, use:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+</pre></td><td class="code"><pre><code class="sh"><span class="line">$<span class="w"> </span>opam<span class="w"> </span>switch<span class="w"> </span>create<span class="w"> </span><span class="m">5</span>.2.0-fp<span class="w"> </span>ocaml-variants.5.2.0+options<span class="w"> </span>ocaml-option-fp
+</span></code></pre></td></tr></tbody></table></div></figure><h2>magic-trace</h2>
+<p><a href="https://magic-trace.org/">magic-trace</a> allows capturing a short trace of everything the CPUs were doing just before some event.
+It uses Intel Processor Trace to have the CPU record all control flow changes (calls, branches, etc) to a ring-buffer,
+with fairly low overhead (2% to 10%, due to extra memory bandwidth needed).
+When something interesting happens, we save the buffer and use it to reconstruct the recent history.</p>
+<p>Normally we'd need to set up a trigger to grab the buffer at the right moment,
+but since this program is mostly idle it doesn't record much
+and I just attached at a random point and immediately pressed Ctrl-C to grab a snapshot and detach:</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+</pre></td><td class="code"><pre><code class="sh"><span class="line">$<span class="w"> </span>sudo<span class="w"> </span>magic-trace<span class="w"> </span>attach<span class="w"> </span>-multi-thread<span class="w"> </span>-trace-include-kernel<span class="w"> </span><span class="se">\</span>
+</span><span class="line"><span class="w">    </span>-p<span class="w"> </span><span class="o">(</span>pgrep<span class="w"> </span>echo_bench.exe<span class="o">)</span>
+</span><span class="line"><span class="o">[</span><span class="w"> </span>Attached.<span class="w"> </span>Press<span class="w"> </span>Ctrl-C<span class="w"> </span>to<span class="w"> </span>stop<span class="w"> </span>recording.<span class="w"> </span><span class="o">]</span>
+</span><span class="line">^C
+</span></code></pre></td></tr></tbody></table></div></figure><p>As before, we see 40ms periods of waiting, with bursts of activity between them:</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-magic-1.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-magic-1.png" title="Magic trace showing 40ms delays" class="caption"/><span class="caption-text">Magic trace showing 40ms delays</span></span></a></p>
+<p>The output is a bit messed up because magic-trace doesn't understand that there are multiple OCaml fibers here,
+each with their own stack. It also doesn't seem to know that exceptions unwind the stack.</p>
+<p>In each 40ms column, <code>Eio_posix.Flow.single_read</code> (3rd line from top) tried to do a read
+with <code>readv</code>, which got <code>EAGAIN</code> and called <code>Sched.next</code> to switch to the next fiber.
+Since there was nothing left to run, the Eio scheduler called <code>ppoll</code>.
+Linux didn't have anything ready for this process,
+and called the <code>schedule</code> kernel function to switch to another process.</p>
+<p>I recorded an eio-trace at the same time, to see the bigger picture.
+Here's the eio-trace zoomed in to show the two client writes (just before the 40ms wait),
+with the relevant bits of the magic-trace stack pasted below them:</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-magic-2.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-magic-2.png" title="Zoomed in on the two client writes, showing eio-trace and magic-trace output together" class="caption"/><span class="caption-text">Zoomed in on the two client writes, showing eio-trace and magic-trace output together</span></span></a></p>
+<p>We can see the OCaml code calling <code>writev</code>, entering the kernel, <code>tcp_write_xmit</code> being called to handle it,
+writing the IP packet to the network and then, because this is the loopback interface, the network receive logic
+handling the packet too.
+The second call is much shorter; <code>tcp_write_xmit</code> returns quickly without sending anything.</p>
+<p>Note: I used the <code>eio_posix</code> backend here so it's easier to correlate the kernel operations to the application calls
+(uring queues them up and runs them later).
+The <a href="https://github.com/koonwen/uring-trace">uring-trace</a> project should make this easier in future, but doesn't integrate with eio-trace yet.</p>
+<p>Zooming in further, it's easy to see the difference between the two calls to <code>tcp_write_xmit</code>:</p>
+<p><a href="https://roscidus.com/blog/images/perf/tcp_write_xmit.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/tcp_write_xmit.png" title="The start of the first tcp_write_xmit and the whole of the second" class="caption"/><span class="caption-text">The start of the first tcp_write_xmit and the whole of the second</span></span></a>
+Looking at the source for <a href="https://github.com/torvalds/linux/blob/v6.6/net/ipv4/tcp_output.c#L2727-L2731"><code>tcp_write_xmit</code></a>,
+we finally find the magic word &quot;<a href="https://en.wikipedia.org/wiki/Nagle's_algorithm">nagle</a>&quot;!</p>
+<figure class="code"><div class="highlight"><table><tbody><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
+<span class="line-number">2</span>
+<span class="line-number">3</span>
+<span class="line-number">4</span>
+</pre></td><td class="code"><pre><code class="c"><span class="line"><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="o">!</span><span class="n">tcp_nagle_test</span><span class="p">(</span><span class="n">tp</span><span class="p">,</span><span class="w"> </span><span class="n">skb</span><span class="p">,</span><span class="w"> </span><span class="n">mss_now</span><span class="p">,</span>
+</span><span class="line"><span class="w">			     </span><span class="p">(</span><span class="n">tcp_skb_is_last</span><span class="p">(</span><span class="n">sk</span><span class="p">,</span><span class="w"> </span><span class="n">skb</span><span class="p">)</span><span class="w"> </span><span class="o">?</span>
+</span><span class="line"><span class="w">			      </span><span class="nl">nonagle</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TCP_NAGLE_PUSH</span><span class="p">))))</span>
+</span><span class="line"><span class="w">	</span><span class="k">break</span><span class="p">;</span>
+</span></code></pre></td></tr></tbody></table></div></figure><h2>Summary script</h2>
+<p>Having identified a load of interesting events
+I wrote <a href="https://roscidus.com/blog/data/perf/summary-posix.bt">summary-posix.bt</a>, a bpftrace script to summarise them.
+This includes log messages written by the application (by tracing <code>write</code> calls to stderr),
+reads and writes on the sockets,
+and various probed kernel functions seen in the magic-trace output and when reading the kernel source.</p>
+<p>The output is specialised to this application (for example, TCP segments sent to port 7000
+are displayed as &quot;to server&quot;, while others are &quot;to client&quot;).
+I think this is a useful way to double-check my understanding, and any fix:</p>
+<pre><code>$ sudo bpftrace summary-posix.bt
+[...]
+844ms: server: got ping request; sending reply
+844ms: server reads from socket (EAGAIN)
+844ms: server: writev(96 bytes)
+844ms:   tcp_write_xmit (to client, nagle-on, packets_out=0)
+844ms:   tcp_v4_send_check: sending 96 bytes to client
+844ms: tcp_v4_rcv: got 96 bytes
+844ms:   timer_start (tcp_delack_timer, 40 ms)
+844ms: client reads 96 bytes from socket
+844ms: client: enqueue finish message
+844ms: client: enqueue ping call
+844ms: client reads from socket (EAGAIN)
+844ms: client: writev(40 bytes)
+844ms:   tcp_write_xmit (to server, nagle-on, packets_out=0)
+844ms:   tcp_v4_send_check: sending 40 bytes to server
+845ms: tcp_v4_rcv: got 40 bytes
+845ms:   timer_start (tcp_delack_timer, 40 ms)
+845ms: client: writev(136 bytes)
+845ms:   tcp_write_xmit (to server, nagle-on, packets_out=1)
+845ms: server reads 40 bytes from socket
+845ms: server reads from socket (EAGAIN)
+885ms: tcp_delack_timer_handler (ACK to client)
+885ms:   tcp_v4_send_check: sending 0 bytes to client
+885ms: tcp_delack_timer_handler (ACK to server)
+885ms: tcp_v4_rcv: got 0 bytes
+885ms:   tcp_write_xmit (to server, nagle-on, packets_out=0)
+885ms:   tcp_v4_send_check: sending 136 bytes to server
+</code></pre>
+<ol>
+<li>The server replies to a ping request, sending a 96 byte reply.
+Nagle is on, but nothing is awaiting an ACK (<code>packets_out=0</code>) so it gets sent immediately.
+</li>
+<li>The client receives the data. It starts a 40ms timer to send an ACK for it.
+</li>
+<li>The client enqueues a &quot;finish&quot; message, followed by another &quot;ping&quot; request.
+</li>
+<li>The client's write fiber sends the 40 byte &quot;finish&quot; message.
+Nothing is awaiting an ACK (<code>packets_out=0</code>) so the kernel sends it immediately.
+</li>
+<li>The client sends the 136 byte ping request. As the last message hasn't been ACK'd, it isn't sent yet.
+</li>
+<li>The server receives the 40 byte finish message.
+</li>
+<li>40ms pass. The server's delayed ACK timer fires and it sends the ACK to the client.
+</li>
+<li>The client's delayed ACK timer fires, but there's nothing to do (it sent the ACK with the &quot;finish&quot;).
+</li>
+<li>The client socket gets the ACK for its &quot;finish&quot; message and sends the delayed ping request.
+</li>
+</ol>
+<h2>Fixing it</h2>
+<p>The problem seemed clear: while porting from Lwt to Eio I'd lost the output buffering.
+So I looked at the Lwt code to see how it did it and... it doesn't! So how was it working?</p>
+<p>As I did with Eio, I set the Lwt benchmark's concurrency to 1 to simplify it for tracing,
+and discovered that Lwt with 1 client thread has exactly the same problem as the Eio version.
+Well, that's embarrassing!
+But why is Lwt fast with 12 client threads?</p>
+<p>With only minor changes (e.g. <code>write</code> vs <code>writev</code>), the summary script above also worked for tracing the Lwt version.
+With 1 or 2 client threads, Lwt is slow, but with 3 it's fairly fast.
+The delay only happens if the client sends a &quot;finish&quot; message when the server has no replies queued up
+(otherwise the finish message unblocks the replies, which carry the ACK to the client immediately).
+So, it works mostly by fluke!
+Lwt just happens to schedule the threads in such a way that Nagle's algorithm mostly doesn't trigger with 12 concurrent requests.</p>
+<p>Anyway, adding buffering to the Eio version fixed the problem:</p>
+<p><a href="https://roscidus.com/blog/images/perf/capnp-before.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-before.png" title="Before" class="caption"/><span class="caption-text">Before</span></span></a>
+<a href="https://roscidus.com/blog/images/perf/capnp-after.png"><span class="caption-wrapper center"><img src="https://roscidus.com/blog/images/perf/capnp-after.png" title="After (same scale)" class="caption"/><span class="caption-text">After (same scale)</span></span></a></p>
+<p>An interesting thing to notice here is that not only did the long delay go away,
+but the CPU operations while it was active were faster too!
+I think the reason is that the CPU goes into power-saving mode during the long delays.
+<code>cpupower monitor</code> shows my CPUs running at around 1 GHz with the old code and
+around 4.7 GHz when running the new version.</p>
+<p>Here are the results for the fixed version:</p>
+<pre><code>$ ./echo_bench.exe
+echo_bench.exe: [INFO] rate = 44425.962625 # The old Lwt version
+echo_bench.exe: [INFO] rate = 59653.451934 # The fixed Eio version
+</code></pre>
+<p>60k RPC requests per second doesn't seem that impressive, but at least it's faster than the old version,
+which is good enough for now! There's clearly scope for improvement here (for example, the buffering I
+added is quite inefficient, making two extra copies of every message, as the framing library copies it from
+a cstruct to a string, and then I have to copy the string back to a cstruct for the kernel).</p>
+<h2>Conclusions</h2>
+<p>There are lots of great tools available to help understand why something is running slowly (or misbehaving),
+and since programmers usually don't have much time for profiling,
+a little investigation will often turn up something interesting!
+Even when things are working correctly, these tools are a good way to learn more about how things work.</p>
+<p><code>time</code> will quickly tell you if the program is taking lots of time in application code, in the kernel, or just sleeping.
+If the problem is sleeping, <code>offcputime</code> and <code>offwaketime</code> can tell you why it was waiting and what woke it in the end.
+My own <code>eio-trace</code> tool will give a quick visual overview of what an Eio application is doing.
+<code>strace</code> is great for tracing interactions between applications and the kernel,
+but it doesn't help much when the application is using uring.
+To fix that, you can either switch to the <code>eio_posix</code> backend or use <code>bpftrace</code> with the uring tracepoints.
+<code>tcpdump</code>, <code>wireshark</code> and <code>ss</code> are all useful to examine network problems specifically.</p>
+<p>I've found <code>bpftrace</code> to be really useful for all kinds of tasks.
+Being able to write quick one-liners or short scripts gives it great flexibility.
+Since the scripts run in the kernel you can also filter and aggregate data efficiently
+without having to pass it all to userspace, and you can examine any kernel data structures.
+We didn't need that here because the program was running so slowly, but it's great for many problems.
+In addition to using well-defined tracepoints,
+it can also probe any (non-inlined) function in the kernel or the application.
+I also think using it to create a &quot;summary script&quot; to confirm a problem and its solution seems useful,
+though this is the first time I've tried doing that.</p>
+<p><code>magic-trace</code> is great for getting really detailed function-by-function tracing through the application and kernel.
+Its ability to report the last few ms of activity after you notice a problem is extremely useful
+(though not needed in this example).
+It would be really useful if you could trigger magic-trace from a bpftrace script, but I didn't see a way to do that.</p>
+<p>However, it was surprisingly difficult to get any of the tools to point directly
+at the combination of Nagle's algorithm with delayed ACKs as the cause of this common problem!</p>
+<p>This post was mainly focused on what was happening in the kernel.
+In <a href="https://roscidus.com/blog/blog/2024/07/22/performance-2/">part 2</a>, I'll investigate a CPU-intensive problem instead.</p>
+
diff --git a/data/planet/tarides/creating-the-syntaxdocumentation-command---part-3-vscode-platform-extension.md b/data/planet/tarides/creating-the-syntaxdocumentation-command---part-3-vscode-platform-extension.md
new file mode 100644
index 0000000000..7c315744e4
--- /dev/null
+++ b/data/planet/tarides/creating-the-syntaxdocumentation-command---part-3-vscode-platform-extension.md
@@ -0,0 +1,201 @@
+---
+title: 'Creating the SyntaxDocumentation Command - Part 3: VSCode Platform Extension'
+description: "In the final installment of our series on the SyntaxDocumentation command,
+  we delve into its integration within the OCaml VSCode Platform\u2026"
+url: https://tarides.com/blog/2024-07-24-creating-the-syntaxdocumentation-command-part-3-vscode-platform-extension
+date: 2024-07-24T00:00:00-00:00
+preview_image: https://tarides.com/static/aa5bd16e724bfc18f6e436399a4dda66/e49a8/vscode_toggle.jpg
+authors:
+- Tarides
+source:
+---
+
+<p>In the final installment of our series on the <code>SyntaxDocumentation</code> command, we delve into its integration within the OCaml VSCode Platform extension. Building on our previous discussions about Merlin and OCaml LSP, this article explores how to make <code>SyntaxDocumentation</code> an opt-in feature in the popular VSCode editor.</p>
+<p>In the first part of this series, <a href="https://tarides.com/blog/2024-04-17-creating-the-syntaxdocumentation-command-part-1-merlin/">Creating the SyntaxDocumentation Command - Part 1: Merlin</a>, we explored how to create a new command in Merlin, particularly the <code>SyntaxDocumentation</code> command. In the second part, <a href="https://tarides.com/blog/2024-06-12-creating-the-syntaxdocumentation-command-part-2-ocaml-lsp/">Creating the SyntaxDocumentation Command - Part 2: OCaml LSP</a>, we looked at how to implement this feature in OCaml LSP in order to enable visual editors to trigger the command with actions such as hovering. In this third and final installment, you will learn how <code>SyntaxDocumentation</code> integrates into the OCaml VSCode Platform extension as an opt-in feature, enabling users to toggle it on/off depending on their preference.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#vscode-editor" aria-label="vscode editor permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>VSCode Editor</h2>
+<p><a href="https://code.visualstudio.com/">Visual Studio Code</a> is a free open-source, cross-platform code editor from Microsoft that is very popular among developers.
+Some of its features include:</p>
+<ul>
+<li>Built-in Git support</li>
+<li>Easy debugging of code right from the editor with an interactive console</li>
+<li>Built-in extension manager with lots of available extensions to download</li>
+<li>Supports a huge number of programming languages, including syntax highlighting</li>
+<li>Integrated terminal and many more features</li>
+</ul>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-platform-extension-for-vscode" aria-label="ocaml platform extension for vscode permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml Platform Extension for VSCode</h2>
+<p>The VSCode OCaml Platform extension enhances the development experience for OCaml programmers. It is itself written in the OCaml programming language using bindings to the VSCode API and then compiled into Javascript with <a href="https://github.com/ocsigen/js_of_ocaml"><code>js_of_ocaml</code></a>. It provides language support features such as <code>syntax-highlighting</code>, <code>go-to-definition</code>, <code>auto-completion</code>, and <code>type-on-hover</code>. These key functionalities are powered by the OCaml Language Server (<code>ocamllsp</code>), which can be installed using popular package managers like <a href="https://opam.ocaml.org/">opam</a> and <a href="https://esy.sh/">esy</a>. Users can easily configure the extension to work with different sandbox environments, ensuring a tailored setup for various project needs. Additionally, the extension includes comprehensive settings and command options, making it very versatile for both beginner and advanced OCaml developers.</p>
+<p>The OCaml Platform Extension for VSCode gives us a nice UI for interacting with OCaml-LSP. We can configure settings for the server as well as interact with switches, browse the AST, and many more features. Our main focus is on adding a <code>checkbox</code> that allows users to activate or deactivate <code>SyntaxDocumentation</code> in OCaml LSP's <code>hover</code> response. I limited this article's scope to just the files relevant in implementing this, while giving a brief tour of how the extension is built.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-implementation" aria-label="the implementation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Implementation</h2>
+<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#extension-manifest" aria-label="extension manifest permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Extension Manifest</h3>
+<p>Every VSCode extension has a manifest file, <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/package.json">package.json</a>, at the root of the extension directory. The <code>package.json</code> contains a mix of Node.js fields, such as scripts and <code>devDependencies</code>, and VS Code specific fields, like <code>publisher</code>, <code>activationEvents</code>, and <code>contributes</code>.
+Our manifest file contains general information such as:</p>
+<ul>
+<li><strong>Name</strong>: OCaml Platform</li>
+<li><strong>Description</strong>: Official OCaml language extension for VSCode</li>
+<li><strong>Version</strong>: 1.14.2</li>
+<li><strong>Publisher</strong>: OCaml Labs</li>
+<li><strong>Categories</strong>: Programming Languages, Debuggers</li>
+</ul>
+<p>We also have commands that act as action events for our extension. These commands are used to perform a wide range of things, like navigating the AST, upgrading packages, deleting a switch, etc.
+An example of a command to open the AST explorer is written as:</p>
+<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span>
+    <span class="token property">&quot;command&quot;</span><span class="token operator">:</span> <span class="token string">&quot;ocaml.open-ast-explorer-to-the-side&quot;</span><span class="token punctuation">,</span>
+    <span class="token property">&quot;category&quot;</span><span class="token operator">:</span> <span class="token string">&quot;OCaml&quot;</span><span class="token punctuation">,</span>
+    <span class="token property">&quot;title&quot;</span><span class="token operator">:</span> <span class="token string">&quot;Open AST explorer&quot;</span>
+<span class="token punctuation">}</span></code></pre></div>
+<p>For our case, enabling/disabling <code>SyntaxDocumentation</code> is a configuration setting for our language server, so we indicate this in the configurations section:</p>
+<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token property">&quot;ocaml.server.syntaxDocumentation&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span>
+    <span class="token property">&quot;type&quot;</span><span class="token operator">:</span> <span class="token string">&quot;boolean&quot;</span><span class="token punctuation">,</span>
+    <span class="token property">&quot;default&quot;</span><span class="token operator">:</span> <span class="token boolean">false</span><span class="token punctuation">,</span>
+    <span class="token property">&quot;markdownDescription&quot;</span><span class="token operator">:</span> <span class="token string">&quot;Enable/Disable syntax documentation&quot;</span>
+<span class="token punctuation">}</span></code></pre></div>
+<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#extension-instance" aria-label="extension instance permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Extension Instance</h3>
+<p>The file <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/extension_instance.ml"><code>extension_instance.ml</code></a> handles the setup and configuration of various components of the OCaml VSCode extension and ensures that features like the language server and documentation are properly initialised. Its key functionalities are:</p>
+<ul>
+<li><strong>Managing the Extension State</strong>: It uses a record type that encapsulates the state of the extension, holding information about the sandbox, REPL, OCaml version, LSP client, documentation server, and various other settings.</li>
+</ul>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> <span class="token punctuation">{</span>
+  <span class="token keyword">mutable</span> sandbox <span class="token punctuation">:</span> Sandbox<span class="token punctuation">.</span>t<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> repl <span class="token punctuation">:</span> Terminal_sandbox<span class="token punctuation">.</span>t option<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> ocaml_version <span class="token punctuation">:</span> Ocaml_version<span class="token punctuation">.</span>t option<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> lsp_client <span class="token punctuation">:</span> <span class="token punctuation">(</span>LanguageClient<span class="token punctuation">.</span>t <span class="token operator">*</span> Ocaml_lsp<span class="token punctuation">.</span>t<span class="token punctuation">)</span> option<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> documentation_server <span class="token punctuation">:</span> Documentation_server<span class="token punctuation">.</span>t option<span class="token punctuation">;</span>
+  documentation_server_info <span class="token punctuation">:</span> StatusBarItem<span class="token punctuation">.</span>t<span class="token punctuation">;</span>
+  sandbox_info <span class="token punctuation">:</span> StatusBarItem<span class="token punctuation">.</span>t<span class="token punctuation">;</span>
+  ast_editor_state <span class="token punctuation">:</span> Ast_editor_state<span class="token punctuation">.</span>t<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> codelens <span class="token punctuation">:</span> bool option<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> extended_hover <span class="token punctuation">:</span> bool option<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> dune_diagnostics <span class="token punctuation">:</span> bool option<span class="token punctuation">;</span>
+  <span class="token keyword">mutable</span> syntax_documentation <span class="token punctuation">:</span> bool option<span class="token punctuation">;</span>
+<span class="token punctuation">}</span></code></pre></div>
+<ul>
+<li>
+<p><strong>Interacting With the Language Server</strong>: This extension needs to interact with the OCaml language server (<code>ocamllsp</code>) to provide features like code completion, diagnostics, and other language-specific functionalities.</p>
+</li>
+<li>
+<p><strong>Documentation Server Management</strong>: The file includes functionality to start, stop, and manage the documentation server, which provides documentation lookup for installed OCaml packages.</p>
+</li>
+<li>
+<p><strong>Handling Configuration</strong>: This extension allows users to configure settings such as code lens, extended hover, diagnostics, and syntax documentation. These settings are sent to the language server to adjust its behaviour accordingly. For <code>SyntaxDocumentation</code>, whenever the user toggles the checkbox, the server should set the correct configuration parameters. This is done mainly using two functions <code>set_configuration</code> and <code>send_configuration</code>.</p>
+</li>
+</ul>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token operator">..</span><span class="token punctuation">.</span>
+
+<span class="token comment">(* Set configuration *)</span>
+<span class="token keyword">let</span> set_configuration t <span class="token label property">~syntax_documentation</span> <span class="token operator">=</span>
+  t<span class="token punctuation">.</span>syntax_documentation <span class="token operator">&lt;-</span> syntax_documentation<span class="token punctuation">;</span>
+  <span class="token keyword">match</span> t<span class="token punctuation">.</span>lsp_client <span class="token keyword">with</span>
+  <span class="token operator">|</span> None <span class="token operator">-&gt;</span> <span class="token punctuation">(</span><span class="token punctuation">)</span>
+  <span class="token operator">|</span> Some <span class="token punctuation">(</span>client<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">_</span> <span class="token punctuation">:</span> Ocaml_lsp<span class="token punctuation">.</span>t<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span>
+      send_configuration <span class="token label property">~syntax_documentation</span> client
+<span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token operator">..</span><span class="token punctuation">.</span>
+
+<span class="token comment">(* Send configuration *)</span>
+<span class="token keyword">let</span> send_configuration <span class="token label property">~syntax_documentation</span> client <span class="token operator">=</span>
+  <span class="token keyword">let</span> syntaxDocumentation <span class="token operator">=</span>
+    Option<span class="token punctuation">.</span>map syntax_documentation <span class="token label property">~f</span><span class="token punctuation">:</span><span class="token punctuation">(</span><span class="token keyword">fun</span> enable <span class="token operator">-&gt;</span>
+        Ocaml_lsp<span class="token punctuation">.</span>OcamllspSettingEnable<span class="token punctuation">.</span>create <span class="token label property">~enable</span><span class="token punctuation">)</span>
+  <span class="token keyword">in</span>
+  <span class="token keyword">let</span> settings <span class="token operator">=</span>
+    Ocaml_lsp<span class="token punctuation">.</span>OcamllspSettings<span class="token punctuation">.</span>create
+      <span class="token label property">~syntaxDocumentation</span>
+  <span class="token keyword">in</span>
+  <span class="token keyword">let</span> payload <span class="token operator">=</span>
+    <span class="token keyword">let</span> settings <span class="token operator">=</span>
+      LanguageClient<span class="token punctuation">.</span>DidChangeConfiguration<span class="token punctuation">.</span>create
+        <span class="token label property">~settings</span><span class="token punctuation">:</span><span class="token punctuation">(</span>Ocaml_lsp<span class="token punctuation">.</span>OcamllspSettings<span class="token punctuation">.</span>t_to_js settings<span class="token punctuation">)</span>
+        <span class="token punctuation">(</span><span class="token punctuation">)</span>
+    <span class="token keyword">in</span>
+    LanguageClient<span class="token punctuation">.</span>DidChangeConfiguration<span class="token punctuation">.</span>t_to_js settings
+  <span class="token keyword">in</span>
+  LanguageClient<span class="token punctuation">.</span>sendNotification
+    client
+    <span class="token string">&quot;workspace/didChangeConfiguration&quot;</span>
+    payload
+
+<span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div>
+<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#interacting-with-ocaml-lsp" aria-label="interacting with ocaml lsp permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Interacting With OCaml LSP:</h3>
+<p>The <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/ocaml_lsp.ml"><code>ocaml_lsp.ml</code></a> file ensures that <code>ocamllsp</code> is set up correctly and up to date. For <code>SyntaxDocumentation</code>, two important modules used from this file are: <code>OcamllspSettingEnable</code> and <code>OcamllspSettings</code>.</p>
+<p><code>OcamllspSettingEnable</code> defines an interface for enabling/disabling specific settings in <code>ocamllsp</code>.</p>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token operator">..</span><span class="token punctuation">.</span>
+
+<span class="token keyword">module</span> OcamllspSettingEnable <span class="token operator">=</span> <span class="token keyword">struct</span>
+  <span class="token keyword">include</span> Interface<span class="token punctuation">.</span>Make <span class="token punctuation">(</span><span class="token punctuation">)</span>
+  <span class="token keyword">include</span>
+    <span class="token punctuation">[</span><span class="token operator">%</span>js<span class="token punctuation">:</span>
+    <span class="token keyword">val</span> enable <span class="token punctuation">:</span> t <span class="token operator">-&gt;</span> bool or_undefined <span class="token punctuation">[</span><span class="token operator">@@</span>js<span class="token punctuation">.</span>get<span class="token punctuation">]</span>
+    <span class="token keyword">val</span> create <span class="token punctuation">:</span> enable<span class="token punctuation">:</span>bool <span class="token operator">-&gt;</span> t <span class="token punctuation">[</span><span class="token operator">@@</span>js<span class="token punctuation">.</span>builder<span class="token punctuation">]</span><span class="token punctuation">]</span>
+<span class="token keyword">end</span>
+
+<span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div>
+<p>The annotation <code>[@@js.get]</code> is a PPX used to bind OCaml functions to JavaScript property accessors. This allows OCaml code to interact seamlessly with JavaScript objects, accessing properties directly as if they were native OCaml fields, while <code>[@@js.builder]</code> facilitates the creation of JavaScript objects from OCaml functions. They both come from the <a href="https://github.com/LexiFi/gen_js_api/tree/master"><code>LexFi/gen_js_api</code></a> library.</p>
+<p><code>OcamllspSettings</code> aggregrates multiple <code>OcamllspSettingEnable</code> settings into a comprehensive settings interface for <code>ocamllsp</code>.</p>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token operator">..</span><span class="token punctuation">.</span>
+<span class="token keyword">module</span> OcamllspSettings <span class="token operator">=</span> <span class="token keyword">struct</span>
+  <span class="token keyword">include</span> Interface<span class="token punctuation">.</span>Make <span class="token punctuation">(</span><span class="token punctuation">)</span>
+  <span class="token keyword">include</span>
+    <span class="token punctuation">[</span><span class="token operator">%</span>js<span class="token punctuation">:</span>
+    <span class="token keyword">val</span> syntaxDocumentation <span class="token punctuation">:</span> t <span class="token operator">-&gt;</span>
+      OcamllspSettingEnable<span class="token punctuation">.</span>t or_undefined <span class="token punctuation">[</span><span class="token operator">@@</span>js<span class="token punctuation">.</span>get<span class="token punctuation">]</span>
+
+    <span class="token keyword">val</span> create <span class="token punctuation">:</span> <span class="token operator">?</span>syntaxDocumentation<span class="token punctuation">:</span>OcamllspSettingEnable<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span>
+      unit <span class="token operator">-&gt;</span> t <span class="token punctuation">[</span><span class="token operator">@@</span>js<span class="token punctuation">.</span>builder<span class="token punctuation">]</span><span class="token punctuation">]</span>
+
+  <span class="token keyword">let</span> create <span class="token label property">~syntaxDocumentation</span> <span class="token operator">=</span> create <span class="token operator">?</span>syntaxDocumentation <span class="token punctuation">(</span><span class="token punctuation">)</span>
+<span class="token keyword">end</span>
+<span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div>
+<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#workspace-configuration" aria-label="workspace configuration permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Workspace Configuration</h3>
+<p>The file <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/settings.ml"><code>settings.ml</code></a> provides a flexible way to manage workspace-specific settings, including:</p>
+<ul>
+<li>Creating settings with JSON serialisation and deserialisation</li>
+<li>Retrieving and updating settings from the workspace configuration</li>
+<li>Resolving and substituting workspace variables within settings</li>
+<li>Defining specific settings for the OCaml language server, such as extra environment variables, server arguments, and features like <code>codelens</code> and <code>SyntaxDocumentation</code></li>
+</ul>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token operator">..</span><span class="token punctuation">.</span>
+<span class="token keyword">let</span> create_setting <span class="token label property">~scope</span> <span class="token label property">~key</span> <span class="token label property">~of_json</span> <span class="token label property">~to_json</span> <span class="token operator">=</span>
+  <span class="token punctuation">{</span> scope<span class="token punctuation">;</span> key<span class="token punctuation">;</span> to_json<span class="token punctuation">;</span> of_json <span class="token punctuation">}</span>
+
+<span class="token keyword">let</span> server_syntaxDocumentation_setting <span class="token operator">=</span>
+  create_setting
+    <span class="token label property">~scope</span><span class="token punctuation">:</span>ConfigurationTarget<span class="token punctuation">.</span>Workspace
+    <span class="token label property">~key</span><span class="token punctuation">:</span><span class="token string">&quot;ocaml.server.syntaxDocumentation&quot;</span>
+    <span class="token label property">~of_json</span><span class="token punctuation">:</span>Jsonoo<span class="token punctuation">.</span>Decode<span class="token punctuation">.</span>bool
+    <span class="token label property">~to_json</span><span class="token punctuation">:</span>Jsonoo<span class="token punctuation">.</span>Encode<span class="token punctuation">.</span>bool
+<span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div>
+<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#activating-the-extension" aria-label="activating the extension permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Activating the Extension</h3>
+<p>The <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/vscode_ocaml_platform.ml"><code>vscode_ocaml_platform.ml</code></a> file initialises and activates the OCaml Platform extension for VSCode. The key tasks include:</p>
+<ul>
+<li>Suggesting users select a sandbox environment</li>
+<li>Notifying the extension instance of configuration changes</li>
+<li>Registering various components and features of the extension</li>
+<li>Setting up the sandbox environment and starting the OCaml language server</li>
+</ul>
+<p>In the context of <code>SyntaxDocumentation</code>, this code ensures that the extension is correctly configured to handle <code>SyntaxDocumentation</code> settings. The <code>notify_configuration_changes</code> function listens for changes to the <code>server_syntaxDocumentation_setting</code> and updates the extension instance accordingly. This means that any changes the user makes to the <code>SyntaxDocumentation</code> settings in the VSCode workspace configuration will be reflected in the extension's behaviour, ensuring that <code>SyntaxDocumentation</code> is enabled or disabled as per the user's preference.</p>
+<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> notify_configuration_changes instance <span class="token operator">=</span>
+  Workspace<span class="token punctuation">.</span>onDidChangeConfiguration
+    <span class="token label property">~listener</span><span class="token punctuation">:</span><span class="token punctuation">(</span><span class="token keyword">fun</span> _event <span class="token operator">-&gt;</span>
+      <span class="token keyword">let</span> syntax_documentation <span class="token operator">=</span>
+        Settings<span class="token punctuation">.</span><span class="token punctuation">(</span>get server_syntaxDocumentation_setting<span class="token punctuation">)</span>
+      <span class="token keyword">in</span>
+      Extension_instance<span class="token punctuation">.</span>set_configuration instance <span class="token label property">~syntax_documentation</span><span class="token punctuation">)</span>
+    <span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre></div>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2>
+<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; ">
+      <a href="https://tarides.com/static/35fdebf46d5296c80a6b1339b9237095/66cde/syndoc_vscode.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener">
+    <span class="gatsby-resp-image-background-image" style="padding-bottom: 62.35294117647059%; position: relative; bottom: 0; left: 0; background-image: url('data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAMABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAIBBf/EABQBAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhADEAAAAeTNSGD/xAAUEAEAAAAAAAAAAAAAAAAAAAAg/9oACAEBAAEFAl//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAEDAQE/AT//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAECAQE/AT//xAAUEAEAAAAAAAAAAAAAAAAAAAAg/9oACAEBAAY/Al//xAAXEAADAQAAAAAAAAAAAAAAAAAAEEEB/9oACAEBAAE/IY4av//aAAwDAQACAAMAAAAQ8M//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAEDAQE/ED//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAECAQE/ED//xAAcEAACAgIDAAAAAAAAAAAAAAAAAREhEDFhwdH/2gAIAQEAAT8Q3F9jV7RHKJp6UeP/2Q=='); background-size: cover; display: block;"></span>
+  <img src="https://tarides.com/static/35fdebf46d5296c80a6b1339b9237095/7bf67/syndoc_vscode.jpg" class="gatsby-resp-image-image" alt="SyntaxDocument toggle" title="" srcset="/static/35fdebf46d5296c80a6b1339b9237095/651be/syndoc_vscode.jpg 170w,
+/static/35fdebf46d5296c80a6b1339b9237095/d30a3/syndoc_vscode.jpg 340w,
+/static/35fdebf46d5296c80a6b1339b9237095/7bf67/syndoc_vscode.jpg 680w,
+/static/35fdebf46d5296c80a6b1339b9237095/990cb/syndoc_vscode.jpg 1020w,
+/static/35fdebf46d5296c80a6b1339b9237095/66cde/syndoc_vscode.jpg 1045w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/>
+  </a>
+    </span></p>
+<p>In this final article, we explored how to integrate <code>SyntaxDocumentation</code> into OCaml VSCode Platform extension as a configurable option for OCaml LSP's <code>hover</code> command. We covered key components such as configuring the extension manifest, managing the extension state, interacting with the OCaml language server, and handling workspace configurations. By enabling users to toggle the <code>SyntaxDocumentation</code> feature on or off, we can ensure a flexible and customisable development experience for all users.</p>
+<p>Feel free to contribute to this extension on the GitHub repository: <a href="https://github.com/ocamllabs/vscode-ocaml-platform"><code>vscode-ocaml-platform</code></a>. Thank you for following along in this series, and happy coding with OCaml and VSCode!</p>
+<blockquote>
+<p>Tarides is an open-source company first. Our top priorities are to establish and tend to the OCaml community. Similarly, we&rsquo;re dedicated to the <a href="https://github.com/sponsors/tarides">development of the OCaml language</a> and enjoy collaborating with industry partners and individual engineers to continue improving the performance and features of OCaml.</p>
+<p>We want you to join the OCaml community, test the languages and tools, and actively be part of the language&rsquo;s evolution.</p>
+<p><a href="https://tarides.com/company">Contact Tarides</a> to see how OCaml can benefit your business and/or for support while learning OCaml. Follow us on <a href="https://twitter.com/tarides_">Twitter</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a> to ensure you never miss a post, and join the OCaml discussion on <a href="https://discuss.ocaml.org/">Discuss</a>!</p>
+</blockquote>
diff --git a/data/planet/tarides/deep-dive-optimising-multicore-ocaml-for-windows.md b/data/planet/tarides/deep-dive-optimising-multicore-ocaml-for-windows.md
new file mode 100644
index 0000000000..53b117e858
--- /dev/null
+++ b/data/planet/tarides/deep-dive-optimising-multicore-ocaml-for-windows.md
@@ -0,0 +1,60 @@
+---
+title: 'Deep Dive: Optimising Multicore OCaml for Windows'
+description: "We love hosting internships. It is rewarding to potentially facilitate
+  someone\u2019s first foray into the OCaml ecosystem, helping them\u2026"
+url: https://tarides.com/blog/2024-07-10-deep-dive-optimising-multicore-ocaml-for-windows
+date: 2024-07-10T00:00:00-00:00
+preview_image: https://tarides.com/static/18316ffead39c18231bc3dd3899eed4f/6b50e/racecar.jpg
+authors:
+- Tarides
+source:
+---
+
+<p>We love hosting internships. It is rewarding to potentially facilitate someone&rsquo;s first foray into the OCaml ecosystem, helping them establish a hopefully life-long foothold in the world of open-source programming. It is also a great opportunity to get new perspectives on existing issues. Fresh eyes can reinvigorate topics, highlighting different pain points and new solutions which benefit the entire community.</p>
+<p>Sometimes, we also find ourselves just trying to keep up with our interns as they take off like rocket ships! Recently, we mentored a student who did just that. The initial goal of the internship was to investigate strange performance drops in the OCaml runtime that arose after the introduction of multicore support. These performance drops were most keenly felt on Windows machines, and the initial internship specification emphasised the need to improve the developer experience on that operating system.</p>
+<p>Our intern <a href="https://github.com/eutro">@eutro</a> went above and beyond anything we could have expected and tackled the project thoroughly and ambitiously. In this post, I will attempt to give you a comprehensive overview of this intricate project and the problems it tackled.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#get-busy-waiting" aria-label="get busy waiting permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Get Busy Waiting?</h2>
+<p>Before OCaml 5, only one thread would run at any given time. Users never had to worry about multiple threads trying to use a shared resource like the Garbage Collector (GC). In OCaml 5, however, the process is divided into several 'threads'<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>, and multiple threads regularly try to run parts of the GC simultaneously. The minor GC uses a Stop The World (STW) function to run in parallel on all threads, whereas the major GC&rsquo;s work is split into slices. These may happen in parallel between threads and while the user&rsquo;s program (also called the &lsquo;mutator&rsquo;) is making changes. This is one example of when a mechanism is needed to protect multiple threads from making changes that contradict each other and result in unexpected behaviours.</p>
+<p>Locks are the traditional way of doing this, whereby other activity is halted (or locked) while one activity finishes. However, in multicore programming, this method would be incredibly inefficient since there can be many activities in progress simultaneously. In this case, we would need to introduce so many locks for the different parts of memory that doing so would cause memory and OS resource problems!</p>
+<p>The approach we use for OCaml 5 combines a <a href="https://en.wikipedia.org/wiki/Compare-and-swap">Compare And Swap</a> (CAS) operation with <a href="https://en.wikipedia.org/wiki/Busy_waiting">Busy-Wait</a> loops. A CAS operation ensures that if two threads try to modify the same area of memory, only one will succeed. The one that fails will know it has failed and can then enter a period of Busy-Waiting (called <code>SPIN_WAIT</code> in the code). Busy-wait loops (also referred to as spins) describe a process that repeatedly ('busily') checks whether a condition is true. The process or task is only resumed once that condition is met.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#sleeping-beauty" aria-label="sleeping beauty permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Sleeping Beauty</h2>
+<p>Busy-wait loops are used successfully in OCaml for many purposes but have been optimised. They are mostly appropriate in cases where we think that the required condition will be met quickly or in a reasonable period of time. If that&rsquo;s not the case, then theoretically, the thread that is waiting will just keep spinning. If one allows busy-wait loops to spin indefinitely, they waste a lot of power and CPU and can actually prevent the condition they are waiting for from being met. To avoid that happening, we can use a <code>sleep</code> function.</p>
+<p>In order to implement spinning without wasting power, the loop checks the condition repeatedly, but after a while, it starts 'sleeping' between checks. Suppose a thread is waiting for condition <code>C</code> to come true, and it uses a Busy-Wait loop to check for this. The program spins a number of times, checking the condition, and then waits or goes to &lsquo;sleep&rsquo; for a set amount of time &ndash; then it &lsquo;wakes up&rsquo; and checks once more before (if it has to) going back to &lsquo;sleep&lsquo; again. The period of &lsquo;sleep&rsquo; increases each time. This cycle repeats itself until the condition <code>C</code> finally comes true.</p>
+<p>This was how the process was <em>supposed</em> to work, yet, for some unknown reason, certain processes would occasionally take much longer than expected. The performance drop was worst on Windows machines.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#testing-1-2-3-testing-1-2-3" aria-label="testing 1 2 3 testing 1 2 3 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Testing 1-2-3, Testing 1-2-3,</h2>
+<p>The first order of business was to conduct a series of tests on the runtime. Not only to discover the possible cause of the performance drops but also to establish a baseline of performance against which to measure any changes (and hopefully improvements!).</p>
+<p>We knew that there was a performance problem and that it was particularly painful on Windows, but we didn&rsquo;t know why. Even if we had a hunch as to what might be causing it, it was crucial to build up a realistic image of what was happening before we attempted a fix.</p>
+<p>@eutro began this process by identifying where Busy-Wait loops were spinning in the runtime and for how long. She also wanted to know if there were places in the runtime where processes would get &lsquo;stuck&rsquo; in Busy-Wait loops and not move on, and if so, where and why.</p>
+<p>She used the <a href="https://github.com/ocaml/ocaml/tree/trunk/testsuite/tests">OCaml testsuite</a> and measured how many <code>SPIN_WAIT</code> macros resolved successfully without needing to sleep and which ones did not. She discovered that in most cases, the spinning had the desired effect, and the process could continue after a reasonable amount of time when the condition it was waiting for was met. The garbage collector was also not experiencing any significant performance drops, so it could not be the cause of the problems on Windows. Instead, what she realised was that on Windows, <code>sleeps</code> cannot be shorter than one millisecond, and so the first sleeps that occur end up being much too long. This causes extended and unnecessary delays for processes running on Windows. Equipped with this realisation, @eutro got started on a solution. One that would be most helpful on Windows but still benefit users on other operating systems.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#barriers-and-futexes-oh-my" aria-label="barriers and futexes oh my permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Barriers and Futexes, Oh My!</h2>
+<p>There are a few ways a thread in OCaml can wait for a condition:</p>
+<ul>
+<li>First, we may be able to proceed very soon (nanoseconds), in which case we will spin until we can proceed.</li>
+<li>Then, if spinning doesn&rsquo;t let us proceed, we sleep a few times until the condition comes true. In most cases (read: not on Windows), the first few sleeps are still very quick, and we can proceed soon once the condition is met.</li>
+<li>An alternative to sleeping &lsquo;blindly&rsquo; like this is to tell the OS specifically <em>what</em> we are waiting for so that we can be woken up only when we know the condition is true. You can think of this as taking a ticket and waiting for your number to be called rather than repeatedly asking if you can be seen.</li>
+</ul>
+<p>So what has changed? As things stood, only steps one and two were available, a series of increasingly long sleeps interleaved with checks. So you would spin <em>n</em> times, then sleep for 10&micro;s (&lsquo;&micro;s&rsquo; is short for microseconds), then you check the condition once more and might sleep for 20&micro;s, then 35&micro;s, and so on. The point is that the time spent sleeping kept gradually increasing.</p>
+<p>However, as @eutro discovered, in many cases, the process took far too long to resume, even after the condition had come true. By the time they woke up from sleeping, they could have already proceeded if they had just &lsquo;taken a ticket&rsquo; earlier and waited until they were notified. To improve performance, instead of repeatedly sleeping for longer increments, we use specialised &lsquo;barriers&rsquo; to wait <em>until</em> we can proceed.</p>
+<p>To solve the Windows problem, we now use the <code>SPIN_WAIT</code> function only in cases where we don&rsquo;t expect to ever need to sleep. In cases where that first sleep would cause significant delay, we introduce a new <code>SPIN_WAIT_NTIMES</code> function, which lets the process spin for a set number of times before being combined with a barrier. @eutro used her previous benchmarks to determine which occasions could keep the <code>SPIN_WAIT</code> cycle as-is and which occasions required the new <code>SPIN_WAIT_NTIMES</code> combined with a barrier.</p>
+<p>But things didn&rsquo;t stop there! @eutro could also optimise the type of barrier. Traditionally, we use condition variables to wake up threads waiting on a condition. However, they are unnecessarily resource-intensive as they require extra memory, and since woken threads must acquire (and release) a lock before they continue. A <em>futex</em> is a lower-level synchronisation primitive that can similarly be used to wake up threads but without the added complexity of a condition variable.</p>
+<p>@eutro added the use of futexes to the operating systems that permitted her to do so. Notably, macOS does <em>not</em> allow non-OS programs to use futexes so we fall back to using &quot;condition variables&quot; there.</p>
+<p>By introducing the use of <code>SPIN_WAIT_NTIMES</code>, barriers, and futexes, @eutro implemented a number of optimisations that were applicable not only on Windows but on several operating systems. These optimisations save users time and processing power.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#how-much-do-you-benchmark" aria-label="how much do you benchmark permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How Much do You Bench(mark)?</h2>
+<p>During the course of implementing these changes, @eutro did a lot of tests. It was important to be thorough in order to ensure that her changes didn&rsquo;t have unintended consequences. It is incredibly difficult to reason about how programs will react to a specific alteration, as there are many things happening in the runtime and several ways that programs can interact.</p>
+<p>She used the OCaml test suite again, this time to help her check that the OCaml runtime and other OCaml programs functioned correctly. Having verified that they were, @eutro also ran several benchmarks to check that she hadn&rsquo;t actually made anything slower. For this, she used the <a href="https://github.com/ocaml-bench/sandmark">Sandmark test suite</a>.</p>
+<p>I recommend checking out the tests and benchmarks for yourself <a href="https://github.com/ocaml/ocaml/pull/12579">in the Pull Request</a>. The PR also gives a more in-depth technical overview of the changes to the Busy-Waiting loops.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#you-can-join-us-too" aria-label="you can join us too permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>You Can Join Us Too!</h2>
+<p>It is great to see what someone with a passion for OCaml can bring to the system as a whole. I think it illustrates the benefits of open-source software development: when we invite fresh observations and suggestions, we create a community that supports innovation and collaboration. We are impressed with the hard work @eutro put into solving the complicated problem before her. She went above and beyond what we thought possible in such a short amount of time!</p>
+<p>Would you like to complete an internship with us? We welcome people of varying experience levels &ndash; some interns have made open-source contributions before and are familiar with functional programming, and some of our interns have no functional programming experience at all! If you&rsquo;re interested in developing your skills in a supportive environment, keep an eye on our <a href="https://tarides.com/careers/">careers page</a>, where we post about any available internships. We also regularly communicate about available internships on <a href="https://twitter.com/tarides_">X</a> (formerly known as Twitter). We hope to hear from you!</p>
+<blockquote>
+<p>Tarides champions open-source development. We create and maintain key features of the OCaml language in collaboration with the OCaml community. To learn more about how you can support our open-source work, discover our <a href="https://github.com/sponsors/tarides">page on GitHub</a>.</p>
+</blockquote>
+<blockquote>
+<p>We are always happy to discuss commercial opportunities around OCaml. We provide core services, including training, tailor-made tools, and secure solutions. <a href="https://tarides.com/contact/">Contact us today</a> to learn more about how Tarides can help your teams realise their vision.</p>
+</blockquote>
+<div class="footnotes">
+<hr/>
+<ol>
+<li>We are aware of the distinction between &lsquo;threads&rsquo; and &lsquo;domains&rsquo; in OCaml. We chose to use the former here, mainly to keep the content accessible for people who are less familiar with the subtleties of OCaml.<a href="https://tarides.com/feed.xml#fnref-1" class="footnote-backref">&#8617;</a></li>
+</ol>
+</div>
diff --git a/data/planet/tarides/introducing-olly-providing-observability-tools-for-ocaml-5.md b/data/planet/tarides/introducing-olly-providing-observability-tools-for-ocaml-5.md
new file mode 100644
index 0000000000..5b30df403e
--- /dev/null
+++ b/data/planet/tarides/introducing-olly-providing-observability-tools-for-ocaml-5.md
@@ -0,0 +1,60 @@
+---
+title: 'Introducing Olly: Providing Observability Tools for OCaml 5'
+description: "It might be tempting to think that we can write code that works perfectly
+  the first time around, but in reality optimisation and\u2026"
+url: https://tarides.com/blog/2024-07-03-introducing-olly-providing-observability-tools-for-ocaml-5
+date: 2024-07-03T00:00:00-00:00
+preview_image: https://tarides.com/static/c2d24e261cee0bd95269a23f88e01178/09efa/ollydata.jpg
+authors:
+- Tarides
+source:
+---
+
+<p>It might be tempting to think that we can write code that works perfectly the first time around, but in reality optimisation and troubleshooting forms a big part of programming. However, there are more and less productive (and frustrating!) ways of problem solving. Having the right tools to guide you, ones that show you where to look and what is going wrong, can make a huge difference.</p>
+<p>We recently introduced you to the <a href="https://tarides.com/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml/">monitoring system <code>runtime_events</code></a>, which allows users to monitor their runtime for, among other things, how programs are affecting performance. Alongside <code>runtime_events</code>, sits the observability tool <code>olly</code>, which provides users with a number of helpful formatting options for their runtime tracing data.</p>
+<p>This is all part of how we&rsquo;re making developing in OCaml easier by bringing new features and tools to the community. Olly is just one such tool, and it makes the monitoring system for OCaml significantly more accessible. With Olly, you don&rsquo;t have to be an expert or spend time combing through the data that <code>runtime_events</code> extracts for you. Rather, Olly can generate the information you need in a way that makes it easy to understand, store, and query.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-olly-and-how-does-it-work" aria-label="what is olly and how does it work permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is Olly and How Does it Work?</h2>
+<p>Olly, as an observability tool for OCaml 5, has the ability to extract runtime tracing data from <code>runtime_events</code>. This data can then be visualised with a variety of graphical options available.</p>
+<p>How does Olly do this? Olly uses the Runtime API to provide you with monitoring metric information and associated data. The tool comes with several subcommands, each with its own function.</p>
+<p>The command <code>olly trace</code> can generate runtime traces for programs compiled in OCaml 5 using its trace subcommand. The tracing data is generated in one of two formats, the <a href="https://fuchsia.dev/fuchsia-src/reference/tracing/trace-format">Fuschia trace format</a> or the <a href="https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview">Chrome tracing format</a> with the former being the default. Both formats can be viewed in <a href="https://ui.perfetto.dev">Perfetto</a>, but the Chrome format trace can also be viewed in <code>chrome://tracing</code> for Chromium-based browsers. Another example of a subcommand is <code>olly gc-stats</code>, which can report the running time of the garbage collector (GC) and the GC tail latency of an OCaml executable.</p>
+<p>The motivation behind introducing an observability tool like Olly is to make data extracted using <code>runtime_events</code> more useful, since few developers will want to use the event tracing system directly. Olly makes it easy for users to troubleshoot their own programs, but it also makes it easy for a developer to diagnose why someone <em>else&rsquo;s</em> program is slow. A client can send their <code>runtime_events</code> data, a feature that comes built in with every OCaml 5 switch, to a developer who can then use Olly to find the problem and suggest a solution. This makes working in OCaml is easier as optimisation and problem solving becomes more efficient and streamlined.</p>
+<p>It doesn&rsquo;t end there! One of our future goals for Olly is that it should be able to provide automatic reports and diagnosis of some problems. Look out for that exciting update in the future!</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#recent-update-modularising-olly" aria-label="recent update modularising olly permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Recent Update: Modularising Olly</h2>
+<p>One of the latest updates to Olly is its modularisation by <a href="https://github.com/eutro">Eutro</a>, splitting the <code>bin/olly.ml</code> file into smaller discrete libraries including <code>olly_common</code>, <code>olly_trace</code>, and <code>olly_gc_stats</code>. By splitting up the large file, the user can exercise some control over which dependencies they want their library to have. They can create a minimal build with minimal dependencies, or stick with a fuller build relying on all the dependencies. For example, to build <code>olly_bare</code> on the trunk you now only require two dependencies: Dune and <code>cmdliner</code>. Both can be installed without using Opam. Since some developers will prefer this set up, it&rsquo;s good to support a variety of configurations.</p>
+<p>It also potentially makes it easier to maintain, since the smaller files have well-defined purposes and provide a clearer overview than just having one large file covering a multitude of functions. If something breaks, this segmentation can make it easier for a maintainer to triage and amend the problem. The same modularisation may also help newcomers get an overview of all the different components of the library. Sadiq Jaffer merged Eutro&rsquo;s <a href="https://github.com/tarides/runtime_events_tools/pull/43">PR #43</a> into <code>Tarides: main</code> and it will form part of a future Olly release pending further testing.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#how-to-use-olly-an-example" aria-label="how to use olly an example permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How to Use Olly: an Example</h2>
+<p>Let's wrap up by looking at an example of when you might use Olly. When we want to visualise the performance of the OCaml runtime alongside any <a href="https://tarides.com/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml/">custom events</a> we may have, the first step is to generate a trace. To generate a trace, we run the command <code>olly trace tracefile.trace</code> in combination with the name of the program we want to enable tracing for. If we wanted to generate a trace for the <code>solver-service</code>, the command would be <code>olly trace tracefile.trace 'solver-service'</code>.</p>
+<p>For our example, we chose to generate the tracing data in the Fuschia trace format. Once we had the trace, we loaded it into Perfetto to get a helpful visual representation of what our code is doing and we ended up with the following image:</p>
+<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; ">
+      <a href="https://tarides.com/static/a6f36a136572efd644c7221f449795f5/89557/olly-trace-2.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener">
+    <span class="gatsby-resp-image-background-image" style="padding-bottom: 57.05882352941176%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAAsTAAALEwEAmpwYAAABjUlEQVR42s1Sy2obQRDc/ydf4IsPORhCIHdfcgnBOMYotoLAr8iSHcXanZ13z0O7KvfMktiE3HJJQ1G11UzTj22E6EFEcM4hMMeUYJyBCwaULLNm1gi5eAoxO+TRI+0c0uDhooRyLbRvYb1CE0LE6xjHESuxxU3f4v3iEu8uv+Jwdoa3F+c4OD3Bh8UcR/MZ+7OqD89PcfDlM958PMan2ys0XddhHAbs9/uKUrCXhrvMeNgGXC0lrtcGyw1hft1htXFYP0XcrhTuf1jcPXosvkt8uxHoZERzt37AtpcwFCq0JyhbRg08PiGkMh7rHJCyr/zLD5nY4zWxn4eiueCGd/gkFUxMsClX1pGYCdJr9E4xZGVhe/bM5HGuaGFFRau3sMGieewEfpYOXxVUnjulCGk8pPZQlR2vwk5aucm3NGlG3/PR+F3TSl1HpbxD2A0gRuBLl2sHvjzx+MGYCUpWr/qFvQdpXXMkBDJ5NJ53VqIc5HcUXTFOGIcJw678Bi/+nzn+bjz9peA/xP9f8Bmp0lCTH8iGuwAAAABJRU5ErkJggg=='); background-size: cover; display: block;"></span>
+  <img src="https://tarides.com/static/a6f36a136572efd644c7221f449795f5/c5bb3/olly-trace-2.png" class="gatsby-resp-image-image" alt="A diagram representing different processes running left to right along the image in different colours: green, yellow, pink, and grey. The visual representations of the processes are stacked on top of one another, forming different bands." title="" srcset="/static/a6f36a136572efd644c7221f449795f5/04472/olly-trace-2.png 170w,
+/static/a6f36a136572efd644c7221f449795f5/9f933/olly-trace-2.png 340w,
+/static/a6f36a136572efd644c7221f449795f5/c5bb3/olly-trace-2.png 680w,
+/static/a6f36a136572efd644c7221f449795f5/b12f7/olly-trace-2.png 1020w,
+/static/a6f36a136572efd644c7221f449795f5/b5a09/olly-trace-2.png 1360w,
+/static/a6f36a136572efd644c7221f449795f5/89557/olly-trace-2.png 1928w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/>
+  </a>
+    </span></p>
+<p>The UI in this image displays the processes down the side, each corresponding to a domain. Our program ended up using four cores, and therefore, the image shows four processes. Each process, in turn, shows the tracing for the OCaml runtime build plus the custom events generated by Eio. Let's zoom in on one process now:</p>
+<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; ">
+      <a href="https://tarides.com/static/80db9e97b3d228aeaadd165ec03845b8/ec09f/olly-expanded-2.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener">
+    <span class="gatsby-resp-image-background-image" style="padding-bottom: 50%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAKCAYAAAC0VX7mAAAACXBIWXMAAAsTAAALEwEAmpwYAAACFUlEQVR42m2Ry08TURTG5w90ryhxbaILaIOJazewwIXRBSEuFI1GjFBKeQRK0AStbVHeKLQVAqECnc6jnWHeP8/MACtu8st3vu/czJl7rxKEIb7vX+O6Lkdqk7p6QrGxxUJtg/zvnxT2fjG5U2GxviVsMr+/zlJjW3pVcrsVxqpLlA52UaIoBCKuVigDTNuh44Vs/uvyo9FipaFSPTZZ3D5h9UhjQ/LygcrasU7pUOPr3ikz64fUzgyUkVdvePv+E2OXvH73kbEP44znppicnuHzVJ6c6FRhlsn8NBNCrpDmE/kC+Zk52VdgQvxKqYwyOPyC4ecjDD17mTAoPH74lMz9J/T3Zsjc6SfbE9NHVuqB3izZexn6bou/m5Gsj4GeLI9uPWB0aBSFG9ZKaYPc7Be+F/9QXqhRiSnuU53fZ235L6vLdb7NiS/WWS02qMw3KBVqHG6do8R3FhMEQYLnuew2z9hpqqi6j9kNMMxANMQQYu0kmmamFaHHPSvEcSMUw76QwML2gwTL83HktQN5rIvAl8zFDjypPam9pI6xJE/6Se4mPUe8cqYbNNU2puslGI6LZl2gdW1UzRBMWqKttsF5S5PaRG2nvq13aKl6kp+eqnKSjvyhZaF1OjLBT7A8mS4ftR0Hu9vFjnuGiS2bLU1PvWDFGp/MMLBiVBVH/I2PQhSlyN2mBCnBpUbBdS+6yuS4sf4HR+bJLu9BhM8AAAAASUVORK5CYII='); background-size: cover; display: block;"></span>
+  <img src="https://tarides.com/static/80db9e97b3d228aeaadd165ec03845b8/c5bb3/olly-expanded-2.png" class="gatsby-resp-image-image" alt="A diagram giving an expanded view of the events happening in process 0. The different activities are shown using various colours. Activities include ring_id 0 1, eio.exit_fiber:v:5, and eio_fiber:v:1" title="" srcset="/static/80db9e97b3d228aeaadd165ec03845b8/04472/olly-expanded-2.png 170w,
+/static/80db9e97b3d228aeaadd165ec03845b8/9f933/olly-expanded-2.png 340w,
+/static/80db9e97b3d228aeaadd165ec03845b8/c5bb3/olly-expanded-2.png 680w,
+/static/80db9e97b3d228aeaadd165ec03845b8/b12f7/olly-expanded-2.png 1020w,
+/static/80db9e97b3d228aeaadd165ec03845b8/b5a09/olly-expanded-2.png 1360w,
+/static/80db9e97b3d228aeaadd165ec03845b8/ec09f/olly-expanded-2.png 1916w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/>
+  </a>
+    </span></p>
+<p>This expanded view shows both the Garbage Collector's (GC) activity and times when Eio is suspended.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#until-next-time" aria-label="until next time permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Until Next Time!</h2>
+<p>We want to create tools that make the developer experience in OCaml easier and more intuitive. Olly makes it possible to visualise your code's performance, helping you understand when your programs are slowing down and why. If you have suggestions or improvements to share, you are welcome to participate in <a href="https://github.com/tarides/runtime_events_tools">the Runtime Events Tools repo</a> on GitHub.</p>
+<p>We want to hear from you! Connect with us on social media by following us on <a href="https://x.com/tarides_">X</a> (formerly known as Twitter) and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a>. You can also join in with the rest of the community on the forum <a href="https://discuss.ocaml.org/">Discuss</a> to share your thoughts on everything OCaml!</p>
+<blockquote>
+<p>Tarides champions open-source development. We create and maintain key features of the OCaml language in collaboration with the OCaml community. To learn more about how you can support our open-source work, discover our <a href="https://github.com/sponsors/tarides">page on GitHub</a>.</p>
+</blockquote>
+<blockquote>
+<p>We are always happy to discuss commercial opportunities around OCaml. We provide core services, including training, tailor-made tools, and secure solutions. <a href="https://tarides.com/contact/">Contact us today</a> to learn more about how Tarides can help your teams realise their vision.</p>
+</blockquote>
diff --git a/data/planet/tarides/ocaml-compiler-manual-html-generation.md b/data/planet/tarides/ocaml-compiler-manual-html-generation.md
new file mode 100644
index 0000000000..c60716a67e
--- /dev/null
+++ b/data/planet/tarides/ocaml-compiler-manual-html-generation.md
@@ -0,0 +1,109 @@
+---
+title: OCaml Compiler Manual HTML Generation
+description: "In order to avoid long, confusing URLs on the OCaml Manual pages, we
+  set out to create a solution that shortens these URLs, including\u2026"
+url: https://tarides.com/blog/2024-07-17-ocaml-compiler-manual-html-generation
+date: 2024-07-17T00:00:00-00:00
+preview_image: https://tarides.com/static/71ebe0c7b3ff03df0f8cfbf681e8dad8/0132d/compiler-manual.jpg
+authors:
+- Tarides
+source:
+---
+
+<p>In order to avoid long, confusing URLs on the OCaml Manual pages, we set out to create a solution that shortens these URLs, including section references, and contains the specific version. The result improves readability and user experience. This article outlines the motivation behind these changes and how we implemented them.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#challenge" aria-label="challenge permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Challenge</h2>
+<p>The OCaml HTML manuals have URL references such as <a href="https://v2.ocaml.org/manual/types.html#sss:typexpr-sharp-types">https://v2.ocaml.org/manual/types.html#sss:typexpr-sharp-types</a>, and they do not refer to any specific compiler version. We needed a way to easily share a link with the version number included. The OCaml.org page already has a mention of the compiler version, but it refers to specific <a href="https://ocaml.org/releases">https://ocaml.org/releases</a>.</p>
+<p>We wanted a canonical naming convention that is consistent with current and future manual releases. It would also be beneficial to have only one place to store all the manuals, and the users of OCaml.org should never see redirecting URLs in the browser. This will greatly help increase the overall backlink quality when people share the links in conversations, tutorials, blogs, and on the Web. A preferred naming scheme should be something like:</p>
+<p><a href="https://v2.ocaml.org/releases/latest/manual/attributes.html">https://v2.ocaml.org/releases/latest/manual/attributes.html</a>
+<a href="https://v2.ocaml.org/releases/4.12/manual/attributes.html">https://v2.ocaml.org/releases/4.12/manual/attributes.html</a></p>
+<p>Using this, we redirected the v2.ocaml.org to OCaml.org for the production deployment. Also, the changes help in shorter URLs that can be easily remembered and shared. The rel=&quot;canonical&quot; is a perfectly good way to make sure only <a href="https://ocaml.org/manual/latest">https://ocaml.org/manual/latest</a> gets indexed.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#implementation" aria-label="implementation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Implementation</h2>
+<p>After a detailed discussion, the following UI mockup to switch manuals was provided <a href="https://github.com/ocaml/ocaml.org/issues/534#issuecomment-1318570350">via GitHub issue</a>, and <em>Option A</em> was selected.</p>
+<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; ">
+      <a href="https://tarides.com/static/515787f06d67098243babd789c7487e4/e431d/UI-Mockup.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener">
+    <span class="gatsby-resp-image-background-image" style="padding-bottom: 54.11764705882353%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAABYlAAAWJQFJUiTwAAACE0lEQVR42k1T2ZKbMBDk//8rVXnI0yZbu+sDY25hBNgSOhGdkexNhapmQGrN0TRZCDv2fUe87vc7imsJzhiMUoircX/bAnaKdHvylgWX/AI+3tJa5Hwjsz7ARDhAKAu+SNyFwqodtlexEAIWFcBlwKx2dLPBlQl0XKFfPPwWSz952bBYnLoVOdMoR0twuPINJ+Yg9ZaIMemNLzhcB+QNR80WdOMD1W1FzQ18+C/hPHFcixxVVYJ1DYa+RU3PQsg03neHNz7jUrVougFVw1C3DJeyxUM5WOvgvUdULmuowke94tAqfHUKn82K91Lg1Bto99Q2Xsdqws/fFY6tQDVvNIVDcXOoZ49xmqHWNfEy6zyijtZvCcZt6d3E9RcMdTDeFY0pwB8Gs3SYBEGS5hSlMlgeAo7OZ81InTWCOpQ495L0VPS84swUHbTQxkJRwj/5gB9vJd5yjkOn00SfjaIzGmyc0Q8jNPGy6U4j9CPKfqaNBWySYLOijkz60sYaKEr6WU749cVS8RMlOcbChI9Wo+45qrrFvDyQrcZT5i1BWU8IiGsxmSIv6QiqXLYjjkX3b1wuN4yPOLKlkS3xTeJmjDx1oKqx4rlXOF4YCS+RD0QyL9uQLdhikrVykiIn3pmSx1iQ1bQLLx7ZxnvSSApIKWH1Cncr4JQkgUOyS/pTKHqrIUXkEccoOHamqMnUrz9le/L/AgqCShl0VxRzAAAAAElFTkSuQmCC'); background-size: cover; display: block;"></span>
+  <img src="https://tarides.com/static/515787f06d67098243babd789c7487e4/c5bb3/UI-Mockup.png" class="gatsby-resp-image-image" alt="UI Mockup" title="" srcset="/static/515787f06d67098243babd789c7487e4/04472/UI-Mockup.png 170w,
+/static/515787f06d67098243babd789c7487e4/9f933/UI-Mockup.png 340w,
+/static/515787f06d67098243babd789c7487e4/c5bb3/UI-Mockup.png 680w,
+/static/515787f06d67098243babd789c7487e4/b12f7/UI-Mockup.png 1020w,
+/static/515787f06d67098243babd789c7487e4/b5a09/UI-Mockup.png 1360w,
+/static/515787f06d67098243babd789c7487e4/e431d/UI-Mockup.png 1882w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/>
+  </a>
+    </span></p>
+<p>Our proposed changes to the URL are shown below:</p>
+<p>Current: <a href="https://v2.ocaml.org/releases/5.1/htmlman/index.html">https://v2.ocaml.org/releases/5.1/htmlman/index.html</a><br/>
+Suggested: <code>https://ocaml.org/manual/5.3.0/index.html</code></p>
+<p>Current: <a href="https://v2.ocaml.org/releases/5.1/api/Atomic.html">https://v2.ocaml.org/releases/5.1/api/Atomic.html</a><br/>
+Suggested: <code>https://ocaml.org/manual/5.3.0/api/Atomic.html</code></p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#html-compiler-manuals" aria-label="html compiler manuals permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>HTML Compiler Manuals</h2>
+<p>The HTML manual files are hosted in a separate GitHub repository at <a href="https://github.com/ocaml-web/html-compiler-manuals/">https://github.com/ocaml-web/html-compiler-manuals/</a>. It contains a folder for each compiler version, and it also has the manual HTML files.</p>
+<p>A script to automate the process of generating the HTML manuals is also available at <a href="https://github.com/ocaml-web/html-compiler-manuals/blob/main/scripts/build-compiler-html-manuals.sh">https://github.com/ocaml-web/html-compiler-manuals/blob/main/scripts/build-compiler-html-manuals.sh</a>. The script defines two variables, DIR and OCAML_VERSION, where you can specify the location to build the manual and the compiler version to use. It then clones the <code>ocaml/ocaml</code> repository, switches to the specific compiler branch, builds the compiler, and then generates the manuals. The actual commands are listed below for reference:</p>
+<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">echo &quot;Clone ocaml repository ...&quot;
+git clone git@github.com:ocaml/ocaml.git
+
+# Switch to ocaml branch
+echo &quot;Checkout $OCAML_VERSION branch in ocaml ...&quot;
+cd ocaml
+git checkout $OCAML_VERSION
+
+# Remove any stale files
+echo &quot;Running make clean&quot;
+make clean
+git clean -f -x
+
+# Configure and build
+echo &quot;Running configure and make ...&quot;
+./configure
+make
+
+# Build web
+echo &quot;Generating manuals ...&quot;
+cd manual
+make web</code></pre></div>
+<p>As per the new API requirements, the <code>manual/src/html_processing/Makefile</code> variables are updated as follows:</p>
+<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">WEBDIRMAN = $(WEDBIR)/$(VERSION)
+WEBDIRAPI = $(WEBDIRMAN)/API</code></pre></div>
+<p>Accordingly, we have also updated the <code>manual/src/html_processing/src/common.ml.in</code> file OCaml variables to reflect the required changes:</p>
+<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">
+let web_dir = Filename.parent_dir_name // &quot;webman&quot; // ocaml_version
+
+let docs_maindir = web_dir
+
+let api_page_url = &quot;api&quot;
+
+let manual_page_url = &quot;..&quot;</code></pre></div>
+<p>We also include the <a href="https://plausible.ci.dev/js/script.js">https://plausible.ci.dev/js/script.js</a> script to collect view metrics for the HTML pages. The manuals from 3.12 through 5.2 are now available in the <a href="https://github.com/ocaml-web/html-compiler-manuals/tree/main">https://github.com/ocaml-web/html-compiler-manuals/tree/main</a> GitHub repository.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#ocamlorg" aria-label="ocamlorg permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml.org</h2>
+<p>The OCaml.org Dockerfile has a step included to clone the HTML manuals and perform an automated production deployment as shown below:</p>
+<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">RUN git clone https://github.com/ocaml-web/html-compiler-manuals /manual
+
+ENV OCAMLORG_MANUAL_PATH /manual</code></pre></div>
+<p>The path to the new GitHub repository has been updated in the configuration file, along with the explicit URL paths to the respective manuals. The v2 URLs from the <code>data/releases/*.md</code> file have been replaced without the v2 URLs, and the <code>manual /releases/</code> redirects have been removed from <code>redirection.ml.</code> The <code>/releases/</code> redirects are now handled in <code>middleware.ml</code>. The caddy configuration to allow the redirection of v2.ocaml.org can be implemented as follows:</p>
+<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">v2.ocaml.org {
+	redir https://ocaml.org{uri} permanent
+}</code></pre></div>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#call-to-action" aria-label="call to action permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Call to Action</h2>
+<p>You are encouraged to checkout the latest <a href="https://github.com/ocaml/ocaml">OCaml compiler from trunk</a> and use the <code>build-compiler-html-manual.sh</code> script to generate the HTML documentation.</p>
+<p>Please do report any errors or issues that you face at the following GitHub repository: <a href="https://github.com/ocaml-web/html-compiler-manuals/issues">https://github.com/ocaml-web/html-compiler-manuals/issues</a></p>
+<p>If you are interested in working on OCaml.org, please message us on the <a href="http://discord.ocaml.org">OCaml Discord</a> server or reach out to the <a href="https://github.com/ocaml-web//html-compiler-manuals">contributors in GitHub</a>.</p>
+<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#references" aria-label="references permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>References</h2>
+<ol>
+<li>
+<p>(cross-ref) Online OCaml Manual: there should be an easy way to get a fixed-version URL. <a href="https://github.com/ocaml/ocaml.org/issues/534">https://github.com/ocaml/ocaml.org/issues/534</a></p>
+</li>
+<li>
+<p>Use <code>webman/*.html</code> and <code>webman/api</code> for OCaml.org manuals. <a href="https://github.com/ocaml/ocaml/pull/12976">https://github.com/ocaml/ocaml/pull/12976</a></p>
+</li>
+<li>
+<p>Serve OCaml Compiler Manuals. <a href="https://github.com/ocaml/ocaml.org/pull/2150">https://github.com/ocaml/ocaml.org/pull/2150</a></p>
+</li>
+<li>
+<p>Simplify and extend <code>/releases/</code> redirects from legacy v2.ocaml.org URLs. <a href="https://github.com/ocaml/ocaml.org/pull/2448">https://github.com/ocaml/ocaml.org/pull/2448</a></p>
+</li>
+</ol>
+<blockquote>
+<p>Tarides is an open-source company first. Our top priorities are to establish and tend to the OCaml community. Similarly, we&rsquo;re dedicated to the <a href="https://github.com/sponsors/tarides">development of the OCaml language</a> and enjoy collaborating with industry partners and individual engineers to continue improving the performance and features of OCaml. We want you to join the OCaml community, test the languages and tools, and actively be part of the language&rsquo;s evolution.</p>
+</blockquote>
+<blockquote>
+<p>Tarides is also always happy to discuss commercial opportunities around OCaml. There are many areas where we can help industrial users to adopt OCaml 5 more quickly, including training, support, custom developments, etc. Please <a href="https://tarides.com/company">contact us</a>  if you are interested in discussing your needs.</p>
+</blockquote>

	Original	incr	Atomic.incr
Load Operations	11,898	25,860	2,658,364
Load L1D hit	4,140	15,181	326,236
Load L2D hit	93	163	295
Load LLC hit	3,750	3,173	2,321,704
Load Local HITM	251	299	2,317,885
Store Operations	116,386	462,162	3,909,500
Store L1D Hit	104,763	389,492	3,908,947
Store L1D Miss	11,622	72,667	550