Built site for gh-pages

broadinstitute · Sep 10, 2024 · c6f9c93 · c6f9c93
1 parent 026ae51
commit c6f9c93
Show file tree

Hide file tree

Showing 23 changed files with 378 additions and 362 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-b4391914
+ec0b8683
diff --git a/explanations/FAQ.ipynb b/explanations/FAQ.ipynb
@@ -152,7 +152,7 @@
         "        of these replicates’ value was in turn the mean of all the sites\n",
         "        and cells in a given well."
       ],
-      "id": "6e5ec436-fb8d-4925-9e70-1f45bc9268ef"
+      "id": "25df5f6a-b7c0-4f2d-8c26-a8ac29fc882d"
     }
   ],
   "nbformat": 4,

diff --git a/explanations/Resources.ipynb b/explanations/Resources.ipynb
@@ -28,7 +28,7 @@
         "    [website](https://www.springscience.com/jump-cp) for data\n",
         "    exploration (account needed)."
       ],
-      "id": "a6b0bdcd-bd2f-4aac-b20b-e17ce1756ae9"
+      "id": "645a9c0f-e9d7-485c-ac99-4fb2d40c679f"
     }
   ],
   "nbformat": 4,

diff --git a/howto/1_tutorial_basic.html b/howto/1_tutorial_basic.html
@@ -258,7 +258,7 @@ <h1 class="title">Access JUMP profiles</h1>
 
 
 <p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
-<div id="5b6e9960" class="cell" title="Imports" data-execution_count="1">
+<div id="469e074e" class="cell" title="Imports" data-execution_count="1">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -271,14 +271,14 @@ <h1 class="title">Access JUMP profiles</h1>
 <li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
 </ol>
 <p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
-<div id="af4b78fe" class="cell" title="Paths" data-execution_count="2">
+<div id="088064e1" class="cell" title="Paths" data-execution_count="2">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </details>
 </div>
 <p>We use a version-controlled csv to release the latest corrected profiles</p>
-<div id="74943e98" class="cell" data-execution_count="3">
+<div id="3a1f14df" class="cell" data-execution_count="3">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
@@ -340,7 +340,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
-<div id="8354db6f" class="cell" data-execution_count="4">
+<div id="7d56aaee" class="cell" data-execution_count="4">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
@@ -354,7 +354,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>We will lazy-load the dataframes and print the number of rows and columns</p>
-<div id="26b23e92" class="cell" data-execution_count="5">
+<div id="ac704319" class="cell" data-execution_count="5">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
@@ -427,7 +427,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
-<div id="ddbf4de0" class="cell" data-execution_count="6">
+<div id="4d5c424c" class="cell" data-execution_count="6">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
@@ -496,7 +496,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>The following line excludes the metadata columns:</p>
-<div id="84c89566" class="cell" data-execution_count="7">
+<div id="74fa3a50" class="cell" data-execution_count="7">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
@@ -1062,7 +1062,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
-<div id="8e7145b8" class="cell" data-execution_count="8">
+<div id="f1aec322" class="cell" data-execution_count="8">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>

diff --git a/howto/1_tutorial_basic.ipynb b/howto/1_tutorial_basic.ipynb
@@ -12,7 +12,7 @@
         "and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
         "be handled in memory."
       ],
-      "id": "ff68103d-6af8-4fd5-b0ee-70fcabe329df"
+      "id": "62f58375-d451-47a9-8644-7dd976c66a2a"
     },
     {
       "cell_type": "code",
@@ -24,7 +24,7 @@
       "source": [
         "import polars as pl"
       ],
-      "id": "e13b4ef9"
+      "id": "f0e42d05"
     },
     {
       "cell_type": "markdown",
@@ -40,7 +40,7 @@
         "produce the datasets. The aws paths of the dataframes are built from a\n",
         "prefix below:"
       ],
-      "id": "61085dc7-9269-46e8-941a-097269fbf12b"
+      "id": "aa8d4f4e-4be9-4520-b84a-f74b779b7b41"
     },
     {
       "cell_type": "code",
@@ -52,15 +52,15 @@
       "source": [
         "INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
       ],
-      "id": "7786fc79"
+      "id": "ee8aa7a8"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We use a version-controlled csv to release the latest corrected profiles"
       ],
-      "id": "a9567c2a-a5ac-4d42-885d-c6026e063681"
+      "id": "3ab1925f-e25d-4223-a550-0c74819389e0"
     },
     {
       "cell_type": "code",
@@ -81,7 +81,7 @@
         "profile_index = pl.read_csv(INDEX_FILE)\n",
         "profile_index.head()"
       ],
-      "id": "5cd94b24"
+      "id": "432bedaa"
     },
     {
       "cell_type": "markdown",
@@ -90,7 +90,7 @@
         "We do not need the ‘etag’ (used to check file integrity) column nor the\n",
         "‘interpretable’ (i.e., before major modifications)"
       ],
-      "id": "75f132b9-eb38-46df-b1d6-7e8780c2cd4b"
+      "id": "b659ada1-c038-45bd-a546-0119daf651ab"
     },
     {
       "cell_type": "code",
@@ -112,7 +112,7 @@
         "filepaths = dict(selected_profiles.iter_rows())\n",
         "print(filepaths)"
       ],
-      "id": "a1718533"
+      "id": "89971273"
     },
     {
       "cell_type": "markdown",
@@ -121,7 +121,7 @@
         "We will lazy-load the dataframes and print the number of rows and\n",
         "columns"
       ],
-      "id": "822821df-e6a3-4961-94fd-097428cc5b55"
+      "id": "e71f5eb8-64ae-41ac-b359-28aa375b09c7"
     },
     {
       "cell_type": "code",
@@ -153,7 +153,7 @@
         "\n",
         "pl.DataFrame(info)"
       ],
-      "id": "fa248564"
+      "id": "cbee9d75"
     },
     {
       "cell_type": "markdown",
@@ -163,7 +163,7 @@
         "metadata columns. We will then sample rows and display the overview.\n",
         "Note that the collect() method enforces loading some data into memory."
       ],
-      "id": "f35ddf39-3278-471b-8418-76d0d849fa92"
+      "id": "6c9cfc15-15a1-4760-8550-180be8b450df"
     },
     {
       "cell_type": "code",
@@ -184,15 +184,15 @@
         "data = pl.scan_parquet(filepaths[\"crispr\"])\n",
         "data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
       ],
-      "id": "934f99b4"
+      "id": "61e696a0"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "The following line excludes the metadata columns:"
       ],
-      "id": "02767b41-60ce-4dea-adf5-5b361bcd2695"
+      "id": "40dab565-812d-4e25-aa2a-68f2fb5c72a1"
     },
     {
       "cell_type": "code",
@@ -213,7 +213,7 @@
         "data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
         "data_only"
       ],
-      "id": "406a5e81"
+      "id": "0737a38c"
     },
     {
       "cell_type": "markdown",
@@ -223,7 +223,7 @@
         "with that tool. Keep in mind that this loads the entire dataframe into\n",
         "memory."
       ],
-      "id": "39e802de-0df6-4a51-bee3-7f37d19fb7e2"
+      "id": "e9cae16e-17b5-46e1-8d0c-5d72d96c0151"
     },
     {
       "cell_type": "code",
@@ -245,7 +245,7 @@
       "source": [
         "data_only.to_pandas()"
       ],
-      "id": "27af5b51"
+      "id": "56cdf676"
     }
   ],
   "nbformat": 4,

diff --git a/howto/2_add_metadata.html b/howto/2_add_metadata.html
@@ -258,15 +258,15 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 
 
 <p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
-<div id="7d0a683e" class="cell" title="Imports" data-execution_count="1">
+<div id="10c0cc39" class="cell" title="Imports" data-execution_count="1">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> broad_babel.query <span class="im">import</span> get_mapper</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </details>
 </div>
 <p>We will be using the CRISPR dataset specificed in our index csv.</p>
-<div id="88d1f294" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
+<div id="d47c9617" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
@@ -279,7 +279,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
-<div id="351b215b" class="cell" title="Subset data" data-execution_count="3">
+<div id="5aeaec3f" class="cell" title="Subset data" data-execution_count="3">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> (</span>
@@ -305,7 +305,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
-<div id="e622e57b" class="cell" title="Pull mapper" data-execution_count="4">
+<div id="bdc1737c" class="cell" title="Pull mapper" data-execution_count="4">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(</span>
@@ -329,7 +329,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 <p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
 <p>We will now repeat the process to get their ‘standard’ name</p>
-<div id="02aca518" class="cell" title="Fetch standard name" data-execution_count="5">
+<div id="a02993ff" class="cell" title="Fetch standard name" data-execution_count="5">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper(</span>
@@ -354,7 +354,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
-<div id="994cf4d0" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
+<div id="d18b5efb" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(</span>

diff --git a/howto/2_add_metadata.ipynb b/howto/2_add_metadata.ipynb
@@ -10,7 +10,7 @@
         "which ones are treatments and which ones are controls. Here we will\n",
         "explore how we can use broad-babel to accomplish this task."
       ],
-      "id": "7d85965b-da47-443e-ad8c-9a7423336be3"
+      "id": "8fd5e4b1-a894-478d-9b51-eba246ae2198"
     },
     {
       "cell_type": "code",
@@ -23,15 +23,15 @@
         "import polars as pl\n",
         "from broad_babel.query import get_mapper"
       ],
-      "id": "930c918a"
+      "id": "640371fa"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We will be using the CRISPR dataset specificed in our index csv."
       ],
-      "id": "7fa5c424-c829-4e78-9951-de529b824888"
+      "id": "893faa8a-12cc-45a5-91a2-301ca664eef8"
     },
     {
       "cell_type": "code",
@@ -54,7 +54,7 @@
         "profiles = pl.scan_parquet(CRISPR_URL)\n",
         "print(profiles.collect_schema().names()[:6])"
       ],
-      "id": "dd1d7dab"
+      "id": "920e3741"
     },
     {
       "cell_type": "markdown",
@@ -65,7 +65,7 @@
         "for that perturbation. We will use broad-babel to further expand on this\n",
         "metadata, but for simplicity’s sake let us sample subset of data."
       ],
-      "id": "fc31e0dd-5253-4e2e-a640-2f8ba708bbdb"
+      "id": "1aac7d19-3035-4d6d-9154-548947944d9c"
     },
     {
       "cell_type": "code",
@@ -103,7 +103,7 @@
         "subsample = (*subsample, \"JCP2022_800002\")\n",
         "subsample"
       ],
-      "id": "d319fb90"
+      "id": "9d77d321"
     },
     {
       "cell_type": "markdown",
@@ -112,7 +112,7 @@
         "We will use these JUMP ids to obtain a mapper that indicates the\n",
         "perturbation type (trt, negcon or, rarely, poscon)"
       ],
-      "id": "69861367-1ed6-4618-ade6-d94c9de567b5"
+      "id": "161c6518-3c57-4f84-94f7-cca06e67428e"
     },
     {
       "cell_type": "code",
@@ -147,7 +147,7 @@
         ")\n",
         "pert_mapper"
       ],
-      "id": "95334ba4"
+      "id": "302d5fcd"
     },
     {
       "cell_type": "markdown",
@@ -164,7 +164,7 @@
         "\n",
         "We will now repeat the process to get their ‘standard’ name"
       ],
-      "id": "b66a09a1-951b-4ae8-96ad-11c9da8a0b16"
+      "id": "a32b6a45-651a-43a3-8f17-bf2d948eaf1d"
     },
     {
       "cell_type": "code",
@@ -201,7 +201,7 @@
         ")\n",
         "name_mapper"
       ],
-      "id": "44f3ca0d"
+      "id": "47042284"
     },
     {
       "cell_type": "markdown",
@@ -212,7 +212,7 @@
         "select a few features to showcase how how selection can be performed in\n",
         "polars."
       ],
-      "id": "d997ca1e-eab6-488f-b4ad-958006876a70"
+      "id": "7b18a5c0-b8bc-4fca-9aa0-74e124382685"
     },
     {
       "cell_type": "code",
@@ -243,7 +243,7 @@
         "    pl.col((\"name\", \"pert_type\", \"^Metadata.*$\", \"^X_[0-3]$\"))\n",
         ").sort(by=\"pert_type\")"
       ],
-      "id": "2378e7bf"
+      "id": "02ff637d"
     }
   ],
   "nbformat": 4,