Built site for gh-pages

broadinstitute · Sep 11, 2024 · 92876bd · 92876bd
1 parent 581ad12
commit 92876bd
Show file tree

Hide file tree

Showing 24 changed files with 483 additions and 414 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-78d52888
+62823df9
diff --git a/explanations/FAQ.ipynb b/explanations/FAQ.ipynb
@@ -152,7 +152,7 @@
         "        of these replicates’ value was in turn the mean of all the sites\n",
         "        and cells in a given well."
       ],
-      "id": "17ad2136-1321-43fb-b48a-fdcbc379f541"
+      "id": "3bc637b6-fba2-47bc-8dc5-99b0b989cdb6"
     }
   ],
   "nbformat": 4,

diff --git a/explanations/Resources.ipynb b/explanations/Resources.ipynb
@@ -28,7 +28,7 @@
         "    [website](https://www.springscience.com/jump-cp) for data\n",
         "    exploration (account needed)."
       ],
-      "id": "d986c00a-eb84-4ab5-a0dd-ad581a7b6da8"
+      "id": "9425c958-294b-4f8e-802d-cc560ff596f6"
     }
   ],
   "nbformat": 4,

diff --git a/explanations/glossary.ipynb b/explanations/glossary.ipynb
@@ -63,7 +63,7 @@
         "for compound probes). q-value: Expected False Discovery Rate (FDR): the\n",
         "proportion of false positives among all positive results."
       ],
-      "id": "44baf1c0-9d2c-4cc0-a21e-abbc1c056356"
+      "id": "90f98b40-68c7-4a7a-a020-d9bc61df80ef"
     }
   ],
   "nbformat": 4,

diff --git a/howto/1_retrieve_profiles.html b/howto/1_retrieve_profiles.html
@@ -258,7 +258,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 
 
 <p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
-<div id="81474bff" class="cell" title="Imports" data-execution_count="1">
+<div id="2618eeb5" class="cell" title="Imports" data-execution_count="1">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -271,14 +271,14 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 <li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
 </ol>
 <p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
-<div id="a23d7f3b" class="cell" title="Paths" data-execution_count="2">
+<div id="52c689bb" class="cell" title="Paths" data-execution_count="2">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </details>
 </div>
 <p>We use a version-controlled csv to release the latest corrected profiles</p>
-<div id="4316f786" class="cell" data-execution_count="3">
+<div id="a5e1cf36" class="cell" data-execution_count="3">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
@@ -340,7 +340,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
-<div id="6b4dfca4" class="cell" data-execution_count="4">
+<div id="c1ad6c21" class="cell" data-execution_count="4">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
@@ -354,7 +354,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>We will lazy-load the dataframes and print the number of rows and columns</p>
-<div id="d34fb368" class="cell" data-execution_count="5">
+<div id="4dc1dafd" class="cell" data-execution_count="5">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
@@ -427,7 +427,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
-<div id="c811cc8d" class="cell" data-execution_count="6">
+<div id="3a00f154" class="cell" data-execution_count="6">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
@@ -496,7 +496,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>The following line excludes the metadata columns:</p>
-<div id="15093fdf" class="cell" data-execution_count="7">
+<div id="ff05c332" class="cell" data-execution_count="7">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
@@ -1062,7 +1062,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
-<div id="20df67a8" class="cell" data-execution_count="8">
+<div id="74296695" class="cell" data-execution_count="8">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>

diff --git a/howto/1_retrieve_profiles.ipynb b/howto/1_retrieve_profiles.ipynb
@@ -12,7 +12,7 @@
         "and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
         "be handled in memory."
       ],
-      "id": "00451035-7dfb-4a3d-97a2-acf91788a5b7"
+      "id": "47bec2df-a884-445c-b717-b76787be64f7"
     },
     {
       "cell_type": "code",
@@ -24,7 +24,7 @@
       "source": [
         "import polars as pl"
       ],
-      "id": "1ab03215"
+      "id": "2de578e8"
     },
     {
       "cell_type": "markdown",
@@ -40,7 +40,7 @@
         "produce the datasets. The aws paths of the dataframes are built from a\n",
         "prefix below:"
       ],
-      "id": "6cdfb466-2e7b-4fb1-be8a-17612cbff44d"
+      "id": "a2b1ab4e-28a3-49be-bb90-36ac754e7ae0"
     },
     {
       "cell_type": "code",
@@ -52,15 +52,15 @@
       "source": [
         "INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
       ],
-      "id": "6d4f44e7"
+      "id": "31129b7b"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We use a version-controlled csv to release the latest corrected profiles"
       ],
-      "id": "c1581cb4-93ff-4c54-82df-ca3c5add732c"
+      "id": "62b0d62b-4fa0-4a6f-a0ac-3ff51669034e"
     },
     {
       "cell_type": "code",
@@ -81,7 +81,7 @@
         "profile_index = pl.read_csv(INDEX_FILE)\n",
         "profile_index.head()"
       ],
-      "id": "64533136"
+      "id": "52fa9980"
     },
     {
       "cell_type": "markdown",
@@ -90,7 +90,7 @@
         "We do not need the ‘etag’ (used to check file integrity) column nor the\n",
         "‘interpretable’ (i.e., before major modifications)"
       ],
-      "id": "d2910b99-e47e-4ad3-b234-b2b1a9d5f048"
+      "id": "22349e94-37c7-4811-91de-8f73e77ab612"
     },
     {
       "cell_type": "code",
@@ -112,7 +112,7 @@
         "filepaths = dict(selected_profiles.iter_rows())\n",
         "print(filepaths)"
       ],
-      "id": "480cf3a2"
+      "id": "12faa4e1"
     },
     {
       "cell_type": "markdown",
@@ -121,7 +121,7 @@
         "We will lazy-load the dataframes and print the number of rows and\n",
         "columns"
       ],
-      "id": "8d1eb3da-fb2e-458b-9d4a-bf76a66886e8"
+      "id": "90fda18a-6fab-46f8-af7c-860bce8dfe71"
     },
     {
       "cell_type": "code",
@@ -153,7 +153,7 @@
         "\n",
         "pl.DataFrame(info)"
       ],
-      "id": "6544f26f"
+      "id": "ebde11f4"
     },
     {
       "cell_type": "markdown",
@@ -163,7 +163,7 @@
         "metadata columns. We will then sample rows and display the overview.\n",
         "Note that the collect() method enforces loading some data into memory."
       ],
-      "id": "ee78d621-784c-47ad-b2e4-223eda176ac1"
+      "id": "4580b7df-4169-4e10-a5bc-a19d118591eb"
     },
     {
       "cell_type": "code",
@@ -184,15 +184,15 @@
         "data = pl.scan_parquet(filepaths[\"crispr\"])\n",
         "data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
       ],
-      "id": "a7fce019"
+      "id": "83923b7c"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "The following line excludes the metadata columns:"
       ],
-      "id": "5c329801-709a-4905-b054-1eb58d179391"
+      "id": "9e21bfeb-0e23-430f-821e-e8c88280ebbc"
     },
     {
       "cell_type": "code",
@@ -213,7 +213,7 @@
         "data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
         "data_only"
       ],
-      "id": "4c7da927"
+      "id": "b0717d22"
     },
     {
       "cell_type": "markdown",
@@ -223,7 +223,7 @@
         "with that tool. Keep in mind that this loads the entire dataframe into\n",
         "memory."
       ],
-      "id": "da1b8576-76fd-4de7-ad3f-0c693bd63d27"
+      "id": "bef13299-3a4d-45c8-ab71-10baf090e549"
     },
     {
       "cell_type": "code",
@@ -245,7 +245,7 @@
       "source": [
         "data_only.to_pandas()"
       ],
-      "id": "a134dad9"
+      "id": "c39531ca"
     }
   ],
   "nbformat": 4,

diff --git a/howto/2_add_metadata.html b/howto/2_add_metadata.html
@@ -258,15 +258,15 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 
 
 <p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
-<div id="c791aeca" class="cell" title="Imports" data-execution_count="1">
+<div id="f4abeb71" class="cell" title="Imports" data-execution_count="1">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> broad_babel.query <span class="im">import</span> get_mapper</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </details>
 </div>
 <p>We will be using the CRISPR dataset specificed in our index csv.</p>
-<div id="2c5790d6" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
+<div id="e9aa2a66" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
@@ -279,7 +279,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
-<div id="5d961eb6" class="cell" title="Subset data" data-execution_count="3">
+<div id="59d9ab93" class="cell" title="Subset data" data-execution_count="3">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> (</span>
@@ -305,7 +305,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
-<div id="61873cf3" class="cell" title="Pull mapper" data-execution_count="4">
+<div id="c90a024c" class="cell" title="Pull mapper" data-execution_count="4">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(</span>
@@ -329,7 +329,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 <p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
 <p>We will now repeat the process to get their ‘standard’ name</p>
-<div id="36640526" class="cell" title="Fetch standard name" data-execution_count="5">
+<div id="4e4abdd9" class="cell" title="Fetch standard name" data-execution_count="5">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper(</span>
@@ -354,7 +354,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
-<div id="01f8872a" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
+<div id="ae8ec1e8" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(</span>