Built site for gh-pages

broadinstitute · Sep 12, 2024 · 1438c02 · 1438c02
1 parent 9d47d63
commit 1438c02
Show file tree

Hide file tree

Showing 24 changed files with 373 additions and 373 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-681c0219
+0bb361a4
diff --git a/explanations/FAQ.ipynb b/explanations/FAQ.ipynb
@@ -152,7 +152,7 @@
         "        of these replicates’ value was in turn the mean of all the sites\n",
         "        and cells in a given well."
       ],
-      "id": "8d5696cb-d7b6-4d19-9daf-316acf4fb80e"
+      "id": "76f1d9ce-f912-41b2-bcba-c93d55b1541d"
     }
   ],
   "nbformat": 4,

diff --git a/explanations/Resources.ipynb b/explanations/Resources.ipynb
@@ -28,7 +28,7 @@
         "    [website](https://www.springscience.com/jump-cp) for data\n",
         "    exploration (account needed)."
       ],
-      "id": "929910cb-1283-4ad0-b8b1-01ca932ba8c2"
+      "id": "9638699e-e48e-4448-a927-06063f27081c"
     }
   ],
   "nbformat": 4,

diff --git a/explanations/glossary.ipynb b/explanations/glossary.ipynb
@@ -63,7 +63,7 @@
         "for compound probes). q-value: Expected False Discovery Rate (FDR): the\n",
         "proportion of false positives among all positive results."
       ],
-      "id": "8a5d959e-1f0a-43a9-ad87-b95ef603e94e"
+      "id": "c06bf953-fe68-450b-a7a6-c8fd8e4c9a2b"
     }
   ],
   "nbformat": 4,

diff --git a/howto/1_retrieve_profiles.html b/howto/1_retrieve_profiles.html
@@ -260,7 +260,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 
 
 <p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
-<div id="62ea9adc" class="cell" title="Imports" data-execution_count="1">
+<div id="dcc0eab6" class="cell" title="Imports" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>The shapes of the available datasets are:</p>
@@ -270,11 +270,11 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 <li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
 </ol>
 <p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
-<div id="d992d9d3" class="cell" title="Paths" data-execution_count="2">
+<div id="18eed3b7" class="cell" title="Paths" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We use a version-controlled csv to release the latest corrected profiles</p>
-<div id="94abcbc3" class="cell" data-execution_count="3">
+<div id="c6e5ae1a" class="cell" data-execution_count="3">
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>profile_index.head()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="3">
@@ -333,7 +333,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
-<div id="f1ddc271" class="cell" data-execution_count="4">
+<div id="1af7c2fb" class="cell" data-execution_count="4">
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    pl.col(<span class="st">"subset"</span>).is_in((<span class="st">"crispr"</span>, <span class="st">"orf"</span>, <span class="st">"compound"</span>))</span>
 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>).select(pl.exclude(<span class="st">"etag"</span>))</span>
@@ -344,7 +344,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>We will lazy-load the dataframes and print the number of rows and columns</p>
-<div id="54e44892" class="cell" data-execution_count="5">
+<div id="5d88c9f0" class="cell" data-execution_count="5">
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> name, path <span class="kw">in</span> filepaths.items():</span>
 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    data <span class="op">=</span> pl.scan_parquet(path)</span>
@@ -414,7 +414,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
-<div id="eaf2605d" class="cell" data-execution_count="6">
+<div id="8c8bbef7" class="cell" data-execution_count="6">
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>data.select(pl.col(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="6">
@@ -480,7 +480,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>The following line excludes the metadata columns:</p>
-<div id="772cde92" class="cell" data-execution_count="7">
+<div id="795fc4f1" class="cell" data-execution_count="7">
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>data_only</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="7">
@@ -1043,7 +1043,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
 </div>
 </div>
 <p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
-<div id="5123b93b" class="cell" data-execution_count="8">
+<div id="da384f48" class="cell" data-execution_count="8">
 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="8">
 <div>

diff --git a/howto/1_retrieve_profiles.ipynb b/howto/1_retrieve_profiles.ipynb
@@ -12,7 +12,7 @@
         "and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
         "be handled in memory."
       ],
-      "id": "e97fd426-67e8-4ede-a761-020bf876ed25"
+      "id": "c207e014-f2db-4136-8569-5fac72a40e20"
     },
     {
       "cell_type": "code",
@@ -24,7 +24,7 @@
       "source": [
         "import polars as pl"
       ],
-      "id": "bda87b61"
+      "id": "050fdbe0"
     },
     {
       "cell_type": "markdown",
@@ -40,7 +40,7 @@
         "produce the datasets. The aws paths of the dataframes are built from a\n",
         "prefix below:"
       ],
-      "id": "12964272-caf6-4d5a-bffd-79e266306ec4"
+      "id": "456b6441-9b59-4668-a3f6-6ca9fd0638e2"
     },
     {
       "cell_type": "code",
@@ -52,15 +52,15 @@
       "source": [
         "INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
       ],
-      "id": "40c00f57"
+      "id": "dff5e9f9"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We use a version-controlled csv to release the latest corrected profiles"
       ],
-      "id": "eaedc4b3-e2ff-497e-8b06-fa8ccad23f15"
+      "id": "294aca9f-26b0-4db9-b3ae-4cd328db8c6c"
     },
     {
       "cell_type": "code",
@@ -81,7 +81,7 @@
         "profile_index = pl.read_csv(INDEX_FILE)\n",
         "profile_index.head()"
       ],
-      "id": "da1585b1"
+      "id": "be3d7bc9"
     },
     {
       "cell_type": "markdown",
@@ -90,7 +90,7 @@
         "We do not need the ‘etag’ (used to check file integrity) column nor the\n",
         "‘interpretable’ (i.e., before major modifications)"
       ],
-      "id": "fb519c55-606c-44a5-9de3-9646b3904a9b"
+      "id": "8748e5de-4a85-4aec-a40e-17fe629c4da9"
     },
     {
       "cell_type": "code",
@@ -112,7 +112,7 @@
         "filepaths = dict(selected_profiles.iter_rows())\n",
         "print(filepaths)"
       ],
-      "id": "fa128411"
+      "id": "f0d2b220"
     },
     {
       "cell_type": "markdown",
@@ -121,7 +121,7 @@
         "We will lazy-load the dataframes and print the number of rows and\n",
         "columns"
       ],
-      "id": "0f9f9d9b-5fad-4c4d-848c-917f996870b2"
+      "id": "8211888a-f29b-4140-8ada-2eb7fd26b9e2"
     },
     {
       "cell_type": "code",
@@ -153,7 +153,7 @@
         "\n",
         "pl.DataFrame(info)"
       ],
-      "id": "be9cab4b"
+      "id": "110f0995"
     },
     {
       "cell_type": "markdown",
@@ -163,7 +163,7 @@
         "metadata columns. We will then sample rows and display the overview.\n",
         "Note that the collect() method enforces loading some data into memory."
       ],
-      "id": "ce4f1a2f-13ee-4154-aca2-d1337ae63131"
+      "id": "1ba2c4e6-32d7-490c-8ae4-ea2ed71af70a"
     },
     {
       "cell_type": "code",
@@ -184,15 +184,15 @@
         "data = pl.scan_parquet(filepaths[\"crispr\"])\n",
         "data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
       ],
-      "id": "c6ee77bb"
+      "id": "6c420d21"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "The following line excludes the metadata columns:"
       ],
-      "id": "8722a143-fe76-4580-a487-3299ad49840f"
+      "id": "1be03951-55c7-4e1c-bda9-0be92a0083d4"
     },
     {
       "cell_type": "code",
@@ -213,7 +213,7 @@
         "data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
         "data_only"
       ],
-      "id": "1c4f6b4c"
+      "id": "7ac64f6f"
     },
     {
       "cell_type": "markdown",
@@ -223,7 +223,7 @@
         "with that tool. Keep in mind that this loads the entire dataframe into\n",
         "memory."
       ],
-      "id": "0bdebe1f-9cdd-457a-a46c-f1b07966f120"
+      "id": "14c7a423-e8c6-4b18-b3f2-ee46f7350fa7"
     },
     {
       "cell_type": "code",
@@ -245,7 +245,7 @@
       "source": [
         "data_only.to_pandas()"
       ],
-      "id": "80e1a977"
+      "id": "c8a420b3"
     }
   ],
   "nbformat": 4,

diff --git a/howto/2_add_metadata.html b/howto/2_add_metadata.html
@@ -260,12 +260,12 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 
 
 <p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
-<div id="661e1bda" class="cell" title="Imports" data-execution_count="1">
+<div id="619eda04" class="cell" title="Imports" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> broad_babel.query <span class="im">import</span> get_mapper</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>We will be using the CRISPR dataset specificed in our index csv.</p>
-<div id="e82dc867" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
+<div id="b778ce48" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>CRISPR_URL <span class="op">=</span> pl.read_csv(INDEX_FILE).<span class="bu">filter</span>(pl.col(<span class="st">"subset"</span>) <span class="op">==</span> <span class="st">"crispr"</span>).item(<span class="dv">0</span>, <span class="st">"url"</span>)</span>
 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>profiles <span class="op">=</span> pl.scan_parquet(CRISPR_URL)</span>
@@ -275,7 +275,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
-<div id="6b6af033" class="cell" title="Subset data" data-execution_count="3">
+<div id="c72a1211" class="cell" title="Subset data" data-execution_count="3">
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> (</span>
 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    profiles.select(pl.col(<span class="st">"Metadata_JCP2022"</span>)).unique().collect().to_series().sort()</span>
 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>)</span>
@@ -298,7 +298,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
-<div id="b214eb16" class="cell" title="Pull mapper" data-execution_count="4">
+<div id="67377dbd" class="cell" title="Pull mapper" data-execution_count="4">
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(</span>
 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>    subsample, input_column<span class="op">=</span><span class="st">"JCP2022"</span>, output_columns<span class="op">=</span><span class="st">"JCP2022,pert_type"</span></span>
 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>)</span>
@@ -319,7 +319,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 <p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
 <p>We will now repeat the process to get their ‘standard’ name</p>
-<div id="aa62c9c5" class="cell" title="Fetch standard name" data-execution_count="5">
+<div id="4a4119ac" class="cell" title="Fetch standard name" data-execution_count="5">
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper(</span>
 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>    (<span class="op">*</span>subsample, <span class="st">"JCP2022_800002"</span>),</span>
 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>    input_column<span class="op">=</span><span class="st">"JCP2022"</span>,</span>
@@ -341,7 +341,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
-<div id="7474b214" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
+<div id="afc5221d" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(</span>
 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>    pl.col(<span class="st">"Metadata_JCP2022"</span>).is_in(subsample)</span>
 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>).collect()</span>