Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto_GHA_Runner committed Sep 11, 2024
1 parent 581ad12 commit 92876bd
Show file tree
Hide file tree
Showing 24 changed files with 483 additions and 414 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
78d52888
62823df9
2 changes: 1 addition & 1 deletion explanations/FAQ.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@
" of these replicates’ value was in turn the mean of all the sites\n",
" and cells in a given well."
],
"id": "17ad2136-1321-43fb-b48a-fdcbc379f541"
"id": "3bc637b6-fba2-47bc-8dc5-99b0b989cdb6"
}
],
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion explanations/Resources.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
" [website](https://www.springscience.com/jump-cp) for data\n",
" exploration (account needed)."
],
"id": "d986c00a-eb84-4ab5-a0dd-ad581a7b6da8"
"id": "9425c958-294b-4f8e-802d-cc560ff596f6"
}
],
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion explanations/glossary.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"for compound probes). q-value: Expected False Discovery Rate (FDR): the\n",
"proportion of false positives among all positive results."
],
"id": "44baf1c0-9d2c-4cc0-a21e-abbc1c056356"
"id": "90f98b40-68c7-4a7a-a020-d9bc61df80ef"
}
],
"nbformat": 4,
Expand Down
16 changes: 8 additions & 8 deletions howto/1_retrieve_profiles.html
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>


<p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
<div id="81474bff" class="cell" title="Imports" data-execution_count="1">
<div id="2618eeb5" class="cell" title="Imports" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand All @@ -271,14 +271,14 @@ <h1 class="title">Retrieve JUMP profiles</h1>
<li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
</ol>
<p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
<div id="a23d7f3b" class="cell" title="Paths" data-execution_count="2">
<div id="52c689bb" class="cell" title="Paths" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<p>We use a version-controlled csv to release the latest corrected profiles</p>
<div id="4316f786" class="cell" data-execution_count="3">
<div id="a5e1cf36" class="cell" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
Expand Down Expand Up @@ -340,7 +340,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
</div>
</div>
<p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
<div id="6b4dfca4" class="cell" data-execution_count="4">
<div id="c1ad6c21" class="cell" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
Expand All @@ -354,7 +354,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
</div>
</div>
<p>We will lazy-load the dataframes and print the number of rows and columns</p>
<div id="d34fb368" class="cell" data-execution_count="5">
<div id="4dc1dafd" class="cell" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
Expand Down Expand Up @@ -427,7 +427,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
</div>
</div>
<p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
<div id="c811cc8d" class="cell" data-execution_count="6">
<div id="3a00f154" class="cell" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
Expand Down Expand Up @@ -496,7 +496,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
</div>
</div>
<p>The following line excludes the metadata columns:</p>
<div id="15093fdf" class="cell" data-execution_count="7">
<div id="ff05c332" class="cell" data-execution_count="7">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
Expand Down Expand Up @@ -1062,7 +1062,7 @@ <h1 class="title">Retrieve JUMP profiles</h1>
</div>
</div>
<p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
<div id="20df67a8" class="cell" data-execution_count="8">
<div id="74296695" class="cell" data-execution_count="8">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down
32 changes: 16 additions & 16 deletions howto/1_retrieve_profiles.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
"be handled in memory."
],
"id": "00451035-7dfb-4a3d-97a2-acf91788a5b7"
"id": "47bec2df-a884-445c-b717-b76787be64f7"
},
{
"cell_type": "code",
Expand All @@ -24,7 +24,7 @@
"source": [
"import polars as pl"
],
"id": "1ab03215"
"id": "2de578e8"
},
{
"cell_type": "markdown",
Expand All @@ -40,7 +40,7 @@
"produce the datasets. The aws paths of the dataframes are built from a\n",
"prefix below:"
],
"id": "6cdfb466-2e7b-4fb1-be8a-17612cbff44d"
"id": "a2b1ab4e-28a3-49be-bb90-36ac754e7ae0"
},
{
"cell_type": "code",
Expand All @@ -52,15 +52,15 @@
"source": [
"INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
],
"id": "6d4f44e7"
"id": "31129b7b"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use a version-controlled csv to release the latest corrected profiles"
],
"id": "c1581cb4-93ff-4c54-82df-ca3c5add732c"
"id": "62b0d62b-4fa0-4a6f-a0ac-3ff51669034e"
},
{
"cell_type": "code",
Expand All @@ -81,7 +81,7 @@
"profile_index = pl.read_csv(INDEX_FILE)\n",
"profile_index.head()"
],
"id": "64533136"
"id": "52fa9980"
},
{
"cell_type": "markdown",
Expand All @@ -90,7 +90,7 @@
"We do not need the ‘etag’ (used to check file integrity) column nor the\n",
"‘interpretable’ (i.e., before major modifications)"
],
"id": "d2910b99-e47e-4ad3-b234-b2b1a9d5f048"
"id": "22349e94-37c7-4811-91de-8f73e77ab612"
},
{
"cell_type": "code",
Expand All @@ -112,7 +112,7 @@
"filepaths = dict(selected_profiles.iter_rows())\n",
"print(filepaths)"
],
"id": "480cf3a2"
"id": "12faa4e1"
},
{
"cell_type": "markdown",
Expand All @@ -121,7 +121,7 @@
"We will lazy-load the dataframes and print the number of rows and\n",
"columns"
],
"id": "8d1eb3da-fb2e-458b-9d4a-bf76a66886e8"
"id": "90fda18a-6fab-46f8-af7c-860bce8dfe71"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -153,7 +153,7 @@
"\n",
"pl.DataFrame(info)"
],
"id": "6544f26f"
"id": "ebde11f4"
},
{
"cell_type": "markdown",
Expand All @@ -163,7 +163,7 @@
"metadata columns. We will then sample rows and display the overview.\n",
"Note that the collect() method enforces loading some data into memory."
],
"id": "ee78d621-784c-47ad-b2e4-223eda176ac1"
"id": "4580b7df-4169-4e10-a5bc-a19d118591eb"
},
{
"cell_type": "code",
Expand All @@ -184,15 +184,15 @@
"data = pl.scan_parquet(filepaths[\"crispr\"])\n",
"data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
],
"id": "a7fce019"
"id": "83923b7c"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following line excludes the metadata columns:"
],
"id": "5c329801-709a-4905-b054-1eb58d179391"
"id": "9e21bfeb-0e23-430f-821e-e8c88280ebbc"
},
{
"cell_type": "code",
Expand All @@ -213,7 +213,7 @@
"data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
"data_only"
],
"id": "4c7da927"
"id": "b0717d22"
},
{
"cell_type": "markdown",
Expand All @@ -223,7 +223,7 @@
"with that tool. Keep in mind that this loads the entire dataframe into\n",
"memory."
],
"id": "da1b8576-76fd-4de7-ad3f-0c693bd63d27"
"id": "bef13299-3a4d-45c8-ab71-10baf090e549"
},
{
"cell_type": "code",
Expand All @@ -245,7 +245,7 @@
"source": [
"data_only.to_pandas()"
],
"id": "a134dad9"
"id": "c39531ca"
}
],
"nbformat": 4,
Expand Down
12 changes: 6 additions & 6 deletions howto/2_add_metadata.html
Original file line number Diff line number Diff line change
Expand Up @@ -258,15 +258,15 @@ <h1 class="title">Incorporate metadata into profiles</h1>


<p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
<div id="c791aeca" class="cell" title="Imports" data-execution_count="1">
<div id="f4abeb71" class="cell" title="Imports" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> broad_babel.query <span class="im">import</span> get_mapper</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<p>We will be using the CRISPR dataset specificed in our index csv.</p>
<div id="2c5790d6" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
<div id="e9aa2a66" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
Expand All @@ -279,7 +279,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
<div id="5d961eb6" class="cell" title="Subset data" data-execution_count="3">
<div id="59d9ab93" class="cell" title="Subset data" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> (</span>
Expand All @@ -305,7 +305,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
<div id="61873cf3" class="cell" title="Pull mapper" data-execution_count="4">
<div id="c90a024c" class="cell" title="Pull mapper" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(</span>
Expand All @@ -329,7 +329,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
<p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
<p>We will now repeat the process to get their ‘standard’ name</p>
<div id="36640526" class="cell" title="Fetch standard name" data-execution_count="5">
<div id="4e4abdd9" class="cell" title="Fetch standard name" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper(</span>
Expand All @@ -354,7 +354,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
<div id="01f8872a" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
<div id="ae8ec1e8" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(</span>
Expand Down
Loading

0 comments on commit 92876bd

Please sign in to comment.