Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto_GHA_Runner committed Sep 10, 2024
1 parent 026ae51 commit c6f9c93
Show file tree
Hide file tree
Showing 23 changed files with 378 additions and 362 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
b4391914
ec0b8683
2 changes: 1 addition & 1 deletion explanations/FAQ.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@
" of these replicates’ value was in turn the mean of all the sites\n",
" and cells in a given well."
],
"id": "6e5ec436-fb8d-4925-9e70-1f45bc9268ef"
"id": "25df5f6a-b7c0-4f2d-8c26-a8ac29fc882d"
}
],
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion explanations/Resources.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
" [website](https://www.springscience.com/jump-cp) for data\n",
" exploration (account needed)."
],
"id": "a6b0bdcd-bd2f-4aac-b20b-e17ce1756ae9"
"id": "645a9c0f-e9d7-485c-ac99-4fb2d40c679f"
}
],
"nbformat": 4,
Expand Down
16 changes: 8 additions & 8 deletions howto/1_tutorial_basic.html
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ <h1 class="title">Access JUMP profiles</h1>


<p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
<div id="5b6e9960" class="cell" title="Imports" data-execution_count="1">
<div id="469e074e" class="cell" title="Imports" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand All @@ -271,14 +271,14 @@ <h1 class="title">Access JUMP profiles</h1>
<li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
</ol>
<p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
<div id="af4b78fe" class="cell" title="Paths" data-execution_count="2">
<div id="088064e1" class="cell" title="Paths" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<p>We use a version-controlled csv to release the latest corrected profiles</p>
<div id="74943e98" class="cell" data-execution_count="3">
<div id="3a1f14df" class="cell" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
Expand Down Expand Up @@ -340,7 +340,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
<div id="8354db6f" class="cell" data-execution_count="4">
<div id="7d56aaee" class="cell" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
Expand All @@ -354,7 +354,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>We will lazy-load the dataframes and print the number of rows and columns</p>
<div id="26b23e92" class="cell" data-execution_count="5">
<div id="ac704319" class="cell" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
Expand Down Expand Up @@ -427,7 +427,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
<div id="ddbf4de0" class="cell" data-execution_count="6">
<div id="4d5c424c" class="cell" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
Expand Down Expand Up @@ -496,7 +496,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>The following line excludes the metadata columns:</p>
<div id="84c89566" class="cell" data-execution_count="7">
<div id="74fa3a50" class="cell" data-execution_count="7">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
Expand Down Expand Up @@ -1062,7 +1062,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
<div id="8e7145b8" class="cell" data-execution_count="8">
<div id="f1aec322" class="cell" data-execution_count="8">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down
32 changes: 16 additions & 16 deletions howto/1_tutorial_basic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
"be handled in memory."
],
"id": "ff68103d-6af8-4fd5-b0ee-70fcabe329df"
"id": "62f58375-d451-47a9-8644-7dd976c66a2a"
},
{
"cell_type": "code",
Expand All @@ -24,7 +24,7 @@
"source": [
"import polars as pl"
],
"id": "e13b4ef9"
"id": "f0e42d05"
},
{
"cell_type": "markdown",
Expand All @@ -40,7 +40,7 @@
"produce the datasets. The aws paths of the dataframes are built from a\n",
"prefix below:"
],
"id": "61085dc7-9269-46e8-941a-097269fbf12b"
"id": "aa8d4f4e-4be9-4520-b84a-f74b779b7b41"
},
{
"cell_type": "code",
Expand All @@ -52,15 +52,15 @@
"source": [
"INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
],
"id": "7786fc79"
"id": "ee8aa7a8"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use a version-controlled csv to release the latest corrected profiles"
],
"id": "a9567c2a-a5ac-4d42-885d-c6026e063681"
"id": "3ab1925f-e25d-4223-a550-0c74819389e0"
},
{
"cell_type": "code",
Expand All @@ -81,7 +81,7 @@
"profile_index = pl.read_csv(INDEX_FILE)\n",
"profile_index.head()"
],
"id": "5cd94b24"
"id": "432bedaa"
},
{
"cell_type": "markdown",
Expand All @@ -90,7 +90,7 @@
"We do not need the ‘etag’ (used to check file integrity) column nor the\n",
"‘interpretable’ (i.e., before major modifications)"
],
"id": "75f132b9-eb38-46df-b1d6-7e8780c2cd4b"
"id": "b659ada1-c038-45bd-a546-0119daf651ab"
},
{
"cell_type": "code",
Expand All @@ -112,7 +112,7 @@
"filepaths = dict(selected_profiles.iter_rows())\n",
"print(filepaths)"
],
"id": "a1718533"
"id": "89971273"
},
{
"cell_type": "markdown",
Expand All @@ -121,7 +121,7 @@
"We will lazy-load the dataframes and print the number of rows and\n",
"columns"
],
"id": "822821df-e6a3-4961-94fd-097428cc5b55"
"id": "e71f5eb8-64ae-41ac-b359-28aa375b09c7"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -153,7 +153,7 @@
"\n",
"pl.DataFrame(info)"
],
"id": "fa248564"
"id": "cbee9d75"
},
{
"cell_type": "markdown",
Expand All @@ -163,7 +163,7 @@
"metadata columns. We will then sample rows and display the overview.\n",
"Note that the collect() method enforces loading some data into memory."
],
"id": "f35ddf39-3278-471b-8418-76d0d849fa92"
"id": "6c9cfc15-15a1-4760-8550-180be8b450df"
},
{
"cell_type": "code",
Expand All @@ -184,15 +184,15 @@
"data = pl.scan_parquet(filepaths[\"crispr\"])\n",
"data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
],
"id": "934f99b4"
"id": "61e696a0"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following line excludes the metadata columns:"
],
"id": "02767b41-60ce-4dea-adf5-5b361bcd2695"
"id": "40dab565-812d-4e25-aa2a-68f2fb5c72a1"
},
{
"cell_type": "code",
Expand All @@ -213,7 +213,7 @@
"data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
"data_only"
],
"id": "406a5e81"
"id": "0737a38c"
},
{
"cell_type": "markdown",
Expand All @@ -223,7 +223,7 @@
"with that tool. Keep in mind that this loads the entire dataframe into\n",
"memory."
],
"id": "39e802de-0df6-4a51-bee3-7f37d19fb7e2"
"id": "e9cae16e-17b5-46e1-8d0c-5d72d96c0151"
},
{
"cell_type": "code",
Expand All @@ -245,7 +245,7 @@
"source": [
"data_only.to_pandas()"
],
"id": "27af5b51"
"id": "56cdf676"
}
],
"nbformat": 4,
Expand Down
12 changes: 6 additions & 6 deletions howto/2_add_metadata.html
Original file line number Diff line number Diff line change
Expand Up @@ -258,15 +258,15 @@ <h1 class="title">Incorporate metadata into profiles</h1>


<p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
<div id="7d0a683e" class="cell" title="Imports" data-execution_count="1">
<div id="10c0cc39" class="cell" title="Imports" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> broad_babel.query <span class="im">import</span> get_mapper</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<p>We will be using the CRISPR dataset specificed in our index csv.</p>
<div id="88d1f294" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
<div id="d47c9617" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
Expand All @@ -279,7 +279,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
<div id="351b215b" class="cell" title="Subset data" data-execution_count="3">
<div id="5aeaec3f" class="cell" title="Subset data" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> (</span>
Expand All @@ -305,7 +305,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
<div id="e622e57b" class="cell" title="Pull mapper" data-execution_count="4">
<div id="bdc1737c" class="cell" title="Pull mapper" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(</span>
Expand All @@ -329,7 +329,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
<p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
<p>We will now repeat the process to get their ‘standard’ name</p>
<div id="02aca518" class="cell" title="Fetch standard name" data-execution_count="5">
<div id="a02993ff" class="cell" title="Fetch standard name" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper(</span>
Expand All @@ -354,7 +354,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
<div id="994cf4d0" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
<div id="d18b5efb" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(</span>
Expand Down
24 changes: 12 additions & 12 deletions howto/2_add_metadata.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"which ones are treatments and which ones are controls. Here we will\n",
"explore how we can use broad-babel to accomplish this task."
],
"id": "7d85965b-da47-443e-ad8c-9a7423336be3"
"id": "8fd5e4b1-a894-478d-9b51-eba246ae2198"
},
{
"cell_type": "code",
Expand All @@ -23,15 +23,15 @@
"import polars as pl\n",
"from broad_babel.query import get_mapper"
],
"id": "930c918a"
"id": "640371fa"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be using the CRISPR dataset specificed in our index csv."
],
"id": "7fa5c424-c829-4e78-9951-de529b824888"
"id": "893faa8a-12cc-45a5-91a2-301ca664eef8"
},
{
"cell_type": "code",
Expand All @@ -54,7 +54,7 @@
"profiles = pl.scan_parquet(CRISPR_URL)\n",
"print(profiles.collect_schema().names()[:6])"
],
"id": "dd1d7dab"
"id": "920e3741"
},
{
"cell_type": "markdown",
Expand All @@ -65,7 +65,7 @@
"for that perturbation. We will use broad-babel to further expand on this\n",
"metadata, but for simplicity’s sake let us sample subset of data."
],
"id": "fc31e0dd-5253-4e2e-a640-2f8ba708bbdb"
"id": "1aac7d19-3035-4d6d-9154-548947944d9c"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -103,7 +103,7 @@
"subsample = (*subsample, \"JCP2022_800002\")\n",
"subsample"
],
"id": "d319fb90"
"id": "9d77d321"
},
{
"cell_type": "markdown",
Expand All @@ -112,7 +112,7 @@
"We will use these JUMP ids to obtain a mapper that indicates the\n",
"perturbation type (trt, negcon or, rarely, poscon)"
],
"id": "69861367-1ed6-4618-ade6-d94c9de567b5"
"id": "161c6518-3c57-4f84-94f7-cca06e67428e"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -147,7 +147,7 @@
")\n",
"pert_mapper"
],
"id": "95334ba4"
"id": "302d5fcd"
},
{
"cell_type": "markdown",
Expand All @@ -164,7 +164,7 @@
"\n",
"We will now repeat the process to get their ‘standard’ name"
],
"id": "b66a09a1-951b-4ae8-96ad-11c9da8a0b16"
"id": "a32b6a45-651a-43a3-8f17-bf2d948eaf1d"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -201,7 +201,7 @@
")\n",
"name_mapper"
],
"id": "44f3ca0d"
"id": "47042284"
},
{
"cell_type": "markdown",
Expand All @@ -212,7 +212,7 @@
"select a few features to showcase how how selection can be performed in\n",
"polars."
],
"id": "d997ca1e-eab6-488f-b4ad-958006876a70"
"id": "7b18a5c0-b8bc-4fca-9aa0-74e124382685"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -243,7 +243,7 @@
" pl.col((\"name\", \"pert_type\", \"^Metadata.*$\", \"^X_[0-3]$\"))\n",
").sort(by=\"pert_type\")"
],
"id": "2378e7bf"
"id": "02ff637d"
}
],
"nbformat": 4,
Expand Down
Loading

0 comments on commit c6f9c93

Please sign in to comment.