Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto_GHA_Runner committed Sep 6, 2024
1 parent fb6245b commit b0d4591
Show file tree
Hide file tree
Showing 18 changed files with 317 additions and 317 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3725b130
c3b5a916
2 changes: 1 addition & 1 deletion explanations/FAQ.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@
" of these replicates’ value was in turn the mean of all the sites\n",
" and cells in a given well."
],
"id": "c448e5f3-f5e5-4fdf-ab3f-d4bb158f02a2"
"id": "cd7e6da4-e105-4aac-9013-15901a018da2"
}
],
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion explanations/Resources.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
" [website](https://www.springscience.com/jump-cp) for data\n",
" exploration (account needed)."
],
"id": "d263bdd4-b80c-4c31-9f90-2d038be8c9db"
"id": "0f71de0c-bb20-4b54-aa45-4c457f1dfa9a"
}
],
"nbformat": 4,
Expand Down
16 changes: 8 additions & 8 deletions howto/1_tutorial_basic.html
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ <h1 class="title">Access JUMP profiles</h1>


<p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
<div id="19263eb5" class="cell" title="Imports" data-execution_count="1">
<div id="c11c56df" class="cell" title="Imports" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand All @@ -259,14 +259,14 @@ <h1 class="title">Access JUMP profiles</h1>
<li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
</ol>
<p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
<div id="8bba0cbc" class="cell" title="Paths" data-execution_count="2">
<div id="e654a9ed" class="cell" title="Paths" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<p>We use a version-controlled csv to release the latest corrected profiles</p>
<div id="adba8b36" class="cell" data-execution_count="3">
<div id="142720f9" class="cell" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
Expand Down Expand Up @@ -328,7 +328,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
<div id="ec77f397" class="cell" data-execution_count="4">
<div id="704783a3" class="cell" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
Expand All @@ -342,7 +342,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>We will lazy-load the dataframes and print the number of rows and columns</p>
<div id="2c898974" class="cell" data-execution_count="5">
<div id="9c6fa319" class="cell" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
Expand Down Expand Up @@ -415,7 +415,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
<div id="85fdaf08" class="cell" data-execution_count="6">
<div id="c26c5cc6" class="cell" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
Expand Down Expand Up @@ -484,7 +484,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>The following line excludes the metadata columns:</p>
<div id="dc8e7202" class="cell" data-execution_count="7">
<div id="67bad2d3" class="cell" data-execution_count="7">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
Expand Down Expand Up @@ -1050,7 +1050,7 @@ <h1 class="title">Access JUMP profiles</h1>
</div>
</div>
<p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
<div id="ff959884" class="cell" data-execution_count="8">
<div id="fe0dc495" class="cell" data-execution_count="8">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down
32 changes: 16 additions & 16 deletions howto/1_tutorial_basic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
"be handled in memory."
],
"id": "46282db1-6e9a-41ab-af9b-f044787e286d"
"id": "71086e70-b2c9-4f8d-a5be-66bd4670f0a6"
},
{
"cell_type": "code",
Expand All @@ -24,7 +24,7 @@
"source": [
"import polars as pl"
],
"id": "b4309070"
"id": "5344fc3b"
},
{
"cell_type": "markdown",
Expand All @@ -40,7 +40,7 @@
"produce the datasets. The aws paths of the dataframes are built from a\n",
"prefix below:"
],
"id": "f1590189-b856-43c3-9ee8-478b0d63d930"
"id": "c93ac8bf-8f94-40ac-80fc-b194fe705ef1"
},
{
"cell_type": "code",
Expand All @@ -52,15 +52,15 @@
"source": [
"INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
],
"id": "a1e55cf5"
"id": "d63c3b83"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use a version-controlled csv to release the latest corrected profiles"
],
"id": "91aacd23-600d-43b9-960a-8b1c9e2a5624"
"id": "cf3ff280-8d46-4ff2-937f-d576e5ed9eb4"
},
{
"cell_type": "code",
Expand All @@ -81,7 +81,7 @@
"profile_index = pl.read_csv(INDEX_FILE)\n",
"profile_index.head()"
],
"id": "c0f3a2fd"
"id": "a36a9eee"
},
{
"cell_type": "markdown",
Expand All @@ -90,7 +90,7 @@
"We do not need the ‘etag’ (used to check file integrity) column nor the\n",
"‘interpretable’ (i.e., before major modifications)"
],
"id": "4d95966c-de78-4b23-bbc8-d300b26fc864"
"id": "92e8f7b1-342e-45e5-99c7-52f03c23a7c9"
},
{
"cell_type": "code",
Expand All @@ -112,7 +112,7 @@
"filepaths = dict(selected_profiles.iter_rows())\n",
"print(filepaths)"
],
"id": "a68bfed0"
"id": "8b9fbe03"
},
{
"cell_type": "markdown",
Expand All @@ -121,7 +121,7 @@
"We will lazy-load the dataframes and print the number of rows and\n",
"columns"
],
"id": "50b3644f-2298-4794-9f4a-d4ace3c3fd0f"
"id": "967d212a-fe71-4708-ad68-6c01a3488bb1"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -153,7 +153,7 @@
"\n",
"pl.DataFrame(info)"
],
"id": "6002adba"
"id": "aeafeae0"
},
{
"cell_type": "markdown",
Expand All @@ -163,7 +163,7 @@
"metadata columns. We will then sample rows and display the overview.\n",
"Note that the collect() method enforces loading some data into memory."
],
"id": "fd073545-de8a-4f4d-8405-347c77733cac"
"id": "4e17598b-8645-4335-b1f7-2436f6192da7"
},
{
"cell_type": "code",
Expand All @@ -184,15 +184,15 @@
"data = pl.scan_parquet(filepaths[\"crispr\"])\n",
"data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
],
"id": "46930b2a"
"id": "a4b72d95"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following line excludes the metadata columns:"
],
"id": "576cceb3-5491-463a-a1b6-a9a89586d02a"
"id": "960bfb4f-efe2-47fb-b03b-51699f95e79d"
},
{
"cell_type": "code",
Expand All @@ -213,7 +213,7 @@
"data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
"data_only"
],
"id": "bad2728b"
"id": "fe95879c"
},
{
"cell_type": "markdown",
Expand All @@ -223,7 +223,7 @@
"with that tool. Keep in mind that this loads the entire dataframe into\n",
"memory."
],
"id": "2cbf0bc1-5a9f-4b26-b852-3d98d35cb717"
"id": "81e8dd65-c331-4fb2-9bbb-f63969568a06"
},
{
"cell_type": "code",
Expand All @@ -245,7 +245,7 @@
"source": [
"data_only.to_pandas()"
],
"id": "680033c9"
"id": "3f812c4e"
}
],
"nbformat": 4,
Expand Down
12 changes: 6 additions & 6 deletions howto/2_add_metadata.html
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>


<p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
<div id="c9daedc4" class="cell" title="Imports" data-execution_count="1">
<div id="88c1356a" class="cell" title="Imports" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
Expand All @@ -257,7 +257,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>We will be using the CRISPR dataset specificed in our index csv.</p>
<div id="1e366862" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
<div id="f6cca2f7" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
Expand All @@ -270,7 +270,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
<div id="390f5f1b" class="cell" title="Subset data" data-execution_count="3">
<div id="89ffa108" class="cell" title="Subset data" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> profiles.select(pl.col(<span class="st">"Metadata_JCP2022"</span>)).unique().collect().to_series().sort()</span>
Expand All @@ -294,7 +294,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
<div id="8999f352" class="cell" title="Pull mapper" data-execution_count="4">
<div id="7a16a62b" class="cell" title="Pull mapper" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(subsample, input_column<span class="op">=</span><span class="st">"JCP2022"</span>, output_columns<span class="op">=</span><span class="st">"JCP2022,pert_type"</span>)</span>
Expand All @@ -316,7 +316,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
<p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
<p>We will now repeat the process to get their ‘standard’ name</p>
<div id="7e4241c6" class="cell" title="Fetch standard name" data-execution_count="5">
<div id="642516f6" class="cell" title="Fetch standard name" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper((<span class="op">*</span>subsample, <span class="st">"JCP2022_800002"</span>), input_column<span class="op">=</span><span class="st">"JCP2022"</span>, output_columns<span class="op">=</span><span class="st">"JCP2022,standard_key"</span>)</span>
Expand All @@ -337,7 +337,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
</div>
</div>
<p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
<div id="55e9cce9" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
<div id="9ec566b7" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(pl.col(<span class="st">"Metadata_JCP2022"</span>).is_in(subsample)).collect()</span>
Expand Down
24 changes: 12 additions & 12 deletions howto/2_add_metadata.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"which ones are treatments and which ones are controls. Here we will\n",
"explore how we can use broad-babel to accomplish this task."
],
"id": "fe37da80-7a0c-4ade-90e1-97544f3a5593"
"id": "94851de9-e8fe-47cd-b7ab-926158a6d961"
},
{
"cell_type": "code",
Expand All @@ -23,15 +23,15 @@
"import polars as pl\n",
"from broad_babel.query import get_mapper"
],
"id": "85cc9e7d"
"id": "62066803"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be using the CRISPR dataset specificed in our index csv."
],
"id": "3648b6a2-fd02-4778-a8ed-f66e6ca410d3"
"id": "6a01d94a-b2eb-420e-83d9-e1f79478064e"
},
{
"cell_type": "code",
Expand All @@ -54,7 +54,7 @@
"profiles = pl.scan_parquet(CRISPR_URL)\n",
"print(profiles.collect_schema().names()[:6])"
],
"id": "55727930"
"id": "c5ccf468"
},
{
"cell_type": "markdown",
Expand All @@ -65,7 +65,7 @@
"for that perturbation. We will use broad-babel to further expand on this\n",
"metadata, but for simplicity’s sake let us sample subset of data."
],
"id": "94b719b6-37ec-482c-8755-945e613b2414"
"id": "fea49430-cce6-4a7f-8460-e5cc0982dec5"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -101,7 +101,7 @@
"subsample = (*subsample, \"JCP2022_800002\")\n",
"subsample"
],
"id": "310e271e"
"id": "08f0325b"
},
{
"cell_type": "markdown",
Expand All @@ -110,7 +110,7 @@
"We will use these JUMP ids to obtain a mapper that indicates the\n",
"perturbation type (trt, negcon or, rarely, poscon)"
],
"id": "aa599f86-db68-4c09-ab5d-ebf2ff0fb2a0"
"id": "fb3be569-364f-42a5-a858-cc3df5b4f238"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -143,7 +143,7 @@
"pert_mapper = get_mapper(subsample, input_column=\"JCP2022\", output_columns=\"JCP2022,pert_type\")\n",
"pert_mapper"
],
"id": "79763a3e"
"id": "80f037b5"
},
{
"cell_type": "markdown",
Expand All @@ -160,7 +160,7 @@
"\n",
"We will now repeat the process to get their ‘standard’ name"
],
"id": "787bccb8-54b2-4b3a-8c6d-4ec668aa49b5"
"id": "2e3249ff-468f-44aa-b12c-3e9e7456fe79"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -193,7 +193,7 @@
"name_mapper = get_mapper((*subsample, \"JCP2022_800002\"), input_column=\"JCP2022\", output_columns=\"JCP2022,standard_key\")\n",
"name_mapper"
],
"id": "2e265371"
"id": "c1dc2c7e"
},
{
"cell_type": "markdown",
Expand All @@ -204,7 +204,7 @@
"select a few features to showcase how how selection can be performed in\n",
"polars."
],
"id": "d17b350d-92ad-48ae-b5bc-d4ee0d67b35e"
"id": "fe339190-a4f5-4857-8989-538f1b2eac62"
},
{
"cell_type": "code",
Expand All @@ -229,7 +229,7 @@
" pl.col(\"Metadata_JCP2022\").replace(name_mapper).alias(\"name\"))\n",
"profiles_with_meta.select(pl.col((\"name\",\"pert_type\", \"^Metadata.*$\", \"^X_[0-3]$\"))).sort(by=\"pert_type\")"
],
"id": "c0a2ced8"
"id": "620bdc72"
}
],
"nbformat": 4,
Expand Down
Loading

0 comments on commit b0d4591

Please sign in to comment.