Built site for gh-pages

broadinstitute · Sep 6, 2024 · b0d4591 · b0d4591
1 parent fb6245b
commit b0d4591
Show file tree

Hide file tree

Showing 18 changed files with 317 additions and 317 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-3725b130
+c3b5a916
diff --git a/explanations/FAQ.ipynb b/explanations/FAQ.ipynb
@@ -152,7 +152,7 @@
         "        of these replicates’ value was in turn the mean of all the sites\n",
         "        and cells in a given well."
       ],
-      "id": "c448e5f3-f5e5-4fdf-ab3f-d4bb158f02a2"
+      "id": "cd7e6da4-e105-4aac-9013-15901a018da2"
     }
   ],
   "nbformat": 4,

diff --git a/explanations/Resources.ipynb b/explanations/Resources.ipynb
@@ -28,7 +28,7 @@
         "    [website](https://www.springscience.com/jump-cp) for data\n",
         "    exploration (account needed)."
       ],
-      "id": "d263bdd4-b80c-4c31-9f90-2d038be8c9db"
+      "id": "0f71de0c-bb20-4b54-aa45-4c457f1dfa9a"
     }
   ],
   "nbformat": 4,

diff --git a/howto/1_tutorial_basic.html b/howto/1_tutorial_basic.html
@@ -246,7 +246,7 @@ <h1 class="title">Access JUMP profiles</h1>
 
 
 <p>This is a tutorial on how to access profiles from the <a href="https://github.com/jump-cellpainting/datasets">JUMP Cell Painting datasets</a>. We will use polars to fetch the data frames lazily, with the help of <code>s3fs</code> and <code>pyarrow</code>. We prefer lazy loading because the data can be too big to be handled in memory.</p>
-<div id="19263eb5" class="cell" title="Imports" data-execution_count="1">
+<div id="c11c56df" class="cell" title="Imports" data-execution_count="1">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -259,14 +259,14 @@ <h1 class="title">Access JUMP profiles</h1>
 <li><code>cpg0016-jump[compound]</code>: Chemical perturbations.</li>
 </ol>
 <p>Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:</p>
-<div id="8bba0cbc" class="cell" title="Paths" data-execution_count="2">
+<div id="e654a9ed" class="cell" title="Paths" data-execution_count="2">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </details>
 </div>
 <p>We use a version-controlled csv to release the latest corrected profiles</p>
-<div id="adba8b36" class="cell" data-execution_count="3">
+<div id="142720f9" class="cell" data-execution_count="3">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>profile_index <span class="op">=</span> pl.read_csv(INDEX_FILE)</span>
@@ -328,7 +328,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)</p>
-<div id="ec77f397" class="cell" data-execution_count="4">
+<div id="704783a3" class="cell" data-execution_count="4">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>selected_profiles <span class="op">=</span> profile_index.<span class="bu">filter</span>(</span>
@@ -342,7 +342,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>We will lazy-load the dataframes and print the number of rows and columns</p>
-<div id="2c898974" class="cell" data-execution_count="5">
+<div id="9c6fa319" class="cell" data-execution_count="5">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>info <span class="op">=</span> {k: [] <span class="cf">for</span> k <span class="kw">in</span> (<span class="st">"dataset"</span>, <span class="st">"#rows"</span>, <span class="st">"#cols"</span>, <span class="st">"#Metadata cols"</span>, <span class="st">"Size (MB)"</span>)}</span>
@@ -415,7 +415,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>Let us now focus on the <code>crispr</code> dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.</p>
-<div id="85fdaf08" class="cell" data-execution_count="6">
+<div id="c26c5cc6" class="cell" data-execution_count="6">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> pl.scan_parquet(filepaths[<span class="st">"crispr"</span>])</span>
@@ -484,7 +484,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>The following line excludes the metadata columns:</p>
-<div id="dc8e7202" class="cell" data-execution_count="7">
+<div id="67bad2d3" class="cell" data-execution_count="7">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>data_only <span class="op">=</span> data.select(pl.<span class="bu">all</span>().exclude(<span class="st">"^Metadata.*$"</span>).sample(n<span class="op">=</span><span class="dv">5</span>, seed<span class="op">=</span><span class="dv">1</span>)).collect()</span>
@@ -1050,7 +1050,7 @@ <h1 class="title">Access JUMP profiles</h1>
 </div>
 </div>
 <p>Finally, we can convert this to <code>pandas</code> if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.</p>
-<div id="ff959884" class="cell" data-execution_count="8">
+<div id="fe0dc495" class="cell" data-execution_count="8">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>data_only.to_pandas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>

diff --git a/howto/1_tutorial_basic.ipynb b/howto/1_tutorial_basic.ipynb
@@ -12,7 +12,7 @@
         "and `pyarrow`. We prefer lazy loading because the data can be too big to\n",
         "be handled in memory."
       ],
-      "id": "46282db1-6e9a-41ab-af9b-f044787e286d"
+      "id": "71086e70-b2c9-4f8d-a5be-66bd4670f0a6"
     },
     {
       "cell_type": "code",
@@ -24,7 +24,7 @@
       "source": [
         "import polars as pl"
       ],
-      "id": "b4309070"
+      "id": "5344fc3b"
     },
     {
       "cell_type": "markdown",
@@ -40,7 +40,7 @@
         "produce the datasets. The aws paths of the dataframes are built from a\n",
         "prefix below:"
       ],
-      "id": "f1590189-b856-43c3-9ee8-478b0d63d930"
+      "id": "c93ac8bf-8f94-40ac-80fc-b194fe705ef1"
     },
     {
       "cell_type": "code",
@@ -52,15 +52,15 @@
       "source": [
         "INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\""
       ],
-      "id": "a1e55cf5"
+      "id": "d63c3b83"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We use a version-controlled csv to release the latest corrected profiles"
       ],
-      "id": "91aacd23-600d-43b9-960a-8b1c9e2a5624"
+      "id": "cf3ff280-8d46-4ff2-937f-d576e5ed9eb4"
     },
     {
       "cell_type": "code",
@@ -81,7 +81,7 @@
         "profile_index = pl.read_csv(INDEX_FILE)\n",
         "profile_index.head()"
       ],
-      "id": "c0f3a2fd"
+      "id": "a36a9eee"
     },
     {
       "cell_type": "markdown",
@@ -90,7 +90,7 @@
         "We do not need the ‘etag’ (used to check file integrity) column nor the\n",
         "‘interpretable’ (i.e., before major modifications)"
       ],
-      "id": "4d95966c-de78-4b23-bbc8-d300b26fc864"
+      "id": "92e8f7b1-342e-45e5-99c7-52f03c23a7c9"
     },
     {
       "cell_type": "code",
@@ -112,7 +112,7 @@
         "filepaths = dict(selected_profiles.iter_rows())\n",
         "print(filepaths)"
       ],
-      "id": "a68bfed0"
+      "id": "8b9fbe03"
     },
     {
       "cell_type": "markdown",
@@ -121,7 +121,7 @@
         "We will lazy-load the dataframes and print the number of rows and\n",
         "columns"
       ],
-      "id": "50b3644f-2298-4794-9f4a-d4ace3c3fd0f"
+      "id": "967d212a-fe71-4708-ad68-6c01a3488bb1"
     },
     {
       "cell_type": "code",
@@ -153,7 +153,7 @@
         "\n",
         "pl.DataFrame(info)"
       ],
-      "id": "6002adba"
+      "id": "aeafeae0"
     },
     {
       "cell_type": "markdown",
@@ -163,7 +163,7 @@
         "metadata columns. We will then sample rows and display the overview.\n",
         "Note that the collect() method enforces loading some data into memory."
       ],
-      "id": "fd073545-de8a-4f4d-8405-347c77733cac"
+      "id": "4e17598b-8645-4335-b1f7-2436f6192da7"
     },
     {
       "cell_type": "code",
@@ -184,15 +184,15 @@
         "data = pl.scan_parquet(filepaths[\"crispr\"])\n",
         "data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()"
       ],
-      "id": "46930b2a"
+      "id": "a4b72d95"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "The following line excludes the metadata columns:"
       ],
-      "id": "576cceb3-5491-463a-a1b6-a9a89586d02a"
+      "id": "960bfb4f-efe2-47fb-b03b-51699f95e79d"
     },
     {
       "cell_type": "code",
@@ -213,7 +213,7 @@
         "data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n",
         "data_only"
       ],
-      "id": "bad2728b"
+      "id": "fe95879c"
     },
     {
       "cell_type": "markdown",
@@ -223,7 +223,7 @@
         "with that tool. Keep in mind that this loads the entire dataframe into\n",
         "memory."
       ],
-      "id": "2cbf0bc1-5a9f-4b26-b852-3d98d35cb717"
+      "id": "81e8dd65-c331-4fb2-9bbb-f63969568a06"
     },
     {
       "cell_type": "code",
@@ -245,7 +245,7 @@
       "source": [
         "data_only.to_pandas()"
       ],
-      "id": "680033c9"
+      "id": "3f812c4e"
     }
   ],
   "nbformat": 4,

diff --git a/howto/2_add_metadata.html b/howto/2_add_metadata.html
@@ -246,7 +246,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 
 
 <p>A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.</p>
-<div id="c9daedc4" class="cell" title="Imports" data-execution_count="1">
+<div id="88c1356a" class="cell" title="Imports" data-execution_count="1">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> polars <span class="im">as</span> pl</span>
@@ -257,7 +257,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>We will be using the CRISPR dataset specificed in our index csv.</p>
-<div id="1e366862" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
+<div id="f6cca2f7" class="cell" title="Fetch the CRISPR dataset" data-execution_count="2">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>INDEX_FILE <span class="op">=</span> <span class="st">"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"</span></span>
@@ -270,7 +270,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.</p>
-<div id="390f5f1b" class="cell" title="Subset data" data-execution_count="3">
+<div id="89ffa108" class="cell" title="Subset data" data-execution_count="3">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>jcp_ids <span class="op">=</span> profiles.select(pl.col(<span class="st">"Metadata_JCP2022"</span>)).unique().collect().to_series().sort()</span>
@@ -294,7 +294,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)</p>
-<div id="8999f352" class="cell" title="Pull mapper" data-execution_count="4">
+<div id="7a16a62b" class="cell" title="Pull mapper" data-execution_count="4">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>pert_mapper <span class="op">=</span> get_mapper(subsample, input_column<span class="op">=</span><span class="st">"JCP2022"</span>, output_columns<span class="op">=</span><span class="st">"JCP2022,pert_type"</span>)</span>
@@ -316,7 +316,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 <p>A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s <a href="https://github.com/broadinstitute/monorepo/tree/main/libs/jump_babel">readme</a>.</p>
 <p>We will now repeat the process to get their ‘standard’ name</p>
-<div id="7e4241c6" class="cell" title="Fetch standard name" data-execution_count="5">
+<div id="642516f6" class="cell" title="Fetch standard name" data-execution_count="5">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>name_mapper <span class="op">=</span> get_mapper((<span class="op">*</span>subsample, <span class="st">"JCP2022_800002"</span>), input_column<span class="op">=</span><span class="st">"JCP2022"</span>, output_columns<span class="op">=</span><span class="st">"JCP2022,standard_key"</span>)</span>
@@ -337,7 +337,7 @@ <h1 class="title">Incorporate metadata into profiles</h1>
 </div>
 </div>
 <p>To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.</p>
-<div id="55e9cce9" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
+<div id="9ec566b7" class="cell" title="Filter profiles and merge metadata" data-execution_count="6">
 <details class="code-fold">
 <summary>Code</summary>
 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>subsample_profiles <span class="op">=</span> profiles.<span class="bu">filter</span>(pl.col(<span class="st">"Metadata_JCP2022"</span>).is_in(subsample)).collect()</span>

diff --git a/howto/2_add_metadata.ipynb b/howto/2_add_metadata.ipynb
@@ -10,7 +10,7 @@
         "which ones are treatments and which ones are controls. Here we will\n",
         "explore how we can use broad-babel to accomplish this task."
       ],
-      "id": "fe37da80-7a0c-4ade-90e1-97544f3a5593"
+      "id": "94851de9-e8fe-47cd-b7ab-926158a6d961"
     },
     {
       "cell_type": "code",
@@ -23,15 +23,15 @@
         "import polars as pl\n",
         "from broad_babel.query import get_mapper"
       ],
-      "id": "85cc9e7d"
+      "id": "62066803"
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We will be using the CRISPR dataset specificed in our index csv."
       ],
-      "id": "3648b6a2-fd02-4778-a8ed-f66e6ca410d3"
+      "id": "6a01d94a-b2eb-420e-83d9-e1f79478064e"
     },
     {
       "cell_type": "code",
@@ -54,7 +54,7 @@
         "profiles = pl.scan_parquet(CRISPR_URL)\n",
         "print(profiles.collect_schema().names()[:6])"
       ],
-      "id": "55727930"
+      "id": "c5ccf468"
     },
     {
       "cell_type": "markdown",
@@ -65,7 +65,7 @@
         "for that perturbation. We will use broad-babel to further expand on this\n",
         "metadata, but for simplicity’s sake let us sample subset of data."
       ],
-      "id": "94b719b6-37ec-482c-8755-945e613b2414"
+      "id": "fea49430-cce6-4a7f-8460-e5cc0982dec5"
     },
     {
       "cell_type": "code",
@@ -101,7 +101,7 @@
         "subsample = (*subsample, \"JCP2022_800002\")\n",
         "subsample"
       ],
-      "id": "310e271e"
+      "id": "08f0325b"
     },
     {
       "cell_type": "markdown",
@@ -110,7 +110,7 @@
         "We will use these JUMP ids to obtain a mapper that indicates the\n",
         "perturbation type (trt, negcon or, rarely, poscon)"
       ],
-      "id": "aa599f86-db68-4c09-ab5d-ebf2ff0fb2a0"
+      "id": "fb3be569-364f-42a5-a858-cc3df5b4f238"
     },
     {
       "cell_type": "code",
@@ -143,7 +143,7 @@
         "pert_mapper = get_mapper(subsample, input_column=\"JCP2022\", output_columns=\"JCP2022,pert_type\")\n",
         "pert_mapper"
       ],
-      "id": "79763a3e"
+      "id": "80f037b5"
     },
     {
       "cell_type": "markdown",
@@ -160,7 +160,7 @@
         "\n",
         "We will now repeat the process to get their ‘standard’ name"
       ],
-      "id": "787bccb8-54b2-4b3a-8c6d-4ec668aa49b5"
+      "id": "2e3249ff-468f-44aa-b12c-3e9e7456fe79"
     },
     {
       "cell_type": "code",
@@ -193,7 +193,7 @@
         "name_mapper = get_mapper((*subsample, \"JCP2022_800002\"), input_column=\"JCP2022\", output_columns=\"JCP2022,standard_key\")\n",
         "name_mapper"
       ],
-      "id": "2e265371"
+      "id": "c1dc2c7e"
     },
     {
       "cell_type": "markdown",
@@ -204,7 +204,7 @@
         "select a few features to showcase how how selection can be performed in\n",
         "polars."
       ],
-      "id": "d17b350d-92ad-48ae-b5bc-d4ee0d67b35e"
+      "id": "fe339190-a4f5-4857-8989-538f1b2eac62"
     },
     {
       "cell_type": "code",
@@ -229,7 +229,7 @@
         "                                pl.col(\"Metadata_JCP2022\").replace(name_mapper).alias(\"name\"))\n",
         "profiles_with_meta.select(pl.col((\"name\",\"pert_type\", \"^Metadata.*$\", \"^X_[0-3]$\"))).sort(by=\"pert_type\")"
       ],
-      "id": "c0a2ced8"
+      "id": "620bdc72"
     }
   ],
   "nbformat": 4,