Code
import polars as pl
Retrieve JUMP profiles
cpg0016-jump[compound]
: Chemical perturbations.Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:
-Code
= "https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv" INDEX_FILE
We use a version-controlled csv to release the latest corrected profiles
-Code
= pl.read_csv(INDEX_FILE)
@@ -340,7 +340,7 @@ profile_index Retrieve JUMP profiles
We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)
-Code
= profile_index.filter(
@@ -354,7 +354,7 @@ selected_profiles Retrieve JUMP profiles
We will lazy-load the dataframes and print the number of rows and columns
-Code
= {k: [] for k in ("dataset", "#rows", "#cols", "#Metadata cols", "Size (MB)")}
@@ -427,7 +427,7 @@ info Retrieve JUMP profiles
Let us now focus on the crispr
dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.
Code
= pl.scan_parquet(filepaths["crispr"])
@@ -496,7 +496,7 @@ data Retrieve JUMP profiles
The following line excludes the metadata columns:
-Code
= data.select(pl.all().exclude("^Metadata.*$").sample(n=5, seed=1)).collect()
@@ -1062,7 +1062,7 @@ data_only Retrieve JUMP profiles
Finally, we can convert this to pandas
if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.
Code
data_only.to_pandas()
Incorporate metadata into profiles
A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.
-Code
import polars as pl
@@ -266,7 +266,7 @@ Incorporate metadata into profiles
We will be using the CRISPR dataset specificed in our index csv.
-Code
= "https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"
@@ -279,7 +279,7 @@ INDEX_FILE Incorporate metadata into profiles
For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.
-Code
= (
@@ -305,7 +305,7 @@ jcp_ids Incorporate metadata into profiles
We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)
-Code
= get_mapper(
@@ -329,7 +329,7 @@ pert_mapper Incorporate metadata into profiles
A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s readme.
We will now repeat the process to get their ‘standard’ name
-Code
= get_mapper(
@@ -354,7 +354,7 @@ name_mapper Incorporate metadata into profiles
To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.
-Code
= profiles.filter(
diff --git a/howto/2_add_metadata.ipynb b/howto/2_add_metadata.ipynb
index 1094561..bbea796 100644
--- a/howto/2_add_metadata.ipynb
+++ b/howto/2_add_metadata.ipynb
@@ -10,7 +10,7 @@
"which ones are treatments and which ones are controls. Here we will\n",
"explore how we can use broad-babel to accomplish this task."
],
- "id": "45ca3be5-e1cb-426f-9d42-e26792dc9315"
+ "id": "da3ac58d-0ed3-4e12-8542-7f71905cd2e7"
},
{
"cell_type": "code",
@@ -23,7 +23,7 @@
"import polars as pl\n",
"from broad_babel.query import get_mapper"
],
- "id": "6a23b41f"
+ "id": "e86bf6be"
},
{
"cell_type": "markdown",
@@ -31,7 +31,7 @@
"source": [
"We will be using the CRISPR dataset specificed in our index csv."
],
- "id": "4e2482bb-d195-4492-87d4-405c31d6a5a1"
+ "id": "0153dfa3-79c6-4134-8eb0-dfbdaf1b055e"
},
{
"cell_type": "code",
@@ -54,7 +54,7 @@
"profiles = pl.scan_parquet(CRISPR_URL)\n",
"print(profiles.collect_schema().names()[:6])"
],
- "id": "e5a8c1da"
+ "id": "038abb17"
},
{
"cell_type": "markdown",
@@ -65,7 +65,7 @@
"for that perturbation. We will use broad-babel to further expand on this\n",
"metadata, but for simplicity’s sake let us sample subset of data."
],
- "id": "78e7df4f-24ba-4313-b05c-65adacda1d8e"
+ "id": "81c05e27-1ce5-4a89-af6e-58da9958dbb3"
},
{
"cell_type": "code",
@@ -103,7 +103,7 @@
"subsample = (*subsample, \"JCP2022_800002\")\n",
"subsample"
],
- "id": "8639fb0b"
+ "id": "13c5e916"
},
{
"cell_type": "markdown",
@@ -112,7 +112,7 @@
"We will use these JUMP ids to obtain a mapper that indicates the\n",
"perturbation type (trt, negcon or, rarely, poscon)"
],
- "id": "a2478cfe-073c-4c47-89ab-83928eb806e5"
+ "id": "2f8be60a-99c5-4b97-936f-656887dc6e8d"
},
{
"cell_type": "code",
@@ -147,7 +147,7 @@
")\n",
"pert_mapper"
],
- "id": "6f5caa7e"
+ "id": "90684b27"
},
{
"cell_type": "markdown",
@@ -164,7 +164,7 @@
"\n",
"We will now repeat the process to get their ‘standard’ name"
],
- "id": "156cb830-0a02-462a-9d23-d11f0c5bbc3d"
+ "id": "92af4ce5-15b4-4bff-b845-b60d82066fcb"
},
{
"cell_type": "code",
@@ -201,7 +201,7 @@
")\n",
"name_mapper"
],
- "id": "9fbd9b1a"
+ "id": "e5ad1d97"
},
{
"cell_type": "markdown",
@@ -212,7 +212,7 @@
"select a few features to showcase how how selection can be performed in\n",
"polars."
],
- "id": "bfebdbe0-edb9-4e6a-b0da-8a818621dab7"
+ "id": "79f47295-6816-4345-88dc-56300a6d27e1"
},
{
"cell_type": "code",
@@ -243,7 +243,7 @@
" pl.col((\"name\", \"pert_type\", \"^Metadata.*$\", \"^X_[0-3]$\"))\n",
").sort(by=\"pert_type\")"
],
- "id": "8b57de41"
+ "id": "ff9e1a6f"
}
],
"nbformat": 4,
diff --git a/howto/3_calculate_activity.html b/howto/3_calculate_activity.html
index 3019fa0..6293fcf 100644
--- a/howto/3_calculate_activity.html
+++ b/howto/3_calculate_activity.html
@@ -259,7 +259,7 @@ subsample_profiles Calculate phenotypic activity
A common first analysis for morphological datasets is the activity of the cells’ phenotypes. We will use the copairs package, which makes use of mean average precision to obtain a metric of replicability for any set of morphological profiles. In other words, it indicates how similar a given set of compounds are, relative to their negative controls, which is usually cells that have experienced no perturbation.
-
+
Code
import polars as pl
@@ -270,7 +270,7 @@ Calculate phenotypic activity
We will be using the CRISPR dataset specificed in our index csv, but we will select a subset of perturbations and the controls present.
-
+
Code
= "https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"
@@ -279,7 +279,7 @@ INDEX_FILE Calculate phenotypic activity
Sample perturbations and add known negative control.
-
+
Code
= (
@@ -298,7 +298,7 @@ jcp_ids Calculate phenotypic activity
Now we create a mapper to label treatments and controls. See the previous tutorial for details on fetching metadata.
-
+
Code
= get_mapper(
@@ -310,7 +310,7 @@ pert_mapper Calculate phenotypic activity
Finally we use the parameters from . See the copairs wiki for more details on the parameters that copairs requires.
-
+
Code
= ["Metadata_JCP2022"] # We want to match perturbations
@@ -339,12 +339,12 @@ pos_sameby Calculate phenotypic activity
@@ -439,7 +439,7 @@ Calculate phenotypic activity
The result of copairs is a dataframe containing, in addition to the original metadata, the average precision with which perturbations were retrieved. Perturbations that look more similar to each other than to the negative controls in the plates present in the same plates will be higher. Perturbations that do not differentiate themselves against negative controls will be closer to zero.
To wrap up we pull the standard gene symbol and plot the distribution of average precision.
-
+
Code
= get_mapper(
@@ -467,7 +467,7 @@ name_mapper Calculate phenotypic activity
+
@@ -100,9 +100,6 @@
"search-label": "Search"
}
}
-
-
-
@@ -258,7 +255,7 @@ Query information of genes
This how-to focuses on linking gene names outside. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to keep in hand..
-
+
Code
import polars as pl
@@ -267,7 +264,7 @@ Query information of genes
We define
-
+
Code
= "example@email.com"
@@ -280,14 +277,14 @@ Entrez.email Query information of genes
We will use a set of genes that we found in a JUMP cluster as an example.
-
+
Code
= ("CHRM4", "SCAPER", "GPR176", "LY6K") genes
Get the
-
+
Code
# Get a dictionary that maps Gene symbols to Entrez IDs
@@ -308,65 +305,76 @@ Query information of genes
)
-
+
Code
- pl.DataFrame(entries)
+with pl.Config(fmt_str_lengths=1000):
+print(pl.DataFrame(entries))
-
-
-
-shape: (4, 4)
-
-
-
-Name
-Description
-Summary
-OtherDesignations
-
-
-str
-str
-str
-str
-
-
-
-
-"GPR176"
-"G protein-coupled receptor 176"
-"Members of the G protein-coupl…
-"G-protein coupled receptor 176…
-
-
-"CHRM4"
-"cholinergic receptor muscarini…
-"The muscarinic cholinergic rec…
-"muscarinic acetylcholine recep…
-
-
-"LY6K"
-"lymphocyte antigen 6 family me…
-"Predicted to be involved in bi…
-"lymphocyte antigen 6K|cancer/t…
-
-
-"SCAPER"
-"S-phase cyclin A associated pr…
-"Predicted to enable nucleic ac…
-"S phase cyclin A-associated pr…
-
-
-
-
-
+
+shape: (4, 4)
+┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐
+│ Name ┆ Description ┆ Summary ┆ OtherDesignations │
+│ --- ┆ --- ┆ --- ┆ --- │
+│ str ┆ str ┆ str ┆ str │
+╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡
+│ GPR176 ┆ G protein-coupled receptor ┆ Members of the G ┆ G-protein coupled receptor │
+│ ┆ 176 ┆ protein-coupled receptor ┆ 176|probable G-protein │
+│ ┆ ┆ family, such as GPR176, are ┆ coupled receptor 176 │
+│ ┆ ┆ cell surface receptors ┆ │
+│ ┆ ┆ involved in responses to ┆ │
+│ ┆ ┆ hormones, growth factors, ┆ │
+│ ┆ ┆ and neurotransmitters (Hata ┆ │
+│ ┆ ┆ et al., 1995 [PubMed ┆ │
+│ ┆ ┆ 7893747]).[supplied by ┆ │
+│ ┆ ┆ OMIM, Jul 2008] ┆ │
+│ CHRM4 ┆ cholinergic receptor ┆ The muscarinic cholinergic ┆ muscarinic acetylcholine │
+│ ┆ muscarinic 4 ┆ receptors belong to a ┆ receptor M4|acetylcholine │
+│ ┆ ┆ larger family of G ┆ receptor, muscarinic 4 │
+│ ┆ ┆ protein-coupled receptors. ┆ │
+│ ┆ ┆ The functional diversity of ┆ │
+│ ┆ ┆ these receptors is defined ┆ │
+│ ┆ ┆ by the binding of ┆ │
+│ ┆ ┆ acetylcholine and includes ┆ │
+│ ┆ ┆ cellular responses such as ┆ │
+│ ┆ ┆ adenylate cyclase ┆ │
+│ ┆ ┆ inhibition, ┆ │
+│ ┆ ┆ phosphoinositide ┆ │
+│ ┆ ┆ degeneration, and potassium ┆ │
+│ ┆ ┆ channel mediation. ┆ │
+│ ┆ ┆ Muscarinic receptors ┆ │
+│ ┆ ┆ influence many effects of ┆ │
+│ ┆ ┆ acetylcholine in the ┆ │
+│ ┆ ┆ central and peripheral ┆ │
+│ ┆ ┆ nervous system. The ┆ │
+│ ┆ ┆ clinical implications of ┆ │
+│ ┆ ┆ this receptor are unknown; ┆ │
+│ ┆ ┆ however, mouse studies link ┆ │
+│ ┆ ┆ its function to adenylyl ┆ │
+│ ┆ ┆ cyclase inhibition. ┆ │
+│ ┆ ┆ [provided by RefSeq, Jul ┆ │
+│ ┆ ┆ 2008] ┆ │
+│ LY6K ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen │
+│ ┆ member K ┆ binding activity of sperm ┆ 6K|cancer/testis antigen │
+│ ┆ ┆ to zona pellucida. ┆ 97|lymphocyte antigen 6 │
+│ ┆ ┆ Predicted to act upstream ┆ complex, locus │
+│ ┆ ┆ of or within flagellated ┆ K|up-regulated in lung │
+│ ┆ ┆ sperm motility. Predicted ┆ cancer 10 │
+│ ┆ ┆ to be located in cell ┆ │
+│ ┆ ┆ surface; cytoplasm; and ┆ │
+│ ┆ ┆ plasma membrane. Predicted ┆ │
+│ ┆ ┆ to be active in acrosomal ┆ │
+│ ┆ ┆ vesicle. [provided by ┆ │
+│ ┆ ┆ Alliance of Genome ┆ │
+│ ┆ ┆ Resources, Apr 2022] ┆ │
+│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │
+│ ┆ protein in the ER ┆ acid binding activity and ┆ protein in the endoplasmic │
+│ ┆ ┆ zinc ion binding activity. ┆ reticulum|zinc finger │
+│ ┆ ┆ Located in cytosol and ┆ protein 291 │
+│ ┆ ┆ nuclear speck. [provided by ┆ │
+│ ┆ ┆ Alliance of Genome ┆ │
+│ ┆ ┆ Resources, Apr 2022] ┆ │
+└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘
diff --git a/howto/6_query_genes_externally.ipynb b/howto/6_query_genes_externally.ipynb
index 318c2f5..b3683bc 100644
--- a/howto/6_query_genes_externally.ipynb
+++ b/howto/6_query_genes_externally.ipynb
@@ -14,7 +14,7 @@
"[documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/), which\n",
"contains all the options, is a useful reference to keep in hand.."
],
- "id": "f25a0c19-a496-4f5a-8636-9fd85f01d21f"
+ "id": "7d3f7496-4528-4249-8e1c-fba864d239c2"
},
{
"cell_type": "code",
@@ -28,7 +28,7 @@
"from Bio import Entrez\n",
"from broad_babel.query import get_mapper"
],
- "id": "b20d0818"
+ "id": "3b44cccd"
},
{
"cell_type": "markdown",
@@ -36,7 +36,7 @@
"source": [
"We define"
],
- "id": "3c5dc13a-3a1f-4dc0-b0ee-863cd2e3505d"
+ "id": "0c4f7c8f-9988-4ded-9dd2-f62563e81b91"
},
{
"cell_type": "code",
@@ -52,7 +52,7 @@
" \"OtherDesignations\", # This gives us synonyms\n",
")"
],
- "id": "e2f17932"
+ "id": "da738c03"
},
{
"cell_type": "markdown",
@@ -61,7 +61,7 @@
"We will use a set of genes that we found in a JUMP cluster as an\n",
"example."
],
- "id": "19103f9a-d181-4430-81fa-1452c2aabeaf"
+ "id": "a7a8a2d5-b2c0-481e-9357-1622000a7a0a"
},
{
"cell_type": "code",
@@ -71,7 +71,7 @@
"source": [
"genes = (\"CHRM4\", \"SCAPER\", \"GPR176\", \"LY6K\")"
],
- "id": "3dd9e050"
+ "id": "b8637965"
},
{
"cell_type": "markdown",
@@ -79,7 +79,7 @@
"source": [
"Get the"
],
- "id": "cad009a5-b36b-4816-95cb-a4eca3528691"
+ "id": "6387dff8-fa09-4051-aac6-28a721f09352"
},
{
"cell_type": "code",
@@ -104,7 +104,7 @@
" {k: record[\"DocumentSummarySet\"][\"DocumentSummary\"][0][k] for k in fields}\n",
" )"
],
- "id": "7b7a39b4"
+ "id": "0de30308"
},
{
"cell_type": "code",
@@ -114,19 +114,80 @@
},
"outputs": [
{
- "output_type": "display_data",
- "metadata": {},
- "data": {
- "text/html": [
- ""
- ]
- }
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "shape: (4, 4)\n",
+ "┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐\n",
+ "│ Name ┆ Description ┆ Summary ┆ OtherDesignations │\n",
+ "│ --- ┆ --- ┆ --- ┆ --- │\n",
+ "│ str ┆ str ┆ str ┆ str │\n",
+ "╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡\n",
+ "│ GPR176 ┆ G protein-coupled receptor ┆ Members of the G ┆ G-protein coupled receptor │\n",
+ "│ ┆ 176 ┆ protein-coupled receptor ┆ 176|probable G-protein │\n",
+ "│ ┆ ┆ family, such as GPR176, are ┆ coupled receptor 176 │\n",
+ "│ ┆ ┆ cell surface receptors ┆ │\n",
+ "│ ┆ ┆ involved in responses to ┆ │\n",
+ "│ ┆ ┆ hormones, growth factors, ┆ │\n",
+ "│ ┆ ┆ and neurotransmitters (Hata ┆ │\n",
+ "│ ┆ ┆ et al., 1995 [PubMed ┆ │\n",
+ "│ ┆ ┆ 7893747]).[supplied by ┆ │\n",
+ "│ ┆ ┆ OMIM, Jul 2008] ┆ │\n",
+ "│ CHRM4 ┆ cholinergic receptor ┆ The muscarinic cholinergic ┆ muscarinic acetylcholine │\n",
+ "│ ┆ muscarinic 4 ┆ receptors belong to a ┆ receptor M4|acetylcholine │\n",
+ "│ ┆ ┆ larger family of G ┆ receptor, muscarinic 4 │\n",
+ "│ ┆ ┆ protein-coupled receptors. ┆ │\n",
+ "│ ┆ ┆ The functional diversity of ┆ │\n",
+ "│ ┆ ┆ these receptors is defined ┆ │\n",
+ "│ ┆ ┆ by the binding of ┆ │\n",
+ "│ ┆ ┆ acetylcholine and includes ┆ │\n",
+ "│ ┆ ┆ cellular responses such as ┆ │\n",
+ "│ ┆ ┆ adenylate cyclase ┆ │\n",
+ "│ ┆ ┆ inhibition, ┆ │\n",
+ "│ ┆ ┆ phosphoinositide ┆ │\n",
+ "│ ┆ ┆ degeneration, and potassium ┆ │\n",
+ "│ ┆ ┆ channel mediation. ┆ │\n",
+ "│ ┆ ┆ Muscarinic receptors ┆ │\n",
+ "│ ┆ ┆ influence many effects of ┆ │\n",
+ "│ ┆ ┆ acetylcholine in the ┆ │\n",
+ "│ ┆ ┆ central and peripheral ┆ │\n",
+ "│ ┆ ┆ nervous system. The ┆ │\n",
+ "│ ┆ ┆ clinical implications of ┆ │\n",
+ "│ ┆ ┆ this receptor are unknown; ┆ │\n",
+ "│ ┆ ┆ however, mouse studies link ┆ │\n",
+ "│ ┆ ┆ its function to adenylyl ┆ │\n",
+ "│ ┆ ┆ cyclase inhibition. ┆ │\n",
+ "│ ┆ ┆ [provided by RefSeq, Jul ┆ │\n",
+ "│ ┆ ┆ 2008] ┆ │\n",
+ "│ LY6K ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen │\n",
+ "│ ┆ member K ┆ binding activity of sperm ┆ 6K|cancer/testis antigen │\n",
+ "│ ┆ ┆ to zona pellucida. ┆ 97|lymphocyte antigen 6 │\n",
+ "│ ┆ ┆ Predicted to act upstream ┆ complex, locus │\n",
+ "│ ┆ ┆ of or within flagellated ┆ K|up-regulated in lung │\n",
+ "│ ┆ ┆ sperm motility. Predicted ┆ cancer 10 │\n",
+ "│ ┆ ┆ to be located in cell ┆ │\n",
+ "│ ┆ ┆ surface; cytoplasm; and ┆ │\n",
+ "│ ┆ ┆ plasma membrane. Predicted ┆ │\n",
+ "│ ┆ ┆ to be active in acrosomal ┆ │\n",
+ "│ ┆ ┆ vesicle. [provided by ┆ │\n",
+ "│ ┆ ┆ Alliance of Genome ┆ │\n",
+ "│ ┆ ┆ Resources, Apr 2022] ┆ │\n",
+ "│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │\n",
+ "│ ┆ protein in the ER ┆ acid binding activity and ┆ protein in the endoplasmic │\n",
+ "│ ┆ ┆ zinc ion binding activity. ┆ reticulum|zinc finger │\n",
+ "│ ┆ ┆ Located in cytosol and ┆ protein 291 │\n",
+ "│ ┆ ┆ nuclear speck. [provided by ┆ │\n",
+ "│ ┆ ┆ Alliance of Genome ┆ │\n",
+ "│ ┆ ┆ Resources, Apr 2022] ┆ │\n",
+ "└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘"
+ ]
}
],
"source": [
- "pl.DataFrame(entries)"
+ "with pl.Config(fmt_str_lengths=1000):\n",
+ " print(pl.DataFrame(entries))"
],
- "id": "f07b6354"
+ "id": "5bcdb87b"
}
],
"nbformat": 4,
diff --git a/index.ipynb b/index.ipynb
index d199566..db2d079 100644
--- a/index.ipynb
+++ b/index.ipynb
@@ -54,7 +54,7 @@
"novel biological insights. We aim to make this the one-stop shop for the\n",
"vast majority of JUMP questions, be it computational or biological."
],
- "id": "44e3e6f8-f1db-4629-bae1-cacd44dfe950"
+ "id": "5a9b1cf8-debe-488b-9378-6cb182fc163c"
}
],
"nbformat": 4,
diff --git a/readme.ipynb b/readme.ipynb
index 4f81681..1d4d895 100644
--- a/readme.ipynb
+++ b/readme.ipynb
@@ -17,7 +17,7 @@
"This repository can be used as a way to install essential dependencies\n",
"for an exploratory analysis of JUMP morphological data."
],
- "id": "6491d325-c6f3-476f-b778-3d2cd1b8ffe7"
+ "id": "417cf2c1-e9ba-4a35-a98d-c168bfa9716d"
}
],
"nbformat": 4,
diff --git a/search.json b/search.json
index a404379..f7eb378 100644
--- a/search.json
+++ b/search.json
@@ -182,7 +182,7 @@
"href": "howto/6_query_genes_externally.html",
"title": "Query information of genes",
"section": "",
- "text": "This how-to focuses on linking gene names outside. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to keep in hand..\n\n\nCode\nimport polars as pl\nfrom Bio import Entrez\nfrom broad_babel.query import get_mapper\n\n\nWe define\n\n\nCode\nEntrez.email = \"example@email.com\"\nfields = (\n \"Name\",\n \"Description\",\n \"Summary\",\n \"OtherDesignations\", # This gives us synonyms\n)\n\n\nWe will use a set of genes that we found in a JUMP cluster as an example.\n\n\nCode\ngenes = (\"CHRM4\", \"SCAPER\", \"GPR176\", \"LY6K\")\n\n\nGet the\n\n\nCode\n# Get a dictionary that maps Gene symbols to Entrez IDs\nids = get_mapper(\n query=genes,\n input_column=\"standard_key\",\n output_columns=\"standard_key,NCBI_Gene_ID\",\n)\n\n# Fetch the summaries for these genes\nentries = []\nfor id_ in ids.values():\n stream = Entrez.esummary(db=\"gene\", id=id_)\n record = Entrez.read(stream)\n\n entries.append(\n {k: record[\"DocumentSummarySet\"][\"DocumentSummary\"][0][k] for k in fields}\n )\n\n\n\n\nCode\npl.DataFrame(entries)\n\n\n\n\nshape: (4, 4)\n\n\n\nName\nDescription\nSummary\nOtherDesignations\n\n\nstr\nstr\nstr\nstr\n\n\n\n\n\"GPR176\"\n\"G protein-coupled receptor 176\"\n\"Members of the G protein-coupl…\n\"G-protein coupled receptor 176…\n\n\n\"CHRM4\"\n\"cholinergic receptor muscarini…\n\"The muscarinic cholinergic rec…\n\"muscarinic acetylcholine recep…\n\n\n\"LY6K\"\n\"lymphocyte antigen 6 family me…\n\"Predicted to be involved in bi…\n\"lymphocyte antigen 6K|cancer/t…\n\n\n\"SCAPER\"\n\"S-phase cyclin A associated pr…\n\"Predicted to enable nucleic ac…\n\"S phase cyclin A-associated pr…",
+ "text": "This how-to focuses on linking gene names outside. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to keep in hand..\n\n\nCode\nimport polars as pl\nfrom Bio import Entrez\nfrom broad_babel.query import get_mapper\n\n\nWe define\n\n\nCode\nEntrez.email = \"example@email.com\"\nfields = (\n \"Name\",\n \"Description\",\n \"Summary\",\n \"OtherDesignations\", # This gives us synonyms\n)\n\n\nWe will use a set of genes that we found in a JUMP cluster as an example.\n\n\nCode\ngenes = (\"CHRM4\", \"SCAPER\", \"GPR176\", \"LY6K\")\n\n\nGet the\n\n\nCode\n# Get a dictionary that maps Gene symbols to Entrez IDs\nids = get_mapper(\n query=genes,\n input_column=\"standard_key\",\n output_columns=\"standard_key,NCBI_Gene_ID\",\n)\n\n# Fetch the summaries for these genes\nentries = []\nfor id_ in ids.values():\n stream = Entrez.esummary(db=\"gene\", id=id_)\n record = Entrez.read(stream)\n\n entries.append(\n {k: record[\"DocumentSummarySet\"][\"DocumentSummary\"][0][k] for k in fields}\n )\n\n\n\n\nCode\nwith pl.Config(fmt_str_lengths=1000):\n print(pl.DataFrame(entries))\n\n\nshape: (4, 4)\n┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐\n│ Name ┆ Description ┆ Summary ┆ OtherDesignations │\n│ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str │\n╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡\n│ GPR176 ┆ G protein-coupled receptor ┆ Members of the G ┆ G-protein coupled receptor │\n│ ┆ 176 ┆ protein-coupled receptor ┆ 176|probable G-protein │\n│ ┆ ┆ family, such as GPR176, are ┆ coupled receptor 176 │\n│ ┆ ┆ cell surface receptors ┆ │\n│ ┆ ┆ involved in responses to ┆ │\n│ ┆ ┆ hormones, growth factors, ┆ │\n│ ┆ ┆ and neurotransmitters (Hata ┆ │\n│ ┆ ┆ et al., 1995 [PubMed ┆ │\n│ ┆ ┆ 7893747]).[supplied by ┆ │\n│ ┆ ┆ OMIM, Jul 2008] ┆ │\n│ CHRM4 ┆ cholinergic receptor ┆ The muscarinic cholinergic ┆ muscarinic acetylcholine │\n│ ┆ muscarinic 4 ┆ receptors belong to a ┆ receptor M4|acetylcholine │\n│ ┆ ┆ larger family of G ┆ receptor, muscarinic 4 │\n│ ┆ ┆ protein-coupled receptors. ┆ │\n│ ┆ ┆ The functional diversity of ┆ │\n│ ┆ ┆ these receptors is defined ┆ │\n│ ┆ ┆ by the binding of ┆ │\n│ ┆ ┆ acetylcholine and includes ┆ │\n│ ┆ ┆ cellular responses such as ┆ │\n│ ┆ ┆ adenylate cyclase ┆ │\n│ ┆ ┆ inhibition, ┆ │\n│ ┆ ┆ phosphoinositide ┆ │\n│ ┆ ┆ degeneration, and potassium ┆ │\n│ ┆ ┆ channel mediation. ┆ │\n│ ┆ ┆ Muscarinic receptors ┆ │\n│ ┆ ┆ influence many effects of ┆ │\n│ ┆ ┆ acetylcholine in the ┆ │\n│ ┆ ┆ central and peripheral ┆ │\n│ ┆ ┆ nervous system. The ┆ │\n│ ┆ ┆ clinical implications of ┆ │\n│ ┆ ┆ this receptor are unknown; ┆ │\n│ ┆ ┆ however, mouse studies link ┆ │\n│ ┆ ┆ its function to adenylyl ┆ │\n│ ┆ ┆ cyclase inhibition. ┆ │\n│ ┆ ┆ [provided by RefSeq, Jul ┆ │\n│ ┆ ┆ 2008] ┆ │\n│ LY6K ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen │\n│ ┆ member K ┆ binding activity of sperm ┆ 6K|cancer/testis antigen │\n│ ┆ ┆ to zona pellucida. ┆ 97|lymphocyte antigen 6 │\n│ ┆ ┆ Predicted to act upstream ┆ complex, locus │\n│ ┆ ┆ of or within flagellated ┆ K|up-regulated in lung │\n│ ┆ ┆ sperm motility. Predicted ┆ cancer 10 │\n│ ┆ ┆ to be located in cell ┆ │\n│ ┆ ┆ surface; cytoplasm; and ┆ │\n│ ┆ ┆ plasma membrane. Predicted ┆ │\n│ ┆ ┆ to be active in acrosomal ┆ │\n│ ┆ ┆ vesicle. [provided by ┆ │\n│ ┆ ┆ Alliance of Genome ┆ │\n│ ┆ ┆ Resources, Apr 2022] ┆ │\n│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │\n│ ┆ protein in the ER ┆ acid binding activity and ┆ protein in the endoplasmic │\n│ ┆ ┆ zinc ion binding activity. ┆ reticulum|zinc finger │\n│ ┆ ┆ Located in cytosol and ┆ protein 291 │\n│ ┆ ┆ nuclear speck. [provided by ┆ │\n│ ┆ ┆ Alliance of Genome ┆ │\n│ ┆ ┆ Resources, Apr 2022] ┆ │\n└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘",
"crumbs": [
"How-To Guides",
"Query information of genes"
diff --git a/sitemap.xml b/sitemap.xml
index d356649..07b0fc8 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,46 +2,46 @@
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/3_calculate_activity.html
- 2024-09-11T00:06:44.718Z
+ 2024-09-11T00:17:49.904Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/2_add_metadata.html
- 2024-09-11T00:06:44.358Z
+ 2024-09-11T00:17:49.544Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/4_display_perturbation_images.html
- 2024-09-11T00:06:45.094Z
+ 2024-09-11T00:17:50.284Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/explanations/FAQ.html
- 2024-09-11T00:05:30.487Z
+ 2024-09-11T00:16:10.032Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/explanations/Resources.html
- 2024-09-11T00:05:30.487Z
+ 2024-09-11T00:16:10.032Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/index.html
- 2024-09-11T00:05:30.487Z
+ 2024-09-11T00:16:10.032Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/readme.html
- 2024-09-11T00:05:30.491Z
+ 2024-09-11T00:16:10.032Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/explanations/glossary.html
- 2024-09-11T00:05:30.487Z
+ 2024-09-11T00:16:10.032Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/5_explore_distance_clusters.html
- 2024-09-11T00:06:45.466Z
+ 2024-09-11T00:17:50.652Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/1_retrieve_profiles.html
- 2024-09-11T00:06:43.978Z
+ 2024-09-11T00:17:49.168Z
https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/6_query_genes_externally.html
- 2024-09-11T00:06:45.834Z
+ 2024-09-11T00:17:51.008Z