diff --git a/.nojekyll b/.nojekyll index 2e7cd67..b78f9b0 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -78d52888 \ No newline at end of file +62823df9 \ No newline at end of file diff --git a/explanations/FAQ.ipynb b/explanations/FAQ.ipynb index acf9a6e..4624cb4 100644 --- a/explanations/FAQ.ipynb +++ b/explanations/FAQ.ipynb @@ -152,7 +152,7 @@ " of these replicates’ value was in turn the mean of all the sites\n", " and cells in a given well." ], - "id": "17ad2136-1321-43fb-b48a-fdcbc379f541" + "id": "3bc637b6-fba2-47bc-8dc5-99b0b989cdb6" } ], "nbformat": 4, diff --git a/explanations/Resources.ipynb b/explanations/Resources.ipynb index e099eb5..ffe7ed7 100644 --- a/explanations/Resources.ipynb +++ b/explanations/Resources.ipynb @@ -28,7 +28,7 @@ " [website](https://www.springscience.com/jump-cp) for data\n", " exploration (account needed)." ], - "id": "d986c00a-eb84-4ab5-a0dd-ad581a7b6da8" + "id": "9425c958-294b-4f8e-802d-cc560ff596f6" } ], "nbformat": 4, diff --git a/explanations/glossary.ipynb b/explanations/glossary.ipynb index 7dd1331..1ff843a 100644 --- a/explanations/glossary.ipynb +++ b/explanations/glossary.ipynb @@ -63,7 +63,7 @@ "for compound probes). q-value: Expected False Discovery Rate (FDR): the\n", "proportion of false positives among all positive results." ], - "id": "44baf1c0-9d2c-4cc0-a21e-abbc1c056356" + "id": "90f98b40-68c7-4a7a-a020-d9bc61df80ef" } ], "nbformat": 4, diff --git a/howto/1_retrieve_profiles.html b/howto/1_retrieve_profiles.html index 8f050c6..cf6e57b 100644 --- a/howto/1_retrieve_profiles.html +++ b/howto/1_retrieve_profiles.html @@ -258,7 +258,7 @@

Retrieve JUMP profiles

This is a tutorial on how to access profiles from the JUMP Cell Painting datasets. We will use polars to fetch the data frames lazily, with the help of s3fs and pyarrow. We prefer lazy loading because the data can be too big to be handled in memory.

-
+
Code
import polars as pl
@@ -271,14 +271,14 @@

Retrieve JUMP profiles

  • cpg0016-jump[compound]: Chemical perturbations.
  • Their explicit location is determined by the transformations that produce the datasets. The aws paths of the dataframes are built from a prefix below:

    -
    +
    Code
    INDEX_FILE = "https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"

    We use a version-controlled csv to release the latest corrected profiles

    -
    +
    Code
    profile_index = pl.read_csv(INDEX_FILE)
    @@ -340,7 +340,7 @@ 

    Retrieve JUMP profiles

    We do not need the ‘etag’ (used to check file integrity) column nor the ‘interpretable’ (i.e., before major modifications)

    -
    +
    Code
    selected_profiles = profile_index.filter(
    @@ -354,7 +354,7 @@ 

    Retrieve JUMP profiles

    We will lazy-load the dataframes and print the number of rows and columns

    -
    +
    Code
    info = {k: [] for k in ("dataset", "#rows", "#cols", "#Metadata cols", "Size (MB)")}
    @@ -427,7 +427,7 @@ 

    Retrieve JUMP profiles

    Let us now focus on the crispr dataset and use a regex to select the metadata columns. We will then sample rows and display the overview. Note that the collect() method enforces loading some data into memory.

    -
    +
    Code
    data = pl.scan_parquet(filepaths["crispr"])
    @@ -496,7 +496,7 @@ 

    Retrieve JUMP profiles

    The following line excludes the metadata columns:

    -
    +
    Code
    data_only = data.select(pl.all().exclude("^Metadata.*$").sample(n=5, seed=1)).collect()
    @@ -1062,7 +1062,7 @@ 

    Retrieve JUMP profiles

    Finally, we can convert this to pandas if we want to perform analyses with that tool. Keep in mind that this loads the entire dataframe into memory.

    -
    +
    Code
    data_only.to_pandas()
    diff --git a/howto/1_retrieve_profiles.ipynb b/howto/1_retrieve_profiles.ipynb index 1deddce..567209e 100644 --- a/howto/1_retrieve_profiles.ipynb +++ b/howto/1_retrieve_profiles.ipynb @@ -12,7 +12,7 @@ "and `pyarrow`. We prefer lazy loading because the data can be too big to\n", "be handled in memory." ], - "id": "00451035-7dfb-4a3d-97a2-acf91788a5b7" + "id": "47bec2df-a884-445c-b717-b76787be64f7" }, { "cell_type": "code", @@ -24,7 +24,7 @@ "source": [ "import polars as pl" ], - "id": "1ab03215" + "id": "2de578e8" }, { "cell_type": "markdown", @@ -40,7 +40,7 @@ "produce the datasets. The aws paths of the dataframes are built from a\n", "prefix below:" ], - "id": "6cdfb466-2e7b-4fb1-be8a-17612cbff44d" + "id": "a2b1ab4e-28a3-49be-bb90-36ac754e7ae0" }, { "cell_type": "code", @@ -52,7 +52,7 @@ "source": [ "INDEX_FILE = \"https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv\"" ], - "id": "6d4f44e7" + "id": "31129b7b" }, { "cell_type": "markdown", @@ -60,7 +60,7 @@ "source": [ "We use a version-controlled csv to release the latest corrected profiles" ], - "id": "c1581cb4-93ff-4c54-82df-ca3c5add732c" + "id": "62b0d62b-4fa0-4a6f-a0ac-3ff51669034e" }, { "cell_type": "code", @@ -81,7 +81,7 @@ "profile_index = pl.read_csv(INDEX_FILE)\n", "profile_index.head()" ], - "id": "64533136" + "id": "52fa9980" }, { "cell_type": "markdown", @@ -90,7 +90,7 @@ "We do not need the ‘etag’ (used to check file integrity) column nor the\n", "‘interpretable’ (i.e., before major modifications)" ], - "id": "d2910b99-e47e-4ad3-b234-b2b1a9d5f048" + "id": "22349e94-37c7-4811-91de-8f73e77ab612" }, { "cell_type": "code", @@ -112,7 +112,7 @@ "filepaths = dict(selected_profiles.iter_rows())\n", "print(filepaths)" ], - "id": "480cf3a2" + "id": "12faa4e1" }, { "cell_type": "markdown", @@ -121,7 +121,7 @@ "We will lazy-load the dataframes and print the number of rows and\n", "columns" ], - "id": "8d1eb3da-fb2e-458b-9d4a-bf76a66886e8" + "id": "90fda18a-6fab-46f8-af7c-860bce8dfe71" }, { "cell_type": "code", @@ -153,7 +153,7 @@ "\n", "pl.DataFrame(info)" ], - "id": "6544f26f" + "id": "ebde11f4" }, { "cell_type": "markdown", @@ -163,7 +163,7 @@ "metadata columns. We will then sample rows and display the overview.\n", "Note that the collect() method enforces loading some data into memory." ], - "id": "ee78d621-784c-47ad-b2e4-223eda176ac1" + "id": "4580b7df-4169-4e10-a5bc-a19d118591eb" }, { "cell_type": "code", @@ -184,7 +184,7 @@ "data = pl.scan_parquet(filepaths[\"crispr\"])\n", "data.select(pl.col(\"^Metadata.*$\").sample(n=5, seed=1)).collect()" ], - "id": "a7fce019" + "id": "83923b7c" }, { "cell_type": "markdown", @@ -192,7 +192,7 @@ "source": [ "The following line excludes the metadata columns:" ], - "id": "5c329801-709a-4905-b054-1eb58d179391" + "id": "9e21bfeb-0e23-430f-821e-e8c88280ebbc" }, { "cell_type": "code", @@ -213,7 +213,7 @@ "data_only = data.select(pl.all().exclude(\"^Metadata.*$\").sample(n=5, seed=1)).collect()\n", "data_only" ], - "id": "4c7da927" + "id": "b0717d22" }, { "cell_type": "markdown", @@ -223,7 +223,7 @@ "with that tool. Keep in mind that this loads the entire dataframe into\n", "memory." ], - "id": "da1b8576-76fd-4de7-ad3f-0c693bd63d27" + "id": "bef13299-3a4d-45c8-ab71-10baf090e549" }, { "cell_type": "code", @@ -245,7 +245,7 @@ "source": [ "data_only.to_pandas()" ], - "id": "a134dad9" + "id": "c39531ca" } ], "nbformat": 4, diff --git a/howto/2_add_metadata.html b/howto/2_add_metadata.html index c57264e..ab4356a 100644 --- a/howto/2_add_metadata.html +++ b/howto/2_add_metadata.html @@ -258,7 +258,7 @@

    Incorporate metadata into profiles

    A very common task when processing morphological profiles is knowing which ones are treatments and which ones are controls. Here we will explore how we can use broad-babel to accomplish this task.

    -
    +
    Code
    import polars as pl
    @@ -266,7 +266,7 @@ 

    Incorporate metadata into profiles

    We will be using the CRISPR dataset specificed in our index csv.

    -
    +
    Code
    INDEX_FILE = "https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"
    @@ -279,7 +279,7 @@ 

    Incorporate metadata into profiles

    For simplicity the contents of our processed profiles are minimal: “The profile origin” (source, plate and well) and the unique JUMP identifier for that perturbation. We will use broad-babel to further expand on this metadata, but for simplicity’s sake let us sample subset of data.

    -
    +
    Code
    jcp_ids = (
    @@ -305,7 +305,7 @@ 

    Incorporate metadata into profiles

    We will use these JUMP ids to obtain a mapper that indicates the perturbation type (trt, negcon or, rarely, poscon)

    -
    +
    Code
    pert_mapper = get_mapper(
    @@ -329,7 +329,7 @@ 

    Incorporate metadata into profiles

    A couple of important notes about broad_babel’s get mapper and other functions: - these must be fed tuples, as these are cached and provide significant speed-ups for repeated calls - ‘get-mapper’ works for datasets for up to a few tens of thousands of samples. If you try to use it to get a mapper for the entirety of the ‘compounds’ dataset it is likely to fail. For these cases we suggest the more general function ‘run_query’. You can read more on this and other use-cases on Babel’s readme.

    We will now repeat the process to get their ‘standard’ name

    -
    +
    Code
    name_mapper = get_mapper(
    @@ -354,7 +354,7 @@ 

    Incorporate metadata into profiles

    To wrap up, we will fetch all the available profiles for these perturbations and use the mappers to add the missing metadata. We also select a few features to showcase how how selection can be performed in polars.

    -
    +
    Code
    subsample_profiles = profiles.filter(
    diff --git a/howto/2_add_metadata.ipynb b/howto/2_add_metadata.ipynb
    index 1094561..bbea796 100644
    --- a/howto/2_add_metadata.ipynb
    +++ b/howto/2_add_metadata.ipynb
    @@ -10,7 +10,7 @@
             "which ones are treatments and which ones are controls. Here we will\n",
             "explore how we can use broad-babel to accomplish this task."
           ],
    -      "id": "45ca3be5-e1cb-426f-9d42-e26792dc9315"
    +      "id": "da3ac58d-0ed3-4e12-8542-7f71905cd2e7"
         },
         {
           "cell_type": "code",
    @@ -23,7 +23,7 @@
             "import polars as pl\n",
             "from broad_babel.query import get_mapper"
           ],
    -      "id": "6a23b41f"
    +      "id": "e86bf6be"
         },
         {
           "cell_type": "markdown",
    @@ -31,7 +31,7 @@
           "source": [
             "We will be using the CRISPR dataset specificed in our index csv."
           ],
    -      "id": "4e2482bb-d195-4492-87d4-405c31d6a5a1"
    +      "id": "0153dfa3-79c6-4134-8eb0-dfbdaf1b055e"
         },
         {
           "cell_type": "code",
    @@ -54,7 +54,7 @@
             "profiles = pl.scan_parquet(CRISPR_URL)\n",
             "print(profiles.collect_schema().names()[:6])"
           ],
    -      "id": "e5a8c1da"
    +      "id": "038abb17"
         },
         {
           "cell_type": "markdown",
    @@ -65,7 +65,7 @@
             "for that perturbation. We will use broad-babel to further expand on this\n",
             "metadata, but for simplicity’s sake let us sample subset of data."
           ],
    -      "id": "78e7df4f-24ba-4313-b05c-65adacda1d8e"
    +      "id": "81c05e27-1ce5-4a89-af6e-58da9958dbb3"
         },
         {
           "cell_type": "code",
    @@ -103,7 +103,7 @@
             "subsample = (*subsample, \"JCP2022_800002\")\n",
             "subsample"
           ],
    -      "id": "8639fb0b"
    +      "id": "13c5e916"
         },
         {
           "cell_type": "markdown",
    @@ -112,7 +112,7 @@
             "We will use these JUMP ids to obtain a mapper that indicates the\n",
             "perturbation type (trt, negcon or, rarely, poscon)"
           ],
    -      "id": "a2478cfe-073c-4c47-89ab-83928eb806e5"
    +      "id": "2f8be60a-99c5-4b97-936f-656887dc6e8d"
         },
         {
           "cell_type": "code",
    @@ -147,7 +147,7 @@
             ")\n",
             "pert_mapper"
           ],
    -      "id": "6f5caa7e"
    +      "id": "90684b27"
         },
         {
           "cell_type": "markdown",
    @@ -164,7 +164,7 @@
             "\n",
             "We will now repeat the process to get their ‘standard’ name"
           ],
    -      "id": "156cb830-0a02-462a-9d23-d11f0c5bbc3d"
    +      "id": "92af4ce5-15b4-4bff-b845-b60d82066fcb"
         },
         {
           "cell_type": "code",
    @@ -201,7 +201,7 @@
             ")\n",
             "name_mapper"
           ],
    -      "id": "9fbd9b1a"
    +      "id": "e5ad1d97"
         },
         {
           "cell_type": "markdown",
    @@ -212,7 +212,7 @@
             "select a few features to showcase how how selection can be performed in\n",
             "polars."
           ],
    -      "id": "bfebdbe0-edb9-4e6a-b0da-8a818621dab7"
    +      "id": "79f47295-6816-4345-88dc-56300a6d27e1"
         },
         {
           "cell_type": "code",
    @@ -243,7 +243,7 @@
             "    pl.col((\"name\", \"pert_type\", \"^Metadata.*$\", \"^X_[0-3]$\"))\n",
             ").sort(by=\"pert_type\")"
           ],
    -      "id": "8b57de41"
    +      "id": "ff9e1a6f"
         }
       ],
       "nbformat": 4,
    diff --git a/howto/3_calculate_activity.html b/howto/3_calculate_activity.html
    index 3019fa0..6293fcf 100644
    --- a/howto/3_calculate_activity.html
    +++ b/howto/3_calculate_activity.html
    @@ -259,7 +259,7 @@ 

    Calculate phenotypic activity

    A common first analysis for morphological datasets is the activity of the cells’ phenotypes. We will use the copairs package, which makes use of mean average precision to obtain a metric of replicability for any set of morphological profiles. In other words, it indicates how similar a given set of compounds are, relative to their negative controls, which is usually cells that have experienced no perturbation.

    -
    +
    Code
    import polars as pl
    @@ -270,7 +270,7 @@ 

    Calculate phenotypic activity

    We will be using the CRISPR dataset specificed in our index csv, but we will select a subset of perturbations and the controls present.

    -
    +
    Code
    INDEX_FILE = "https://raw.githubusercontent.com/jump-cellpainting/datasets/50cd2ab93749ccbdb0919d3adf9277c14b6343dd/manifests/profile_index.csv"
    @@ -279,7 +279,7 @@ 

    Calculate phenotypic activity

    Sample perturbations and add known negative control.

    -
    +
    Code
    jcp_ids = (
    @@ -298,7 +298,7 @@ 

    Calculate phenotypic activity

    Now we create a mapper to label treatments and controls. See the previous tutorial for details on fetching metadata.

    -
    +
    Code
    pert_mapper = get_mapper(
    @@ -310,7 +310,7 @@ 

    Calculate phenotypic activity

    Finally we use the parameters from . See the copairs wiki for more details on the parameters that copairs requires.

    -
    +
    Code
    pos_sameby = ["Metadata_JCP2022"]  # We want to match perturbations
    @@ -339,12 +339,12 @@ 

    Calculate phenotypic activity

    @@ -439,7 +439,7 @@

    Calculate phenotypic activity

    The result of copairs is a dataframe containing, in addition to the original metadata, the average precision with which perturbations were retrieved. Perturbations that look more similar to each other than to the negative controls in the plates present in the same plates will be higher. Perturbations that do not differentiate themselves against negative controls will be closer to zero.

    To wrap up we pull the standard gene symbol and plot the distribution of average precision.

    -
    +
    Code
    name_mapper = get_mapper(
    @@ -467,7 +467,7 @@ 

    Calculate phenotypic activity

    + @@ -100,9 +100,6 @@ "search-label": "Search" } } - - - @@ -258,7 +255,7 @@

    Query information of genes

    This how-to focuses on linking gene names outside. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to keep in hand..

    -
    +
    Code
    import polars as pl
    @@ -267,7 +264,7 @@ 

    Query information of genes

    We define

    -
    +
    Code
    Entrez.email = "example@email.com"
    @@ -280,14 +277,14 @@ 

    Query information of genes

    We will use a set of genes that we found in a JUMP cluster as an example.

    -
    +
    Code
    genes = ("CHRM4", "SCAPER", "GPR176", "LY6K")

    Get the

    -
    +
    Code
    # Get a dictionary that maps Gene symbols to Entrez IDs
    @@ -308,65 +305,76 @@ 

    Query information of genes

    )
    -
    +
    Code -
    pl.DataFrame(entries)
    +
    with pl.Config(fmt_str_lengths=1000):
    +    print(pl.DataFrame(entries))
    -
    -
    -
    -shape: (4, 4) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    NameDescriptionSummaryOtherDesignations
    strstrstrstr
    "GPR176""G protein-coupled receptor 176""Members of the G protein-coupl…"G-protein coupled receptor 176…
    "CHRM4""cholinergic receptor muscarini…"The muscarinic cholinergic rec…"muscarinic acetylcholine recep…
    "LY6K""lymphocyte antigen 6 family me…"Predicted to be involved in bi…"lymphocyte antigen 6K|cancer/t…
    "SCAPER""S-phase cyclin A associated pr…"Predicted to enable nucleic ac…"S phase cyclin A-associated pr…
    -
    -
    +
    +
    shape: (4, 4)
    +┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐
    +│ Name   ┆ Description                 ┆ Summary                     ┆ OtherDesignations           │
    +│ ---    ┆ ---                         ┆ ---                         ┆ ---                         │
    +│ str    ┆ str                         ┆ str                         ┆ str                         │
    +╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡
    +│ GPR176 ┆ G protein-coupled receptor  ┆ Members of the G            ┆ G-protein coupled receptor  │
    +│        ┆ 176                         ┆ protein-coupled receptor    ┆ 176|probable G-protein      │
    +│        ┆                             ┆ family, such as GPR176, are ┆ coupled receptor 176        │
    +│        ┆                             ┆ cell surface receptors      ┆                             │
    +│        ┆                             ┆ involved in responses to    ┆                             │
    +│        ┆                             ┆ hormones, growth factors,   ┆                             │
    +│        ┆                             ┆ and neurotransmitters (Hata ┆                             │
    +│        ┆                             ┆ et al., 1995 [PubMed        ┆                             │
    +│        ┆                             ┆ 7893747]).[supplied by      ┆                             │
    +│        ┆                             ┆ OMIM, Jul 2008]             ┆                             │
    +│ CHRM4  ┆ cholinergic receptor        ┆ The muscarinic cholinergic  ┆ muscarinic acetylcholine    │
    +│        ┆ muscarinic 4                ┆ receptors belong to a       ┆ receptor M4|acetylcholine   │
    +│        ┆                             ┆ larger family of G          ┆ receptor, muscarinic 4      │
    +│        ┆                             ┆ protein-coupled receptors.  ┆                             │
    +│        ┆                             ┆ The functional diversity of ┆                             │
    +│        ┆                             ┆ these receptors is defined  ┆                             │
    +│        ┆                             ┆ by the binding of           ┆                             │
    +│        ┆                             ┆ acetylcholine and includes  ┆                             │
    +│        ┆                             ┆ cellular responses such as  ┆                             │
    +│        ┆                             ┆ adenylate cyclase           ┆                             │
    +│        ┆                             ┆ inhibition,                 ┆                             │
    +│        ┆                             ┆ phosphoinositide            ┆                             │
    +│        ┆                             ┆ degeneration, and potassium ┆                             │
    +│        ┆                             ┆ channel mediation.          ┆                             │
    +│        ┆                             ┆ Muscarinic receptors        ┆                             │
    +│        ┆                             ┆ influence many effects of   ┆                             │
    +│        ┆                             ┆ acetylcholine in the        ┆                             │
    +│        ┆                             ┆ central and peripheral      ┆                             │
    +│        ┆                             ┆ nervous system. The         ┆                             │
    +│        ┆                             ┆ clinical implications of    ┆                             │
    +│        ┆                             ┆ this receptor are unknown;  ┆                             │
    +│        ┆                             ┆ however, mouse studies link ┆                             │
    +│        ┆                             ┆ its function to adenylyl    ┆                             │
    +│        ┆                             ┆ cyclase inhibition.         ┆                             │
    +│        ┆                             ┆ [provided by RefSeq, Jul    ┆                             │
    +│        ┆                             ┆ 2008]                       ┆                             │
    +│ LY6K   ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen          │
    +│        ┆ member K                    ┆ binding activity of sperm   ┆ 6K|cancer/testis antigen    │
    +│        ┆                             ┆ to zona pellucida.          ┆ 97|lymphocyte antigen 6     │
    +│        ┆                             ┆ Predicted to act upstream   ┆ complex, locus              │
    +│        ┆                             ┆ of or within flagellated    ┆ K|up-regulated in lung      │
    +│        ┆                             ┆ sperm motility. Predicted   ┆ cancer 10                   │
    +│        ┆                             ┆ to be located in cell       ┆                             │
    +│        ┆                             ┆ surface; cytoplasm; and     ┆                             │
    +│        ┆                             ┆ plasma membrane. Predicted  ┆                             │
    +│        ┆                             ┆ to be active in acrosomal   ┆                             │
    +│        ┆                             ┆ vesicle. [provided by       ┆                             │
    +│        ┆                             ┆ Alliance of Genome          ┆                             │
    +│        ┆                             ┆ Resources, Apr 2022]        ┆                             │
    +│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │
    +│        ┆ protein in the ER           ┆ acid binding activity and   ┆ protein in the endoplasmic  │
    +│        ┆                             ┆ zinc ion binding activity.  ┆ reticulum|zinc finger       │
    +│        ┆                             ┆ Located in cytosol and      ┆ protein 291                 │
    +│        ┆                             ┆ nuclear speck. [provided by ┆                             │
    +│        ┆                             ┆ Alliance of Genome          ┆                             │
    +│        ┆                             ┆ Resources, Apr 2022]        ┆                             │
    +└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘
    diff --git a/howto/6_query_genes_externally.ipynb b/howto/6_query_genes_externally.ipynb index 318c2f5..b3683bc 100644 --- a/howto/6_query_genes_externally.ipynb +++ b/howto/6_query_genes_externally.ipynb @@ -14,7 +14,7 @@ "[documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/), which\n", "contains all the options, is a useful reference to keep in hand.." ], - "id": "f25a0c19-a496-4f5a-8636-9fd85f01d21f" + "id": "7d3f7496-4528-4249-8e1c-fba864d239c2" }, { "cell_type": "code", @@ -28,7 +28,7 @@ "from Bio import Entrez\n", "from broad_babel.query import get_mapper" ], - "id": "b20d0818" + "id": "3b44cccd" }, { "cell_type": "markdown", @@ -36,7 +36,7 @@ "source": [ "We define" ], - "id": "3c5dc13a-3a1f-4dc0-b0ee-863cd2e3505d" + "id": "0c4f7c8f-9988-4ded-9dd2-f62563e81b91" }, { "cell_type": "code", @@ -52,7 +52,7 @@ " \"OtherDesignations\", # This gives us synonyms\n", ")" ], - "id": "e2f17932" + "id": "da738c03" }, { "cell_type": "markdown", @@ -61,7 +61,7 @@ "We will use a set of genes that we found in a JUMP cluster as an\n", "example." ], - "id": "19103f9a-d181-4430-81fa-1452c2aabeaf" + "id": "a7a8a2d5-b2c0-481e-9357-1622000a7a0a" }, { "cell_type": "code", @@ -71,7 +71,7 @@ "source": [ "genes = (\"CHRM4\", \"SCAPER\", \"GPR176\", \"LY6K\")" ], - "id": "3dd9e050" + "id": "b8637965" }, { "cell_type": "markdown", @@ -79,7 +79,7 @@ "source": [ "Get the" ], - "id": "cad009a5-b36b-4816-95cb-a4eca3528691" + "id": "6387dff8-fa09-4051-aac6-28a721f09352" }, { "cell_type": "code", @@ -104,7 +104,7 @@ " {k: record[\"DocumentSummarySet\"][\"DocumentSummary\"][0][k] for k in fields}\n", " )" ], - "id": "7b7a39b4" + "id": "0de30308" }, { "cell_type": "code", @@ -114,19 +114,80 @@ }, "outputs": [ { - "output_type": "display_data", - "metadata": {}, - "data": { - "text/html": [ - "
    " - ] - } + "output_type": "stream", + "name": "stdout", + "text": [ + "shape: (4, 4)\n", + "┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐\n", + "│ Name ┆ Description ┆ Summary ┆ OtherDesignations │\n", + "│ --- ┆ --- ┆ --- ┆ --- │\n", + "│ str ┆ str ┆ str ┆ str │\n", + "╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡\n", + "│ GPR176 ┆ G protein-coupled receptor ┆ Members of the G ┆ G-protein coupled receptor │\n", + "│ ┆ 176 ┆ protein-coupled receptor ┆ 176|probable G-protein │\n", + "│ ┆ ┆ family, such as GPR176, are ┆ coupled receptor 176 │\n", + "│ ┆ ┆ cell surface receptors ┆ │\n", + "│ ┆ ┆ involved in responses to ┆ │\n", + "│ ┆ ┆ hormones, growth factors, ┆ │\n", + "│ ┆ ┆ and neurotransmitters (Hata ┆ │\n", + "│ ┆ ┆ et al., 1995 [PubMed ┆ │\n", + "│ ┆ ┆ 7893747]).[supplied by ┆ │\n", + "│ ┆ ┆ OMIM, Jul 2008] ┆ │\n", + "│ CHRM4 ┆ cholinergic receptor ┆ The muscarinic cholinergic ┆ muscarinic acetylcholine │\n", + "│ ┆ muscarinic 4 ┆ receptors belong to a ┆ receptor M4|acetylcholine │\n", + "│ ┆ ┆ larger family of G ┆ receptor, muscarinic 4 │\n", + "│ ┆ ┆ protein-coupled receptors. ┆ │\n", + "│ ┆ ┆ The functional diversity of ┆ │\n", + "│ ┆ ┆ these receptors is defined ┆ │\n", + "│ ┆ ┆ by the binding of ┆ │\n", + "│ ┆ ┆ acetylcholine and includes ┆ │\n", + "│ ┆ ┆ cellular responses such as ┆ │\n", + "│ ┆ ┆ adenylate cyclase ┆ │\n", + "│ ┆ ┆ inhibition, ┆ │\n", + "│ ┆ ┆ phosphoinositide ┆ │\n", + "│ ┆ ┆ degeneration, and potassium ┆ │\n", + "│ ┆ ┆ channel mediation. ┆ │\n", + "│ ┆ ┆ Muscarinic receptors ┆ │\n", + "│ ┆ ┆ influence many effects of ┆ │\n", + "│ ┆ ┆ acetylcholine in the ┆ │\n", + "│ ┆ ┆ central and peripheral ┆ │\n", + "│ ┆ ┆ nervous system. The ┆ │\n", + "│ ┆ ┆ clinical implications of ┆ │\n", + "│ ┆ ┆ this receptor are unknown; ┆ │\n", + "│ ┆ ┆ however, mouse studies link ┆ │\n", + "│ ┆ ┆ its function to adenylyl ┆ │\n", + "│ ┆ ┆ cyclase inhibition. ┆ │\n", + "│ ┆ ┆ [provided by RefSeq, Jul ┆ │\n", + "│ ┆ ┆ 2008] ┆ │\n", + "│ LY6K ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen │\n", + "│ ┆ member K ┆ binding activity of sperm ┆ 6K|cancer/testis antigen │\n", + "│ ┆ ┆ to zona pellucida. ┆ 97|lymphocyte antigen 6 │\n", + "│ ┆ ┆ Predicted to act upstream ┆ complex, locus │\n", + "│ ┆ ┆ of or within flagellated ┆ K|up-regulated in lung │\n", + "│ ┆ ┆ sperm motility. Predicted ┆ cancer 10 │\n", + "│ ┆ ┆ to be located in cell ┆ │\n", + "│ ┆ ┆ surface; cytoplasm; and ┆ │\n", + "│ ┆ ┆ plasma membrane. Predicted ┆ │\n", + "│ ┆ ┆ to be active in acrosomal ┆ │\n", + "│ ┆ ┆ vesicle. [provided by ┆ │\n", + "│ ┆ ┆ Alliance of Genome ┆ │\n", + "│ ┆ ┆ Resources, Apr 2022] ┆ │\n", + "│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │\n", + "│ ┆ protein in the ER ┆ acid binding activity and ┆ protein in the endoplasmic │\n", + "│ ┆ ┆ zinc ion binding activity. ┆ reticulum|zinc finger │\n", + "│ ┆ ┆ Located in cytosol and ┆ protein 291 │\n", + "│ ┆ ┆ nuclear speck. [provided by ┆ │\n", + "│ ┆ ┆ Alliance of Genome ┆ │\n", + "│ ┆ ┆ Resources, Apr 2022] ┆ │\n", + "└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘" + ] } ], "source": [ - "pl.DataFrame(entries)" + "with pl.Config(fmt_str_lengths=1000):\n", + " print(pl.DataFrame(entries))" ], - "id": "f07b6354" + "id": "5bcdb87b" } ], "nbformat": 4, diff --git a/index.ipynb b/index.ipynb index d199566..db2d079 100644 --- a/index.ipynb +++ b/index.ipynb @@ -54,7 +54,7 @@ "novel biological insights. We aim to make this the one-stop shop for the\n", "vast majority of JUMP questions, be it computational or biological." ], - "id": "44e3e6f8-f1db-4629-bae1-cacd44dfe950" + "id": "5a9b1cf8-debe-488b-9378-6cb182fc163c" } ], "nbformat": 4, diff --git a/readme.ipynb b/readme.ipynb index 4f81681..1d4d895 100644 --- a/readme.ipynb +++ b/readme.ipynb @@ -17,7 +17,7 @@ "This repository can be used as a way to install essential dependencies\n", "for an exploratory analysis of JUMP morphological data." ], - "id": "6491d325-c6f3-476f-b778-3d2cd1b8ffe7" + "id": "417cf2c1-e9ba-4a35-a98d-c168bfa9716d" } ], "nbformat": 4, diff --git a/search.json b/search.json index a404379..f7eb378 100644 --- a/search.json +++ b/search.json @@ -182,7 +182,7 @@ "href": "howto/6_query_genes_externally.html", "title": "Query information of genes", "section": "", - "text": "This how-to focuses on linking gene names outside. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to keep in hand..\n\n\nCode\nimport polars as pl\nfrom Bio import Entrez\nfrom broad_babel.query import get_mapper\n\n\nWe define\n\n\nCode\nEntrez.email = \"example@email.com\"\nfields = (\n \"Name\",\n \"Description\",\n \"Summary\",\n \"OtherDesignations\", # This gives us synonyms\n)\n\n\nWe will use a set of genes that we found in a JUMP cluster as an example.\n\n\nCode\ngenes = (\"CHRM4\", \"SCAPER\", \"GPR176\", \"LY6K\")\n\n\nGet the\n\n\nCode\n# Get a dictionary that maps Gene symbols to Entrez IDs\nids = get_mapper(\n query=genes,\n input_column=\"standard_key\",\n output_columns=\"standard_key,NCBI_Gene_ID\",\n)\n\n# Fetch the summaries for these genes\nentries = []\nfor id_ in ids.values():\n stream = Entrez.esummary(db=\"gene\", id=id_)\n record = Entrez.read(stream)\n\n entries.append(\n {k: record[\"DocumentSummarySet\"][\"DocumentSummary\"][0][k] for k in fields}\n )\n\n\n\n\nCode\npl.DataFrame(entries)\n\n\n\n\nshape: (4, 4)\n\n\n\nName\nDescription\nSummary\nOtherDesignations\n\n\nstr\nstr\nstr\nstr\n\n\n\n\n\"GPR176\"\n\"G protein-coupled receptor 176\"\n\"Members of the G protein-coupl…\n\"G-protein coupled receptor 176…\n\n\n\"CHRM4\"\n\"cholinergic receptor muscarini…\n\"The muscarinic cholinergic rec…\n\"muscarinic acetylcholine recep…\n\n\n\"LY6K\"\n\"lymphocyte antigen 6 family me…\n\"Predicted to be involved in bi…\n\"lymphocyte antigen 6K|cancer/t…\n\n\n\"SCAPER\"\n\"S-phase cyclin A associated pr…\n\"Predicted to enable nucleic ac…\n\"S phase cyclin A-associated pr…", + "text": "This how-to focuses on linking gene names outside. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to keep in hand..\n\n\nCode\nimport polars as pl\nfrom Bio import Entrez\nfrom broad_babel.query import get_mapper\n\n\nWe define\n\n\nCode\nEntrez.email = \"example@email.com\"\nfields = (\n \"Name\",\n \"Description\",\n \"Summary\",\n \"OtherDesignations\", # This gives us synonyms\n)\n\n\nWe will use a set of genes that we found in a JUMP cluster as an example.\n\n\nCode\ngenes = (\"CHRM4\", \"SCAPER\", \"GPR176\", \"LY6K\")\n\n\nGet the\n\n\nCode\n# Get a dictionary that maps Gene symbols to Entrez IDs\nids = get_mapper(\n query=genes,\n input_column=\"standard_key\",\n output_columns=\"standard_key,NCBI_Gene_ID\",\n)\n\n# Fetch the summaries for these genes\nentries = []\nfor id_ in ids.values():\n stream = Entrez.esummary(db=\"gene\", id=id_)\n record = Entrez.read(stream)\n\n entries.append(\n {k: record[\"DocumentSummarySet\"][\"DocumentSummary\"][0][k] for k in fields}\n )\n\n\n\n\nCode\nwith pl.Config(fmt_str_lengths=1000):\n print(pl.DataFrame(entries))\n\n\nshape: (4, 4)\n┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐\n│ Name ┆ Description ┆ Summary ┆ OtherDesignations │\n│ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str │\n╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡\n│ GPR176 ┆ G protein-coupled receptor ┆ Members of the G ┆ G-protein coupled receptor │\n│ ┆ 176 ┆ protein-coupled receptor ┆ 176|probable G-protein │\n│ ┆ ┆ family, such as GPR176, are ┆ coupled receptor 176 │\n│ ┆ ┆ cell surface receptors ┆ │\n│ ┆ ┆ involved in responses to ┆ │\n│ ┆ ┆ hormones, growth factors, ┆ │\n│ ┆ ┆ and neurotransmitters (Hata ┆ │\n│ ┆ ┆ et al., 1995 [PubMed ┆ │\n│ ┆ ┆ 7893747]).[supplied by ┆ │\n│ ┆ ┆ OMIM, Jul 2008] ┆ │\n│ CHRM4 ┆ cholinergic receptor ┆ The muscarinic cholinergic ┆ muscarinic acetylcholine │\n│ ┆ muscarinic 4 ┆ receptors belong to a ┆ receptor M4|acetylcholine │\n│ ┆ ┆ larger family of G ┆ receptor, muscarinic 4 │\n│ ┆ ┆ protein-coupled receptors. ┆ │\n│ ┆ ┆ The functional diversity of ┆ │\n│ ┆ ┆ these receptors is defined ┆ │\n│ ┆ ┆ by the binding of ┆ │\n│ ┆ ┆ acetylcholine and includes ┆ │\n│ ┆ ┆ cellular responses such as ┆ │\n│ ┆ ┆ adenylate cyclase ┆ │\n│ ┆ ┆ inhibition, ┆ │\n│ ┆ ┆ phosphoinositide ┆ │\n│ ┆ ┆ degeneration, and potassium ┆ │\n│ ┆ ┆ channel mediation. ┆ │\n│ ┆ ┆ Muscarinic receptors ┆ │\n│ ┆ ┆ influence many effects of ┆ │\n│ ┆ ┆ acetylcholine in the ┆ │\n│ ┆ ┆ central and peripheral ┆ │\n│ ┆ ┆ nervous system. The ┆ │\n│ ┆ ┆ clinical implications of ┆ │\n│ ┆ ┆ this receptor are unknown; ┆ │\n│ ┆ ┆ however, mouse studies link ┆ │\n│ ┆ ┆ its function to adenylyl ┆ │\n│ ┆ ┆ cyclase inhibition. ┆ │\n│ ┆ ┆ [provided by RefSeq, Jul ┆ │\n│ ┆ ┆ 2008] ┆ │\n│ LY6K ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen │\n│ ┆ member K ┆ binding activity of sperm ┆ 6K|cancer/testis antigen │\n│ ┆ ┆ to zona pellucida. ┆ 97|lymphocyte antigen 6 │\n│ ┆ ┆ Predicted to act upstream ┆ complex, locus │\n│ ┆ ┆ of or within flagellated ┆ K|up-regulated in lung │\n│ ┆ ┆ sperm motility. Predicted ┆ cancer 10 │\n│ ┆ ┆ to be located in cell ┆ │\n│ ┆ ┆ surface; cytoplasm; and ┆ │\n│ ┆ ┆ plasma membrane. Predicted ┆ │\n│ ┆ ┆ to be active in acrosomal ┆ │\n│ ┆ ┆ vesicle. [provided by ┆ │\n│ ┆ ┆ Alliance of Genome ┆ │\n│ ┆ ┆ Resources, Apr 2022] ┆ │\n│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │\n│ ┆ protein in the ER ┆ acid binding activity and ┆ protein in the endoplasmic │\n│ ┆ ┆ zinc ion binding activity. ┆ reticulum|zinc finger │\n│ ┆ ┆ Located in cytosol and ┆ protein 291 │\n│ ┆ ┆ nuclear speck. [provided by ┆ │\n│ ┆ ┆ Alliance of Genome ┆ │\n│ ┆ ┆ Resources, Apr 2022] ┆ │\n└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘", "crumbs": [ "How-To Guides", "Query information of genes" diff --git a/sitemap.xml b/sitemap.xml index d356649..07b0fc8 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,46 +2,46 @@ https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/3_calculate_activity.html - 2024-09-11T00:06:44.718Z + 2024-09-11T00:17:49.904Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/2_add_metadata.html - 2024-09-11T00:06:44.358Z + 2024-09-11T00:17:49.544Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/4_display_perturbation_images.html - 2024-09-11T00:06:45.094Z + 2024-09-11T00:17:50.284Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/explanations/FAQ.html - 2024-09-11T00:05:30.487Z + 2024-09-11T00:16:10.032Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/explanations/Resources.html - 2024-09-11T00:05:30.487Z + 2024-09-11T00:16:10.032Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/index.html - 2024-09-11T00:05:30.487Z + 2024-09-11T00:16:10.032Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/readme.html - 2024-09-11T00:05:30.491Z + 2024-09-11T00:16:10.032Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/explanations/glossary.html - 2024-09-11T00:05:30.487Z + 2024-09-11T00:16:10.032Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/5_explore_distance_clusters.html - 2024-09-11T00:06:45.466Z + 2024-09-11T00:17:50.652Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/1_retrieve_profiles.html - 2024-09-11T00:06:43.978Z + 2024-09-11T00:17:49.168Z https://broadinstitute.github.io/2023_12_JUMP_data_only_vignettes/howto/6_query_genes_externally.html - 2024-09-11T00:06:45.834Z + 2024-09-11T00:17:51.008Z