How to use cellxgene for manual annotations

Introduction

Welcome to our hands-on introduction to the CELLxGENE tool, a key component of our upcoming STSM. This powerful tool is designed to facilitate your exploration of single-cell datasets, providing an array of features to enhance your research experience.

To get started, follow the steps outlined here to familiarize yourself with the tool. We also encourage you to dive in and experiment with CELLxGENE independently. Feel free to sample cells and explore its functionalities at your own pace. Enjoy!

CELLxGENE is a comprehensive toolkit designed for scientists exploring single-cell datasets and atlases. It offers a broad range of features to visualize the data, including giving a view over the different integration methods of diverse datasets, cluster visualization based on metadata annotations, and the ability to visualize individual genes and create/visualize gene sets. The user-friendly interface enhances the efficiency and accessibility of single-cell analysis, allowing for in-depth exploration and collaborative efforts in understanding complex biological information.

For our scRAFIKI (Single-cell RNA-seq Atlas Framework for Integration and Key Insights) project, we have curated two atlases that we invite you to explore using the CELLxGENE tool. You can access and analyze these atlases here:

User interface

The CELLxGENE interface offers a comprehensive overview of atlas data, with each cell represented as a point in the embedding plot at the center. We have implemented three distinct integration methods (harmony, scvi, scanvi), which you can select using the button located below the embedding plot in the left corner. CELLxGENE has a toolbar enabling:

Setting populations to find marker genes
Finding marker genes
Subset cells based on a current selection
Reset to all data
Lasso selection tooltip
Number coloring and move canvas tooltip
Display categorical labels (when coloring by a category)
Clip numerical values
Undoing actions
Redoing actions

On the left side, you'll find all categorical and numerical metadata of the data, starting with the categroical. You can explore all values by expanding the category (>) and color the embedding plot by selecting the "drop" icon on the side.

Scrolling down reveals statistical tests and histograms for further analysis of specific qualities. On the right side, you have options for selecting genes and generating gene sets, which will be explained in more detail in the following steps.

Instructions for Key Procedures

Subsetting (Myeloid) cells

Color the categorical metadata by selecting the "cell_type" of interest (e.g., Myeloid).
Open all values by clicking on the categorical metadata ">" icon.
Choose and select only the "Myeloid" cell type. You will observe that the selected "Myeloid" cells are now visually emphasized in the embedding plot.

Click the "Subset cells based on a current selection" button (3) to create the subset.

Now, you can focus on and analyze genes and gene sets specifically within the subset of Myeloid cells, streamlining your exploration.

Exploring marker genes

Search for genes of interest, such as known marker genes.
Expand the gene entry to visualize the distribution via a histogram.
Color your plot based on the expression of the selected gene.

You can visualize their distribution and observe how they relate to specific cell populations. Additionally, explore bivariate plots to assess potential correlations between the expression of two different genes, by selecting “y” and “x” accordingly.

Exploring gene sets

Create personalized gene sets by adding a name, description, and a list of genes.
Analyze the average distribution of values via the histogram
Select genes for plotting and coloring to investigate co-regulation patterns and expression values.

This functionality empowers you to curate and explore sets of genes, allowing for a more comprehensive analysis of co-regulation patterns and the combined expression dynamics within the datasets.

Enabling marker gene identification

Utilize the toolbar to set populations for marker gene identification (as described above)
You can also select them by using the “lasso selection toollift” (4)
Click on the "Set populations to find marker genes" button (1)
Click on the “Find marker genes" button (2)

This feature enables the identification of marker genes specific to selected populations and will provide gene sets containing the marker genes and their expression values and analyzed more in detail.

Creating new categories

create a new category with the button in the top left corner and name it
add all labels inside the category without assigning any clusters: they should contain 0 cells
select the cells corresponding to the labels individually
assign the selected cells

Categories based on existing Categories (Metadata)

create the new category
if you want to change an already existing category assignment, you can duplicate the labels and assignments from existing categories:

create all labels according to the new distributions:

color by the category you want to annotate
select all clusters for one label:

assign them to the corresponding label:

Here we assigned coarse cell type annotations based on the clustering resolution 2.0 which was generated by the integration method scANVI.

Categories based on genes and gene sets

create all labels according to the corresponding genes or gene sets:

search for the gene/gene sets and clip the values to assign only cells containing them:

assign the cells containing the gene(s) to the corresponding label:

Here we selected genes of our choice for a better visualisation of the distribution.

Reference Material

If these functionalities have sparked your interest, we recommend exploring the official CELLxGENE website and engaging with the tutorials. This will provide you with a comprehensive understanding of the tool's capabilities for your single-cell analysis.

If anything remains unclear, feel free to open a GitHub issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly