CellContrast: Reconstructing Spatial Relationships in Single-Cell RNA Sequencing Data via Deep Contrastive Learning
Contact: Yuanhua Huang, Ruibang Luo, Shumin Li
Email: [email protected], [email protected], [email protected]
CellContrast reconstructs the spatial relationships for single-cell RNA sequencing (SC) data. Its fundamental assumption is that gene expression profiles can be projected into a latent space, where physically proximate cells demonstrate higher similarities. To achieve this, cellContrast employs a contrastive learning framework of an encoder-projector structure. The model was trained with spatial transcriptomics (ST) data and applied to SC data for obtaining the spatially-related representations. The produced results of cellContrast can be used in multiple down-stream tasks that requires spatial information, such as cell-type co-localization and cell-cell communications.
CellContrast's paper describing its algorithms and results is at Cell Patterns.
- v0.1 (Sep, 2023): Initial release.
To install CellContrast, python 3.9 is required and follow the instruction
- Install Miniconda3 if not already available.
- Clone this repository:
git clone https://github.com/HKU-BAL/CellContrast
- Navigate to
CellContrast
directory:
cd CellContrast
- (5-10 minutes) Create a conda environment with the required dependencies:
conda env create -f environment.yml
- Activate the
cellContrast
environment you just created:
conda activate cellContrast
- Install pytorch: You may refer to pytorch installtion as needed. For example, the command of installing a cpu-only pytorch is:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
CellContrast contains 3 main moduels: train
, eval
and inference
, for training model, benchmarking evaluation and inference of spatial relationships, respectively. In addition, We also provide reconstruct
module for integrating train
and inference
. To check available modules, run:
python cellContrast.py -h
python cellContrast.py reconstruct \
--train_data_path train_ST.h5ad ## required, use your ST h5ad file here\
--query_data_path query_sc.h5ad ## path of query SC h5ad file\
--parameter_file parameters/parameters_spot.json ## optional. use the our default for spot or single-cell ST, or your customized parameters here\
--save_folder cellContrast_models/ ## optional, model output path\
--enable_denovo ## optional, run MDS to leverage the SC-SC pairwise distance to 2D pseudo space
--save_path spatial_reconstructed_sc.h5ad \ ## path of of the spatial reconstructed SC data
- Adopt the predefined parameters for imaging-based ST data by setting
--single_cell
.
python cellContrast.py reconstruct \
--train_data_path train_ST.h5ad ## required, use your ST h5ad file here\
--query_data_path query_sc.h5ad ## path of query SC h5ad file\
--single_cell \
--parameter_file parameters/parameters_singleCell.json ## optional. use the our default for spot or single-cell ST, or your customized parameters here\
--save_folder cellContrast_models/ ## optional, model output path\
--enable_denovo ## optional, run MDS to leverage the SC-SC pairwise distance to 2D pseudo space
--save_path spatial_reconstructed_sc.h5ad \ ## path of of the spatial reconstructed SC data
CellContrast model was trained based on ST data (which should be in AnnData format, with truth locations in .obs[['x','y']])
. The model can be trained with the following command:
‼️ Default parameters are defined for sequencing-based ST, adopt the predefined parameters for imaging-based ST data by setting--single_cell
.
python cellContrast.py train \
--train_data_path train_ST.h5ad \ ## required, use your ST h5ad file here
--save_folder cellContrast_models/ \ ## optional, model output path
--single_cell # defaut: not enabled. Set this flag to switch to our prefined parameters for imaging-based ST.
--parameter_file parameters/parameters_singleCell.json ## optional. use the our default for spot or single-cell ST, or your customized parameters here\
## Output file: cellContrast_models/epoch_3000.pt
The peformance of benchmarking can be evaluated with the following command, and three metrics are included: nearest neighbor hit, Jessen-Shannon distance, and Spearman's rank correlation.
python cellContrast.py eval \
--ref_data_path ref_ST.h5ad \ ## path of refernece ST h5ad file
--query_data_path query_ST.h5ad \ ## path of testing h5ad file with truth locations
--model_foldercellContrast_models\ ## folder of trained model
--parameter_file parameters/parameters_singleCell.json ## Take the parameter file you used in the training phase.\
--save_path results.csv \ ## evaluation result path
## Output file: result.csv with neighbor hit, JSD, spearman's rank correlation for each testing sample.
The spatial relationships of SC data can be obtained with the following command:
python cellContrast.py inference \
--ref_data_path train_ST.h5ad \ ## path of refernece ST h5ad file
--query_data_path query_sc.h5ad \ ## path of query SC h5ad file
--model_folder \ ## folder of trained model
--parameter_file parameters/parameters_singleCell.json ## Take the parameter file you used in the training phase.\
--save_path spatial_reconstructed_sc.h5ad \ ## path of of the spatial reconstructed SC data
--enable_denovo \ ## optional, run MDS to leverage the SC-SC pairwise distance to 2D pseudo space
## Output file: spatially reconstructed h5ad file of annData
- what will be newly added in
sptial_reconstructed_sc.h5ad
:.uns[['cosine sim of rep','representation','referenced x','referenced y','de novo x','de novo y']]