Visium heart dataset #67

wxicu · 2024-11-07T08:18:00Z

Hi, thanks for implementing this fantastic tool!

I'm wondering if the Visium heart dataset might be corrupted because the samples GT_IZ_P9 and GT_IZ_P9_rep2 (ACH0012 and ACH0013) appear to be identical. In the meantime, I'm attempting to download the raw data for GT_IZ_P9 from Zenodo (https://zenodo.org/records/6580069) to replace the problematic part in adata. However, I noticed there are additional columns in adata.obs. Would it be possible to share the data preprocessing notebook on GitHub? Thanks a lot!

adata = gc.datasets.visium_heart()
np.array_equal(
    adata[adata.obs['sample'] == 'GT_IZ_P9'].X.toarray(), 
    adata[adata.obs['sample'] == 'GT_IZ_P9_rep2'].X.toarray()
)

The text was updated successfully, but these errors were encountered:

merelkuijs · 2024-11-07T21:29:39Z

@mayarali, I see that we have uploaded our MIBI-TOF pre-processing notebook (under notebooks/processing), but not our Visium heart one. I checked our hackathon repository and saw that Giovanni concatenated the samples, then saved the concatenated object as adata_processed.h5ad. I'll DM you the path of Giovanni's notebook and the location of the sample objects.

I think it would be good to check if Giovanni's data folder contains any duplicates, but I don't have access to the Helmholtz cluster.

merelkuijs · 2024-11-07T22:44:35Z

Thanks for bringing this to our attention, Xichen. I can confirm that the samples are identical, but they aren't supposed to be.

The extra columns you see are probably the columns added after deconvolution. A colleague of ours deconvolved the data using cell2location, but she has since left the lab, and I'm not sure where her script is stored. I've asked her but it might be some days before she replies.

Since the names of the faulty samples are pretty similar, I think something might have gone wrong while saving the deconvolved data. We can check and correct this using our colleague's script.

merelkuijs · 2024-11-25T17:33:37Z

Hi Xichen, thanks for your patience!

Our colleague got back to us. According to her, the deconvolution was performed by the original authors. It was made available at https://cellxgene.cziscience.com/collections/8191c283-0816-424b-9b61-c3e1d6258a77. The authors have uploaded data for each sample separately. I checked GT_IZ_P9 and GT_IZ_P9_rep2. They are different, so it seems like the replication happenend when my colleague concatenated the data.

I will try to correct our data soon. Stay tuned!
Merel

wxicu · 2024-11-25T17:49:05Z

Hi thank you for taking care of this. I am working on the multicell project mentored by @bio-la so I have already tried to fix the data myself and happy to share with you in case it helps. My script might look complicated because I need the raw counts to rerun cell2location, you can just skip it. Also I have noticed that the colleague also annotated the genes. The processed data shared by the original author only provides gene names, so I also try to map back to gene ids and annotate in the same way.

adata = gc.datasets.visium_heart()
adata = adata[adata.obs['sample'] != 'GT_IZ_P9']
adata.layers['normalized'] = adata.X.copy()
adata.X = adata.raw.X
var_all = adata.var

adata_p9 = sc.read(f"{DATA_PATH}/Visium_GT_IZ_P9.h5ad")
adata_p9.layers['normalized'] = adata_p9.X.copy()
adata_p9.X = adata_p9.raw.X

# fetch metadata
adata_p9.obs['tissue'] = "heart left ventricle"
adata_p9.obs['sample'] = "GT_IZ_P9"
adata_p9.obs['disease'] = "myocardial infarction"
adata_p9.obs['organism'] = "Homo sapiens"
adata_p9.obs['assay'] = "Visium Spatial Gene Expression"
adata_p9.obs['ethnicity'] = "European"
adata_p9.obs['condition'] = "GT_IZ"
adata_p9.obs['sex'] = 'male'
adata_p9.obs['development_stage'] = "52-year-old human stage"
adata_p9.obs['cell_type'] = adata_p9.obs['cell_type_original'].map(adata.obs[['cell_type','cell_type_original']].set_index('cell_type_original').to_dict()['cell_type'])

# Fetch feature names and ids
diff_var_name = {'TBCE.1':'TBCE-1',
 'LINC01238.1': 'LINC01238-1',
 'CYB561D2.1': 'CYB561D2-1',
 'MATR3.1': 'MATR3-1',
 'HSPA14.1': 'HSPA14-1',
 'TMSB15B.1': 'TMSB15B-1'
 }
adata_p9_raw = sc.read_10x_mtx(f"{DATA_PATH}/ACH0012/outs/Volumes/RicoData2/MI_project/MI_revisions/HCA_submission/spatial/ACH0012/outs/filtered_feature_bc_matrix")
gene_map = adata_p9_raw.var['gene_ids'].to_dict()
adata_p9.var['feature_id'] = adata_p9.var['features'].apply(lambda x: diff_var_name.get(x, x)).map(gene_map)
adata_p9.var = adata_p9.var.set_index('feature_id', drop=True)

adata = sc.concat([adata, adata_p9], axis=0)
adata.var = adata.var.merge(var_all, left_index=True, right_index=True, how='left')
adata.var = adata.var.merge(adata_p9.var, left_index=True, right_on='feature_id')

wxicu · 2024-11-25T17:50:28Z

The h5ad file is downloaded from: https://zenodo.org/records/6578047

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visium heart dataset #67

Visium heart dataset #67

wxicu commented Nov 7, 2024

merelkuijs commented Nov 7, 2024

merelkuijs commented Nov 7, 2024

merelkuijs commented Nov 25, 2024 •

edited

Loading

wxicu commented Nov 25, 2024 •

edited

Loading

wxicu commented Nov 25, 2024

Visium heart dataset #67

Visium heart dataset #67

Comments

wxicu commented Nov 7, 2024

merelkuijs commented Nov 7, 2024

merelkuijs commented Nov 7, 2024

merelkuijs commented Nov 25, 2024 • edited Loading

wxicu commented Nov 25, 2024 • edited Loading

wxicu commented Nov 25, 2024

merelkuijs commented Nov 25, 2024 •

edited

Loading

wxicu commented Nov 25, 2024 •

edited

Loading