05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #790

maud-p · 2024-10-07T12:20:18Z

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

This PR is following the discussion from the PR#776

Describe the goals of the changes to the analysis module.

On a subselection of samples, I want to try to infer aneuploidy and/or CNV to help identifying normal and cancer cells.

For this, I compare the use of copykat and infercnv. For each of the method, I wanted to compare few parameters.

copykat

In copykat, few parameters can be used to fine-tuned the results. Especially, we can try running copykat with or without a set of normal cells. It is important to note that , CopyKAT had difficulty in predicting tumor and normal cells in the cases of pediatric and liquid tumors that have a few CNAs. CopyKAT provides two ways to bypass this to give certain output instead of being dead staright: 1) input a vector of cell names of known normal cells from the same dataset 2) or try to search for T cells. (see copykat). I thus tested both with and without a reference but I am quite convinced that giving few normal cells help the function.

One parameter I also wanted to test is the clustering method. In copykat, parameters for clustering include "euclidean" distance and correlational distance, ie. 1-"pearson" and "spearman" similarity. In general, corretional distances tend to favor noisy data, while euclidean distance tends to favor data with larger CN segments. I thus tested eucliedean and spearman. In our dataset, I think euclidean (default) is perfoming best.

infercnv

Another way of inferring copy number alterations from tumor single cell RNA-Seq data is using infercnv. In a previous discussion, we were not sure about the impact of the definition of the heatly reference. For that reason, I ran infercvn with no normal cells as reference or immune and/or endothelial cells as reference.

What will your pull request contain?

The PR contains:

R scripts to run copykat and infercnv on a selection of samples
results from 05_copykat.R and 06_infercnv.R that will be transfer via the s3 bucket.
notebook_templates that start looking into copykat and infercnv and compare results
notebooks for a selection of samples

Will you require additional software beyond what is already in the analysis module?

no

Will you require different computational resources beyond what the analysis module already uses?

No response

If known, when do you expect to file the pull request?

today or tomorrow

The text was updated successfully, but these errors were encountered:

sjspielman · 2024-10-07T13:36:01Z

Hi @maud-p, thanks for filing this issue with your plans for the next steps! It sounds like you've taken some time to explore how to best perform these steps, which is great. The one thing I want to say for now is, we want to make sure that decisions you've made here are visible in the module itself. Just as one example, you wrote above,

I thus tested eucliedean and spearman. In our dataset, I think euclidean (default) is perfoming best.

There should be some result in the module that indeed demonstrates this. In other words, we don't want to only have code running euclidean distance without also a notebook or so that can provide evidence for euclidean outperforming spearman to ultimately bolster the results. In this case, I would expect to see a notebook as part of the PR (but of course this can be a few smaller PRs depending on how you are structuring the code, which I'm happy to chat more about strategies for!) that demonstrates why you choose euclidean over spearman.

maud-p · 2024-10-07T13:43:24Z

Hi @sjspielman, thank you for the precision.

For copykat, there will be 2 notebook like 05_cnv_copykat_{distance}_exploration_{sample_id}.html to compare for each distance with, and without using normal cells as reference.

There will be one notebook per sample tested 06_cnv_exploration_{sample_id}.html that just plots the CNV heatmaps of all condition tested, for copykat and infercnv.

A bit long to run and might not be easy to interprete, but that would be the plan ;)
I thought I put all together in one to make it easier. Then we can select 1 or 2 method(s)/parameter combination to investigate more into details in a next step :)

sjspielman · 2024-10-07T14:13:10Z

Sounds like a good plan for notebooks! To help support a faster review, it would be best to split this up into a couple PRs (it seems counterintuitive that more PRs will go faster, but it will help in the end!):

A PR that contains code to just run inferCNV and/or copyKAT (assuming this is in a separate script and/or function)
The 05 notebook you describe above
The 06 notebook you describe above

Let me know if this makes sense for how you've written your code or if you had another idea for how to split this up into smaller PRs.

maud-p added the analysis label Oct 7, 2024

maud-p mentioned this issue Oct 8, 2024

Scripts to run copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #801

Merged

7 tasks

maud-p changed the title ~~05_ aneuploidy inference using copykat for a subselection of Wilms tumor from SCPCP000006~~ 05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 Oct 8, 2024

maud-p mentioned this issue Oct 8, 2024

06_explore cnv results using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #802

Closed

8 tasks

This was referenced Oct 15, 2024

SCPCP000006_05_explore_COPYKAT #813

Merged

06 explore infercnv for Wilms tumor -06 #828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #790

05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #790

maud-p commented Oct 7, 2024

sjspielman commented Oct 7, 2024

maud-p commented Oct 7, 2024

sjspielman commented Oct 7, 2024

05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #790

05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #790

Comments

maud-p commented Oct 7, 2024

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

Describe the goals of the changes to the analysis module.

What will your pull request contain?

Will you require additional software beyond what is already in the analysis module?

Will you require different computational resources beyond what the analysis module already uses?

If known, when do you expect to file the pull request?

sjspielman commented Oct 7, 2024

maud-p commented Oct 7, 2024

sjspielman commented Oct 7, 2024