-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
05_ aneuploidy inference using copykat and infercnv for a subselection of Wilms tumor from SCPCP000006 #790
Comments
Hi @maud-p, thanks for filing this issue with your plans for the next steps! It sounds like you've taken some time to explore how to best perform these steps, which is great. The one thing I want to say for now is, we want to make sure that decisions you've made here are visible in the module itself. Just as one example, you wrote above,
There should be some result in the module that indeed demonstrates this. In other words, we don't want to only have code running euclidean distance without also a notebook or so that can provide evidence for euclidean outperforming spearman to ultimately bolster the results. In this case, I would expect to see a notebook as part of the PR (but of course this can be a few smaller PRs depending on how you are structuring the code, which I'm happy to chat more about strategies for!) that demonstrates why you choose euclidean over spearman. |
Hi @sjspielman, thank you for the precision. For There will be one notebook per sample tested A bit long to run and might not be easy to interprete, but that would be the plan ;) |
Sounds like a good plan for notebooks! To help support a faster review, it would be best to split this up into a couple PRs (it seems counterintuitive that more PRs will go faster, but it will help in the end!):
Let me know if this makes sense for how you've written your code or if you had another idea for how to split this up into smaller PRs. |
If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.
This PR is following the discussion from the PR#776
Describe the goals of the changes to the analysis module.
On a subselection of samples, I want to try to infer aneuploidy and/or CNV to help identifying normal and cancer cells.
For this, I compare the use of
copykat
andinfercnv
. For each of the method, I wanted to compare few parameters.copykat
In
copykat
, few parameters can be used to fine-tuned the results. Especially, we can try runningcopykat
with or without a set of normal cells. It is important to note that , CopyKAT had difficulty in predicting tumor and normal cells in the cases of pediatric and liquid tumors that have a few CNAs. CopyKAT provides two ways to bypass this to give certain output instead of being dead staright: 1) input a vector of cell names of known normal cells from the same dataset 2) or try to search for T cells. (see copykat). I thus tested both with and without a reference but I am quite convinced that giving few normal cells help the function.One parameter I also wanted to test is the clustering method. In copykat, parameters for clustering include "euclidean" distance and correlational distance, ie. 1-"pearson" and "spearman" similarity. In general, corretional distances tend to favor noisy data, while euclidean distance tends to favor data with larger CN segments. I thus tested
eucliedean
andspearman
. In our dataset, I thinkeuclidean
(default) is perfoming best.infercnv
Another way of inferring copy number alterations from tumor single cell RNA-Seq data is using
infercnv
. In a previous discussion, we were not sure about the impact of the definition of the heatly reference. For that reason, I raninfercvn
with no normal cells as reference or immune and/or endothelial cells as reference.What will your pull request contain?
The PR contains:
copykat
andinfercnv
on a selection of samples05_copykat.R
and06_infercnv.R
that will be transfer via the s3 bucket.copykat
andinfercnv
and compare resultsWill you require additional software beyond what is already in the analysis module?
no
Will you require different computational resources beyond what the analysis module already uses?
No response
If known, when do you expect to file the pull request?
today or tomorrow
The text was updated successfully, but these errors were encountered: