Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCPCP000006_05_explore_COPYKAT #813

Merged
merged 11 commits into from
Oct 16, 2024

Conversation

maud-p
Copy link
Contributor

@maud-p maud-p commented Oct 15, 2024

Purpose/implementation Section

This PR is following the discussion from the PR#776.
It explores the results generated in PR#801.

Please link to the GitHub issue that this pull request addresses.

#790

What is the goal of this pull request?

We wanted to explore the copykat results generated in PR#801.

copykat results have been obtained with or without normal cells as reference, using either an euclidean or statistical (spearman) method for CNV heatmap clustering.
This impact the final decision made by copykat for each cell to be either aneuploid or diploid, and it is thus crucial to explore the results using the different methods.
For each of theselected samples, we explore the results in the notebooks 05_cnv_copykat_{distance_parameter}_exploration_{sample_id}.html.
These notebooks are inspired by the plots written for the Ewing Sarcoma analysis in 03-copykat.Rmd.

Briefly describe the general approach you took to achieve this goal.

I get inspiration from the Ewing Sarcoma analysis in 03-copykat.Rmd to quickly check the results of copykat.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes, same kind of notebook for exploration of infercnv results.

What is your summary of the results?

Looking at copykat heatmaps, I can really see how the function annotated a cell as aneuploid or diploid when using spearman as a clustering distance. The results seems to be more meaningfull when using an euclidean distance (i.e. we see CNV in aneuploid predicted cells and not diploid, mostly).
For that reason, I think that using spearman clustering distance should be avoided.

Provide directions for reviewers

I am not confident about continuing with copykat.
I'll do the same kind of exploration of the infercnv results in another PR, maybe adding some additional umap of predicted_cell.types to have everything in the same notebook.

Author checklists

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@jaclyn-taroni jaclyn-taroni requested review from sjspielman and removed request for jaclyn-taroni October 15, 2024 16:23
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm convinced - let's not use copyKAT 😄
I'm also really glad that you were able to adapt code from a different analysis module, too - it's a great way to work within the OpenScPCA project!

I left a few very small wording comments, but otherwise this seems good to go! Once you make the small changes I suggested, I will approve this and we can move onto looking at inferCNV.

analyses/cell-type-wilms-tumor-06/notebook/README.md Outdated Show resolved Hide resolved

From the heatmap of CNV and the mean CNV detection plots, it is difficult to see any CNV pattern that drive the identification of aneuploid cells.
The assignment of the aneuploidy/diploidy value might relies on very few CNV and/or an arbitrary threshold.
This might be why the assignement of aneuploidy/diploidy values differs between condition (and between runs!!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - does it really differ between runs? I'm surprised to hear this (but I believe you!!), because the copyKAT code itself actually hardcodes a random seed; I actually filed an issue about this last week: navinlabcode/copykat#121. So, I'd expect it to be very reproducible. This means that it must be calling something else that is somehow un-setting the seed.

Nothing we can do about it, it's just interesting to note...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I figured this out when running multiple times to save the heatmap not in jpeg but png format. I can send you examples tomorrow (I struggle with the vpn connexion now 😞, hopefully better tomorrow! )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjspielman some example to illustrate.
For the sample -208, I show you the heatmap of CNV for copykat ran with an euclidean distance, no reference, two days apart (to then save the heatmap in png instead of jpeg:
image
image

This is not completly different but not identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sample -205
image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for -184 it is much more striking ❓❗
image

@sjspielman
Copy link
Member

@maud-p can you also regenerate the html notebooks since the Rmd has changed? Thanks!

@maud-p
Copy link
Contributor Author

maud-p commented Oct 16, 2024

@maud-p can you also regenerate the html notebooks since the Rmd has changed? Thanks!

Sure, it is running! Sorry for missing this part!

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is ready to go! The GitHub Action failure is not something to worry about - or at least, it wasn't caused by your code, so it doesn't prevent this from moving forward.

@sjspielman
Copy link
Member

Hm actually, we do need to try and get that check to pass, which is a matter of "re-triggering" the github action. Can you make a commit with a small "meaningless" change, like something very small (even just a space) somewhere in the README file? That will hopefully get the action to work.

@sjspielman
Copy link
Member

@jashapiro can you have a look at what's up with this pre-commit action (I don't have permissions)?This PR is otherwise approved.

@jashapiro
Copy link
Member

@jashapiro can you have a look at what's up with this pre-commit action (I don't have permissions)?This PR is otherwise approved.

We seem to be timing out on cloning the repo. This could be a more significant problem than just this PR: There is a time limit of 30s for cloning and we seem to be taking ~25 seconds in other PRs, meaning we are probably quite close to the repo size limit for the service.

@sjspielman
Copy link
Member

@maud-p We're going to have to look into this action failure on our end a bit more. Until then, you can feel free to file the next PR looking at infer CNV.

@sjspielman sjspielman merged commit a5c3623 into AlexsLemonade:main Oct 16, 2024
3 checks passed
@maud-p
Copy link
Contributor Author

maud-p commented Oct 16, 2024

Wow I was out of office few hours and problem already solved! Thank you so much @sjspielman @jashapiro 🚀

@maud-p maud-p deleted the 05_explore_copykat branch October 16, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants