Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widespread inflated metrics for label projection due to leakage #386

Open
kthorner opened this issue Feb 22, 2024 · 2 comments
Open

Widespread inflated metrics for label projection due to leakage #386

kthorner opened this issue Feb 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@kthorner
Copy link

kthorner commented Feb 22, 2024

I was interested in the benchmarks for label projection, hoping to implement the "best" method (logistic regression) in a project.

Using the example pancreas dataset, I was unable to replicate the performance (e.g. 99% accuracy for random split, which from experience seemed too high). Going through the code, I saw that "process_dataset" takes an already processed h5ad file, does an 80:20 split, and passes those subsets to the various methods.

Focusing on my example, which uses PCs as features, openproblems calculates this on all the data, while I calculated only on the training set, and then applied the same centering/scaling/rotation to the test. Otherwise these benchmarks don't reflect how it would perform on new data.

As it currently stands, the metrics and therefore the rankings cannot be relied upon. This is a problem especially for methods that use PCA; in theory it could give them an apparent edge over those operating directly on genes.

@kthorner kthorner added the bug Something isn't working label Feb 22, 2024
@rcannood
Copy link
Member

Hi @kthorner ! Thanks for your interest in the label projection task!

Just to clarify, the results currently available on the website originate from the openproblems-bio/openproblems repository. We're planning on creating versioned releases of the results generated by the openproblems-v2 repository very soon. A preview of these results can be found here: https://openproblems-v2-results--openproblems.netlify.app/results/label_projection/ .

When I look at the raw results from the v2 platform, I see that some methods get high accuracy scores on some of the datasets. It would be worthwhile investigating why that is in more detail.

Focusing on my example, which uses PCs as features, openproblems calculates this on all the data, while I calculated only on the training set, and then applied the same centering/scaling/rotation to the test. Otherwise these benchmarks don't reflect how it would perform on new data.

I'm not sure why you are computing a PCA in this manner. The expression data is what is can be observed for both the train and the test data, so there is no need to only compute it on only the training data and then apply those transformations to the test data. It would be something completely different if the dimensionality reduction were using the ground-truth information somehow.

Would you be willing to attend our weekly working meeting next Wednesday on Discord? I'd be happy to discuss this in more detail.

@kthorner
Copy link
Author

Hi @rcannood, appreciate the response. Time permitting, I'd like to get more involved with the project and will try to attend.

I'll keep my response brief here, but I found a semi-related issue: openproblems-bio/openproblems#771. I'll need to think about it more but I view scANVI as more of a special case. Generally speaking however, pre-processing cannot come before splitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants