Releases · openproblems-bio/openproblems

08 Sep 05:47

rcannood

v2.0.0

7ffb753

v2.0.0 Latest

Latest

A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.

Most relevant parts of the overall structure:

src/tasks: Benchmarking tasks:
- batch_integration: Batch integration
- denoising: Denoising
- dimensionality_reduction: Dimensionality reduction
- match_modalities: Match modalities
- predict_modality: Predict modality
- spatial_decomposition: Spatial decomposition
- spatially_variable_genes: Spatially variable genes
src/datasets: Components for creating common datasets. Loaders:
- cellxgene_census: Query cells from a CellxGene Census
- openproblems_neurips2021_bmmc: Fetch a dataset from the OpenProblems NeurIPS2021 competition
- openproblems_neurips2022_pbmc: Fetch a dataset from the OpenProblems NeurIPS2022 competition
- openproblems_v1: Fetch a legacy OpenProblems v1 dataset
- openproblems_v1_multimodal: Fetch a legacy OpenProblems v1 multimodal dataset
- tenx_vision: Fetch a and convert 10x Visium dataset
- zenodo_spatial: Fetch and process an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.
- zenodo_spatial_slidetags: Download a compressed file containing gene expression matrix and spatial locations from zenodo.
src/common: Common components used by all tasks.
- check_dataset_schema: Check whether an h5ad dataset adheres to a dataset schema
- check_yaml_schema: Check whether a YAML adheres to a JSON schema
- comp_tests: Reusable component unit tests
- create_component: Create a component Viash component.
- create_task_readme: Create a README for an OpenProblems task.
- extract_metadata: Extract the .uns metadata from an h5ad file.
- helper_functions: Commonly used helper functions in Python or in R,
- process_task_results: Process the raw tasks results (containing raw logs, unprocessed component configs, and various metrics) into nicely formatted task results.
- schemas: JSON schemas for YAML files in the repository
- sync_test_resources: Synchronise the test resources from s3 to resources_test

For more information related to the structure of this repository, see the documentation.

Assets 2

07 Sep 15:00

rcannood

v1.0.0

f54e8e9

v1.0.0

Note: This changelog was automatically generated from the git log.

New functionality

Added cell2location to the spatial_decomposition task.
Added nearest-neighbor ranking matrix computation to _utils.
Datasets now store nearest-neighbor ranking matrix in adata.obsm["X_ranking"].
Added support for parsing Nextflow output and generating benchmark results for the website.
Added max_samples parameter to qlocal, qglobal, qnn_auc, lcmc, qnn, and continuity metrics to allow for subsampling of data for faster computation.
Added new scArches based methods: scarches_scanvi_xgb_all_genes and scarches_scanvi_xgb_hvg.
Added prediction_method parameter to _scanvi_scarches to specify prediction method.
Added _pred_xgb function to perform XGBoost prediction based on latent representations.
Added obsm parameter to _xgboost function to allow specifying the embedding space for XGBoost training.

Major changes

Updated scvi-tools to version 0.20 in both Python and R environments.
Updated datasets to include nearest-neighbor ranking matrix.
Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
The website update workflow was refactored to use a new workflow using json instead of markdown.
Updated the website generation process to remove duplicate BibTex entries.
Added a new parse_metadata.py script for generating metadata for the website.
Added a new function to openproblems.utils.py to get the member ID of a task, dataset, method or metric.
Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.

Minor changes

Updated method names to be shorter and more consistent across tasks.
Improved method summaries for clarity.
Updated JAX and JAXlib versions to 0.4.6.
Updated dependencies to support new versions of Snakemake and GitPython.
Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
Improved error handling and added logging to the parsing script.
Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
Updated the workflow to upload the final results to the website's results directory instead of the data directory.
Removed unnecessary code and refactored the parsing script for better readability.
Added unit tests for the new parsing script.
Updated the run_tests workflow to skip testing on the test_website branch.
Updated the run_tests workflow to skip testing on the test_process branch.
Updated the create-pull-request step to set the author for the pull request.
Updated the run_tests workflow to skip testing on pull request reviews.
Updated the update_website_content workflow to update the website on the main branch.
Updated the main.bib file to fix a typo.
Removed extraneous headings from task README files.
Updated generate_test_matrix.py to use the new openproblems.utils.get_member_id function.
Updated the website generation process to copy BibTex files to the correct location.
Updated the process_requires section in setup.py to include gitpython.
Updated git commit hash generation for openproblems functions.
Modified _xgboost to allow for specifying tree_method.
Modified _scanvi_scarches to consistently use unlabeled_category.
Modified _scanvi_scarches to remove unnecessary copying of labels.
Removed _scanvi_scarches functions that were redundant with _scanvi_scarches.
Removed unused _scanvi functions.
Modified _scanvi_scarches to allow for specifying prediction_method and handle unlabeled_category consistently.

Documentation

Improved the documentation of the auprc metric.
Improved the documentation of the cell2location methods.
Document sub-stub task behaviour

Bug fixes

Fixed an error in neuralee_default where the subsample_genes argument could be too small.
Fixed an error in knn_naive where the is_baseline argument was set to False.
Fixed calculation of ranking matrix in _utils to include ties.
Fixed a bug in load_tenx_5k_pbmc() where a warning about non-unique variable names was being raised.
Removed the unused _utils.py file.
Removed the X_ranking entry from the obsm attribute of datasets.
The _fit() function in nn_ranking.py now subsamples the data if max_samples is specified.
The nn_ranking metrics now use subsampling in the _fit() function to improve performance.
Fixed the git hash generation for openproblems functions
Fixed a warning about pkg_resources being deprecated
Removed unnecessary fetch-depth: 1 from workflow
Fixed potential issue in _scanvi_scarches where labels_pred could be overwritten
Fixed potential issue in _pred_xgb where num_round wasn't being used correctly
Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
Fixed an issue where the "code_version" field was not being populated correctly for each method.
Fixed an issue where the "submission_time" field was not being populated correctly for each method.
Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
Updated the run_tests workflow to skip testing on the test_website branch.
Updated the run_tests workflow to skip testing on the test_process branch.
Updated the create-pull-request step to set the author for the pull request.
Updated the run_tests workflow to skip testing on pull request reviews.
Updated the `update_website_

Full Changelog: v0.8.0...v1.0.0

Assets 2

21 Feb 15:40

scottgigante-immunai

v0.8.0

4f21b1b

v0.8.0

What's Changed

Fix DR baselines by @scottgigante-immunai in #816
set adata.uns['is_baseline'] by @scottgigante-immunai in #820
Copy anndata in metric decorator by @scottgigante-immunai in #819
Don't recompute X_emb and neighborhood graph for baseline datasets by @danielStrobl in #823
Changes in destVI code (#826) by @scottgigante-immunai in #827
Set explicit token permissions by @scottgigante-immunai in #828
Warnings fix by @scottgigante-immunai in #831
Harmonize batch integration dataset APIs by @scottgigante-immunai in #834
new common baselines and cross import by @danielStrobl in #825
jitter baseline patch by @danielStrobl in #838
Add reversed norm order for ALRA in Denoising Task by @wes-lewis in #835

Full Changelog: v0.7.4...v0.8.0

Contributors

danielStrobl, wes-lewis, and scottgigante-immunai

Assets 2

02 Feb 00:04

scottgigante-immunai

v0.7.0

49c83bf

v0.7.0

What's Changed

Fix docker image builds by @scottgigante-immunai in #758
[Dimensionality reduction] Fix normalization in baselines by @scottgigante-immunai in #760
downgrade gtfparse and polars by @scottgigante-immunai in #766
Fix output headers order by @scottgigante-immunai in #769
Convert references to bib by @scottgigante-immunai in #720
fix typo in bibliography path by @scottgigante-immunai in #774
More bibliography typos by @scottgigante in #775
Pre-normalize dimensionality reduction datasets by @scottgigante-immunai in #768
Add pymde to dimensionality reduction by @scottgigante-immunai in #767
Fix flaky R installations in docker build by @scottgigante-immunai in #783
save initial layer in X for adata_pre by @danielStrobl in #784
Filter datasets by celltype by @scottgigante-immunai in #770
Pass raw counts to neuralee by @scottgigante-immunai in #779
Label projection describe datasets by @mxposed in #776
Add missing DR references by @rcannood in #782
Bugfix/lowercase GitHub repo owner by @scottgigante-immunai in #794
Upgrade isort by @scottgigante-immunai in #795
Update styler to 1.9.0 by @github-actions in #787
[auto] Update docker version by @github-actions in #798
Update bslib to 0.4.2 by @github-actions in #759
add missing logfc decorator by @dbdimitrov in #796
Add ALRA preprocessing identical to literature by @wes-lewis in #763
run CI on PRs only with approving review by @scottgigante-immunai in #804
add new workflow to add status by @scottgigante-immunai in #805
Update bioc/scran to 1.26.2 by @github-actions in #799
Specify PR number by @scottgigante-immunai in #808
add magic with reverse norm order by @scottgigante-immunai in #797
Bump pymde from 0.1.15 to 0.1.18 in /docker/openproblems-python-pytorch by @dependabot in #801
Update scvi-tools requirement from ~=0.16 to ~=0.19 in /docker/openproblems-r-pytorch by @dependabot in #731
Use graph and embedding metrics for feature and embedding subtask by @danielStrobl in #807
Fix typo in dimensionality reduction dataset names by @lazappi in #802
add new dataloaders by @danielStrobl in #792
rmse -> distance correlation by @scottgigante-immunai in #811
CPM -> CP10k by @scottgigante-immunai in #812
change multimodal data integration task name to matching modalities by @LuckyMD in #778
updated scib version by @danielStrobl in #793
Daniel strobl hvg conservation fix by @danielStrobl in #785