Releases: openproblems-bio/openproblems
Releases · openproblems-bio/openproblems
v2.0.0
A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.
Most relevant parts of the overall structure:
-
src/tasks
: Benchmarking tasks:batch_integration
: Batch integrationdenoising
: Denoisingdimensionality_reduction
: Dimensionality reductionmatch_modalities
: Match modalitiespredict_modality
: Predict modalityspatial_decomposition
: Spatial decompositionspatially_variable_genes
: Spatially variable genes
-
src/datasets
: Components for creating common datasets. Loaders:cellxgene_census
: Query cells from a CellxGene Censusopenproblems_neurips2021_bmmc
: Fetch a dataset from the OpenProblems NeurIPS2021 competitionopenproblems_neurips2022_pbmc
: Fetch a dataset from the OpenProblems NeurIPS2022 competitionopenproblems_v1
: Fetch a legacy OpenProblems v1 datasetopenproblems_v1_multimodal
: Fetch a legacy OpenProblems v1 multimodal datasettenx_vision
: Fetch a and convert 10x Visium datasetzenodo_spatial
: Fetch and process an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.zenodo_spatial_slidetags
: Download a compressed file containing gene expression matrix and spatial locations from zenodo.
-
src/common
: Common components used by all tasks.check_dataset_schema
: Check whether an h5ad dataset adheres to a dataset schemacheck_yaml_schema
: Check whether a YAML adheres to a JSON schemacomp_tests
: Reusable component unit testscreate_component
: Create a component Viash component.create_task_readme
: Create a README for an OpenProblems task.extract_metadata
: Extract the.uns
metadata from an h5ad file.helper_functions
: Commonly used helper functions in Python or in R,process_task_results
: Process the raw tasks results (containing raw logs, unprocessed component configs, and various metrics) into nicely formatted task results.schemas
: JSON schemas for YAML files in the repositorysync_test_resources
: Synchronise the test resources from s3 to resources_test
For more information related to the structure of this repository, see the documentation.
v1.0.0
Note: This changelog was automatically generated from the git log.
New functionality
- Added
cell2location
to thespatial_decomposition
task. - Added nearest-neighbor ranking matrix computation to
_utils
. - Datasets now store nearest-neighbor ranking matrix in
adata.obsm["X_ranking"]
. - Added support for parsing Nextflow output and generating benchmark results for the website.
- Added
max_samples
parameter toqlocal
,qglobal
,qnn_auc
,lcmc
,qnn
, andcontinuity
metrics to allow for subsampling of data for faster computation. - Added new scArches based methods:
scarches_scanvi_xgb_all_genes
andscarches_scanvi_xgb_hvg
. - Added
prediction_method
parameter to_scanvi_scarches
to specify prediction method. - Added
_pred_xgb
function to perform XGBoost prediction based on latent representations. - Added
obsm
parameter to_xgboost
function to allow specifying the embedding space for XGBoost training.
Major changes
- Updated
scvi-tools
to version0.20
in both Python and R environments. - Updated datasets to include nearest-neighbor ranking matrix.
- Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
- The website update workflow was refactored to use a new workflow using json instead of markdown.
- Updated the website generation process to remove duplicate BibTex entries.
- Added a new
parse_metadata.py
script for generating metadata for the website. - Added a new function to
openproblems.utils.py
to get the member ID of a task, dataset, method or metric. - Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.
Minor changes
- Updated method names to be shorter and more consistent across tasks.
- Improved method summaries for clarity.
- Updated JAX and JAXlib versions to 0.4.6.
- Updated dependencies to support new versions of Snakemake and GitPython.
- Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
- Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
- Improved error handling and added logging to the parsing script.
- Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
- Updated the workflow to upload the final results to the website's results directory instead of the data directory.
- Removed unnecessary code and refactored the parsing script for better readability.
- Added unit tests for the new parsing script.
- Updated the
run_tests
workflow to skip testing on thetest_website
branch. - Updated the
run_tests
workflow to skip testing on thetest_process
branch. - Updated the
create-pull-request
step to set the author for the pull request. - Updated the
run_tests
workflow to skip testing on pull request reviews. - Updated the
update_website_content
workflow to update the website on themain
branch. - Updated the
main.bib
file to fix a typo. - Removed extraneous headings from task README files.
- Updated
generate_test_matrix.py
to use the newopenproblems.utils.get_member_id
function. - Updated the website generation process to copy BibTex files to the correct location.
- Updated the
process_requires
section insetup.py
to includegitpython
. - Updated git commit hash generation for openproblems functions.
- Modified
_xgboost
to allow for specifyingtree_method
. - Modified
_scanvi_scarches
to consistently useunlabeled_category
. - Modified
_scanvi_scarches
to remove unnecessary copying oflabels
. - Removed
_scanvi_scarches
functions that were redundant with_scanvi_scarches
. - Removed unused
_scanvi
functions. - Modified
_scanvi_scarches
to allow for specifyingprediction_method
and handleunlabeled_category
consistently.
Documentation
- Improved the documentation of the
auprc
metric. - Improved the documentation of the
cell2location
methods. - Document sub-stub task behaviour
Bug fixes
- Fixed an error in
neuralee_default
where thesubsample_genes
argument could be too small. - Fixed an error in
knn_naive
where theis_baseline
argument was set toFalse
. - Fixed calculation of ranking matrix in
_utils
to include ties. - Fixed a bug in
load_tenx_5k_pbmc()
where a warning about non-unique variable names was being raised. - Removed the unused
_utils.py
file. - Removed the
X_ranking
entry from theobsm
attribute of datasets. - The
_fit()
function innn_ranking.py
now subsamples the data ifmax_samples
is specified. - The
nn_ranking
metrics now use subsampling in the_fit()
function to improve performance. - Fixed the git hash generation for openproblems functions
- Fixed a warning about
pkg_resources
being deprecated - Removed unnecessary
fetch-depth: 1
from workflow - Fixed potential issue in
_scanvi_scarches
wherelabels_pred
could be overwritten - Fixed potential issue in
_pred_xgb
wherenum_round
wasn't being used correctly - Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
- Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
- Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
- Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
- Fixed an issue where the "code_version" field was not being populated correctly for each method.
- Fixed an issue where the "submission_time" field was not being populated correctly for each method.
- Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
- Updated the
run_tests
workflow to skip testing on thetest_website
branch. - Updated the
run_tests
workflow to skip testing on thetest_process
branch. - Updated the
create-pull-request
step to set the author for the pull request. - Updated the
run_tests
workflow to skip testing on pull request reviews. - Updated the `update_website_
Full Changelog: v0.8.0...v1.0.0
v0.8.0
What's Changed
- Fix DR baselines by @scottgigante-immunai in #816
- set adata.uns['is_baseline'] by @scottgigante-immunai in #820
- Copy anndata in metric decorator by @scottgigante-immunai in #819
- Don't recompute X_emb and neighborhood graph for baseline datasets by @danielStrobl in #823
- Changes in destVI code (#826) by @scottgigante-immunai in #827
- Set explicit token permissions by @scottgigante-immunai in #828
- Warnings fix by @scottgigante-immunai in #831
- Harmonize batch integration dataset APIs by @scottgigante-immunai in #834
- new common baselines and cross import by @danielStrobl in #825
- jitter baseline patch by @danielStrobl in #838
- Add reversed norm order for ALRA in Denoising Task by @wes-lewis in #835
Full Changelog: v0.7.4...v0.8.0
v0.7.0
What's Changed
- Fix docker image builds by @scottgigante-immunai in #758
- [Dimensionality reduction] Fix normalization in baselines by @scottgigante-immunai in #760
- downgrade gtfparse and polars by @scottgigante-immunai in #766
- Fix output headers order by @scottgigante-immunai in #769
- Convert references to bib by @scottgigante-immunai in #720
- fix typo in bibliography path by @scottgigante-immunai in #774
- More bibliography typos by @scottgigante in #775
- Pre-normalize dimensionality reduction datasets by @scottgigante-immunai in #768
- Add pymde to dimensionality reduction by @scottgigante-immunai in #767
- Fix flaky R installations in docker build by @scottgigante-immunai in #783
- save initial layer in X for adata_pre by @danielStrobl in #784
- Filter datasets by celltype by @scottgigante-immunai in #770
- Pass raw counts to neuralee by @scottgigante-immunai in #779
- Label projection describe datasets by @mxposed in #776
- Add missing DR references by @rcannood in #782
- Bugfix/lowercase GitHub repo owner by @scottgigante-immunai in #794
- Upgrade isort by @scottgigante-immunai in #795
- Update styler to 1.9.0 by @github-actions in #787
- [auto] Update docker version by @github-actions in #798
- Update bslib to 0.4.2 by @github-actions in #759
- add missing logfc decorator by @dbdimitrov in #796
- Add ALRA preprocessing identical to literature by @wes-lewis in #763
- run CI on PRs only with approving review by @scottgigante-immunai in #804
- add new workflow to add status by @scottgigante-immunai in #805
- Update bioc/scran to 1.26.2 by @github-actions in #799
- Specify PR number by @scottgigante-immunai in #808
- add magic with reverse norm order by @scottgigante-immunai in #797
- Bump pymde from 0.1.15 to 0.1.18 in /docker/openproblems-python-pytorch by @dependabot in #801
- Update scvi-tools requirement from ~=0.16 to ~=0.19 in /docker/openproblems-r-pytorch by @dependabot in #731
- Use graph and embedding metrics for feature and embedding subtask by @danielStrobl in #807
- Fix typo in dimensionality reduction dataset names by @lazappi in #802
- add new dataloaders by @danielStrobl in #792
- rmse -> distance correlation by @scottgigante-immunai in #811
- CPM -> CP10k by @scottgigante-immunai in #812
- change multimodal data integration task name to matching modalities by @LuckyMD in #778
- updated scib version by @danielStrobl in #793
- Daniel strobl hvg conservation fix by @danielStrobl in #785
Full Changelog: v0.6.1...v0.7.0