Skip to content

Releases: openproblems-bio/openproblems

v2.0.0

08 Sep 05:47
Compare
Choose a tag to compare

A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.

Most relevant parts of the overall structure:

  • src/tasks: Benchmarking tasks:

    • batch_integration: Batch integration
    • denoising: Denoising
    • dimensionality_reduction: Dimensionality reduction
    • match_modalities: Match modalities
    • predict_modality: Predict modality
    • spatial_decomposition: Spatial decomposition
    • spatially_variable_genes: Spatially variable genes
  • src/datasets: Components for creating common datasets. Loaders:

    • cellxgene_census: Query cells from a CellxGene Census
    • openproblems_neurips2021_bmmc: Fetch a dataset from the OpenProblems NeurIPS2021 competition
    • openproblems_neurips2022_pbmc: Fetch a dataset from the OpenProblems NeurIPS2022 competition
    • openproblems_v1: Fetch a legacy OpenProblems v1 dataset
    • openproblems_v1_multimodal: Fetch a legacy OpenProblems v1 multimodal dataset
    • tenx_vision: Fetch a and convert 10x Visium dataset
    • zenodo_spatial: Fetch and process an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.
    • zenodo_spatial_slidetags: Download a compressed file containing gene expression matrix and spatial locations from zenodo.
  • src/common: Common components used by all tasks.

    • check_dataset_schema: Check whether an h5ad dataset adheres to a dataset schema
    • check_yaml_schema: Check whether a YAML adheres to a JSON schema
    • comp_tests: Reusable component unit tests
    • create_component: Create a component Viash component.
    • create_task_readme: Create a README for an OpenProblems task.
    • extract_metadata: Extract the .uns metadata from an h5ad file.
    • helper_functions: Commonly used helper functions in Python or in R,
    • process_task_results: Process the raw tasks results (containing raw logs, unprocessed component configs, and various metrics) into nicely formatted task results.
    • schemas: JSON schemas for YAML files in the repository
    • sync_test_resources: Synchronise the test resources from s3 to resources_test

For more information related to the structure of this repository, see the documentation.

v1.0.0

07 Sep 15:00
Compare
Choose a tag to compare

Note: This changelog was automatically generated from the git log.

New functionality

  • Added cell2location to the spatial_decomposition task.
  • Added nearest-neighbor ranking matrix computation to _utils.
  • Datasets now store nearest-neighbor ranking matrix in adata.obsm["X_ranking"].
  • Added support for parsing Nextflow output and generating benchmark results for the website.
  • Added max_samples parameter to qlocal, qglobal, qnn_auc, lcmc, qnn, and continuity metrics to allow for subsampling of data for faster computation.
  • Added new scArches based methods: scarches_scanvi_xgb_all_genes and scarches_scanvi_xgb_hvg.
  • Added prediction_method parameter to _scanvi_scarches to specify prediction method.
  • Added _pred_xgb function to perform XGBoost prediction based on latent representations.
  • Added obsm parameter to _xgboost function to allow specifying the embedding space for XGBoost training.

Major changes

  • Updated scvi-tools to version 0.20 in both Python and R environments.
  • Updated datasets to include nearest-neighbor ranking matrix.
  • Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
  • The website update workflow was refactored to use a new workflow using json instead of markdown.
  • Updated the website generation process to remove duplicate BibTex entries.
  • Added a new parse_metadata.py script for generating metadata for the website.
  • Added a new function to openproblems.utils.py to get the member ID of a task, dataset, method or metric.
  • Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.

Minor changes

  • Updated method names to be shorter and more consistent across tasks.
  • Improved method summaries for clarity.
  • Updated JAX and JAXlib versions to 0.4.6.
  • Updated dependencies to support new versions of Snakemake and GitPython.
  • Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
  • Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
  • Improved error handling and added logging to the parsing script.
  • Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
  • Updated the workflow to upload the final results to the website's results directory instead of the data directory.
  • Removed unnecessary code and refactored the parsing script for better readability.
  • Added unit tests for the new parsing script.
  • Updated the run_tests workflow to skip testing on the test_website branch.
  • Updated the run_tests workflow to skip testing on the test_process branch.
  • Updated the create-pull-request step to set the author for the pull request.
  • Updated the run_tests workflow to skip testing on pull request reviews.
  • Updated the update_website_content workflow to update the website on the main branch.
  • Updated the main.bib file to fix a typo.
  • Removed extraneous headings from task README files.
  • Updated generate_test_matrix.py to use the new openproblems.utils.get_member_id function.
  • Updated the website generation process to copy BibTex files to the correct location.
  • Updated the process_requires section in setup.py to include gitpython.
  • Updated git commit hash generation for openproblems functions.
  • Modified _xgboost to allow for specifying tree_method.
  • Modified _scanvi_scarches to consistently use unlabeled_category.
  • Modified _scanvi_scarches to remove unnecessary copying of labels.
  • Removed _scanvi_scarches functions that were redundant with _scanvi_scarches.
  • Removed unused _scanvi functions.
  • Modified _scanvi_scarches to allow for specifying prediction_method and handle unlabeled_category consistently.

Documentation

  • Improved the documentation of the auprc metric.
  • Improved the documentation of the cell2location methods.
  • Document sub-stub task behaviour

Bug fixes

  • Fixed an error in neuralee_default where the subsample_genes argument could be too small.
  • Fixed an error in knn_naive where the is_baseline argument was set to False.
  • Fixed calculation of ranking matrix in _utils to include ties.
  • Fixed a bug in load_tenx_5k_pbmc() where a warning about non-unique variable names was being raised.
  • Removed the unused _utils.py file.
  • Removed the X_ranking entry from the obsm attribute of datasets.
  • The _fit() function in nn_ranking.py now subsamples the data if max_samples is specified.
  • The nn_ranking metrics now use subsampling in the _fit() function to improve performance.
  • Fixed the git hash generation for openproblems functions
  • Fixed a warning about pkg_resources being deprecated
  • Removed unnecessary fetch-depth: 1 from workflow
  • Fixed potential issue in _scanvi_scarches where labels_pred could be overwritten
  • Fixed potential issue in _pred_xgb where num_round wasn't being used correctly
  • Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
  • Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
  • Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
  • Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
  • Fixed an issue where the "code_version" field was not being populated correctly for each method.
  • Fixed an issue where the "submission_time" field was not being populated correctly for each method.
  • Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
  • Updated the run_tests workflow to skip testing on the test_website branch.
  • Updated the run_tests workflow to skip testing on the test_process branch.
  • Updated the create-pull-request step to set the author for the pull request.
  • Updated the run_tests workflow to skip testing on pull request reviews.
  • Updated the `update_website_

Full Changelog: v0.8.0...v1.0.0

v0.8.0

21 Feb 15:40
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.7.4...v0.8.0

v0.7.0

02 Feb 00:04
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.6.1...v0.7.0