BenchHub

Installation instruction

 devtools::install_github("SydneyBioX/BenchHub")

Goals

Clean and simple user interface (API), with clear information and error messages using modern CLI outputs (progress bars, stack tracing, colours, …).
Data handling, including caching and data integrity verification (MD5 checksum?).
Simplify novel method evaluation by providing sample code and inputs for the evaluate function.
Ensure that it is easy and highly flexible to add new datasets from various sources.
- Use YAML to encode dataset metadata to ensure language interoperability.

Finding Datasets

We could find datasets by datatype, task and patterns.

library(BenchHub)

findData(dataType = "SST", task = "Celltype Classification")
#outputs a data frame with metadata about the matched datasets

findData(name = "*IMC*") # match datasets which have IMC in their name

Getting Data

Get data from various sources.

trio <- getTrio("openproblems_v1/tenx_1k_pbmc") # some automatic caching, etc.
trio$tasks()
# outputs information about tasks, metrics and "gold standards" of the included data

Method Evaluation

Evaluate your method.

myCellTypes <- myMethod1(trio$getSCE())
mySegementation <- myMethod2(trio$getRaster())

outputs <- list("cellTypes" = myCellTypes, "segmentation" = mySegementation)

evaluation <- trio$evaluate(outputs)

Get an evaluation template

Generate a temple of what an input to trio$evaluate() would have to look like for a successful evaluation.

template <- trio$getTemplate(task = c("Celltype Classification", "Cell Segmentation"))

names(template) # [1] "celltype"     "segmentation"
length(template$celltype) # [1] 1000
typeof(template$celltype[1]) # [1] "character"
unique(template$celltype) # [1] "a" "b"

Generate an assertion that checks the format of the method output.

checkInput <- trio$checkInput(myCelltypes, task = "Celltype Classification")
# informative errors that allow the user to make their outputs conform to the input of trio$evaluate()

Potential Features

Multiple Datasets

Get multiple datasets in one trio and perform the evaluation on all relevant dataset within the trio.

trio <- getTrio(c("openproblems_v1/tenx_1k_pbmc", "openproblems_v1/tenx_5k_pbmc"))

User Defined Cache Directory

Allow users to define a cache directory for the datasets (i.e., in our case, biostat folder).

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
R		R
inst/extdata/testdata/figshare_26054188		inst/extdata/testdata/figshare_26054188
man		man
tests		tests
vignettes		vignettes
.DS_Store		.DS_Store
.RData		.RData
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

BenchHub

Installation instruction

Goals

Finding Datasets

Getting Data

Method Evaluation

Get an evaluation template

Potential Features

Multiple Datasets

User Defined Cache Directory

About

Licenses found

Releases

Packages

Contributors 3

Languages

License

Licenses found

SydneyBioX/BenchHub

Folders and files

Latest commit

History

Repository files navigation

BenchHub

Installation instruction

Goals

Finding Datasets

Getting Data

Method Evaluation

Get an evaluation template

Potential Features

Multiple Datasets

User Defined Cache Directory

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages