DataDepository

An R package to upload datasets to BigQuery for public sharing so that they can be integrated with other public datasets easily.

Why?

Before an analysis can be done, often a number of datasets have to be downloaded, cleaned and standardized.

A central repository that can store the data set after standardization would reduce the time required for the next analysis using the same source data. It would eliminate the time required to download, and parse a dataset.

What?

BigQuery is a serverless database that is an attractive solution to store and share datasets of general interest for a number of reasons:

Very fast - joining two files in PubChem, 100 million chemical structures and 70 million names took less than 3 minutes without having to define an index
Very cheap. There is no fee for the server it is hosted on, rather there is a small fee for storing data (10Gb free, $0.02 for each additional Gb - i.e. 1TB for $20 per month) and a fee for querying the data (1Tb free, $5 per additional TB)
It has UI from which data can be stored or queried
It has a rest API (and many clients)
Metadata can be used to describe the dataset.
All datasets can be referenced with unique URL

Examples

Compound names from PubChem mapped onto InChIKeys Compound activities from ChEMBL enhanced with InChIKeys Count of compounds appearing in databases based on UniChem

Shiny app to query BigQuery

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
R		R
man		man
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
DataDepository.Rproj		DataDepository.Rproj
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataDepository

Why?

What?

Examples

About

Releases

Packages

Languages

License

iainmwallace/DataDepository

Folders and files

Latest commit

History

Repository files navigation

DataDepository

Why?

What?

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages