Skip to content

A data specification for harmonizing One Health Canadian COVID pathogen genomics contextual data. The specification provides standardized (ontology-based) fields and terms which are implemented via a the DataHarmonizer, supported by field and reference guides as well as different curation and new term request SOPs.

Notifications You must be signed in to change notification settings

cidgoh/CanCOGeN_Contextual_Data_Specification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The CanCOGeN Contextual Data Specification

About

The CanCOGeN Data specification is designed to enhance the sharing and interoperability of genomic and clinical data in the fight against the COVID-19 pandemic. Developed collaboratively with input from a wide range of public health agencies, academic institutions, and research organizations, this standard provides a unified, ontology based framework for data collection, storage, and exchange. It supports collaborative research and informed policy decisions while emphasizing stringent data privacy and security measures to protect sensitive information, ensuring compliance with ethical and legal guidelines. See Genome Canada's website or more information on the The Canadian COVID-19 Genomics Network (CanCOGeN).

What are ontologies and how do they improve data quality?

Labs collect, encode and store information in different ways. They use different fields, terms and formats, they categorize variables in different ways, and the meanings of words change depending on the focus of the organization (think of the word “plant”. To someone in agriculture, “plant” could mean an organism that carries out photosynthesis, while a food regulator might understand the word “plant” to mean a factory where food products are made). This variability makes comparing, integrating and analyzing data generated by different organizations like trying to compare apples, oranges and bananas, which is difficult to do.

Ontologies are collections of controlled vocabulary that are arranged in a hierarchy, where all the terms are linked using logical relationships. Ontologies are open source and meant to represent “universal truth” as much as possible (so not tied to one organization’s vocabulary of use case). Ontologies encode synonyms, which enables mapping between the specific languages used by different organizations, and every term in the ontology is assigned a globally unique and persistent identifier. Using ontology terms to standardize GRDI-AMR contextual data not only helps make data more interoperable by using a common language, it also helps to make contextual data FAIR (Findable, Accessible, Interoperable, Reusable).

The CanCOGeN Contextual Data Specification Package

This specification is currently only implemented via a DataHarmonizer validation template however, any tool can be utilised to implement this specification. Accompanying Field and Term reference guides (which provide definitions and additional specific guidance) and a curation Standard Operating Procedure (SOP) can also be found in this repository.

New terms and/or term changes can be requested using issue request forms, with additional guidance on how to do so outline in the New Term Request (NTR) SOP. This resources are available in the files of this repository and listed below under Package Contents.

Version Control

Please note that development of the specification is dynamic and it will be updated periodically to address user needs. Versioning is done in the format of x.y.z.

x = Field level changes
y = Term value / ID level changes
z = Definition, guidance, example, formatting, or other uncategorized changes

Descriptions of changes are provided in release notes for every new version.

Package Contents

Data Collection Template

Field and Term Reference Guides

Curation SOP

DataHarmonizer Instructions and SOP

New Term Request (NTR) SOP

Contacts

For more information and/or assistance, contact Emma Griffiths at [email protected] or submit a repository issue request.

License

Pending / To Be Determined

Acknowledgements

Brought to you by The Centre for Infectious disease Genomics and One Health

LogoCIDGOH2

About

A data specification for harmonizing One Health Canadian COVID pathogen genomics contextual data. The specification provides standardized (ontology-based) fields and terms which are implemented via a the DataHarmonizer, supported by field and reference guides as well as different curation and new term request SOPs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published