layout |
---|
default |
A high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc) and their associations.
Biolink Model is designed as a way of standardizing types and relational structures in knowledge graphs (KGs), where the KG may be either a property graph or RDF triple store.
The schema is expressed as a YAML, which is translated to:
- Individual pages for each class in the model, e.g https://w3id.org/biolink/vocab/Gene
- An OWL ontology, also available on BioPortal
- Python dataclasses, also available on PyPI
- ShEx (RDF shape constraints)
- graphql
- protobuf
- json-schema
- prefix-mapping (A simple mapping of prefix to IRI expansion)
- java classes
The schema assumes a property graph, where nodes represent individual entities, and edges represent relationship between entities. Biolink Model provides a schema for representing both nodes and edges.
The model itself can be divided into a few parts:
- Entities (subjects and objects)
- Predicates (relationships between core concepts)
- Associations (statements including evidence and provenance)
- Entity Slots (node properties)
- Edge Slots (edge properties)
A entity corresponds to a database entity or a concept, represented as a node in a property graph.
All typed entities are a sub-class of NamedThing.
Each entity has,
- its own unique stable URI
- mappings to other ontologies (SIO, SO, etc.)
- list of valid ID prefixes
These entity types are higher level terms that can be used to categorize nodes in a KG.
For more detailed typing, one can use specific terms from an ontology.
A typed association between two entities, usually supported by evidence and provenance. An association is represented as an edge/relationship between two nodes in a property graph.
All edges are a sub-class of Association.
An association connects a subject node and an object node via a relation property. The nature of the association is defined based on the relation property.
Certain associations can have additional properties like provided_by, has_evidence, publications.
Slots are used to collectively refer to both node and edge properties.
There are two types of slots defined in the model:
- node property - all node properties are a sub-class of node property
- association slot - all edge properties are a sub-class of association slot
Browse the Biolink Model to explore all defined entities, associations, and slots.
See Biolink Model JSON-LD context for a list of CURIE prefix mappings.
These include prefix expansions such as:
"CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
"NCBIGene": "http://www.ncbi.nlm.nih.gov/gene/",
"NCIT": "http://purl.obolibrary.org/obo/NCIT_",
Note: We do not curate these in Biolink Model. Rather we take these from upstream sources, via PrefixCommons. We specify a priority order of upstream sources in cases where conflicts may occur. See the default_curi_maps tag at the top of the biolink-model.yaml.
We also specify a small set of top-level prefix overrides via the prefixes tag at the top of the YAML.
Biolink Model aims to represent knowledge in a graph form regardless of the graph representation used.
Following are some recommendations when attempting to use Biolink Model with each style of representation.
- Neo4J: see Mapping to Neo4j
- RDF: see Mapping to RDF
Unni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ, The Biomedical Data Translator Consortium (2022). Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin Transl Sci. Wiley; 2022 Jun 6; https://onlinelibrary.wiley.com/doi/10.1111/cts.13302