Skip to content

Scripts and other utilities relevant to the CORD-19 dataset

License

Notifications You must be signed in to change notification settings

gyorilab/covid-19

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INDRA applications and models for COVID-19


INDRA integrates multiple text-mining systems and pathway databases to automatically extract mechanistic knowledge from the biomedical literature and through a process of knowledge assembly, build executable models and causal networks. Based on profiling and perturbational data, these models can be contextualized to be cell-type specific and used to explain experimental observations or to make predictions.

In the context of the ongoing COVID-19 pandemic, the INDRA team at the Laboratory of Systems Pharmacology, Harvard Medical School is working on understanding the mechanisms by which SARS-CoV-2 infects cells and the subsequent host response process, with the goal of finding new therapeutics using INDRA.

Results

A self-updating model of COVID-19 literature

EMMAA (Ecosystem of Machine-maintained Models with Automated Analysis) makes available a set of computational models that are kept up-to-date using automated machine reading, knowledge-assembly, and model generation, integrating new discoveries immediately as they become available.

The EMMAA COVID-19 model integrates all literature made available under the COVID-19 Open Research Dataset Challenge (CORD-19) and combines it with newly appearing papers from PubMed (about 300 every day) as well as bioRxiv and medRxiv preprints. It also integrates content from CTD, DrugBank, VirHostNet, and many other pathway databases.

  • The set of all statements in the model can be browsed and curated here.
  • This is a stable link to get the latest dump of all statements in the model as JSON: here.

The model is also used to construct casual, mechanistic explanations to around 2,800 drug-virus effects:

The EMMAA COVID-19 model is also on Twitter (@covid19_emmaa) where it provides updates on the findings that it learns from the literature and also new experimental observations (such as drug effects on viruses, as described above) that it can explain based on these new pieces of knowledge.

INDRA aligned with the COVID-19 Disease Map

The [`COVID-19 Disease Map`](https://www.nature.com/articles/s41597-020-0477-8) brings together top pathway curators and modelers from around the world to create a set of models to elucidate the molecular mechanisms behind COVID-19.

We used INDRA statements assembled from all available biomedical literature and a multitude of pathway databases to find evidence for all interactions in the COVID-19 Disease Map, and to suggest other mechanisms that haven't yet been included. The results are available here.

We also implemented a feature - based on the above alignment - to find small molecule inhibitors for a given pathway in the COVID-19 Disease Map. The results for the Interferon Type I pathway are available here.

We also used our Gilda system to find appropriate grounding (database identifiers) to ungrounded entities used in the Disease Map. The results of this are available here.

Reports on drugs affecting targets relevant for COVID-19

We used INDRA to assemble all known small molecules that can inhibit a set of protein targets that are of particular interest in treating COVID-19. These reports are organized as browseable web pages that allow drilling down into specific literature evidence, linking to supporting publications, and curating any incorrect relationships. The target-specific reports are available here: [`ACE2`](https://indra-covid19.s3.amazonaws.com/drugs_for_target/ACE2.html) [`TMPRSS2`](https://indra-covid19.s3.amazonaws.com/drugs_for_target/TMPRSS2.html) [`CTSB`](https://indra-covid19.s3.amazonaws.com/drugs_for_target/CTSB.html) [`CTSL`]((https://indra-covid19.s3.amazonaws.com/drugs_for_target/CTSL.html)) [`FURIN`](https://indra-covid19.s3.amazonaws.com/drugs_for_target/FURIN.html).

We also compiled similar reports on the downstream effects of some specific drugs of interest to our collaborators. These can be found here: amodiaquine hydroxychloroquine

While we added some customizations to these reports, similar results can be obtained by querying the INDRA DB directly.

CORD-19 documents prioritized for pathway curators

To support the COVID-19 Disease Map curator community, we generated a ranking of articles in the CORD-19 corpus by the amount of molecular mechanistic information they were likely to contain. For each article, the dataset lists 1) the total number of mechanistic events extracted by all NLP systems supported by INDRA, 2) the number of *unique* events extracted from the document, and 3) the number of unique events where subject and object were both molecular entities (i.e., protein or chemical). Because the CORD-19 corpus contains many documents that are not directly relevant to coronavirus biology, we also generated rankings for the subset of documents tagged with the MESH term for "coronavirus" in PubMed (MESH ID D017934). The datasets are available at the links below: [`All CORD-19 articles`](https://indra-covid19.s3.amazonaws.com/covid_docs_ranked_all.csv) [`Coronavirus articles only`](https://indra-covid19.s3.amazonaws.com/covid_docs_ranked_corona.csv)

Semantic search over INDRA COVID-19 results

Another interface for browsing INDRA COVID-19 literature assembly results is available via [`semviz.org`](https://www.semviz.org/) on [`this page`](http://morbius.cs-i.brandeis.edu:23762/login?next=%2Fapp%2Fkibana#/dashboard/2b613e90-7cf0-11ea-8a44-496b85e05ba5) (login: semvizuser/semviz), an approach to semantic browsing of biomedical relations developed at [`Brandeis University`](https://brandeis-llc.github.io/). A tutorial video of using this interface with INDRA results to construct hypotheses about COVID-19 is available [`here`](http://www.voxicon.net/wp-content/uploads/2020/06/semviz.mp4).

Integrations and collaborations

CoronaWhy

CoronaWhy is a globally distributed, volunteer-powered research organisation, assisting the medical community’s ability to answer key questions related to COVID-19.

INDRA is a key part of the CoronaWhy software infrastructure as an entrypoint to access multiple text-mining systems and pathway databases and assembling causal models from these sources.

COVIDminer

INDRA coupled to Reach serves as the back-end for the [`COVIDminer`](https://rupertoverall.net/covidminer/) application developed by [`Rupert Overall`](https://rupertoverall.net/). COVIDminer allows searching for entities of interest for COVID-19 and visualizing the set of interactions in their neighborhood as a graph. By clicking on graph nodes or edges, users can learn more about each entity as well as the supporting publication and the specific sentence serving as evidence for relations.

General technologies for COVID-19

We have developed several applications that are generally applicable to biomedical research and can therefore also be used to study COVID-19.

  • INDRA: INDRA can be used as a Python package or a web service to collect relevant information from the literature and pathway databases and build custom COVID-19 models.
  • INDRA database: The INDRA database website provides a search interface to find INDRA Statements assembled from the biomedical literature, browse their supporting evidence, and curate any errors. An example search relevant to COVID-19 is Object: TMPRSS2 to find entities that regulate the TMPRSS2 protease, which is crucial for SARS-CoV-2 entry into human cells.
  • INDRA network search: The INDRA network search allows finding causal paths, shared regulators, and common targets between two entities. An example search relevant to COVID-19 is Subject: ACE2, Object: MTOR (see here).
  • Dialogue.bio: The dialogue.bio website allows launching dedicated human-machine dialogue sessions where you can upload your data (e.g., DE gene lists or gene expression profiles), discuss relevant mechanisms, and build model hypotheses using simple English dialogue. For instance, you could try the following series of questions: "what is ACE2?", "what does it regulate?", "which of those are transcription factors?".
  • CLARE is a machine assistant that can be installed in any Slack workspace as an application. It supports direct messages or messages in channels to conduct dialogues about biological mechanisms. See demo video here. It is currently deployed in multiple workspaces and has answered hundreds of questions from COVID-19 researchers since the pandemic began. Please contact us if you would like to install CLARE in your Slack workspace.

Funding

This work is funded under the DARPA Communicating with Computers (W911NF-15-1-0544), DARPA Automating Scientific Knowledge Extraction (HR00111990009) and DARPA Automated Scientific Discovery Framework (W911NF-18-1-0124) programs.

About

Scripts and other utilities relevant to the CORD-19 dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •