-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and disease
Lakshmi Devi Priya edited this page Jul 16, 2020
·
34 revisions
Priya
Dheeraj kumar
- Use the communal corpus
epidemic50noCov
consisting of 50 articles. - Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not.
- Using
ami search
to find whether the articles mentioned any comorbidity in a viral epidemic or not. - Sectioning the articles using
ami:section
to extract the relevant information on comorbidity. Annotating with dictionaries to create ami DataTables. - Refining and rerunning the query to get a corpus of 950 articles.
- Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur.
- A spreadsheet as well as a graph will be developed based on the comorbidity during a viral epidemic and their count.
- Development of the ML model for data classification on accuracy.
- Initially the communal corpus
epidemic50noCov
will be used. - Later a corpus of 950 articles will be created.
-
getpapers
to create the corpus of 950 articles fromEuPMC
. -
AMI
for creating and using dictionaries, sectioning. -
SPARQL
for creating dictionaries. -
KNIME
for workflow and analytics. -
keras
for binary classification.
(by collaborator Dheeraj)
What is our aim first of all, that if we recognize diseases, then we will be able to give medicines for it.
In this mini project, we will be able to find diseases with the help of disease
dictionary in times of "viral epidemic" by using ContentMine software ( getpapers and ami)
- The names of all diseases are updated in the dictionary of diseases which are helpful in searching particular diseases' words in the articles, just like the dictionary contains a store of words.
- It's source is ICD-10(by WHO) and Wikidata and it was created using ami.
- This is a group of articles which is based on viral epidemics and diseases. These articles contain information regarding diseases which are to be simplified.
- This is a group of 950 articles that have been downloaded from EuPMC via getpapers.
This is a Pub Med Central website with a lot of scientific research knowledge articles. We are analyzing some of these articles for our mini-project, which are downloaded using getpapers.
- It is a ContentMine software capable of downloading large number of articles from Eupmc.
- See https://github.com/petermr/openVirus/wiki/getpapers for using.
- It is also a ContentMine software. It is used in creating a dictionary. It is useful for searching particular diseases' words that are updated in dictionary, sectioning downloaded articles and gathering information from them.
- Like in this, we have created a dictionary of disease.
- I have read about getpapers and EuPMC and also I have read about advanced search in EuPMC and Reading its articles too.
- I am reading wikidata and learning how to update the dictionary.
- As we said that if diseases are known, then we can give medicines accordingly. Therefore, our main goal will be to find out the names of diseases that co-occur during viral epidemics.
- In this mini-project my main goal is that updating dictionary with ICD-10 using Wikidata.
- The 50 articles in communal corpus
epidemic50noCov
were binary classified as true and false positives manually and a spreadsheet was developed. -
ami search
was used in the corpus of 50 articles and the html DataTables ondisease
dictionary were created. - The corpus was sectioned using
ami section
as per the reference from https://github.com/petermr/openVirus/wiki/ami:section. -
getpapers
was used to create a corpus of950
articles regarding human viral epidemics(expect COVID-19) by the syntaxgetpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o mpc -f mpc/log.txt -k 950 -x -p
. JATS -950
files, alog
text document, XML -949
files & PDF -903
files were created. -
ami search
was used successfully in the 950 article corpus, which was segmented into 4 folders each containing 200-250 articles. - The 950 article corpus was sectioned successfully using
ami section
. - The 950 article corpus was uploaded in GitHub (Thanks to Ambreen).
- Another
disease
dictionary was created with synonyms.
(in the 950 article corpus)
- To binary classify true and false positives manually (progressing).
- To use
KNIME
software for binary classification. - To test the data classification on accuracy.
- Learning
KNIME
andKeras
to use in binary classification.
(Reference from Ambreen's update )
- Download VS code and clone the openVirus repository into your system.
- Open the
openVirus
folder in VS code (don't close it). - Now open your openVirus folder in your directory and make your changes in it.
- Reopen the VS code that was minimized. Now commit the changes by selecting the commit symbol. It might take time with respect to your size of uploading files.
- After adding the remote repository, push the changes to GitHub. See this video for other clarification.