-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and disease
Lakshmi Devi Priya edited this page Jul 8, 2020
·
34 revisions
Priya
Dheeraj
- Use the communal corpus
epidemic50noCov
consisting of 50 articles. - Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not.
- Using
ami search
to find whether the articles mentioned any comorbidity in a viral epidemic or not. - Sectioning the articles using
ami:section
to extract the relevant information on comorbidity. Annotating with dictionaries to create ami DataTables. - Refining and rerunning the query to get a corpus of 950 articles.
- Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur.
- A spreadsheet as well as a graph will be developed based on the comorbidity during a viral epidemic and their count.
- Development of the ML model for data classification on accuracy.
- Initially the communal corpus
epidemic50noCov
will be used. - Later a corpus of 950 articles will be created.
-
getpapers
to create the corpus of 950 articles fromEuPMC
. -
AMI
for creating and using dictionaries, sectioning. -
SPARQL
for creating dictionaries. -
KNIME
for workflow and analytics.
(by collaborator Dheeraj)
- The 50 articles in communal corpus
epidemic50noCov
were binary classified as true and false positives manually and a spreadsheet was developed. -
ami search
was used in the corpus of 50 articles and the html DataTables ondisease
dictionary were created. - The corpus was sectioned using
ami section
as per the reference from https://github.com/petermr/openVirus/wiki/ami:section. -
getpapers
was used to create a corpus of950
articles regarding human viral epidemics(expect COVID-19) by the syntaxgetpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o mpc -f mpc/log.txt -k 950 -x -p
. JATS -950
files, alog
text document, XML -949
files & PDF -903
files were created. -
ami search
was used successfully in the 950 article corpus, which was segmented into 4 folders each containing 200-250 articles. - The 950 article corpus was sectioned successfully using
ami section
.
(in the 950 article corpus)
- To upload the
950
article corpus inGitHub
(Issue rectifying). - To binary classify true and false positives manually.
- To use
KNIME
software for binary classification. - To test the data classification on accuracy.
- Learning
KNIME
to use in binary classification.