This Project is part of Society Impact Project - Electronics and Electrical Communication Department - Faculty of Engineering - Cairo University - Dr. Hassan Mostafa
Developed a website like a search engine for the medical community to help them in their fight against COVID-19 Under Supervision of Dr. Hassan Mostafa
This Repo contains a website for a COVID-19 Search Engine that can be used by Medical community to search for topics in covid-19 published papers, it's based on LDA and trained on +40K papers
COVID-19 Open Research Dataset (CORD-19) is a free resource of scholarly articles, aggregated by a coalition of leading research groups, about COVID-19 and the coronavirus family of viruses. The dataset can be found on Semantic Scholar and there is a research challenge on Kaggle.
This project builds an index over the CORD-19 dataset to assist with analysis and data discovery. A series of tasks were explored to identify relevant articles and help find answers to key scientific questions on a number of COVID-19 research topics.
The following files show the top query results for each task provided in the CORD-19 Research Challenge using this model. A highlights section is also shown for each task, which highlights the most relevant sentences from the query results.
- What is known about transmission, incubation, and environmental stability?
- What do we know about COVID-19 risk factors?
- What do we know about virus genetics, origin, and evolution?
- What do we know about vaccines and therapeutics?
- What do we know about non-pharmaceutical interventions?
- What has been published about medical care?
- What do we know about diagnostics and surveillance?
- What has been published about information sharing and inter-sectoral collaboration?
- What has been published about ethical and social science considerations?
You can use Git to clone the repository from GitHub and install it.
Python 3.5+ is supported
Download all the files in the Download CORD-19 section on Semantic Scholar. Go the directory with the files and run the following commands.
cd <download_path>
For each tar.gz file run the following mkdir && tar -C -xvzf <file.tar.gz>
Once completed, there should be a file name metadata.csv and subdirectories for each data subset with all json articles.
To build the model locally:
# run loader.py to prepare the dataset
python -m loader.py
# Build model files
python -m model.py
The model will be stored in the same directory
The model is a built on LDA and using CountVectorizer
Please find attached the video with a demo for the website working at this link
- Dr Hassan Mostafa
- Abdallah Ahmed
- Abdelrahman Ahmed
- Mohamed Sabry
- Mohamed Abd Elhalim
- Youssef Mostafa