This is my personal project for the course "Internet & Applications" [3.5.63.8]
Parse the Cord-19 dataset; find and display all articles that mention a specific drug, provided by the user.
- Implementation Code (This repo)
- A Readme (This file)
- Presentation
- Demonstration video
-
Receive search query (drug name) from user.
-
Use a Java Servlet to pass the query to javascript code.
-
Parse the .csv file row-by-row using PapaParse. Each row contains the metadata for a separate article.
-
Search through the row and the linked pmc/pdf files for mentions of the drug.
-
If no mentions were found, discard the row.
-
Repeat 3-4 for all rows.
-
Serve the final table.
-
(Optional) Serve a complimentary Articles per Year chart using Highcharts
The folder /data contains the files of the cord-19 dataset. These files need to be downloaded and extracted so that they resemble the directory structure below.
The .java file must be compiled before running:
cd WEB-INF/classes/searchServlet.java
javac searchServlet.java
The implementation is based on JavaScript code, which has been split into 3 files for easier viewing:
-
script.js: contains the 3 functions that perform the main implementation of the project.
-
macros.js: contains several smaller functions that are utilized by the main program.
-
chart.js: contains the code that creates the Articles per Year chart at the end of each search.
$root
├ data
│ ├ metadata.csv
│ └ document_parses
│ ├ pdf_json
│ │ └ *.json
│ └ pmc_json
│ └ *.xml.json
├ src
│ ├ chart.js
│ ├ macros.js
│ ├ script.js
│ └ style.css
├ WEB-INF
│ ├ classes
│ │ ├ searchServlet.class
│ │ └ searchServlet.java
│ └ web.xml
└ index.jsp
Used Papaparse for the .csv parsing.
Used Highcharts for the charts.
A Java servlet was used for calling the javascript functions.
Everything is written in pure JavaScript, no jQuery.
The project was run and tested on an Apache Tomcat server.