Skip to content
Ambreen H edited this page Aug 2, 2020 · 3 revisions

Jupyter Notebooks

installation

https://jupyter.readthedocs.io/en/latest/install.html#install-and-use

getpapers and ami

Has anyone managed to getpapers or ami running in a notebook?


examples for openVirus

contributor Ambreen H

Jupyter notebook was used for writing python code:

Python was used to remove flag symbols from XML Dictionaries:

  • The SRARQL endpoint file was first converted into the standard format using amidict ( for reference see above)
  • The new XML file was imported into python and all characters within the grandchild elements (ie synonyms) were converted to ASCII. This emptied the synonym elements

PYTHON CODE

import re
iname = "E:\\ami_try\\Dictionaries\\country_converted.xml"
oname = "E:\\ami_try\\Dictionaries\\country_converted2.xml"
pat = re.compile('(\s*<synonym>)(.*?)(</synonym>\s*)', re.U)
with open(iname, "rb") as fin:
    with open(oname, "wb") as fout:
        for line in fin:
            #line = line.decode('utf-8')
            line = line.decode('ascii', errors='ignore')
            m = pat.search(line)
            if m:
                g = m.groups()
                line = g[0].lower() + g[1].lower() + g[2].lower()
            fout.write(line.encode('utf-8'))
  • The empty elements were then deleted using python to create a new .xml file with all synonyms except the flags.

PYTHON CODE

from lxml import etree
def remove_empty_tag(tag, original_file, new_file):
    root = etree.parse(original_file)
    for element in root.xpath(f".//*[self::{tag} and not(node())]"):
        element.getparent().remove(element)
    # Serialize "root" and create a new tree using an XMLParser to clean up
    # formatting caused by removing elements.
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.fromstring(etree.tostring(root), parser=parser)
    # Write to new file.
    etree.ElementTree(tree).write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")
remove_empty_tag("synonym", "E:\\ami_try\\Dictionaries\\country_converted2.xml", "E:\\ami_try\\Dictionaries\\country_converted3.xml")

All code is reusable with a little modification

New dictionary for reference


Smoke_test for ML in Jupyter

Tester: Ambreen H

The code was written in Python to import the data from XML file and cleanse it to create a CSV document for binary classification of data.

Data preparation for ML

In order to run the machine learning model, proper data preparation is necessary

  • The following libraries were used: xml.etree.ElementTree as ET, string, os and re
  • A function was written to locate XML files and extracting abstract from that
  • This was done on a small number of papers (11 positives and 11 negatives)
  • The abstract was cleaned by removing unnecessary characters, turning into lowercase and removing subheadings like 'abstract' etc
  • Finally a single data file was created in CSV format having 3 columns, one for the name of the file, other for the entire cleaned text in the abstract, and whether the result is a false positive or true positive.

Code File



Clone this wiki locally