Ambreen H edited this page Aug 2, 2020

Jupyter Notebooks


getpapers and ami

Has anyone managed to getpapers or ami running in a notebook?

examples for openVirus

contributor Ambreen H

Jupyter notebook was used for writing python code:

Python was used to remove flag symbols from XML Dictionaries:

  • The SRARQL endpoint file was first converted into the standard format using amidict ( for reference see above)
  • The new XML file was imported into python and all characters within the grandchild elements (ie synonyms) were converted to ASCII. This emptied the synonym elements


import re
iname = "E:\\ami_try\\Dictionaries\\country_converted.xml"
oname = "E:\\ami_try\\Dictionaries\\country_converted2.xml"
pat = re.compile('(\s*<synonym>)(.*?)(</synonym>\s*)', re.U)
with open(iname, "rb") as fin:
    with open(oname, "wb") as fout:
        for line in fin:
            #line = line.decode('utf-8')
            line = line.decode('ascii', errors='ignore')
            m =
            if m:
                g = m.groups()
                line = g[0].lower() + g[1].lower() + g[2].lower()
  • The empty elements were then deleted using python to create a new .xml file with all synonyms except the flags.


from lxml import etree
def remove_empty_tag(tag, original_file, new_file):
    root = etree.parse(original_file)
    for element in root.xpath(f".//*[self::{tag} and not(node())]"):
    # Serialize "root" and create a new tree using an XMLParser to clean up
    # formatting caused by removing elements.
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.fromstring(etree.tostring(root), parser=parser)
    # Write to new file.
    etree.ElementTree(tree).write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")
remove_empty_tag("synonym", "E:\\ami_try\\Dictionaries\\country_converted2.xml", "E:\\ami_try\\Dictionaries\\country_converted3.xml")

All code is reusable with a little modification

New dictionary for reference

Smoke_test for ML in Jupyter

Tester: Ambreen H

The code was written in Python to import the data from XML file and cleanse it to create a CSV document for binary classification of data.

Data preparation for ML

In order to run the machine learning model, proper data preparation is necessary

  • The following libraries were used: xml.etree.ElementTree as ET, string, os and re
  • A function was written to locate XML files and extracting abstract from that
  • This was done on a small number of papers (11 positives and 11 negatives)
  • The abstract was cleaned by removing unnecessary characters, turning into lowercase and removing subheadings like 'abstract' etc
  • Finally a single data file was created in CSV format having 3 columns, one for the name of the file, other for the entire cleaned text in the abstract, and whether the result is a false positive or true positive.

Code File

