Python SAX parser to extract xml
- Free software: MIT license
- Documentation: https://saxtract.readthedocs.io
Uses a SAXParser to maintain a fix memory footprint to parse and 'extract' tags from an xml file and push it to an output stream.
With performance tests on a trimmed down to 10k records from the dbpl dataset, SaxTrack ran in about half the time and half the memory footprint
python tests/perf_tests.py --filename test.xml --tag authors --runs 5
SaxTrack run took ~0.05381571219999999s
DOM Parser run took ~0.09159613900000001s
allow xsd/dtd input for validation
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
The main parser code was copied from tutorialspoint