Skip to content

danwald/saxtract

Repository files navigation

SaxTract

Documentation Status Updates

Python SAX parser to extract xml

Features

Uses a SAXParser to maintain a fix memory footprint to parse and 'extract' tags from an xml file and push it to an output stream.

With performance tests on a trimmed down to 10k records from the dbpl dataset, SaxTrack ran in about half the time and half the memory footprint

python tests/perf_tests.py --filename test.xml --tag authors --runs 5

SaxTrack run took ~0.05381571219999999s
DOM Parser run took ~0.09159613900000001s

Todo's

allow xsd/dtd input for validation

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

The main parser code was copied from tutorialspoint

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published