Releases: internetarchive/iari
4.1.2
What's Changed
- Rewrite check-url with new auth key support and expose errors from testdeadlink by @dpriskorn in #884
- Fix bug with parsing of URLs by @dpriskorn in #887
- Cleanup, update dependencies and fix ClassVar issues by @dpriskorn in #889
- Add dehydrate parameter to article endpoint by @dpriskorn in #892
Full Changelog: 4.1.1...4.1.2
4.1.1
Breaking changes
None
What's Changed
- More debug output by @dpriskorn in #863
- Test pdf debug output by @dpriskorn in #865
- Use testdeadlink api also by @dpriskorn in #866
- Rewrite check url logic by @dpriskorn in #869
Full Changelog: 4.1.0...4.1.1
4.1.0
This release introduces the following breaking changes:
- pdf endpoint: The output key "text_links" was changed to "links_from_original_text"
What's Changed
- Cleanup repository by @dpriskorn in #829
- Add language detection to 3 endpoints by @dpriskorn in #830
- Add wikitext output on reference in article endpoint by @dpriskorn in #832
- Improve Reference class by @dpriskorn in #835
- Fix fld counts by @dpriskorn in #838
- fix slash in name extraction by @dpriskorn in #840
- Fix forward slash bug by @dpriskorn in #841
- Support revisions in article endpoint by @dpriskorn in #826
- Support debug output by @dpriskorn in #848
- Fix json encoding bug by @dpriskorn in #850
- Rewrite link extraction from text by @dpriskorn in #856
- Cleanup before release of 4.1.0 by @dpriskorn in #859
Full Changelog: 4.0.0-beta0...4.0.0-beta1
4.0.0-beta0
First beta release of v2 of the API.
No breaking changes since the last release.
This release fixes issues with the URL and top level domain extraction. It also added the ORES score to the output of the article endpoint.
What's Changed
- Cleanup pypdf and make tests pass by @dpriskorn in #775
- Improve pdf endpoint with annotation extraction by @dpriskorn in #792
- Add word counts to pdf endpoint by @dpriskorn in #794
- Cleanup and improve readme by @dpriskorn in #796
- Fix helper scripts by @dpriskorn in #797
- Add ORES score to article endpoint by @dpriskorn in #806
- Fix read timout bug by @dpriskorn in #804
- Simplify Python code with flake8-simplify by @cclauss in #780
- Remove unnecessary function calls to dict() and list() by @cclauss in #779
- Lint for Python asyncio issues by @cclauss in #816
- Support parsing Wayback Machine urls in the fld parser by @dpriskorn in #825
- Fix erroneous FLD and URL extraction by @dpriskorn in #827
Full Changelog: 4.0.0-alpha0...4.0.0-beta0
4.0.0-alpha0
Bumped version because the scope is now wider than Wikipedia sites with the new pdf and xhtml endpoints.
What's Changed
- Fix mutation bug with references by @dpriskorn in #702
- Add spoofing headers to check-url by @dpriskorn in #703
- Return response headers from check-url by @dpriskorn in #706
- Deprecate lang+site+title by @dpriskorn in #713
- Deprecate unused methods in Template by @dpriskorn in #714
- Extract ISBNs by @dpriskorn in #715
- Filter lines in sections to avoid false positives by @dpriskorn in #716
- Fix all tests by @dpriskorn in #718
- Fix bug with no templates being extracted by @dpriskorn in #722
- Allow redirects to avoid 3xx status codes by @dpriskorn in #724
- minor copyedit by @metasj in #725
- Add test for URL which return different status codes over time by @dpriskorn in #729
- Rewrite check-url to fix WM urls being reported as malformed urls by @dpriskorn in #731
- Make WARI stupid by adding a regex parameter by @dpriskorn in #732
- Support any language version by @dpriskorn in #735
- Rewrite parsing and add section names to all references by @dpriskorn in #750
- Add new pdf endpoint by @dpriskorn in #752
- Improve output of pdf endpoint by @dpriskorn in #755
- Switch to pypdf by @dpriskorn in #759
- New xhtml endpoint by @dpriskorn in #760
- Replace bandit, flake8, isort, and pyupgrade with ruff by @cclauss in #763
- Impove pdf endpoint errors by @dpriskorn in #765
- Rewrite pdf handler to fix incomplete and missing links by @dpriskorn in #772
New Contributors
Full Changelog: 3.0.0-alpha4...4.0.0-alpha0
3.0.0-alpha4
What's Changed
- Add support for lookup in fatcat by @dpriskorn in #665
- Extract id from fatcat by @dpriskorn in #666
- New all endpoint by @dpriskorn in #669
- Fix checking endpoint syntax by @dpriskorn in #674
- Fix check-doi None issue by @dpriskorn in #675
- Support ia sholar also in check-doi by @dpriskorn in #679
- Remove limit and offset from references endpoint by @dpriskorn in #680
- Support cache and refresh on article, all, and the checking endpoints by @dpriskorn in #681
- Deprecate code by @dpriskorn in #684
- Expose all dehydrated references in the article endpoint by @dpriskorn in #689
- Rewrite references endpoint by @dpriskorn in #690
- Support url parameter also by @dpriskorn in #691
Full Changelog: 3.0.0-alpha3...3.0.0-alpha4
3.0.0-alpha3
What's Changed
- New check-url endpoint by @dpriskorn in #654
- Support checking of DOI by @dpriskorn in #658
- Cleanup docs by @dpriskorn in #659
- Support check-url save to disk by @dpriskorn in #660
Full Changelog: 3.0.0-alpha2...3.0.0-alpha3
3.0.0-alpha2
What's Changed
- Add statistics for references to get_statistics by @dpriskorn in #503
- Add support for general references also by @dpriskorn in #512
- Rewrite to fix "article not found" bug by @dpriskorn in #515
- Update documentation and diagrams by @dpriskorn in #518
- Update test coverage by @dpriskorn in #519
- Rename API attributes and improve documentation by @dpriskorn in #527
- Rename API attribute by @dpriskorn in #533
- API improvements by @dpriskorn in #534
- Fix plain text detection bug by @dpriskorn in #535
- Test isbn template by @dpriskorn in #536
- Move hashing to hashing by @dpriskorn in #537
- Detect wayback url by @dpriskorn in #538
- Detect archive details urls by @dpriskorn in #539
- Detect books.google.com by @dpriskorn in #540
- Support first level domain counts by @dpriskorn in #543
- Fix google books url detection by @dpriskorn in #545
- Release 3.0.0-alpha1 by @dpriskorn in #553
- New model WikipediaUrl by @dpriskorn in #563
- Nested output on the get-statistics endpoint by @dpriskorn in #566
- Fix 500 errors caused by url exceptions by @dpriskorn in #578
- Fix DNS NoAnswer by @dpriskorn in #580
- Add timing, timestamp, page_id, lang and site by @dpriskorn in #581
- Support detecting IP adresses by @dpriskorn in #582
- Support more detailed agg in Links by @dpriskorn in #587
- Add support for deprecated reference templates by @dpriskorn in #588
- Support json disk input/output by @dpriskorn in #591
- Rewrite to support more details on references by @dpriskorn in #592
- Fix bug with check_urls by @dpriskorn in #594
- Disable checks causing 502 by @dpriskorn in #595
- Rewrite WikipediaUrl to improve it by @dpriskorn in #597
- Turn off too verbose url logging by @dpriskorn in #598
- Support more sections with references by @dpriskorn in #599
- Add support for refresh argument by @dpriskorn in #604
- Detect more template types and output templates by @dpriskorn in #609
- Use dead to find dead code by @dpriskorn in #612
- Deprecate all wikibase and hashing code by @dpriskorn in #614
- Add new get-urls endpoint by @dpriskorn in #616
- Simplify with any() and merging nested if statements by @cclauss in #621
- Major rewrite to version 2 of WARI by @dpriskorn in #651
Full Changelog: 3.0.0-alpha0...3.0.0-alpha2
3.0.0-alpha0
This is the first release with the new get_statistics endpoint.
What's Changed
- Add new add-job user script by @dpriskorn in #318
- New user script ia-sandbox-link by @dpriskorn in #322
- Add namespace check by @dpriskorn in #325
- Improve developerment notes by @dpriskorn in #327
- Add test coverage helper by @dpriskorn in #328
- Fix no match bug in the wikidata-qid endpoint by @dpriskorn in #329
- Honor wikibase configuration variable by @dpriskorn in #331
- Add tests for wikidata-qid endpoint by @dpriskorn in #332
- Improve README by @dpriskorn in #334
- Add article import sequence by @dpriskorn in #335
- organize tests by @dpriskorn in #336
- Support cite dictionary by @dpriskorn in #337
- Deprecate WikibaseCrudDelete by @dpriskorn in #338
- Fix bug too many open files/sockets by @dpriskorn in #339
- Fix message cache issue by @dpriskorn in #351
- Fix failing tests by @cclauss in #368
- Disable import of articles with 500+ references by @dpriskorn in #371
- Fix UI message by @dpriskorn in #373
- Fix descriptions by @dpriskorn in #374
- Don't prepare reference claim when not needed by @dpriskorn in #375
- Add ingester helper script by @dpriskorn in #377
- Improve handling of invalid google books id by @dpriskorn in #378
- Fix "working on ..." message by @dpriskorn in #379
- Small fixes by @dpriskorn in #412
- Support storing raw template by @dpriskorn in #416
- Add terminology by @dpriskorn in #479
- Rewrite with new classes for parse and extraction by @dpriskorn in #480
- Rewrite to analyze only and output to get_statistics endpoint by @dpriskorn in #481
- Prepare release 3.0.0-alpha0 by @dpriskorn in #502
Full Changelog: 2.1.0-alpha3...3.0.0-alpha0
2.1.0-alpha3
What's Changed
- Deprecate flush cache by @dpriskorn in #143
- Add tests for last update functionality by @dpriskorn in #144
- Support last update information by @dpriskorn in #145
- Deprecate flush cache by @dpriskorn in #146
- Fix google books page parse error by @dpriskorn in #149
- Deprecate google books support by @dpriskorn in #150
- Fix remaining google books bugs by @dpriskorn in #151
- Update and publish diagrams by @dpriskorn in #176
- Use threads in worker by @dpriskorn in #243
- Improve logging format by @dpriskorn in #244
- Rename the software by @dpriskorn in #245
- Clean up work_queue class by @dpriskorn in #246
- Fix url parsing by @dpriskorn in #253
- Fix hashing based on url by @dpriskorn in #260
- Fix rebuild cache and wikibase url slash-issue by @dpriskorn in #275
- Fix rdf prefix urls by @dpriskorn in #276
- Add terminology by @dpriskorn in #277
- Prepare 2.1.0-alpha3 by @dpriskorn in #291
Full Changelog: 2.1.0-alpha2...2.1.0-alpha3