Skip to content

Commit

Permalink
Overhaul search() signature and docs (#94)
Browse files Browse the repository at this point in the history
The signature for `search()` was unwieldy, complicated, and included a bunch of parameters that did nothing or that might cause broken behavior and should never be used (e.g. `page` and `pageSize`). I've tried to clean up most of those issues here:

- Adds a default value for `limit` in order to fix a very bad footgun. (See #65)
- Adds a lot more detail to the docs, explains special formatting for `url` and complex considerations for `limit`.
- Removes non-functional or breakage-prone parameters that were only included because they were part of the HTTP API. In some cases, they are things this library does automatically for you and aren’t useful to adjust (e.g. `gzip`), in others you could break things (e.g. `page` and `pageSize`), and some are implementation details that users shouldn’t be bothered with (e.g. `resumeKey`, `previous_result`).
- Removes the ability to specify arbitrary extra keyword parameters to be passed directly to the API (there are so many ways to break things here; I argued for this originally so we didn’t have to maintain as much, but it’s just not good).
- Makes all parameters use snake_case.

Internally, the only real change is that this is now a loop instead of a recursive call. This was required in order to not expose internal details as parameters, but is also probably better for call stack and memory management on large queries.

Fixes #65.
  • Loading branch information
Mr0grog authored Oct 27, 2022
1 parent 8847b2c commit bce65fd
Show file tree
Hide file tree
Showing 9 changed files with 1,408 additions and 15,348 deletions.
16 changes: 15 additions & 1 deletion docs/source/release-history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,21 @@ Release History
In Development
--------------

TBD
**Breaking Change:** This release includes a significant overhaul of parameters for :meth:`wayback.WaybackClient.search`.

- The ``limit`` parameter now has a default value. There are very few cases where you should not set a ``limit`` (not doing so will typically break pagination), and there is now a default value to help prevent mistakes. We’ve also added documentation to explain how and when to adjust this value, since it is pretty complex. (:issue:`65`)

- Removed parameters that did nothing, could break search, or that were for internal use only: ``gzip``, ``showResumeKey``, ``resumeKey``, ``page``, ``pageSize``, ``previous_result``.

- Removed support for extra, arbitrary keyword parameters that could be added to each request to the search API.

- All parameters now use snake_case. (Previously, parameters that were passed unchanged to the HTTP API used camelCase, while others used snake_case.)

- ``matchType`` → ``match_type``
- ``fastLatest`` → ``fast_latest``
- ``resolveRevisits`` → ``resolve_revisits``

- Expanded the method documentation to explain things in more depth and link to more external references.


v0.3.3 (2022-09-30)
Expand Down
362 changes: 197 additions & 165 deletions wayback/_client.py

Large diffs are not rendered by default.

32 changes: 21 additions & 11 deletions wayback/tests/cassettes/test_search
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ interactions:
Accept-Encoding:
- gzip, deflate
User-Agent:
- edgi.web_monitoring.WaybackClient/0.2.0a1.post0.dev0+g800608f
- wayback/0.3.3.post6.dev0+ga3e1512 (+https://github.com/edgi-govdata-archiving/wayback)
method: GET
uri: http://web.archive.org/cdx/search/cdx?url=nasa.gov&from=19961001000000&to=19970201000000&showResumeKey=true&resolveRevisits=true
uri: http://web.archive.org/cdx/search/cdx?url=nasa.gov&limit=1000&from=19961001000000&to=19970201000000&showResumeKey=true&resolveRevisits=true
response:
body:
string: !!binary |
Expand All @@ -17,25 +17,35 @@ interactions:
headers:
Connection:
- keep-alive
Content-Encoding:
- gzip
Content-Type:
- text/plain;charset=UTF-8
Date:
- Mon, 18 Nov 2019 17:11:10 GMT
- Wed, 26 Oct 2022 22:56:17 GMT
Permissions-Policy:
- interest-cohort=()
Referrer-Policy:
- no-referrer-when-downgrade
Server:
- nginx/1.15.8
- nginx/1.19.5
Transfer-Encoding:
- chunked
X-App-Server:
- wwwb-app0
X-Cache-Key:
- httpweb.archive.org/cdx/search/cdx?url=nasa.gov&from=19961001000000&to=19970201000000&showResumeKey=true&resolveRevisits=trueUS
X-NA:
- '0'
X-NID:
- '-'
X-Page-Cache:
- BYPASS
X-RL:
- '0'
X-location:
- cdx
X-ts:
content-encoding:
- gzip
x-app-server:
- wwwb-app52
x-tr:
- '81'
x-ts:
- '200'
status:
code: 200
Expand Down
43 changes: 27 additions & 16 deletions wayback/tests/cassettes/test_search_does_not_repeat_results
Original file line number Diff line number Diff line change
Expand Up @@ -5,39 +5,50 @@ interactions:
Accept-Encoding:
- gzip, deflate
User-Agent:
- wayback/0.2.3.post3.dev0+g5a994c7 (+https://github.com/edgi-govdata-archiving/wayback)
- wayback/0.3.3.post6.dev0+ga3e1512 (+https://github.com/edgi-govdata-archiving/wayback)
method: GET
uri: http://web.archive.org/cdx/search/cdx?url=energystar.gov%2F&from=20200612000000&to=20200613000000&showResumeKey=true&resolveRevisits=true
uri: http://web.archive.org/cdx/search/cdx?url=energystar.gov%2F&limit=1000&from=20200612000000&to=20200613000000&showResumeKey=true&resolveRevisits=true
response:
body:
string: !!binary |
H4sIAAAAAAAAAM3UQUvDMBQH8Ps+Ra6CLi/vJen01rlClVJcqNbuNqR0A3XShla/vekuA8HsEsru
CT/+7/15zaG/rj/rtvnp7La94gwBAbRAEBIgYjtrv+44Pz2ZN4eeM1t/W76zH++MQLA0y42Oy5ey
un+mKk7yitIqzR6e8jLX+jViknDWXIYk5FHqHDUMw/wvN2zbN97W/b7bW3bDVsuYcqnkxmwkrpNl
YtTaFES4eiyKzCimQV+Qp4WSFGyS5JcUBk6m/veEm6PQU3RklDB0ssjjoVpM0/5RCt5G386UjDDc
zjxtHCWi6XZ2LMmt1zvFc1/Ocy6AZ5LOW1BoT3g8QhIQ0vMWZeQw3NmC2S9DoXwg1QYAAA==
H4sIAAAAAAAAAL3UwUvDMBTH8fv+ilwFXZL3ktfprXOFKqW4Uq3dbUjpBuqkDa3+97a7DATfGITe
Wz78ki+pD9119Vk19U/rts2VFKBAKdKgtFEqEDvnvu6kPH0yrw+dFK76dnLnPt4FKi3iJM0oLF6K
8v4ZyzBKS4zLOHl4SouU6DUQBmFWc5I2R6kdqL7v53+5ftu8yabq9u3eiRuxWoaYGms22cbAOlpG
mV1nOSKsHvM8yawgRYxH2hr0tgx5yYLnZfZ/Tw/nqGmKOxsl8L0sYDyNZIn1TvOGX85zFojzwC6m
qX+UztR/4TI+EWsC8JcIE/8oIfodxtzYMclbr4WYgDnIwVugb08zHgJqNVknIwf+Hkk1+wUMuvHv
aQYAAA==
headers:
Connection:
- keep-alive
Content-Encoding:
- gzip
Content-Type:
- text/plain;charset=UTF-8
Date:
- Wed, 17 Jun 2020 16:24:27 GMT
- Wed, 26 Oct 2022 22:56:17 GMT
Permissions-Policy:
- interest-cohort=()
Referrer-Policy:
- no-referrer-when-downgrade
Server:
- nginx/1.15.8
- nginx/1.19.5
Transfer-Encoding:
- chunked
X-App-Server:
- wwwb-app15
X-Cache-Key:
- httpweb.archive.org/cdx/search/cdx?url=energystar.gov%2F&from=20200612000000&to=20200613000000&showResumeKey=true&resolveRevisits=trueUS
X-NA:
- '0'
X-NID:
- '-'
X-Page-Cache:
- BYPASS
X-RL:
- '0'
X-location:
- cdx
X-ts:
content-encoding:
- gzip
x-app-server:
- wwwb-app14
x-tr:
- '120'
x-ts:
- '200'
status:
code: 200
Expand Down
Loading

0 comments on commit bce65fd

Please sign in to comment.