-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul search()
signature and docs
#94
Overhaul search()
signature and docs
#94
Conversation
It turns out that the pagination strategy we use with Wayback's CDX search does not work correctly if you don't set a `limit`, so this change sets one by default. However, there is no value that's always safe, so this needs to be used with some care. A limit that's too low may mean you never get any results even when you should, and a limit that's too high may get ignored (or cause other unknown issues). The right values really depend on how frequently you expect the Internet Archive to have captured versions of the URLs you are looking for. :( Fixes #65.
Also use a loop instead of recursion for pagination.
We originally used camelCase when the name and functionality were the same as a parameter in the HTTP API. It was intended to make things clear and to support arbitrarily specifying additional keyword arguments that would get sent in the request. However, we no longer support additional, arbitrary keyword arguments, and the mixed case created more inconvenience and mess than help. This makes all parameters use the same snake_case scheme.
Note: the size reduction here is mostly because recorded responses for tests are gzipped, which was not the case before. (I think what happened is that the CDX server now respects the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these changes to simplify usage and guide users down the happy path -- also the snake_case changes are most welcome.
In #94 and #102, we renamed several method parameters. This brings back the old names, but with deprecation warnings. We'll remove them fully in v0.5.0. (This does *not* support parameters that were completely *removed*, though. I did a search on GitHub, and it doesn't look like anybody is actually using them, so it didn't seem worth keeping support for them.)
The signature for
search()
was unwieldy, complicated, and included a bunch of parameters that did nothing or that might cause broken behavior and should never be used (e.g.page
andpageSize
). While adding a default value forlimit
to resolve #65, I went ahead and cleaned upsearch()
:limit
in order to fix a very bad footgun. (See Add a defaultlimit
toWaybackClient.search()
#65)url
and complex considerations forlimit
.gzip
), in others you could break things (e.g.page
andpageSize
), and some are implementation details that users shouldn’t be bothered with (e.g.resumeKey
,previous_result
).Internally, the only real change is that this is now a loop instead of a recursive call. This was required in order to not expose internal details as parameters, but is also probably better for call stack and memory management on large queries.
Fixes #65.
@edsu are these API changes going to be OK for you? I’ll make sure this is released as 0.4.0 rather than 0.3.x. If you have any feedback on the docs changes, too, I’d appreciate it (you can preview the build docs from the “checks” at the bottom of the PR).
I might also try to include a first pass at #8 before releasing.