- [update] - dependencies
- [update] - dependencies
- [update] - dependencies
- [fix] - allow setting mongodb url #92 (Thanks Vid!)
- update deps
- [fix] - for encoding in references #89
- [new] - support for custom urls #88
- update deps
- update to wtf_wikipedia 7.2.9
- [fix] for doc.title() in custom parser
- update deps
- more consistent template json, via wtf_wikipedia@7
- removal of empty
[]
results inSection
. - fs fixes for node > 9
- major json format changes from wtf_wikipedia v6.0.0
- get skip_redirects actually working
- reduce default batch_size even lower
- add
verbose_skip
option, to log disambig/redirect skipping
⚠️ remove.infoboxes
and.citations
from top-level result. this is duplicate data. find them both insection[i].templates
- improve handling of redirect pages
- refactor encoding logic
- update deps, wtf library improvements
- relicense as MIT
- use latest mongo api
- bugfix for runtime parsing error
- update to wtf_wikipedia v4.2.0
- support passing-in arbitrary functions to worker
- fix connection time-outs & improve logging output
- change default collection name to
pages
- add
.custom()
function support
- MASSIVE SPEEDUP! full re-write by @devrim 🙏 to fix #59
- rename from
wikipedia-to-mongo
todumpster-dive
- use wtf_wikipedia v3 (a big re-factor too!)
- use
line-by-line
, andworker-nodes
to run parsing in parallel
- add a 3s 'break' to avoid build-up of mongo inserts
- add new --verbose and --skip_first options
- add try/catch
- supoprt --skip_redirects && --skip_disambig
-
updates to use
[email protected]
- a major result-format change -
renames bin cmd to
wiki2mongo
-
supports use from cli, or use via javascript
require()
-
support --plaintext flag