From 4d0126c09cc3711c3cc3b6d7b30dd14d9a2fe28f Mon Sep 17 00:00:00 2001 From: xiaodaigh Date: Tue, 18 May 2021 16:14:18 +1000 Subject: [PATCH] updated readme --- README.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d8b2e9e..b9858ba 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,25 @@ # TableScraper.jl -Easy to scrape tables from webapges + +In this package there is only one function + +`scrape_tables(url)` + +which lets you scrape for tables wrapped in `` tag and return them in a [Tables.jl](https://github.com/JuliaData/Tables.jl) compatible row-table. + +By default the function uses `Cascadia.nodeText` to extract the text from each `
` node. + +However, if you wish to extract more than the text node you may use + +``` +scrape_tables(url, identity) +``` + +to keep the cells as `Gumbo.HTMLNode`s and do more advanced extraction. + +Also, you can put any call into the `cell_transform` argument to do custome transformation of the `` nodes before returning. + +E.g. + +``` +scrape_tables(url, cell_transform) +```