Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
xiaodaigh committed May 18, 2021
1 parent b27e313 commit 4d0126c
Showing 1 changed file with 24 additions and 1 deletion.
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,25 @@
# TableScraper.jl
Easy to scrape tables from webapges

In this package there is only one function

`scrape_tables(url)`

which lets you scrape for tables wrapped in `<table>` tag and return them in a [Tables.jl](https://github.com/JuliaData/Tables.jl) compatible row-table.

By default the function uses `Cascadia.nodeText` to extract the text from each `<td>` node.

However, if you wish to extract more than the text node you may use

```
scrape_tables(url, identity)
```

to keep the cells as `Gumbo.HTMLNode`s and do more advanced extraction.

Also, you can put any call into the `cell_transform` argument to do custome transformation of the `<td>` nodes before returning.

E.g.

```
scrape_tables(url, cell_transform)
```

0 comments on commit 4d0126c

Please sign in to comment.