Skip to content

Commit

Permalink
chore(book): add page docs
Browse files Browse the repository at this point in the history
  • Loading branch information
j-mendez committed Dec 27, 2023
1 parent b93f9c0 commit d653219
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 0 deletions.
1 change: 1 addition & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@
# Configuration

- [Website](./website.md)
- [Page](./page.md)
- [Environment](./env.md)

72 changes: 72 additions & 0 deletions book/src/page.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Page

A single page on a website, useful if you just the root url.

## New Page

Get a new page with content.

The first param is the url, followed by if subdomains should be included, and last to include TLD's in links.

Calling `page.fetch` is needed to get the content.

```python
import asyncio
from spider_rs import Page

async def main():
page = Page("https://choosealicense.com")
page.fetch()

asyncio.run(main())
```

## Page Links

get all the links related to a page.

```python
import asyncio
from spider_rs import Page

async def main():
page = Page("https://choosealicense.com")
page.fetch()
links = page.get_links()
print(links)
asyncio.run(main())
```

## Page Html

Get the markup for the page or HTML.

```python
import asyncio
from spider_rs import Page

async def main():
page = Page("https://choosealicense.com")
page.fetch()
links = page.get_html()
print(links)

asyncio.run(main())
```

## Page Bytes

Get the raw bytes of a page to store the files in a database.

```python
import asyncio
from spider_rs import Page

async def main():
page = Page("https://choosealicense.com")
page.fetch()
links = page.get_bytes()
print(links)

asyncio.run(main())
```
1 change: 1 addition & 0 deletions book/src/website.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,7 @@ class Subscription:
async def main():
website = Website("https://choosealicense.com")
website.crawl(Subscription())
# sleep for 2s and stop etc
website.stop()

asyncio.run(main())
Expand Down

0 comments on commit d653219

Please sign in to comment.