Skip to content

Commit

Permalink
chore(website): expose chrome connection url
Browse files Browse the repository at this point in the history
  • Loading branch information
j-mendez committed Sep 24, 2024
1 parent 30e963d commit ce6726b
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 2 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]
edition = "2021"
name = "spider_rs"
version = "0.0.52"
version = "0.0.53"
repository = "https://github.com/spider-rs/spider-py"
license = "MIT"
description = "The fastest web crawler and indexer."
Expand Down
28 changes: 28 additions & 0 deletions book/src/website.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,20 @@ async def main():
asyncio.run(main())
```

### Chrome Remote Connection

Add a chrome remote connection url. This can be a json endpoint or ws direct connection.

```py
import asyncio
from spider_rs import Website

async def main():
website = Website("https://choosealicense.com").with_chrome_connection("http://localhost:9222/json/version")

asyncio.run(main())
```

### External Domains

Add external domains to include with the website.
Expand Down Expand Up @@ -338,6 +352,20 @@ async def main():
asyncio.run(main())
```
### Preserve Host
Preserve the HOST HTTP header.
```py
import asyncio
from spider_rs import Website

async def main():
website = Website("https://choosealicense.com").with_preserve_host_header(True)

asyncio.run(main())
```
## Chaining
You can chain all of the configs together for simple configuration.
Expand Down
24 changes: 23 additions & 1 deletion src/website.rs
Original file line number Diff line number Diff line change
Expand Up @@ -722,10 +722,32 @@ impl Website {
pub fn with_return_page_links(
mut slf: PyRefMut<'_, Self>,
return_page_links: bool,
) -> PyRefMut<'_, Self> {
slf.inner.with_return_page_links(return_page_links);
slf
}

/// Set the connection url for the chrome instance. This method does nothing if the `chrome` is not enabled.
pub fn with_chrome_connection(
mut slf: PyRefMut<'_, Self>,
chome_connection: String,
) -> PyRefMut<'_, Self> {
slf
.inner
.with_return_page_links(return_page_links);
.with_chrome_connection(if chome_connection.is_empty() {
None
} else {
Some(chome_connection)
});
slf
}

/// Preserve the HOST header.
pub fn with_preserve_host_header(
mut slf: PyRefMut<'_, Self>,
preserve: bool,
) -> PyRefMut<'_, Self> {
slf.inner.with_preserve_host_header(preserve);
slf
}

Expand Down

0 comments on commit ce6726b

Please sign in to comment.