-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(docs): add website config intro
- Loading branch information
Showing
4 changed files
with
82 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,16 @@ | ||
# Summary | ||
|
||
[Introduction](./README.md) | ||
- [Getting Started](./getting-started.md) | ||
- [A simple example](./simple.md) | ||
|
||
# User Guide | ||
|
||
- [Getting Started](./getting-started.md) | ||
- [A simple example](./simple.md) | ||
|
||
# Config | ||
|
||
- [Website](./website.md) | ||
|
||
# Features | ||
|
||
- [Cron Job](./cron-job.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Cron Jobs | ||
|
||
Use a cron job that can run any time of day to gather website data. | ||
|
||
```ts | ||
import { Website, type NPage } from "@spider-rs/spider-rs"; | ||
|
||
const website = new Website("https://choosealicense.com") | ||
.withCron("1/5 * * * * *") | ||
.build(); | ||
|
||
const onPageEvent = (err: Error | null, value: NPage) => { | ||
links.push(value); | ||
}; | ||
|
||
const handle = await website.runCron(onPageEvent); | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,4 +6,4 @@ Install the package. | |
yarn add @spider-rs/spider-rs | ||
# or | ||
npm install @spider-rs/spider-rs | ||
``` | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Website | ||
|
||
The Website class is the foundations to the spider. | ||
|
||
## Builder pattern | ||
|
||
We use the builder pattern to configure the website for crawling. | ||
|
||
\*note: Replace `https://choosealicense.com` from the examples below with your website target URL. | ||
|
||
All of the examples use typescript by default. | ||
|
||
```ts | ||
import { Website } from "@spider-rs/spider-rs"; | ||
|
||
const website = new Website("https://choosealicense.com"); | ||
``` | ||
|
||
### Custom Headers | ||
|
||
Add custom HTTP headers to use when crawling/scraping. | ||
|
||
```ts | ||
const website = new Website("https://choosealicense.com") | ||
.withHeaders({ | ||
authorization: "somerandomjwt", | ||
}) | ||
.build(); | ||
``` | ||
|
||
### Blacklist | ||
|
||
Prevent crawling a set path, url, or pattern with Regex. | ||
|
||
```ts | ||
const website = new Website("https://choosealicense.com") | ||
.withBlacklistUrl(["/blog", new RegExp("/books").source, "/resume"]) | ||
.build(); | ||
``` | ||
|
||
### Crons | ||
|
||
Setup a cron job that can run at any time in the background using cron-syntax. | ||
|
||
```ts | ||
const website = new Website("https://choosealicense.com") | ||
.withCron("1/5 * * * * *") | ||
.build(); | ||
``` | ||
|
||
View the [cron](./cron-job.md) section for details how to use the cron. |