Skip to content

Commit

Permalink
chore(docs): add website config intro
Browse files Browse the repository at this point in the history
  • Loading branch information
j-mendez committed Nov 28, 2023
1 parent ab57dde commit 7b5800c
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 3 deletions.
15 changes: 13 additions & 2 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Summary

[Introduction](./README.md)
- [Getting Started](./getting-started.md)
- [A simple example](./simple.md)

# User Guide

- [Getting Started](./getting-started.md)
- [A simple example](./simple.md)

# Config

- [Website](./website.md)

# Features

- [Cron Job](./cron-job.md)
17 changes: 17 additions & 0 deletions book/src/cron-job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Cron Jobs

Use a cron job that can run any time of day to gather website data.

```ts
import { Website, type NPage } from "@spider-rs/spider-rs";

const website = new Website("https://choosealicense.com")
.withCron("1/5 * * * * *")
.build();

const onPageEvent = (err: Error | null, value: NPage) => {
links.push(value);
};

const handle = await website.runCron(onPageEvent);
```
2 changes: 1 addition & 1 deletion book/src/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Install the package.
yarn add @spider-rs/spider-rs
# or
npm install @spider-rs/spider-rs
```
```
51 changes: 51 additions & 0 deletions book/src/website.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Website

The Website class is the foundations to the spider.

## Builder pattern

We use the builder pattern to configure the website for crawling.

\*note: Replace `https://choosealicense.com` from the examples below with your website target URL.

All of the examples use typescript by default.

```ts
import { Website } from "@spider-rs/spider-rs";

const website = new Website("https://choosealicense.com");
```

### Custom Headers

Add custom HTTP headers to use when crawling/scraping.

```ts
const website = new Website("https://choosealicense.com")
.withHeaders({
authorization: "somerandomjwt",
})
.build();
```

### Blacklist

Prevent crawling a set path, url, or pattern with Regex.

```ts
const website = new Website("https://choosealicense.com")
.withBlacklistUrl(["/blog", new RegExp("/books").source, "/resume"])
.build();
```

### Crons

Setup a cron job that can run at any time in the background using cron-syntax.

```ts
const website = new Website("https://choosealicense.com")
.withCron("1/5 * * * * *")
.build();
```

View the [cron](./cron-job.md) section for details how to use the cron.

0 comments on commit 7b5800c

Please sign in to comment.