Skip to content

Commit

Permalink
chore(book): add github url
Browse files Browse the repository at this point in the history
  • Loading branch information
j-mendez committed Dec 4, 2023
1 parent eae3243 commit 1837e08
Show file tree
Hide file tree
Showing 5 changed files with 42 additions and 12 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
run: corepack enable && corepack prepare yarn@stable --activate

- name: Install Deps
run: yarn --no-immutable && yarn build && cd bench npm i
run: yarn --no-immutable && yarn build && cd bench && npm i

- name: Run Bench @spider-rs/spider-rs
run: yarn bench
Expand Down
6 changes: 3 additions & 3 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ Test url: `https://choosealicense.com` (small)
| `libraries` | `speed` |
| :-------------------------------- | :-------------------- |
| **`spider-rs: crawl 10 samples`** | `76ms`(✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `1.6s` (✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `1s` (✅ **1.00x**) |

Test url: `https://rsseau.fr` (medium)
211 pages

| `libraries` | `speed` |
| :-------------------------------- | :------------------- |
| **`spider-rs: crawl 10 samples`** | `1s` (✅ **1.00x**) |
| **`spider-rs: crawl 10 samples`** | `0.5s` (✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `72s` (✅ **1.00x**) |

```sh
Expand All @@ -47,4 +47,4 @@ Test url: `https://rsseau.fr` (medium)
| **`spider-rs: crawl 10 samples`** | `2.5s` (✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `75s` (✅ **1.00x**) |

The performance scales the larger the website and if throttling is needed. Linux benchmarks are about 10x faster than macOS for spider-rs.
The performance scales the larger the website and if throttling is needed. Linux benchmarks are about 10x faster than macOS for spider-rs.
6 changes: 5 additions & 1 deletion book/book.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
[book]
authors = ["j-mendez"]
authors = ["Jeff Mendez"]
language = "en"
multilingual = false
src = "src"
title = "spider-rs"

[output.html]
git-repository-url = "https://github.com/spider-rs/spider-nodejs/tree/main/book"
edit-url-template = "https://github.com/spider-rs/spider-nodejs/edit/main/book/{path}"
12 changes: 11 additions & 1 deletion book/src/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
# Introduction

Spider-RS is the fastest web crawler and indexer written in Rust ported to Node.js.
`Spider-RS` is the fastest web crawler and indexer written in Rust ported to Node.js.

- Concurrent
- Streaming
- Decentralization
- Headless Chrome Rendering
- HTTP Proxies
- Cron Jobs
- Subscriptions
- Blacklisting and Budgeting Depth
- Written in Rust for speed, safety, and simplicity

Spider powers some big tools and helps bring the crawling aspect to almost no downtime with the correct setup, view the [spider](https://github.com/spider-rs/spider) project to learn more.

Expand Down
28 changes: 22 additions & 6 deletions book/src/benchmarks.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,33 @@
# Benchmarks

The speed of Spider-RS ported compared to other tools.
```sh
Linux
8-core CPU
32 GB of RAM memory
-----------------------
```

Spider is about 1,000x (small websites) 10,000x (medium websites), and 100,000x (production grade websites) times faster than the popular crawlee library even with the node port performance hits.
Test url: `https://choosealicense.com` (small)
32 pages

| `libraries` | `speed` |
| :-------------------------------- | :-------------------- |
| **`spider-rs: crawl 10 samples`** | `76ms`(✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `1s` (✅ **1.00x**) |

Test url: `https://rsseau.fr` (medium)
211 pages

| `libraries` | `speed` |
| :-------------------------------- | :------------------- |
| **`spider-rs: crawl 10 samples`** | `0.5s` (✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `72s` (✅ **1.00x**) |

```sh
----------------------
mac Apple M1 Max
10-core CPU
64 GB of RAM memory
1 TB of SSD disk space
-----------------------
```

Expand All @@ -29,6 +47,4 @@ Test url: `https://rsseau.fr` (medium)
| **`spider-rs: crawl 10 samples`** | `2.5s` (✅ **1.00x**) |
| **`crawlee: crawl 10 samples`** | `75s` (✅ **1.00x**) |

The performance scales the larger the website and if throttling is needed.

Linux benchmarks are about 10x faster than macOS for spider-rs.
The performance scales the larger the website and if throttling is needed. Linux benchmarks are about 10x faster than macOS for spider-rs.

0 comments on commit 1837e08

Please sign in to comment.