Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
clemlesne committed Aug 16, 2024
2 parents 2c92ad2 + 5168f3c commit 6ddb4a8
Showing 1 changed file with 58 additions and 1 deletion.
59 changes: 58 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Indexer:
- [x] Embed chuncks with OpenAI embeddings
- [x] Indexed content is semantically searchable with [Azure AI Search](https://learn.microsoft.com/en-us/azure/search)

## How to use the CLI
## How to use

### Scrape a website

Expand Down Expand Up @@ -111,6 +111,63 @@ For documentation on all available options, run:
scrape-it-now index run --help
```

## Architecture

### Scrape

```mermaid
graph LR
cli["CLI"]
web["Website"]
subgraph "Azure Queue Storage"
to_chunck["To chunck"]
to_scrape["To scrape"]
end
subgraph "Azure Blob Storage"
subgraph "Container"
job["job"]
scraped["scraped"]
state["state"]
end
end
cli -- 1. Pull message --> to_scrape
cli -- 2. Get cache --> scraped
cli -- 3. Browse --> web
cli -- 4. Update cache --> scraped
cli -- 5. Push state --> state
cli -- 6. Add message --> to_scrape
cli -- 7. Add message --> to_chunck
cli -- 8. Update state --> job
```

### Index

```mermaid
graph LR
ai_search["Azure AI Search"]
cli["CLI"]
embeddings["Azure OpenAI Embeddings"]
subgraph "Azure Queue Storage"
to_chunck["To chunck"]
end
subgraph "Azure Blob Storage"
subgraph "Container"
scraped["scraped"]
end
end
cli -- 1. Pull message --> to_chunck
cli -- 2. Get cache --> scraped
cli -- 3. Chunk --> cli
cli -- 4. Embed --> embeddings
cli -- 5. Push to search --> ai_search
```

## Advanced usage

### Source environment variables
Expand Down

0 comments on commit 6ddb4a8

Please sign in to comment.