Skip to content
This repository has been archived by the owner on Jul 21, 2021. It is now read-only.

Prevent duplicates by using URL as uniquie identifier #2

Open
schliflo opened this issue Feb 18, 2021 · 0 comments
Open

Prevent duplicates by using URL as uniquie identifier #2

schliflo opened this issue Feb 18, 2021 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@schliflo
Copy link
Member

Entries that get updated after being crawled for the first time sometimes generate duplicates upon re-crawl. This could be prevented by using the URL as an unique identifier and falling back to updating the existing entry for any given URL on re-crawl.

@schliflo schliflo added the bug Something isn't working label Feb 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants