Table of Contents
The following sections explain each setting in the crawler
configuration:
allowed-domains
- Description: White list allowed domains
- Default Value:
empty list
- Example:
old.reddit.com
-> visit only old.reddit.com domains
body-size
- Description: Maximum size of the HTTP response body in bytes.
- Default Value:
0
→ unlimited
cache-dir
- Description: Directory path for caching. Leave empty for no caching.
- Default Value:
""
(empty string)
crypto
- Description: Enable or disable crypto-related features.
- Default Value:
false
debug
- Description: Enable or disable debugging mode for GoColly.
- Default Value:
false
disallowed-domains
- Description: Domain black list for the crawler.
- Default Value:
[]
(empty list) - Example: reddit.com → crawler will not visit any reddit urls
disallowed-url-filters
- Description: List of regular expressions to filter disallowed URLs.
- Default Value:
[]
(empty list) - Example:
http://httpbin\.org/h.+"
- Description: Enable or disable email-related features.
- Default Value:
false
ignore-robots-txt
- Description: Enable or disable ignoring the robots.txt file.
- Default Value:
false
limit-delay
- Description: Delay in seconds between requests.
- Default Value:
0
limit-random-delay
- Description: Random delay in seconds added to the fixed delay.
- Default Value:
0
max-depth
- Description: Maximum depth for crawling links.
- Default Value:
0
→ unlimited depth
phone
- Description: List of countries to parse phone numbers from.
- Default Value:
[]
(empty list) - Example: "RU,NL,DE,US" → You can choose which countries don't have to be every
queue-max-size
- Description: Maximum size of the crawler's queue.
- Default Value:
50000
queue-threads
- Description: Number of threads used for crawling.
- Default Value:
4
tor
- Description: Run the crawler through a tor proxy and allow crawling of .onion links
- Default Value:
false
url-filters
- Description: List of regular expressions to filter URLs.
- Default Value:
[]
(empty list) - Example:
http://httpbin\.org/h.+
(?:https?://)?(?:www)?(\\S*?\\.onion)\\b
-> will limit to .onion domains only
url-revisit
- Description: Enable or disable revisiting URLs.
- Default Value:
false
urls
- Description: List of starting URLs for the crawler.
- Default Value:
[]
(empty list) - Example:
urls:
- https://example.com
- https://example2.com
user-agent
- Description: User agent string for HTTP requests.
- Example:
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
- Source: useragents.me
keywords
- Description: A keyword, sentence, list of keywords
- Default Value:
[]
- Example:
search -k owasp -k hacking -k "Please hack the box!"
associations
- Description: Specify the different SQL tables you want to export from the database.
-
Default:
all
-
Values:
"WP" - WordPress
"E" - Email
"P" - PhoneNumbers
"C" - Crypto
criteria
-
Value:
{}
- (empty json) - Description: Criteria for the exporter.
-
Explanation: If you use the LIKE keyword it will automatically perform the SQL
LIKE
statement. There's no need for adding extra%
inside the criteria. -
Usage:
pryingdeep -q 'title=test,"url=LIKE example.com"'
filepath
-
Value:
data.json
- Description: Filepath for the exporter output.
-
Default Value:
data.json
limit
- Description: Limit the exporter to a certain number of items. 0 means every row inside the database.
-
Default Value:
0
raw-sql
-
Value:
false
- Description: Enable or disable the use of performing raw SQL queries.
-
Default Value:
false
raw-sql-filepath
-
Default:
pkg/querybuilder/queries/select.sql
- Description: Filepath for the raw SQL queries.
sort-by
-
Value:
url
-
Description: Field to use for sorting. Just a generic
ORDER BY
. -
Default Value:
status_code
sort-order
-
Value:
asc
- Description: Sort order for the exporter.
offset
-
Value:
0
- Description: Number of records to skip during export. Keep in mind if you want to the id to start from 1, set `sort-by` to `id` and `sort-order` to `asc` Otherwise, the filtering might be weird, and you will get records starting from 50 when you asked for offset from 1.
-s, --silent
-
Default:
false
- Description: Use this flag to disable logging and run silently.
-z, --save-config
-
Default:
false
- Description: Use this flag to save chosen options to your .yaml configuration.
-c, --config <path>
-
Value: The path to the .yaml configuration file. Please also keep the filename as
pryingdeep
, otherwise the program will break. - Description: Use this flag to specify the path to the .yaml configuration.