Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to End Testing #162

Open
66 tasks
willmhowes opened this issue Nov 18, 2024 · 0 comments
Open
66 tasks

End to End Testing #162

willmhowes opened this issue Nov 18, 2024 · 0 comments
Labels
enhancement New feature or request internal-only This PR/Issue is reserved for the IA team

Comments

@willmhowes
Copy link
Collaborator

willmhowes commented Nov 18, 2024

Using a mocked server, perform a series of crawls that test whether the resulting WARC matches the parameters given to the crawl at runtime. It's important to clarify that this feature is NOT testing the validity of a WARC under the WARC spec, that testing is intended to be handled by the WARC-writing library used by Zeno.

Here is a list of the configurable flags offered by the -h command in the binary:

  • --api
  • --api-port string
  • --bypass-proxy strings
  • --capture-alternate-pages
  • --cdx-cookie string
  • --cert-validation
  • --concurrent-sleep-length int
  • --config-file string
  • --consul-address string
  • --consul-config
  • --consul-password string
  • --consul-path string
  • --consul-user string
  • --cookies string
  • --crawl-max-time-limit int
  • --crawl-time-limit int
  • --debug
  • --disable-assets-capture
  • --disable-html-tag strings
  • --disable-ipv4
  • --disable-ipv6
  • --disable-local-dedupe
  • --disable-seencheck
  • --domains-crawl
  • --es-index-prefix zeno
  • --es-password string
  • --es-url string
  • --es-user string
  • --exclude-host strings
  • --exclude-string strings
  • --handover
  • --headless
  • --http-timeout int
  • --include-host strings
  • --include-string strings
  • --ipv6-anyip
  • --job string
  • --json
  • --keep-cookies
  • --live-stats
  • --log-file-output-dir string
  • --log-level string
  • --max-concurrent-assets int
  • --max-concurrent-per-domain int
  • --max-hops uint8
  • --max-redirect int
  • --max-retry int
  • --min-space-required int
  • --no-stdout-log
  • --no-ytdlp
  • --prometheus
  • --prometheus-prefix string
  • --proxy string
  • --random-local-ip
  • --ultrasafe-queue
  • --user-agent string
  • --warc-cdx-dedupe-server string
  • --warc-dedupe-size int
  • --warc-on-disk
  • --warc-operator string
  • --warc-pool-size int
  • --warc-prefix string
  • --warc-size int
  • --warc-temp-dir string
  • -w, --workers int
  • --ytdlp-path string
@willmhowes willmhowes added enhancement New feature or request internal-only This PR/Issue is reserved for the IA team labels Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal-only This PR/Issue is reserved for the IA team
Projects
None yet
Development

No branches or pull requests

1 participant