-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update downloader.sh #18
base: main
Are you sure you want to change the base?
Conversation
List of improvements: - Dependency check: Ensures curl, dig, jq, and mktemp are installed before proceeding. - Error handling in downloads: Added retries and error messages for failed downloads. - Background download: Parallelized downloads for faster execution using & and wait. - Error handling in dig calls: Added fallback || true to prevent script failure if DNS resolution fails. - Recursive SPF record resolution: Improved robustness with additional error handling. - Temp directory with trap: Ensured that the temporary directory is always cleaned up upon script exit. - Improved sorting: Used sort -u for efficient sorting and deduplication in one step. - Output directory creation: Ensured the output directory is created before saving the results. - File size check: Added validation to check if the output files are empty or failed to generate. - Logging: Enhanced logging for error messages and successful file creations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great MR, but I don't understand why so many changes, as this script works in github workers and everything works fine. After these optimizations it looks like the script can run on any machine, but there is no such purpose.
set -euo pipefail | ||
set -x | ||
|
||
# Check for required dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependencies check not need for this script, because its run only in github workers
fi | ||
done | ||
|
||
# Create a temporary directory and ensure cleanup on exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its not need:
/tmp
availiable every time- worker live less 1 minute
|
||
# Function to download files with retries and error handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retries not need too, ~ 3 years this repo exists and I've never had this problem
# get from public ranges | ||
curl -s https://www.gstatic.com/ipranges/goog.txt > /tmp/goog.txt | ||
curl -s https://www.gstatic.com/ipranges/cloud.json > /tmp/cloud.json | ||
# Parallel downloads with retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much process cores available in github workers? Why would this be necessary when downloading files smaller than 1MB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the free plan if I am not wrong you may use up to 2 cores :)
# Public GoogleBot IP ranges | ||
# From: https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot | ||
curl -s https://developers.google.com/search/apis/ipranges/googlebot.json > /tmp/googlebot.json | ||
# Fetch Google netblocks using dig command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could be useful
fetch_netblocks() { | ||
local idx=2 | ||
local txt | ||
txt="$(dig TXT _netblocks.google.com +short @8.8.8.8 || true)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| true
will hide network problems from us that we might have noticed, in which case we just won't know anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm.. maybe the best can be to collect errors as artifacts for further investigation.. but as you already said.. it works and I completely agree with you, this PR is not needed :) Keep it as my thanks message for your immensely useful repo. I use it every day since years :)
get_dns_spf() { | ||
dig @8.8.8.8 +short txt "$1" | | ||
tr ' ' '\n' | | ||
while read entry; do | ||
while read -r entry; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-r do not allow backslashes to escape any characters
What's that for?
grep ':' /tmp/netblocks.txt >> /tmp/google-ipv6.txt | ||
# Sort and deduplicate results, and ensure target directory exists | ||
output_dir="google" | ||
mkdir -p "$output_dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directory already exists, this file in it
# Sort and deduplicate results, and ensure target directory exists | ||
output_dir="google" | ||
mkdir -p "$output_dir" | ||
sort -u "$temp_dir/google-ipv4.txt" > "$output_dir/ipv4.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort -V
:
-V, --version-sort natural sort of (version) numbers within text
Without this sorting, it's gonna look bad, see #5 - Q3
Helo my Lord.... I'm trying to contribute to this gem, then I wanna check if PR are welcome just by proposing a simple update for a single downloader, if appreciated I can cover all downloaders quickly :)
List of improvements: