-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for validating GitHub Release Asset checksums #34
Conversation
This commit is functionally complete, however the --help menu rendering is quite ugly now. I'll fix that next.
A bummer, I just filed this as a GitHub issue with the "newbie" label. It would make for a good trial project :) Will review shortly. |
Ah, it did occur to me that it felt a little trial-projectish, but I wanted to actually use it in my Confluent setup. Thanks for the forthcoming review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good addition. Checksumming modules seems worthwhile too. We'll also have to expose these params in gruntwork-install
. This definitely would've been a good newbie task...
Release](https://help.github.com/articles/creating-releases/)--to download. It only works with the `--tag` option. | ||
- `--release-asset-checksum` (**Optional**): The checksum that a release asset should have. Fetch will fail if this value | ||
is non-empty and does not match the checksum computed by Fetch. | ||
- `--release-asset-checksum-algo` (**Optional**): The algorithm fetch will use to compute a checksum of the release asset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems equally important to have checksums for modules from the --modules
param. Probably the simplest option is to publish the commit ID (sha1) and compare that to the commit ID of the repo we just downloaded via git clone
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the --modules
pattern of gruntwork-install
? Well, can't you just use Fetch to download the exact commit you want to achieve what you're suggesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, --modules
. We typically use tags for specifying the version we want. Should our recommendation be that users specify both the tag and the commit ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, keep in mind that fetch allows you to use tag constrain expressions, but I see where you're going with this. Yes, I think we should permit using both a --tag
and a --commit-id
param at the same time. Fetch can then error out if the two don't match. I think I'll file a GitHub issue for that for now, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func verifyChecksumOfReleaseAsset(assetPath, checksum, algorithm string) *FetchError { | ||
computedChecksum, err := computeChecksum(assetPath, algorithm) | ||
if err != nil { | ||
return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally nicer when writing tests and debugging to return a different type for each, well, type of error. All you do is define a struct (or even a type alias) and add an Error() string
method to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you. This was an early Golang tool I wrote and I didn't want to refactor all the errors throughout the code so I stuck with the existing idiom.
checksum.go
Outdated
return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath)) | ||
} | ||
|
||
fmt.Printf("Checksum matches!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not use any sort of logging library with fetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised about that as well. Again, I didn't want to invest time in updating those kinds of things so I stuck with the existing idiom.
checksum.go
Outdated
return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error()) | ||
} | ||
if computedChecksum != checksum { | ||
return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to provide a bit more info about what this could mean in this error message. E.g, "This means that either you are using the wrong checksum value in your call to fetch
(e.g., perhaps you updated the version of the module you're installing but not the checksum?) or that someone has replaced the asset with a potentially dangerous one and you should be very careful about proceeding."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I'll update now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checksum.go
Outdated
} | ||
|
||
func computeChecksum(filePath string, algorithm string) (string, error) { | ||
var checksum string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a useless mutable value to have. I'd just return "", err
in all error cases below and call return hasherToString
in the non-error cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sometimes like to declare the return var at the top, but your suggestion is probably a little cleaner. updated.
checksum.go
Outdated
defer file.Close() | ||
|
||
switch algorithm { | ||
case "sha256": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this case statement is identical in all but the hasher
used. Perhaps refactor into a getHasher
method that returns sha256.New()
or sha512.New()
and then simplify this method to just a single path that calls io.Copy
and hasherToString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Updated.
|
||
// Checksums can be computed by running "shasum -a [256|512] /path/to/file" on any UNIX system | ||
const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA256="4314590d802760c29a532e2ef22689d4656d184b3daa63f96bc8b8f76f5d22f0" | ||
const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA512="28d9e487c1001e3c28d915c9edd3ed37632f10b923bd94d4d9ac6d28c0af659abbe2456da167763d51def2182fef01c3f73c67edf527d4ed1389a28ba10db332" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you users get these checksums from? If it's from the /releases
page, then anyone who can replace the asset with a fake one could also replace the checksum with a fake one. To be truly useful, wouldn't we have to publish these checksums in some totally separate location? Or are you making the assumption that when someone goes to get the checksum value, they are verifying that the code is valid, and this merely ensures no one can pull the carpet out from under them later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. When I went to download a third-party health checker for Kafka, I realized how risky (and yet common!) including such a library is. If an attacker wanted to exploit a library that many people use, one way is to find a way into someone's GitHub account (leaked GitHub token?), leave the commits unaltered and update only the release asset (where in most cases no checksum at all is published).
In this case, I'm making the assumption that at the time you download the code, you have an opportunity to look around the repo and conclude for yourself that it appears not to be compromised. There's only so much you can do here, of course, but at least if in the future when you've long forgotten about this library, something malicious happens, you'll be notified.
In the end, I suppose it's just more defense in depth.
func downloadReleaseAsset(assetName string, destPath string, githubRepo GitHubRepo, tag string) (string, error) { | ||
var assetPath string | ||
|
||
if assetName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This code got updated when I changed assetNames
from a []string
to a string
. That being said, I think the error handling could generally be improved here, as discussed above.
All feedback responded to and tests passing, so merging now. |
Fetch is used to download both GitHub files and Release Assets. When downloading a Release Asset, it's helpful to verify the SHA256 or SHA512 hash of the release asset (the "checksum") so that you can confirm the release asset hasn't been changed from the time you first downloaded it. This PR adds optional support for doing that.
This use case came up for me when I was downloading a third-party health checker tool using Fetch. Without verifying the checksum, the downloaded tool could be changed without anyone knowing, a potential security issue given that we do not control the third-party repo. Using bash to validate the checksum is an option, but fetch support leads to slightly cleaner code and avoids platform compatibility issues.
In addition, I opted to make a backwards-incompatible change to simplify the UX. Previously, fetch allowed users to download more than one Release Asset in a single call. Once I implemented this PR, the fetch interface quickly became unwieldy with so many flags to pass in when using multiple release assets.
It occurred to me that (a) downloading multiple release assets is itself probably a less likely use case, and (b) users can simply call fetch multiple times, once for each release asset they want to download. Therefore, I opted for the UX simplicity with no loss of functionality but at the cost of backwards-incompatible change.
Other updates: