Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for validating GitHub Release Asset checksums #34

Merged
merged 5 commits into from
Feb 23, 2018

Conversation

josh-padnick
Copy link
Contributor

Fetch is used to download both GitHub files and Release Assets. When downloading a Release Asset, it's helpful to verify the SHA256 or SHA512 hash of the release asset (the "checksum") so that you can confirm the release asset hasn't been changed from the time you first downloaded it. This PR adds optional support for doing that.

This use case came up for me when I was downloading a third-party health checker tool using Fetch. Without verifying the checksum, the downloaded tool could be changed without anyone knowing, a potential security issue given that we do not control the third-party repo. Using bash to validate the checksum is an option, but fetch support leads to slightly cleaner code and avoids platform compatibility issues.

In addition, I opted to make a backwards-incompatible change to simplify the UX. Previously, fetch allowed users to download more than one Release Asset in a single call. Once I implemented this PR, the fetch interface quickly became unwieldy with so many flags to pass in when using multiple release assets.

It occurred to me that (a) downloading multiple release assets is itself probably a less likely use case, and (b) users can simply call fetch multiple times, once for each release asset they want to download. Therefore, I opted for the UX simplicity with no loss of functionality but at the cost of backwards-incompatible change.

Other updates:

  • Slight cleanup to the README

This commit is functionally complete, however the --help menu rendering
is quite ugly now. I'll fix that next.
@brikis98
Copy link
Member

A bummer, I just filed this as a GitHub issue with the "newbie" label. It would make for a good trial project :)

Will review shortly.

@josh-padnick
Copy link
Contributor Author

Ah, it did occur to me that it felt a little trial-projectish, but I wanted to actually use it in my Confluent setup. Thanks for the forthcoming review!

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good addition. Checksumming modules seems worthwhile too. We'll also have to expose these params in gruntwork-install. This definitely would've been a good newbie task...

Release](https://help.github.com/articles/creating-releases/)--to download. It only works with the `--tag` option.
- `--release-asset-checksum` (**Optional**): The checksum that a release asset should have. Fetch will fail if this value
is non-empty and does not match the checksum computed by Fetch.
- `--release-asset-checksum-algo` (**Optional**): The algorithm fetch will use to compute a checksum of the release asset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems equally important to have checksums for modules from the --modules param. Probably the simplest option is to publish the commit ID (sha1) and compare that to the commit ID of the repo we just downloaded via git clone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the --modules pattern of gruntwork-install? Well, can't you just use Fetch to download the exact commit you want to achieve what you're suggesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, --modules. We typically use tags for specifying the version we want. Should our recommendation be that users specify both the tag and the commit ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, keep in mind that fetch allows you to use tag constrain expressions, but I see where you're going with this. Yes, I think we should permit using both a --tag and a --commit-id param at the same time. Fetch can then error out if the two don't match. I think I'll file a GitHub issue for that for now, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35

func verifyChecksumOfReleaseAsset(assetPath, checksum, algorithm string) *FetchError {
computedChecksum, err := computeChecksum(assetPath, algorithm)
if err != nil {
return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally nicer when writing tests and debugging to return a different type for each, well, type of error. All you do is define a struct (or even a type alias) and add an Error() string method to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you. This was an early Golang tool I wrote and I didn't want to refactor all the errors throughout the code so I stuck with the existing idiom.

checksum.go Outdated
return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath))
}

fmt.Printf("Checksum matches!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not use any sort of logging library with fetch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised about that as well. Again, I didn't want to invest time in updating those kinds of things so I stuck with the existing idiom.

checksum.go Outdated
return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error())
}
if computedChecksum != checksum {
return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to provide a bit more info about what this could mean in this error message. E.g, "This means that either you are using the wrong checksum value in your call to fetch (e.g., perhaps you updated the version of the module you're installing but not the checksum?) or that someone has replaced the asset with a potentially dangerous one and you should be very careful about proceeding."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I'll update now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checksum.go Outdated
}

func computeChecksum(filePath string, algorithm string) (string, error) {
var checksum string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a useless mutable value to have. I'd just return "", err in all error cases below and call return hasherToString in the non-error cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes like to declare the return var at the top, but your suggestion is probably a little cleaner. updated.

checksum.go Outdated
defer file.Close()

switch algorithm {
case "sha256":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this case statement is identical in all but the hasher used. Perhaps refactor into a getHasher method that returns sha256.New() or sha512.New() and then simplify this method to just a single path that calls io.Copy and hasherToString.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Updated.


// Checksums can be computed by running "shasum -a [256|512] /path/to/file" on any UNIX system
const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA256="4314590d802760c29a532e2ef22689d4656d184b3daa63f96bc8b8f76f5d22f0"
const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA512="28d9e487c1001e3c28d915c9edd3ed37632f10b923bd94d4d9ac6d28c0af659abbe2456da167763d51def2182fef01c3f73c67edf527d4ed1389a28ba10db332"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you users get these checksums from? If it's from the /releases page, then anyone who can replace the asset with a fake one could also replace the checksum with a fake one. To be truly useful, wouldn't we have to publish these checksums in some totally separate location? Or are you making the assumption that when someone goes to get the checksum value, they are verifying that the code is valid, and this merely ensures no one can pull the carpet out from under them later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. When I went to download a third-party health checker for Kafka, I realized how risky (and yet common!) including such a library is. If an attacker wanted to exploit a library that many people use, one way is to find a way into someone's GitHub account (leaked GitHub token?), leave the commits unaltered and update only the release asset (where in most cases no checksum at all is published).

In this case, I'm making the assumption that at the time you download the code, you have an opportunity to look around the repo and conclude for yourself that it appears not to be compromised. There's only so much you can do here, of course, but at least if in the future when you've long forgotten about this library, something malicious happens, you'll be notified.

In the end, I suppose it's just more defense in depth.

func downloadReleaseAsset(assetName string, destPath string, githubRepo GitHubRepo, tag string) (string, error) {
var assetPath string

if assetName == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This code got updated when I changed assetNames from a []string to a string. That being said, I think the error handling could generally be improved here, as discussed above.

@josh-padnick
Copy link
Contributor Author

All feedback responded to and tests passing, so merging now.

@josh-padnick josh-padnick merged commit 349ab84 into master Feb 23, 2018
@josh-padnick josh-padnick deleted the checksum branch February 23, 2018 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants