Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for validating GitHub Release Asset checksums #34

Merged
merged 5 commits into from
Feb 23, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
.idea

# Covers Visual Studio Code
.vscode

# If the binary gets built during dev, don't commit it.
fetch
fetch.exe
Expand Down
42 changes: 23 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,24 @@
fetch makes it easy to download files, folders, or release assets from a specific commit, branch, or tag of
a public or private GitHub repo.

#### Motivation

[Gruntwork](http://gruntwork.io) helps software teams get up and running on AWS with DevOps best practices and
world-class infrastructure in about a day. Sometimes we publish scripts and binaries that clients use in their
infrastructure, and we want an easy way to install a specific version of one of those scripts and binaries. While this
is fairly straightforward to do with public GitHub repos, as you can usually `curl` or `wget` a public URL, it's much
trickier to do with private GitHub repos, as you have to make multiple API calls, parse JSON responses, and handle
authentication. Fetch makes it possible to handle all of these cases with a one-liner.

#### Features

- Download from a specific git tag, branch, or commit SHA.
- Download a single file, a subset of files, or all files from the repo.
- Download a binary asset from a specific release.
- Verify the SHA256 or SHA512 checksum of a binary asset.
- Download from public repos, or from private repos by specifying a [GitHub Personal Access Token](https://help.github.com/articles/creating-an-access-token-for-command-line-use/).
- When specifying a git tag, you can can specify either exactly the tag you want, or a [Tag Constraint Expression](#tag-constraint-expressions) to do things like "get the latest non-breaking version" of this repo. Note that fetch assumes git tags are specified according to [Semantic Versioning](http://semver.org/) principles.

#### Quick examples

Download folder `/baz` from tag `0.1.3` of a GitHub repo and save it to `/tmp/baz`:
Expand All @@ -19,23 +37,6 @@ fetch --repo="https://github.com/foo/bar" --tag="0.1.5" --release-asset="foo.exe

See more examples in the [Examples section](#examples).

#### Features

- Download from a specific git tag, branch, or commit SHA.
- Download a single file, a subset of files, or all files from the repo.
- Download a binary asset from a specific release.
- Download from public repos, or from private repos by specifying a [GitHub Personal Access Token](https://help.github.com/articles/creating-an-access-token-for-command-line-use/).
- When specifying a git tag, you can can specify either exactly the tag you want, or a [Tag Constraint Expression](#tag-constraint-expressions) to do things like "get the latest non-breaking version" of this repo. Note that fetch assumes git tags are specified according to [Semantic Versioning](http://semver.org/) principles.

#### Motivation

[Gruntwork](http://gruntwork.io) helps software teams get up and running on AWS with DevOps best practices and
world-class infrastructure in about 2 weeks. Sometimes we publish scripts and binaries that clients use in their
infrastructure, and we want an easy way to install a specific version of one of those scripts and binaries. While this
is fairly straightforward to do with public GitHub repos, as you can usually `curl` or `wget` a public URL, it's much
trickier to do with private GitHub repos, as you have to make multiple API calls, parse JSON responses, and handle
authentication. Fetch makes it possible to handle all of these cases with a one-liner.

## Installation

Download the fetch binary from the [GitHub Releases](https://github.com/gruntwork-io/fetch/releases) tab.
Expand Down Expand Up @@ -64,8 +65,11 @@ The supported options are:
the `/folder` path and all files below it). By default, all files are downloaded from the repo unless `--source-path`
or `--release-asset` is specified. This option can be specified more than once.
- `--release-asset` (**Optional**): The name of a release asset--that is, a binary uploaded to a [GitHub
Release](https://help.github.com/articles/creating-releases/)--to download. This option can be specified more than
once. It only works with the `--tag` option.
Release](https://help.github.com/articles/creating-releases/)--to download. It only works with the `--tag` option.
- `--release-asset-checksum` (**Optional**): The checksum that a release asset should have. Fetch will fail if this value
is non-empty and does not match the checksum computed by Fetch.
- `--release-asset-checksum-algo` (**Optional**): The algorithm fetch will use to compute a checksum of the release asset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems equally important to have checksums for modules from the --modules param. Probably the simplest option is to publish the commit ID (sha1) and compare that to the commit ID of the repo we just downloaded via git clone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the --modules pattern of gruntwork-install? Well, can't you just use Fetch to download the exact commit you want to achieve what you're suggesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, --modules. We typically use tags for specifying the version we want. Should our recommendation be that users specify both the tag and the commit ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, keep in mind that fetch allows you to use tag constrain expressions, but I see where you're going with this. Yes, I think we should permit using both a --tag and a --commit-id param at the same time. Fetch can then error out if the two don't match. I think I'll file a GitHub issue for that for now, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35

Supported values are `sha256` and `sha512`.
- `--github-oauth-token` (**Optional**): A [GitHub Personal Access
Token](https://help.github.com/articles/creating-an-access-token-for-command-line-use/). Required if you're
downloading from private GitHub repos. **NOTE:** fetch will also look for this token using the `GITHUB_OAUTH_TOKEN`
Expand Down
63 changes: 63 additions & 0 deletions checksum.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package main

import (
"fmt"
"crypto/sha256"
"crypto/sha512"
"os"
"io"
"hash"
"encoding/hex"
)

func verifyChecksumOfReleaseAsset(assetPath, checksum, algorithm string) *FetchError {
computedChecksum, err := computeChecksum(assetPath, algorithm)
if err != nil {
return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally nicer when writing tests and debugging to return a different type for each, well, type of error. All you do is define a struct (or even a type alias) and add an Error() string method to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you. This was an early Golang tool I wrote and I didn't want to refactor all the errors throughout the code so I stuck with the existing idiom.

}
if computedChecksum != checksum {
return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to provide a bit more info about what this could mean in this error message. E.g, "This means that either you are using the wrong checksum value in your call to fetch (e.g., perhaps you updated the version of the module you're installing but not the checksum?) or that someone has replaced the asset with a potentially dangerous one and you should be very careful about proceeding."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I'll update now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

fmt.Printf("Checksum matches!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not use any sort of logging library with fetch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised about that as well. Again, I didn't want to invest time in updating those kinds of things so I stuck with the existing idiom.


return nil
}

func computeChecksum(filePath string, algorithm string) (string, error) {
var checksum string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a useless mutable value to have. I'd just return "", err in all error cases below and call return hasherToString in the non-error cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes like to declare the return var at the top, but your suggestion is probably a little cleaner. updated.


file, err := os.Open(filePath)
if err != nil {
return checksum, err
}
defer file.Close()

switch algorithm {
case "sha256":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this case statement is identical in all but the hasher used. Perhaps refactor into a getHasher method that returns sha256.New() or sha512.New() and then simplify this method to just a single path that calls io.Copy and hasherToString.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Updated.

fmt.Printf("Computing checksum of release asset using SHA256\n")
hasher := sha256.New()
if _, err := io.Copy(hasher, file); err != nil {
return checksum, err
}

checksum = hasherToString(hasher)
case "sha512":
fmt.Printf("Computing checksum of release asset using SHA512\n")
hasher := sha512.New()
if _, err := io.Copy(hasher, file); err != nil {
return checksum, err
}

checksum = hasherToString(hasher)
default:
return checksum, fmt.Errorf("The checksum algorithm \"%s\" is not supported", algorithm)
}

return checksum, nil
}

// Convert a hasher instance (the common interface used by all Golang hashing functions) to the string value of that hasher
func hasherToString(hasher hash.Hash) string {
return hex.EncodeToString(hasher.Sum(nil))
}
51 changes: 51 additions & 0 deletions checksum_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package main

import (
"testing"
"github.com/stretchr/testify/assert"
"io/ioutil"
)

const SAMPLE_RELEASE_ASSET_GITHUB_REPO_URL ="https://github.com/gruntwork-io/health-checker"
const SAMPLE_RELEASE_ASSET_VERSION="v0.0.2"
const SAMPLE_RELEASE_ASSET_NAME="health-checker_linux_amd64"

// Checksums can be computed by running "shasum -a [256|512] /path/to/file" on any UNIX system
const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA256="4314590d802760c29a532e2ef22689d4656d184b3daa63f96bc8b8f76f5d22f0"
const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA512="28d9e487c1001e3c28d915c9edd3ed37632f10b923bd94d4d9ac6d28c0af659abbe2456da167763d51def2182fef01c3f73c67edf527d4ed1389a28ba10db332"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you users get these checksums from? If it's from the /releases page, then anyone who can replace the asset with a fake one could also replace the checksum with a fake one. To be truly useful, wouldn't we have to publish these checksums in some totally separate location? Or are you making the assumption that when someone goes to get the checksum value, they are verifying that the code is valid, and this merely ensures no one can pull the carpet out from under them later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. When I went to download a third-party health checker for Kafka, I realized how risky (and yet common!) including such a library is. If an attacker wanted to exploit a library that many people use, one way is to find a way into someone's GitHub account (leaked GitHub token?), leave the commits unaltered and update only the release asset (where in most cases no checksum at all is published).

In this case, I'm making the assumption that at the time you download the code, you have an opportunity to look around the repo and conclude for yourself that it appears not to be compromised. There's only so much you can do here, of course, but at least if in the future when you've long forgotten about this library, something malicious happens, you'll be notified.

In the end, I suppose it's just more defense in depth.


func TestVerifyReleaseAsset(t *testing.T) {
tmpDir := mkTempDir(t)

githubRepo, err := ParseUrlIntoGitHubRepo(SAMPLE_RELEASE_ASSET_GITHUB_REPO_URL, "")
if err != nil {
t.Fatalf("Failed to parse sample release asset GitHub URL into Fetch GitHubRepo struct: %s", err)
}

assetPath, fetchErr := downloadReleaseAsset(SAMPLE_RELEASE_ASSET_NAME, tmpDir, githubRepo, SAMPLE_RELEASE_ASSET_VERSION)
if fetchErr != nil {
t.Fatalf("Failed to download release asset: %s", fetchErr)
}

checksumSha256, fetchErr := computeChecksum(assetPath, "sha256")
if fetchErr != nil {
t.Fatalf("Failed to compute file checksum: %s", fetchErr)
}

checksumSha512, fetchErr := computeChecksum(assetPath, "sha512")
if fetchErr != nil {
t.Fatalf("Failed to compute file checksum: %s", fetchErr)
}

assert.Equal(t, SAMPLE_RELEASE_ASSET_CHECKSUM_SHA256, checksumSha256, "SHA256 checksum of sample asset failed to match.")
assert.Equal(t, SAMPLE_RELEASE_ASSET_CHECKSUM_SHA512, checksumSha512, "SHA512 checksum of sample asset failed to match.")
}

func mkTempDir(t *testing.T) string {
tmpDir, err := ioutil.TempDir("", "")
if err != nil {
t.Fatalf("Failed to create temp directory: %s", err)
}

return tmpDir
}
2 changes: 2 additions & 0 deletions fetch_error_constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ const INVALID_GITHUB_TOKEN_OR_ACCESS_DENIED = 401
const REPO_DOES_NOT_EXIST_OR_ACCESS_DENIED = 404

const FAILED_TO_DOWNLOAD_FILE = 500
const CHECKSUM_DOES_NOT_MATCH = 510
const ERROR_WHILE_COMPUTING_CHECKSUM = 520
Loading