-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support optimized fetching and caching #137
Comments
The performance and comparisons to note:
% time protofetch fetch
INFO Resolving github.com/grpc/grpc
INFO Fetching dependencies source files...
INFO Copying proto files from appbiotic descriptor...
INFO Creating new worktree for grpc_health_v1 at /Users/kris/.cache/protofetch/dependencies/grpc_health_v1/b8a04acbbf18fd1c805e5d53d62ed9fa4721a4d1.
protofetch fetch 58.81s user 6.38s system 51% cpu 2:05.90 total Shallow git clone (178 MB, 5 seconds): % time git clone --depth=1 https://github.com/grpc/grpc
Cloning into 'grpc'...
remote: Enumerating objects: 13476, done.
remote: Counting objects: 100% (13476/13476), done.
remote: Compressing objects: 100% (8196/8196), done.
remote: Total 13476 (delta 4632), reused 10045 (delta 3867), pack-reused 0
Receiving objects: 100% (13476/13476), 19.37 MiB | 9.00 MiB/s, done.
Resolving deltas: 100% (4632/4632), done.
Updating files: 100% (12308/12308), done.
git clone --depth=1 https://github.com/grpc/grpc 1.13s user 1.07s system 41% cpu 5.265 total Command line % time curl -sSL https://github.com/grpc/grpc/archive/b8a04acbbf18fd1c805e5d53d62ed9fa4721a4d1.tar.gz | tar -C protos --strip-components 3 -zxf - '**/*.proto'
curl -sSL 0.23s user 0.07s system 8% cpu 3.720 total
tar -C protos --strip-components 3 -zxf - '**/*.proto' 0.24s user 0.05s system 7% cpu 3.720 total Repackaging the proto files into a |
Hi @kriswuollett, thank you for opening the issue. There is definitely some room for optimization here, we just didn't have enough time to work on it. I will try to take a look.
The "expected" workflow is that you specify a tag or a branch in your |
Ah, yes, the lockfile helps. My previous experience of using something similar, sha256 checksum with Bazel http_archive for external dependencies, not necessarily git, was showing. |
- When we know the commit hash, only fetch this commit (and its ancestors) - When we only have a revision/branch, only fetch the relevant refs (and their ancestors). This makes fetches significantly faster. For example, for googleapis/googleapis, it decreases the time from 1m20s to about 30s. An even bigger improvement would be to 1. Shallow fetch. This is supported by libgit2 but I couldn't make it work. 2. Sparse checkout. This is not even supported by libgit2. #137
Just started trying out
protofetch
upon a recommendation. However the first thing I noticed is how slow it can be depending on the source, currently only git. The example I've encountered was setting up the following dependency:The
grpc/health/v1/health.proto
file is just 2416 B. Looks like it mirrored the entire repo into~/.cache/protofetch/github.com/grpc/grpc
taking up 416 MB and about a minute for it to be ready. Performance is machine and network dependent of course, I'm using an M2 mac. And when doing a shallow git clone myself this is the output to see also network performance:The shallow clone takes up less space, just 178 MB.
So my thought was, even if one could use a repo mirror to support multiple versions of different deps from the same source, would it really beat the efficiency, in practice, of a "shallow" fetch and strip out all but the proto files? Perhaps even wrapped in a
.tar.gz
that could just be streamed and decoded in memory when needed. I'd think actual git mirrors or clones would only be necessary if fetching git submodules was supported.Shouldn't be the user's fault that the proto repo is too large.
Also noticed that
revision
can be a tag or a hash. IMO both should be supported and use the hash to confirm the tag when both are provided. Git tags are not constants, and being able to specify both serves as functional documentation rather than just a manual code comment I'd be doing now like in the above example.In any case, if there is any desire to support a potentially breaking config change in the future, I'd think it would be great to support different fetch types like plain http (tarball) with optional sha256 checks as well despite sometimes the hash of a source like git source archives may not be guaranteed for long term on some platforms.
The text was updated successfully, but these errors were encountered: