-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reducing nu_scripts
git repo size by removing images and some obsolete file and rewriting history
#970
Comments
Sounds good to me. I don't know anything about git-lfs. Does it change the way you interact with the repo? |
@fdncred Users who have cloned the 'nu_scripts' repository without installing 'git-lfs' will see text files in place of binary files, like this:
For those who have it installed, nothing will change. On the GitHub site, images in markdown documents will be displayed as usual, and no changes are needed. Here are instructions on how to move current files to 'git-lfs': https://docs.github.com/en/repositories/working-with-files/managing-large-files/moving-a-file-in-your-repository-to-git-large-file-storage In short we will need to use 'filter-repo' to rewrite history and remove binary files. But before that we will need to accept or reject the current PRs I've done it only once before, so I will review all the steps and let you know what's needed. I will rehearse everything on my clones to make sure that everything will go well and show you results. Does that sound okay? |
What happens when someone doesn't have git lfs installed after we've made this change? |
In place of
Those, who have git-lfs - they will see regular png image. |
This operation has benefits and considerations that I propose to discuss and communicate with the community. However, I'm still gathering information, and these are just intermediate results. I have tested all the operations, and the results can be found here:
Benefits:
Considerations and Caveats:
|
I've just found out that bandwidth from 'git lfs' is charged, and only 1 GB is free. Plus, it has other downsides like losing signatures from signed commits, and PR history will point to the wrong commits. Even though it would be much tidier to have a smaller repo size, the downsides are quite noticeable. I will stop here. |
Though for new binary objects, it would be better to use 'git lfs' to avoid increasing the repo size. |
Another thought is to put the screenshots in another repo and have a submodule in this repo. I think then you could clone and only get the scripts unless you wanted both. I don't really know how all that works though. |
I like this variant, and it is feasible to do with the workflow that I described. But again, it will also involve rewriting history to reduce the size of this repo, which might be a bit painful for current contributors and their forks. I think I should ask the community when I have the energy for that. |
Personally I think that's a hit we're ultimately going to have to take, so probably the sooner the better. I think I'd recommend keeping the issue open, since it is an issue, even if we don't have a full solution just yet. |
git-lfs
objects to reduce the repo sizenu_scripts
git repo size by removing images and some obsolete file and rewriting history
Here is the rationale and the actions that I propose. # Let's check the size of the `.git` folder
> du .git/ | get 0.physical
236.2 MB let $a = git verify-pack -v .git/objects/pack/*.idx | lines | split column -r ' +' | rename SHA-1 type size size-in-packfile offset-in-packfile depth base-SHA-1 | where type == 'blob' | select SHA-1 size | into filesize size
let $b = git rev-list --objects --all | lines | split column -r ' +' | rename SHA-1 filename
let $sizes = $a | join $b SHA-1 | sort-by size -r # Let's examine the largest files
> $sizes | first 5 | print
╭─#─┬──────────────────SHA-1───────────────────┬───size───┬──────────────────────filename───────────────────────╮
│ 0 │ 87cd314f73c3bd004ea52b060d8a8d6bf40fcbe7 │ 1.3 MB │ custom-completions/auto-generate/completions/mvn.nu │
│ 1 │ 09e461721303572ea79c390feec763c498b68335 │ 369.0 KB │ assets/gh-emoji.json │
│ 2 │ c2dc98f8c9aa296affb97d2dbeb84f2c6dc09adf │ 241.6 KB │ themes/screenshots/github-light-colorblind.png │
│ 3 │ 9a258215d6435fac43dbe125b8cf2d77a6cc5b82 │ 241.3 KB │ themes/screenshots/github-light-default.png │
│ 4 │ e8557e9b8cbb79b0705792cb316e46bfd98c5559 │ 240.6 KB │ themes/screenshots/github-light.png │
╰─#─┴──────────────────SHA-1───────────────────┴───size───┴──────────────────────filename───────────────────────╯
# Notice that `custom-completions/auto-generate/completions/mvn.nu` seems broken and therefore useless. So I propose to delete it as well. Let's check how much space non-'.png' files occupy
> $sizes | where filename !~ '.*\.png' | get size | math sum | print
7.0 MB # Let's check the size of .png files
> $sizes | where filename =~ '\.png' | get size | math sum | print
253.4 MB # Now we execute the command that will remove broken completions in `mvn.nu` and all binary `.png` files
python3 /Users/user/.local/bin/git-filter-repo --path custom-completions/auto-generate/completions/mvn.nu --path-glob '**/*.png' --invert-paths > du .git/ | get 0.physical
4.4 MB
# The .git folder size was reduced from 236.2 MB to 4.4 MB.
# And now I push the results to my home repo
> git remote add mu https://github.com/maxim-uvarov/nu_scripts_reduced_size
> git push mu main --force |
Is there a way to render the themes as text/html/markdown/ansi so we could store that instead of images? |
Possibly? I'm not a CSS wiz, so I'm not sure how difficult it would be As for Markdown, we'd have to find a library that supports RGB colors through ANSI. I'm not sure whether Shiki does or not. The one used by Discord, for instance, only supports color names. I found that out the hard way when I tried to copy ANSI output from Nushell to Discord only to find out that my theme was the problem. I had to switch to |
Some additional thoughts on this. I've been playing with NUPM and other options for installing packages, and It's also slow. What should be a split-second download is ~20 seconds on a 1gbps link. Even at 250mb, I think we need to fix it. It may not sound like much for a repo, but this is also a package that we want Nushell users (not just developers) to be able to install. There are two stages to this:
|
We'll lose all stars if we start a new repo. Not really interested in that. |
Also, there is some discussion on the topic in the general channel discord and in this thread |
I don't see much problem with PRs and commits history. This is history of commits of my fork, where I have run For example in this commit: maxim-uvarov/nu_scripts_reduced_size@1a5fac0 we see the number of the PR. Further, with this number we can find a PR and just go through all the comments there. Links to hashes will continue to work. The only thing - the opened pages with commits will tell that those commits don't belong to the current repository:
|
Rewriting the history is supposed to be done by a trusted member of the core team. This means that the only things that will change are:
Messages and authors of commits, along with their dates, will remain unchanged. The consequences don’t seem too bad to me because: We can have a backup clone of the repo that won’t sync anymore. All the hashes and commits there will remain unchanged. We can write some markdown explaining what happened and add a link to the backup clone. And we can commit this markdown immediately after the rewrite. Additionally, we can add a comment to all completed PRs on GitHub, explaining where to find the history of changes if needed (even though all the links to hashes will still work, GitHub will indicate that the commits no longer belong to the repo). So from my point of view, we won’t harm the repository’s archeology much. The most problematic issue, as I see it, involves current unmerged PRs, forks that will need to be resynced or rebased, and the related inconveniences. |
Probably worth adding this message from @devyn on Discord in here for the context/history:
|
Related: PoC
|
it's progress 🚀 |
My current progress on
|
That's very cool! Can't wait to see your script. |
Now the repo size seems huge to me:
apparently because images are stored here.
Can we reduce the size of the repo by storing images using
git-lfs
?I can do all the preparations and transfer the results to contributors with push rights.
The text was updated successfully, but these errors were encountered: