Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is it possible for git filter-repo to squash file changes to the initial commit that added the file #492

Open
Morph1984 opened this issue Jul 24, 2023 · 1 comment

Comments

@Morph1984
Copy link

I have a repo which has a thousands of pngs that were poorly compressed added across many multiple commits.
As such, I pushed a commit which aggressively recompresses all the pngs. This action however increased the overall size of the repository as deltas are tracked.
Therefore, I'd like to squash these changes to the initial commit(s) that added them so it is as-if the files were already compressed when they were added, thus removing any file deltas.

@newren
Copy link
Owner

newren commented Aug 1, 2024

Sure. One way to do it is that if you have a program that recompresses a given png, perhaps invoked by a command line like

   recompress --png --level 8 ${PNG_FILE}

then you could run

   lint-history --relevant 'return filename.endswith(b".png")' recompress --png --level 8

and it'd go through history modifying all the png files by passing them to the recompress program and replacing their contents with whatever that command provided.

A second, independent way to do it would be to get all the object names of the blobs in question. You should be able to do that from running git log -1 --raw --no-abbrev ${COMMIT_WHERE_YOU_COMPRESSED_PNGS}. That would show output like:

:100755 100755 edf570fde099c0705432a389b96cb86489beda09 9cce52ae0806d695956dcf662cd74b497eaa7b12 M      foo.png
:100755 100755 644f7c55e1a88a29779dc86b9ff92f512bf9bc11 88b02e9e45c0a62db2f1751b6c065b0c2e538820 M      bar.png

then, using those object names and filenames, you could make a commit callback that just modifies the values, e.g.:

git filter-repo --commit-callback '
    for change in commit.file_changes:
        if change.filename == b"foo.png" and change.blob_id == b"edf570fde099c0705432a389b96cb86489beda09":
            change.blob_id = b"9cce52ae0806d695956dcf662cd74b497eaa7b12"
        if change.filename == b"bar.png" and change.blob_id == b"644f7c55e1a88a29779dc86b9ff92f512bf9bc11":
            change.blob_id = b"88b02e9e45c0a62db2f1751b6c065b0c2e538820"
'

Naturally, you'd have a much longer list than this where I only have two example if-statements and reassignments, but it shows the idea. This second method also relies on the fact that you didn't specify any filtering commands that would need the blob contents (such as a --blob-callback or --replace-text or something), but that seems safe since any blob filtering would kind of mess up what you're trying to do here anyway. (And if you are trying to do blob filtering of things other than pngs, you could just call filter-repo twice, once to fix up the pngs as shown here, and then the second invocation to make the other changes.)

Anyway, does that help? (Or would it, if I had gotten back to you about a year sooner?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants