-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter file contents, Was: Suggestion: add --as-binary option to convert line endings #95
Conversation
My spontaneous reaction is that this is outside the scope of fast-export. Filter branch is the tool to use when you have to mangle the content of the version controlled files. Additionally I see a number of problems with the implementation (just to be clear, even if these problems are addressed I'm not very likely to include the proposed functionality):
|
Indeed, using filter-branch after fast-import is the usual approach. However, converting line endings is the most common, and often the only, post-import operation when migrating repos from hg to git. Wouldn't it be convenient to take care of that with just an extra flag or two and not have to filter-branch (which is quite slow) afterwards? I believe restricting conversion to target just LF is justified because git uses LF line endings for text files internally, and it takes some contortions to force git to store them in a different format even if one wanted it to. I implemented the rest of your very reasonable suggestions. |
[edited to add a hg-hash parameter to the filter] As I said, I think this is outside the scope of fast-export. Since your first proposal this simple filter has already been extended to handle one more line ending. Pretty soon someone else will want support for converting character encodings, running clang-format on c-code, stripping old forgotten What about adding a new flag |
|
How about this one? I pass |
@atykhyy can you please add this option to |
OK, thanks for quick reaction. But I think I don't understand how to use the filter. I guess the filtering script has no access to the filtered files, it simply executes in a git repo with empty index... So HOW to properly change the contents? Should I amend each file in each commit? I still don't know how to do it having only paths that does not exist in fs and hashes... To be honest, I expected that the filtering script should have an access to files directly, but if there's a way to convert line endings or make global substitutions this way, I'd be glad to know... |
Yes, amend each file. There is no way the filtering script can have direct access to files in Git, they aren't imported until the output of
instead. Send the output of |
Ah, so the file contents are piped into the script and the stdout is used as file contents! That's super convenient, I came up with the following script for documentation: #!/bin/sh
#
# This script is executed in your **git** repository root
# for every file in every commit.
#
# The file contents is piped into it, and its output is used
# as file contents. It also takes three arguments:
path=$1
hash=$2
is_binary=$3
# Let's perform a line endings conversion without touching the binaries:
if [ "$is_binary" -eq "1" ]; then cat; else
dos2unix
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing the documentation, could you add a section to the README explaining the feature? Why not use the dos2unix example in from the discussion with @Himura2la?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this looks good, address the comments and squash it into a single commit, and I'll merge it. I don't know why the comment about binascii.hexlify below shows up as "outdated", but have a look at it too.
hg-fast-export.py
Outdated
f=ctx.filectx(file) | ||
d=f.data() | ||
if filter_contents: | ||
a=filter_contents + [filename,binascii.hexlify(f.filenode()),'1' if f.isbinary() else '0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, that's why I'm asking, but I would expect that converting the raw node to hex should already be available from one of the mercurial packages, why not use it instead of pulling in an additional package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look for one.
OK. What do you think about |
If the |
Added example to readme and squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some variable naming nitpicks, looks good otherwise.
README.md
Outdated
# $2 = Mercurial's hash of the file | ||
# $3 = "1" if Mercurial reports the file as binary, otherwise "0" | ||
|
||
if "$3"; then dos2unix; else cat; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try
X=1
if "$X" ; then echo true ; else echo false ; fi
and you'll see that this is not valid sh, you need to do if [ "$3" == "1" ]; ...
hg-fast-export.py
Outdated
d=f.data() | ||
if filter_contents: | ||
import subprocess | ||
a=filter_contents + [filename,node.hex(f.filenode()),'1' if f.isbinary() else '0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a
? give it a more descriptive name please, what about filter_cmd
?
hg-fast-export.py
Outdated
if encoding: | ||
filename=file.decode(encoding).encode('utf8') | ||
else: | ||
filename=file | ||
f=ctx.filectx(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f
what is it? Call it file_info
so I that I don't mix it up with all the other file related variables.
hg-fast-export.py
Outdated
import subprocess | ||
a=filter_contents + [filename,node.hex(f.filenode()),'1' if f.isbinary() else '0'] | ||
try: | ||
p=subprocess.Popen(a,stdin=subprocess.PIPE,stdout=subprocess.PIPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subproc
instead of p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Fixed variable naming. |
Merged, thank you for your contribution. |
Converting line endings in text files to LF is often needed when migrating from Mercurial to Git. Right now it is usually done by git filter-branch and other heavy-weight approaches, but it is very easy to do in fast-export provided that binary files have distinct filenames (e.g. distinct extensions). This change adds an option
--as-binary
that turns on auto-conversion of line endings to LF and takes a list of binary file extensions, e.g.hg-fast-export ..\repo --as-binary=.so,.dll
. As an implementation side-effect, dot-files (e.g..hgignore
) are treated as text.