Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking out a dangling symlink on Windows is treated as a hard error #1354

Closed
EliahKagan opened this issue May 6, 2024 · 4 comments · Fixed by #1363
Closed

Checking out a dangling symlink on Windows is treated as a hard error #1354

EliahKagan opened this issue May 6, 2024 · 4 comments · Fixed by #1363
Labels
acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented May 6, 2024

Current behavior 😯

A repository may contain a dangling (i.e., broken) symbolic link. On Windows, when such a repository's working tree is checked out, such as in the checkout that occurs in a clone done with gix clone, this causes the entire checkout to fail. gix clone reports:

Error: IO error while writing blob or reading file metadata or changing filetype

Caused by:
    The system cannot find the file specified. (os error 2)

The cause appears to be that the symbolic link's target is not found, due to not existing (or otherwise being in a location where the user running gix cannot list its metadata).

On Windows, file and directory symlinks are two separate kinds of symlinks, which different metadata in the symlink itself. Therefore, on Windows, both Git and gitoxide check what kind of file the target is, so that, for symlinks that are not broken, they are created as the correct kind of symlink for the target. However, whereas Git falls back to creating file symlinks when the target is missing (as if the target were a regular file), gitoxide treats this as a hard error.

Expected behavior 🤔

Actually, I am not sure what the best thing to do is. I am inclined to say that we should follow the Git behavior here, but really the main problem is that the difference is unexpected. Therefore, one of the following approaches should be taken:

  • Fall back to creating a file symlink, when the target cannot be found, as Git does.
  • Keep failing, but report a specific error. Also document this behavior prominently.
  • Keep failing, but only for that specific file; have the checkout otherwise proceed, failing with an error status and specific error messages for any non-created symlinks, but creating the other files. Also document this behavior prominently.
  • Fall back to creating a file symlink, but also report a warning about it, in case that is not what the user wants.
  • Some combination of the above, by having it be configurable (and, at least if the default behavior does not match that of Git, then documenting it prominently).

When core.symlinks is false, there is no failure, since no attempt is made to create symbolic links. So in that case nothing needs to be done differently for this. (Note that this issue is, as far as I can tell, entirely independent of #1353, and the steps to reproduce given below make sure to avoid the effect of #1353.)

It seems to me that there are substantial advantages of being able to create a dangling symlink when cloning, though also some disadvantages. These are the cases I am aware of:

  • The symlink may be broken due to a bug in the repository, but if core.symlinks is set to true then the user probably prefers to work on that problem with the actual dangling symlink, and may especially prefer that the other files be checked out.
  • The symlink may be broken due to an unintentionally missing file outside the working tree, or inside the working tree (ignored or not). It could be fixed by putting the target file in place, though on Windows the target would have to be a regular file for this to work (without recreating the symlink).
  • The symlink may be broken due to an intentionally missing file, intended to be created after cloning. This is the same as the previous case, except that creating the dangling symlink is likely to be strongly preferred, at least if it will be possible to dereference properly once the file exists (as would be the case with Git's default, if it is a regular file).
  • The symlink is intended only to work on Unix-like systems. For example, maybe it's a symlink to something in /etc. But it should at least be possible to check out the repository on Windows. In this case, it doesn't matter what happens with the symlink itself.

Git behavior

Git creates a file symbolic link if it cannot figure out what the target is supposed to be.

Although that behavior seems good, the absence of any outwardly observable special treatment for this case is arguably not ideal. Git does not warn or otherwise print any message to indicate that this happened, and the clone reports success, i.e. exit status 0.

Commands such as git status do not show that anything is amiss, though that is arguably ideal, at least in the absence of other design changes, given such commands also do not show that anything special is going on when core.symlinks is set to false and regular files are created in their place instead.

Steps to reproduce 🕹

Create a repository with a dangling symlink

This can be done on a Unix-like system or Windows. On Windows, if you have trouble, you can create a symlink to an actual file, then delete the target. As in #1353, I created the repository on Ubuntu and uploaded it to a SSH remote to ensure that the problem was not specific to file:// remotes.

git init has-dangling-symlink
cd has-dangling-symlink
ln -s target symlink
git add symlink
git commit -m 'Initial commit'

Or use the has-dangling-symlink repository, which is already set up. (The rest of these instructions are written assuming that is used.)

The remaining steps are to be done on Windows, since this is a Windows-specific bug. It should also be done with a user account capable of creating symbolic links; at minimum, doing it in an account that cannot do this would not decisively show the bug. However, no global or other Git configuration related to symbolic links is required, since -c is used below in setting core.symlinks to true.

Check that the clone should succeed

On Windows, in PowerShell or Git Bash, ensure you have no has-dangling-symlink directory by running this command and checking that it gives an expected error:

ls has-dangling-symlink

That is important since the effect of attempting to clone to an already-populated location could potentially be confused with the situations being shown here.

Attempt to clone with gix

gix clone -c core.symlinks=true git@github.com:EliahKagan/has-dangling-symlink.git

This fails, with an error message as shown above, and the has-dangling-symlink directory is not kept.

Verify that it works when it need not create symlinks

When core.symlinks is set to false, as is often the case on Windows, there is no error. To check that this is the case, first verify that there is still no has-dangling-symlink directory, then run a command like the above gix clone command but with false instead of true:

gix clone -c core.symlinks=false git@github.com:EliahKagan/has-dangling-symlink.git

This succeeds, and (since core.symlinks is false) a regular file is created in place of the symlink.

Optionally, compare to Git

This may be a good idea to do, especially if there is any doubt that the test setup is one where the user is capable of creating symlinks. (It is also part of reproducing this bug in the sense that it verifies that this really is a way that gix behaves differently from git.)

First delete the has-dangling-symlink directory that was created in the previous step:

  • In PowerShell: rm -r -fo has-dangling-symlink
  • In Git Bash: rm -rf has-dangling-symlink

Now run a command like the original clone command, but with gix instead of git:

git -c core.symlinks=true clone git@github.com:EliahKagan/has-dangling-symlink.git

This should succeed and create the dangling symbolic link. That it is a symbolic link, and that its target is a file that is not present, can be verified by running:

ls -l has-dangling-symlink

This should show that symlink -> target while also not listing any file target.

@Byron Byron added C-Windows Windows-specific issues acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed labels May 6, 2024
@Byron
Copy link
Member

Byron commented May 6, 2024

Thanks a lot for reporting!

I also think that it appears reasonable to start out Git-compatible and fallback to creating a dangling file symlink at first.
In terms of error-handling, I am not sure i understand what git clone does - does it exit with non-zero status, i.e. officially fails? Ideally, that behaviour can also be reproduced in gix eventually, event though I think the API should report success while making these special files discoverable, so the caller (gix CLI in this case) can turn that into an error if it wanted to.

@EliahKagan
Copy link
Member Author

EliahKagan commented May 6, 2024

In terms of error-handling, I am not sure i understand what git clone does - does it exit with non-zero status, i.e. officially fails?

Git silently falls back to creating a file symlink, does not report that anything of note occurred, and exits indicating success (i.e. with a status of 0).

C:\Users\ek\src> git --version
git version 2.45.0.windows.1
C:\Users\ek\src> git -c core.symlinks=true clone [email protected]:EliahKagan/has-dangling-symlink.git
Cloning into 'has-dangling-symlink'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 7 (delta 0), reused 7 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
C:\Users\ek\src> $LASTEXITCODE
0
C:\Users\ek\src> ls has-dangling-symlink

    Directory: C:\Users\ek\src\has-dangling-symlink

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---            5/6/2024  3:05 PM            617 COPYING
-a---            5/6/2024  3:05 PM            275 README.md
la---            5/6/2024  3:05 PM              0 symlink -> target

C:\Users\ek\src> cat has-dangling-symlink/symlink
Get-Content: Could not find file 'C:\Users\ek\src\has-dangling-symlink\symlink'.
C:\Users\ek\src> echo 'Newly created target file contents.' >has-dangling-symlink/target
C:\Users\ek\src> cat has-dangling-symlink/symlink
Newly created target file contents.

I'll slightly expand I have slightly expanded this issue's "Git behavior" subsection with a summary of this.

That demonstration is on Windows 10, using PowerShell 7.4.2. The wording of the error message from cat (which in PowerShell on Windows is an alias of Get-Content) is somewhat confusing, but matches the wording when Get-Content is used in PowerShell on non-Windows systems to attempt to read through a dangling symlink.

In contrast, the wording of the external cat command (at least in Ubuntu) is cat: has-dangling-symlink/symlink: No such file or directory, which is also potentially confusing, but at least does not state explicitly that the file that can't be found is one whose path is shown.

cat (and Get-Content) message details are of course not relevant to the bug at hand here, only to interpreting the cat part of the above transcript and comparing it to similar code using other systems or shells.


There's a nuance here that I didn't cover in the issue description (and still haven't). I think it is really not Windows, per se, to which the distinction between file and directory symlinks apples, but rather the NTFS filesystem. Windows can access other filesystems that do not have this distinction, and other operating systems can access NTFS (and have had stable read-write drivers for NTFS for a long time, so often do).

A common case of accessing NTFS from a non-Windows system, even though it is not always a best practice, would be using Git or related tools (thus presumably people do, and will, also do this with gix, ein, and other applications that use the gitoxide library crates) from WSL to operate on the mounted Windows partition, which is usually but not always via paths like /mnt/c/.... But that is far from the only case.

However, some operations do actually work in NTFS in Windows through a file symlink to a directory, while others do not. I am unsure if any operations will actually fail through a file symlink to a directory on any other operating systems accessing NTFS, even in the case where it is the host Windows filesystem accessed through drive mounts under /mnt in a WSL system.

To illustrate this broadly, in a way that does not even bring in WSL or anything but native Windows, consider the following, which is still just in PowerShell on Windows:

git -c core.symlinks=true clone git@github.com:EliahKagan/has-dangling-symlink.git
mkdir has-dangling-symlink/target
echo 'Contents of file inside symlink target directory.' >has-dangling-symlink/target/inside

These commands all successfully display content of inside:

cat has-dangling-symlink/target/inside
cat has-dangling-symlink/symlink/inside
cat has-dangling-symlink\target\inside
cat has-dangling-symlink\symlink\inside

These commands both successfully open inside in notepad.exe, because they do not use the symlink:

notepad has-dangling-symlink/target/inside
notepad has-dangling-symlink\target\inside

Neither of these commands open inside in notepad.exe, which thinks the file does not exist, because sometimes Notepad does not traverse through a file symlink to a directory:

notepad has-dangling-symlink/symlink/inside
notepad has-dangling-symlink\target\inside

In contrast, renaming inside to inside.txt allows both those notepad commands to work! After such a rename has been done, these work, even without changing the type of the symlink:

notepad has-dangling-symlink/symlink/inside.txt
notepad has-dangling-symlink\symlink\inside.txt

Presumably the relevant difference is in Notepad's approach to checking if a file exists with the name given, before deciding to try with a .txt extension. But they core point is that unexpected weird things go wrong in some but not all cases on Windows when a file symlink points to a directory. I still think it is probably worthwhile to default to the Git behavior here, but I mention this because it's relevant to how hard it would be to exhaustively test the effect of using wrong-typed symlinks in other scenarios.

(In all of the above, the purpose of the / and \ versions is to show that the choice of directory separator is not the cause, but is instead irrelevant. This is very often wrongly claimed to be irrelevant, or alternatively claimed to be relevant to some applications but irrelevant from the perspective of the Windows API, which is not true either. But in this case, as tested, it is irrelevant.)

Once the directory target exists, the symlink can be fixed by removing it and having Git check it out again. One way to verify it really is currently a file symlink is to run:

cmd /c dir has-dangling-symlink

The resulting listing includes a row like this, with <SYMLINK>:

05/06/2024  03:20 PM    <SYMLINK>      symlink [target]

Then delete the symlink and have Git check it out again, and check it again:

rm has-dangling-symlink/symlink
git -C has-dangling-symlink restore symlink
cmd /c dir has-dangling-symlink

The resulting listing includes a row like this, now with <SYMLINKD> rather than <SYMLINK>:

05/06/2024  03:55 PM    <SYMLINKD>     symlink [target]

And all four notepad commands work after that to open the file inside, even if the file inside (and its name in those commands) is changed back from inside.txt or inside.

Maybe Git has some way to automatically perform those replacements without opening or deleting any other files unnecessarily, but I don't know of it. That might be a good thing to provide, if it could be done correctly.

NTFS has other kinds of reparse points besides symlinks, such as junctions, which is typically unimportant in the context of Git because Git does not represent them and will not create them, and because users who create them tend to be aware of their implications.* But a tool that checks to ensure symlinks are of the correct type and applies automatic fixes depending on what is going on at their targets should possibly be aware of them, at least to ensure no unexpected recursive deletions occur (junctions are almost always dereferenced).

But for improving the default behavior described in this issue, I don't believe any kinds of NTFS reparse points besides file symlinks and directory symlinks have to be considered.

In case it is ever helpful, this item in the Cygwin FAQ gives some insight into the complexity of symbolic links on Windows, a complexity that remains in significant part even when the Cygwin-specific parts are disregarded. (As I understand it, Rust doesn't target Cygwin at this time, not even experimentally; so the Cygwin parts shouldn't come into play here.)

* Edit: I had previously claimed that junctions were among the kinds of reparse points that usually require special privileges to create, which is in accurate. Limited user accounts on Windows can create junctions. See #1354 (comment).

Byron added a commit that referenced this issue May 13, 2024
Byron added a commit that referenced this issue May 13, 2024
Byron added a commit that referenced this issue May 13, 2024
@Byron
Copy link
Member

Byron commented May 13, 2024

With commit 78dedeb the problem is definitely fixed locally, turning the test from red to green.

However, on CI, I have never seen failure which must mean that the filesystem probe doesn't think that symlinks are supported (Edit: it's probably because it's currently implemented in a way that is not thread-safe, but with the fix the probe can actually be simplified (while still not safe for concurrent operations)).

@EliahKagan
Copy link
Member Author

EliahKagan commented Oct 16, 2024

In #1354 (comment), I had said:

NTFS has other kinds of reparse points besides symlinks, such as junctions, which is typically unimportant in the context of Git because Git does not represent them and will not create them, and because users who create them are usually aware of their implications (and must usually be administrators or otherwise privileged to do so).

Although much of that is accurate, the parenthesized portion is very much inaccurate with respect to junctions, which limited user accounts on Windows can create. Sorry about that! I've edited the comment to avoid further misinforming those who read it.

I noticed the above error when looking into whether gitoxide treats NTFS junctions in a reasonable way when it encounters them in working trees, including in gix clean commands where it would be undesirable for it to traverse through them and delete files inside the destination. So far, I have not found any such problems. The behavior in gitoxide does not always match that of Git. For example, in simple cases, a gix clean command will fail with an "access denied" error when it encounters a junction, while a similar git clean command will delete the junction. But this behavior of gitoxide is non-destructive and I'm not sure it's even a bug. The Git behavior is not always ideal or even intended (and there are several issues about junctions in the Git repository), and varies significantly across different git commands.

If on deeper examination I encounter behavior around junctions in gitoxide that looks like a bug, then I'll open a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants