Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drive letter on Windows? #34

Open
melMass opened this issue Nov 19, 2022 · 5 comments
Open

Drive letter on Windows? #34

melMass opened this issue Nov 19, 2022 · 5 comments

Comments

@melMass
Copy link

melMass commented Nov 19, 2022

Hi,

I'm using nushell which relies on this crate for the glob feature.
Unfortunately it fails to resolve windows drive letters, after reading wax's readme I guess it has to do with the repetition token?

Is there a way to escape these?

Here are a few non working samples (in nu):
nushell/#7125

I'm a bit short on time lately but if you have pointers to solve this I can also propose a PR at some point

Thanks

@olson-sean-k
Copy link
Owner

Thanks for the information and context!

it fails to resolve windows drive letters

Wax does not support Windows path prefixes such as drive letters and other volume specifiers by design. For applications like Nushell's glob CLI, I recommend bridging the gap with a mechanism for specifying a tree using a native path. For example, glob could accept an option for this like so:

> glob --tree=\\server\share '**/*.txt'

An option like this would only be necessary when a platform-specific file system feature is needed (such as a Windows UNC path in this example).

@virtualritz
Copy link

virtualritz commented Apr 16, 2023

This is really a bummer as wax IMHO is most useful for cross-platform CLI utilities on Windows. On macOS and Linux the shells most people use have some sort of globbing support anyway.

It would be great if at least UNC paths with forward slashes could be supported. I.e. //./C:/foo/bar/**.

@bobhy
Copy link

bobhy commented Sep 30, 2023

Late to the party, but I'm the most recent maintainer of glob in nushell and was in the process of extending other nushell file system commands to use wax when I ran into this.

The idea of manually splitting the pattern and path on the commandline might work for glob, which has only one pattern.
But it would be pretty awkward for a command like cp <pat1> <pat2> ... <dest>.

So I'm looking at workarounds in nushell that might salvage the rooted pattern scenario. Here's what I'm playing around with right now:
A bit of preprocessing the pattern specified by the user for windows only:

  1. replace all \ with /
  2. if pattern starts with <driveletter>:, escape it as <driveletter>\:

So a rooted glob like: glob C:\Users\<me>\test/**' becomes glob C:/Users//test/**'`
and works for me.
Windows already accepts forward slashes for UNC paths, I think this will work for UNC paths too.

It means user cannot quote a metacharacter with \, but that seems a "small" loss. User might be able to work around by
making it into a one-character class, so test/\* could be specified as test/[*].

Interesting to note that the .NET globbing functions strictly require separating the root directory from the pattern:
https://learn.microsoft.com/en-us/dotnet/core/extensions/file-globbing#get-all-matching-files. I'm guessing they couldn't come up with a more elegant solution...

I'm off to code up this workaround in nu and see how it goes...

@olson-sean-k
Copy link
Owner

olson-sean-k commented Oct 2, 2023

Thanks for sharing, @bobhy! I'm really curious to see how this goes and appreciate seeing what you're working on! I have a few thoughts about this.

Wax glob expressions are designed to be as portable as possible. Windows path prefixes are fairly complex and definitely not portable, which is one of the main reasons that they are explicitly not supported.

the .NET globbing functions strictly require separating the root directory from the pattern

My inkling is that the developers of these APIs wanted to punt on some of the same issues I've thought about that can occur within Windows path prefixes. For example, what happens if a pattern occurs within a prefix? What does \\*\*\*.txt mean? Some parts of that pattern may be possible to implement, but there are many different error cases (I don't think \\* is actually possible). How do verbatim paths interact with patterns? What does \\.\** mean? Rejecting all of these may be a bit surprising, and I think mixing native paths with globbing patterns muddies the waters conceptually.

it would be pretty awkward for a command like cp <pat1> <pat2> ... <dest>

I agree! I'd also like to caution that mass file operations like this are tricky and dangerous. I factored Wax out of Nym, which attempts to do this kind of thing (it's very incomplete; I've been writing various libraries to improve it and haven't looped back to it yet). This is one of the main reasons that variance and exhaustiveness queries exist in Wax. Most users won't care about this at all, but it turns out that these sorts of properties are important for doing this safely and correctly and I've been spending a lot of time refactoring Wax to provide correct (or at least conservative) answers to these queries.

IMO, accepting multiple independent patterns like this should probably be avoided in basic commands. One way to do this is to remove globbing support from commands and instead rely solely on pipelines (as I suggested in a referencing Nushell bug). So this example becomes something more like glob <pat> | cp <dest>. That's a big departure from what most (all?) other shells do though (where globbing is often provided by the shell itself).

If multiple independent patterns with varying prefixes are a must, then I think preprocessing like this is a reasonable approach. I'd recommend a Nushell syntax that explicitly separates a native path prefix from the pattern, such as <path>|<pat>. In your example, we'd get something like C:\Users\<me>|test/**. These prefixed patterns could specify only a path, only a pattern, or both. I can even imagine the shell syntax highlighting this, so when a Unix user copies and pastes some command line from a Windows user on the Internet, they can immediately see and modify the platform-specific parts.

@bobhy
Copy link

bobhy commented Oct 4, 2023

One thing I've learned from this and the conversation back at nushell/nushell#10498 is that globbing at the command line is different from globbing in "code", and probably needs to be.
Wax seems comfortable positioned in the "coding" space -- a powerful tool that has some sharp edges. Nushell has an internal globbing library that works pretty well at the command line (for windows and other OSes). I can fix what was bugging me by extending that library to support '{}', and that could be the end of the story.
But that would leave nushell with a custom glob library to maintain.

If you were interested in positioning wax in the command line arena, I could propose some extensions, and maybe implement them, too. Caveat, I haven't looked at the code at all, so I'm pretty much talking though my hat here. But with that in mind:
I think the big game is to support native Windows paths including rust-designated "verbatim" paths without having to quote colons, backslashes and question marks.

  • If you worry that any of what follows is bad for wax::Glob, consider implementing as an alternate pattern with all the same methods: e.g, ArgGlob.
  • wax could accept : as a literal except within <>. This wouldn't break any currently correct wax patterns, so it's arguably a benign fix.
  • likewise, wax could accept ( as a literal when not followed by ?<option>)'. (Needed for C:\Program Files (x86)`). This also doesn't invalidate currently correct patterns, but it does approach the edge of accepting a nonsense pattern with a typo and doing something unexpected.
  • for the //?/ "verbatim" path (why couldn't rust devs just call it "windows extended path", like the rest of the world?). wax could accept ? as a literal unless within metacharacter braces or adjacent to an active metacharacter. This does break some currently correct patterns. But wax's own doc says a standalone '?' is rare -- it gains power in combination with other pattern contexts, so maybe this is no big loss? Or maybe only accept ? as literal if pattern starts with \?(or//?`)?.
  • and the big one: treat \ as a literal, except within metachar brackets: [], <> and {}. This is a breaking change, User who needs to quote a metachar outside brackets would use [c] instead. All I can say in defense is that this is the rule nushell's internal glob uses and it's well-accepted within that user community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants