Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

direct filter! api for FileTree? #43

Open
Moelf opened this issue Jan 23, 2021 · 8 comments
Open

direct filter! api for FileTree? #43

Moelf opened this issue Jan 23, 2021 · 8 comments

Comments

@Moelf
Copy link

Moelf commented Jan 23, 2021

IMHO it's an anti-pattern to do:

tree[r"regex"]

and currently, you will need to do something close to:

tree = filter(x->!contains(x.name,r"\$|%"), tree);

we should probably allow filter!(pred, tree) to work directly?

(if not not to directly apply predicate on tree.children is a debatable question)

@Moelf Moelf changed the title direct filter api for FileTree? direct filter! api for FileTree? Jan 23, 2021
@DrChainsaw
Copy link
Collaborator

IMHO it's an anti-pattern to do

Yikes, I use regexes all the time when using FileTrees. Is it possible to get an ELI5 version of this issue?

When I see filter! I think of a mutating version of filter, but the second example looks like some kind of negative filter (i.e ! means not instead of mutates). Thats something I have wanted too (everything which does not match a regex) so perhaps I'm just projecting here :)

@Moelf
Copy link
Author

Moelf commented Feb 13, 2021

anti-pattern because getindex(array, RegEx) does not make sense, filter(predicate, array) is the universal syntax for this semantics in Julia

@DrChainsaw
Copy link
Collaborator

Ah, now I understand.

I guess it is just to enable the convenient bracket syntax, similar to what e.g. DataFrames and Dicts do. Not saying you don't have a point, but FileTrees are not arrays so maybe they don't need to follow array conventions.

@Datseris
Copy link

Datseris commented Aug 7, 2022

Couldn't agree more with the original statement that this is not the intuitive way. Persoanlly, I would even like to have a filter option at tree creation. I.e., allow something like

taxi_dir = FileTree("taxi-data", f::Function = x -> true) # by default no filtering

or a keyword filter = x -> true that only puts files that survive the filter in the tree.

@Datseris
Copy link

Datseris commented Aug 7, 2022

In fact, I didn't even understand that the way to create a tree of "only .x files" was by making the full tree and then doing tree[glob"*/*.x"]. I only understood this after I found this issue.

@DrChainsaw
Copy link
Collaborator

I have also missed the predicate when creating the tree. Apart from the filtering, one can also return nothing when loading:

ft = load(FileTree("taxi-data")) do f
     endswith(name(f), ".x") && return nothing
     # code to load .x files
end

Unless original author has any objections, I'll accept a PR for the FileTree predicate (make it first argument to support do-syntax).

I'm also somewhat sympathetic to the getindex issue even though I like the current way just for the sake of convenience. Maybe this is whataboutism, but DataFrames support the same convention:

julia> df = DataFrame(a = 1:3, aa = 2:4, b = 11:13)
3×3 DataFrame
 Row │ a      aa     b     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   11      2     11
   22      3     12
   33      4     13

julia> df[:, r"a"]
3×2 DataFrame
 Row │ a      aa    
     │ Int64  Int64 
─────┼──────────────
   11      2
   22      3
   33      4

If the main gripe is the negation, maybe something like tree[Not(r"bla")] or tree[!, r"bla"]), but it does not look very appealing. Negative regexes are also possible to construct afaik (at least for simple match patterns), but it is a bit messy.

Afaik, filter already works on FileTrees, but it gives you the whole node to allow for filtering on values. Not sure we would like to limit it so you only see the names/paths.

julia> t = maketree("root" => [(name="a", value=2), (name="b", value=3)])
root/
├─ a (Int64)
└─ b (Int64)

julia> filter(Returns(true), t)
root/
├─ a (Int64)
└─ b (Int64)

julia> filter(contains("root")  path , t; dirs=false)
root/
├─ a (Int64)
└─ b (Int64)

julia> filter(contains("a")  path , t; dirs=false)
root/
└─ a (Int64)

julia> filter(!contains("a")  path , t; dirs=false)
root/
└─ b (Int64)

Something seems off with the docstring though. It seems to merge the docstrings for map and filter. Furthermore it seems to be incorrect as it has the same semantics as filter for other collections (true means keep). I'll fix that when time permits.

 help?> filter
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  

  map(f, tree::FileTree; walk=FileTrees.postwalk, dirs=true)


  apply f to every node in the tree. To only visit File nodes, pass dirs=false.

  walk can be either FileTrees.postwalk or FileTrees.postwalk.

  filter(f, tree::FileTree; walk=FileTrees.postwalk, dirs=true)


  remove every node x from tree where f(x) is true. f(x) must return a boolean value.

  ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  

@shashi
Copy link
Owner

shashi commented Aug 10, 2022

In fact, I didn't even understand that the way to create a tree of "only .x files" was by making the full tree and then doing tree[glob"/.x"]. I only understood this after I found this issue.

We should probably have FileTree(glob"*/*.x") work.

I do not consider the API to be "finished". Would love to accept any PRs with improvements! I tend to avoid having too many options if the same effect can be achieved through existing features.

@DrChainsaw
Copy link
Collaborator

We should probably have FileTree(glob"/.x") work.

This is a great idea!

I suppose the equivalent for regex won't make sense though, so maybe we also want the function version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants