Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] LFS threshold #264

Open
Brilator opened this issue Sep 13, 2024 · 13 comments
Open

[Feature Request] LFS threshold #264

Brilator opened this issue Sep 13, 2024 · 13 comments
Assignees
Labels
Type: Feature Request This item is confirmed by the maintainers to be a request for a new feature

Comments

@Brilator
Copy link
Member

For the DataHUB (instance in Freiburg) a non-LFS size limit of 10 MB was implemented to avoid pile-up of large repos / data volumes.

If users now go over that limit (i.e. track >10MB files with pure git, not LFS) they will currently run into the error remote: fatal: pack exceeds maximum allowed size (10.00 MiB).

@github-actions github-actions bot added the Status: Needs Triage This item is up for investigation. label Sep 13, 2024
@Brilator
Copy link
Member Author

@TetraW

Also relevant for ARC commander @HLWeil

@Hannah-Doerpholz
Copy link
Contributor

I also had this exact problem this weekend. I ended up needing to reset the last commit from the ARCitect, and then push tiny chunks of data in about 8 separate pushes through the terminal. Maybe a warning should at least be implemented that tells users about this issue when they try to push new data.

@j-bauer
Copy link

j-bauer commented Sep 16, 2024

We are increasing this limit to 100MB for now. Not sure if that is enough, but we need a way to prevent users from uploading what should be LFS files into the repository directly.

A banner will be shown to the users when this is in place. That should then be seen from both the UI and on the terminal.

Edit: Banner is in place.

@Hannah-Doerpholz
Copy link
Contributor

Is it not possible to forcibly track large files, regardless of whether or not users set another, higher LFS file size limit? To rather put a cap on the user input there (e.g. if a user sets the LFS limit to 1 GB, it is still set to 100 MB, maybe with a warning)? But I get your point, this is definitely something that should be discussed. Then again, that LFS file size limit can only be adjusted in the ARCitect, not for when a user tries to push changes directly through their terminal...

@j-bauer
Copy link

j-bauer commented Sep 16, 2024

This is quite a complex topic and has been discussed extensively on this git-lfs issue which is still open and not solved.

Git's hooking mechanism is probably the answer here. Ideally we would need a server-side hook to enforce it independently of the client's config. Most workarounds/hack use client-side precommit hooks, which seem pretty straightforward, but that would be something that each user would need to setup.

We will look into what can be done with server-side hooks and go from there. And yes, let us setup a meeting to discuss this matter.

@JonasLukasczyk
Copy link
Collaborator

The LFS size limit in ARCitect never knew about this upper limit. It was just an easy mechanism for users to select which files they want to track with LFS. It would be easy to enforce that limit inside ARCitect, but then of course other tools like the ARCCommander will not profit from this. It is also possible to add a pre-commit hook in ARCitect, but with the same issue.

@Brilator
Copy link
Member Author

Brilator commented Sep 24, 2024

Ok this issue pops up too often now with both ARCitect and ARC Commander, windows and mac.

We need to adapt the lfs threshold in the tools to that of the datahub.
Btw, I think 10MB or 50 MB should absolutely suffice. Or are there any other strings attached with lfs-tracked files (less secure, no backups)?

@JonasLukasczyk
Copy link
Collaborator

I think setting the default threshold to 99MB is not correct. Ideally users should track data files with LFS, even if they are just 10MB. The threshold in the UI exists only to adjust the LFS limit for one commit to easily distinguish between LFS and non-LFS files. In ARCitect it would now suggest the following changes:

  1. we remove the limit from the ui
  2. for every new file/folder that are going to be commited, we add a checkbox in front
  3. users have to check the box if they want to track that file/folder with LFS
  4. for files larger than 99MB this is forced on

@JonasLukasczyk
Copy link
Collaborator

To emphasize that the current threshold is apparently misunderstood, imagine a directory containing 1000 files each 90MB. With the increased limit it looks like this would be ok to be tracked without LFS, but it definitely should be!

@Hannah-Doerpholz
Copy link
Contributor

I think adding folder-specific checkboxes is a good idea. If it is possible to check a folder that then automatically also checks all files within that folder (in case a user adds one folder with data output that contains 1000 files, they don't have to check 1000 files individually). On the other hand, when the list is very long, listing for example 1000 new files in multiple different directories, users might feel discouraged to scroll through the entire commit list just to find the right folders to check. We definitely have to consider how to present the commit list in that case

@JonasLukasczyk
Copy link
Collaborator

I suggest to display new/modified files as a file tree where folders containing more than 10 files are closed by default, but every folder will have a checkbox:
image

@JonasLukasczyk JonasLukasczyk self-assigned this Sep 25, 2024
@JonasLukasczyk JonasLukasczyk added Type: Feature Request This item is confirmed by the maintainers to be a request for a new feature and removed Status: Needs Triage This item is up for investigation. labels Sep 25, 2024
@Brilator
Copy link
Member Author

I get your points. The PR on ARC Commander really just decreased an already existing default threshold towards working with the current datahub limit.

Adding a check-box layer helps as long as it does not roadblock. So the >99MB enforcement definitely also helps; ideally together with a message.

@HLWeil
Copy link
Member

HLWeil commented Sep 25, 2024

I suggest to display new/modified files as a file tree where folders containing more than 10 files are closed by default, but every folder will have a checkbox: image

That seems like a reasonable approach.

@kMutagene kMutagene changed the title [Feature Request] cap LFS threshold at 10 MB [Feature Request] LFS threshold Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Request This item is confirmed by the maintainers to be a request for a new feature
Projects
Status: No status
Development

No branches or pull requests

5 participants