Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File permissions change when unpacking with tar_filter enabled #349

Open
ncoghlan opened this issue Oct 14, 2024 · 5 comments
Open

File permissions change when unpacking with tar_filter enabled #349

ncoghlan opened this issue Oct 14, 2024 · 5 comments
Labels
needs-decision Undecided if this should be done

Comments

@ncoghlan
Copy link

ncoghlan commented Oct 14, 2024

Many files in the published Linux archives appear to have 0o664 file permissions rather than 0o644 permissions.

I picked this up when using Python 3.11 to unpack a python-build-installer archive in fully_trusted mode and then repack it gave a different artifact hash than unpacking it using tar_filter mode in Python 3.12 and repacking it.

diffoscope blamed the discrepancy in the repacked archives on a whole lot of group write bits getting cleared (diffoscope_pbs311_extract_with_tar_filter_enabled.log), which is one of the expected behaviours described in the tar_filter docs.

A regular local CPython make altinstall build didn't produce any files with 0o664 permissions, hence filing this as an issue here rather than with CPython directly.

Edit: fixed Python version number reference. Python 3.11 always unpacks in fully_trusted mode, since extraction filters were only added in Python 3.12.

@zanieb
Copy link
Member

zanieb commented Oct 17, 2024

Just to clarify, you expect all of the files to have 0c644 permissions instead of 0c664 to match the upstream behavior?

@ncoghlan
Copy link
Author

ncoghlan commented Oct 18, 2024

Right, 0o644 is what I expected to see (and also what a local build and altinstall of CPython gave me).

The difference in the currently published tarballs can be observed via the tar extraction filters in Python 3.12+:

  • with fully_trusted_filter: files end up with 0o664 permissions
  • with tar_filter: files end up with 0o644 permissions (due to the group write bit getting cleared)

Looking again now, I did find python/cpython#58507, suggesting there may be some build environment dependencies in how that works, so I'll ask the CPython release team if they have any thoughts on this.

@indygreg
Copy link
Collaborator

indygreg commented Oct 18, 2024

This project has custom Python code for writing the tar archive. This is to ensure the tar is as deterministic as possible.

Some of the things that code does is set the owner to 0:0 / root:root and always sets group bits when the owner bit is set. These were all arbitrary choices of mine.

What harm comes from having group writable bits set? i.e. convince me 644 is superior to 664. I've seen this done both ways and it isn't clear to me why one method is superior.

@ncoghlan
Copy link
Author

The only reason I noticed is because I was tidying up some consuming code to eliminate the deprecation warning that comes from not setting an extraction filter in Python 3.12+ when unpacking tarballs: https://docs.python.org/3/library/tarfile.html#extraction-filters.

The "file permissions should be 0o644" comes from the way the standard library's tar_filter extraction filter works, where one of the steps is "Clear high mode bits (setuid, setgid, sticky) and group/other write bits (S_IWGRP | S_IWOTH)."

According to https://peps.python.org/pep-0706/#filters the choice to clear the group and other write bits was as a rough approximation of using GNU tar using the active umask, without making the standard library's behaviour actually depend on the umask setting.

Nothing actually breaks with the 0o664 permissions (as far as I'm aware), it just means that unpacking and repacking the tarball doesn't round-trip nicely when using tar_filter in 3.12+.

@ncoghlan
Copy link
Author

ncoghlan commented Nov 1, 2024

FWIW, the project where this came up is now public: https://pypi.org/project/venvstacks/

It now filters the mode bits when producing its own repacked archives (https://github.com/lmstudio-ai/venvstacks/blob/59c9c23cfbfca9bcef096b7cab3dfdb91ce35a71/src/venvstacks/pack_venv.py#L382)

This means that even though the tar filter is only applied when unpacking the base runtimes on 3.12+ (https://github.com/lmstudio-ai/venvstacks/blob/59c9c23cfbfca9bcef096b7cab3dfdb91ce35a71/src/venvstacks/stacks.py#L891), the repacked output is the same regardless of Python version.

Given that python-build-standalone has worked this way from the beginning, and repacking in a way that would even detect that tar_filter is changing the permission bits (let alone caring that it did so) is a fairly unusual thing to be doing, I think keeping the existing behaviour and closing this as "wontfix" would be an entirely reasonable outcome.

@charliermarsh charliermarsh added the needs-decision Undecided if this should be done label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Undecided if this should be done
Projects
None yet
Development

No branches or pull requests

4 participants