Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest stable release 3602.2.0 introduces docker ownership issue; chown does not work for docker builds #1203

Closed
Exelscior opened this issue Oct 6, 2023 · 12 comments
Labels
channel/stable Issue concerns the Stable channel. kind/bug Something isn't working

Comments

@Exelscior
Copy link

Description

Since the latest stable release update to 3602.2.0 for Microsoft Azure from our previous version 3510.2.8 for Microsoft Azure, the docker runtime has been updated to version 20.10.24 from version 20.10.23 prior.
With the current stable version, docker builds and runs seem to omit ownership information during filesystem extraction and all files are owned by root regardless of chown usage during building.

Impact

Building container images and running them on the current stable version of Flatcar leads most likely to permission denied errors on non-root containers attempting to write files as all files belong to root regardless of chown commands during building.

Environment and steps to reproduce

To reproduce the bug:

  1. Set-up: Flatcar 3602.2.0 for Microsoft Azure with built-in docker runtime 20.10.24
  2. Task: Run the following docker commands as a user in the docker group or as root
  3. Action(s):
    a. docker run --name bug-test alpine ls -l /etc/shadow
    This command will show that file is owned by root:shadow (UID/GID 0/42)
    b. docker export bug-test | tar tv etc/shadow
    This command will show that it is owned by UID/GID 0/0
  4. Error: The second command should show that the file is owned by UID/GID 0/42 but shows 0/0 (root/root)

Expected behavior

To show intended behaviour:

  1. Set-up: Flatcar 3510.2.8 for Microsoft Azure with built-in docker runtime 20.10.23
  2. Task: Run the following docker commands as a user in the docker group or as root
  3. Action(s):
    a. docker run --name bug-test alpine ls -l /etc/shadow
    This command will show that file is owned by root:shadow (UID/GID 0/42)
    b. docker export bug-test | tar tv etc/shadow
    This command will show that it is owned by UID/GID 0/42
  4. Result: Error is no longer here as file is correctly shown as being owned by UID/GID 0/42 (root/shadow)

Additional information

We have currently resolved this with a forced rollback to Flatcar 3510.2.8 for Microsoft Azure.

After more searching this seems to be the exact same bug that Ubuntu release had back in August as referenced here : https://bugs.launchpad.net/ubuntu/+source/docker.io-app/+bug/2029523/

Tar archive created by docker export command is missing ownership information (all files are owned by root). If this archive is then used to recreate filesystem for unprivileged processes (like via docker import or just unpacking it and chrooting), they can fail with permission denied error or in some other way.

This bug happens when package is built with Go older than 1.19: this version of Go introduced build tag unix that is used by upstream to determine if it should add UNIX-specific attributes into archive. Older Go versions silently ignore this source code, and the result is missing UIDs and GIDs in tar archives. As Go 1.20 was backported to affected releases, patches attached use this version to fix the bug.

The issue is caused by this change moby/moby@721358e#diff-12919f88ca9c04e478a6ffdf37e9a67ccdd2997afdc2e51adb1e67c53dcdbd8cL5-R5 and by packaging using Go 1.18. It can be fixed by appending this tag to DOCKER_BUILDTAGS, but building package with newer Go version is even better.

@Exelscior Exelscior added the kind/bug Something isn't working label Oct 6, 2023
jepio pushed a commit to flatcar/scripts that referenced this issue Oct 9, 2023
Go 1.18 is already EOL, so no security update at all. Since
upstream docker projects already have Go 1.19, Flatcar should also have
that.

See also
https://github.com/moby/moby/blob/5d6db842238e3c4f5f9fb9ad70ea46b35227d084/Dockerfile#L6.

(cherry picked from commit 93a8983)
Signed-off-by: Jeremi Piotrowski <[email protected]>
Fixes: flatcar/Flatcar#1203
@jepio jepio added the channel/stable Issue concerns the Stable channel. label Oct 9, 2023
@jepio
Copy link
Member

jepio commented Oct 9, 2023

Thanks for the report. Opened a PR and will make sure its part of the next stable release.

@jepio
Copy link
Member

jepio commented Oct 9, 2023

This is already fixed in Flatcar >=3619.0.0.

@Threadache
Copy link

@t-lo, @jepio

This issue seems sufficient to justify deprecating the current stable build 3602.2.0 as it is a significant breaking change. Is this possible?

@jepio
Copy link
Member

jepio commented Oct 9, 2023

Can you tell me more about how this is impacting you? What we could do is halt the rollout of stable-3602 until it's fixed, i'm looking to see if that is a viable way forward.

@Threadache
Copy link

Threadache commented Oct 9, 2023

Our software uses flatcar as the image for hosting Docker. Our build & test pipeline uses latest stable as the targeted version of flatcar. Latest stable is not compatible with the approach we use to interact with docker due to this issue, and our pipeline has broken as a result.

In production environments we have specific pinned versions only, and are not affected.

We have swapped our build & test to target LTS to resolve pipeline issues, but will need to find another way of testing compatibility with flatcar stable if this is a longer-term thing.

We are affected in GCP, not Azure, so if you could rollback latest in GCP that would resolve the issue for us too I expect.

@gerald-blackford
Copy link

I have raised #1205 in order to pin against a previous flatcar-stable version, but the impact that @Threadache has put together impacts me the same way.

@Threadache
Copy link

Threadache commented Oct 9, 2023

@jepio @t-lo

I've documented how we're affected above.

If we didn't have a particularly diligent operations team, who insisted on controlling the specific version we're using in production via a private repo, then we'd have downtime right now from following the recommendations in the flatcar docs.

https://www.flatcar.org/docs/latest/installing/cloud/gcp/

The Stable channel should be used by production clusters. Versions of Flatcar Container Linux are battle-tested within the Beta and Alpha channels before being promoted. The current version is Flatcar Container Linux 3602.2.0.

Also, the link in this section that provides instructions to remove the auto-updates is broken (404)

Flatcar Container Linux is designed to be updated automatically with different schedules per channel. You can disable this feature ,

@jepio
Copy link
Member

jepio commented Oct 9, 2023

@jepio @t-lo

I've documented how we're affected above.

If we didn't have a particularly diligent operations team, who insisted on controlling the specific version we're using in production via a private repo, then we'd have downtime right now from following the recommendations in the flatcar docs.

https://www.flatcar.org/docs/latest/installing/cloud/gcp/

I'm sorry for this issue making it into stable. The 3602 release was baking in beta for 4 months and this went unnoticed.

Are you able to temporarily switch to either beta (unaffected) or pin to the previous flatcar stable release on your end (#1205 (comment))? Does that work for you?

We would fast path a fixed 3602 stable release with an eta of next week.

The Stable channel should be used by production clusters. Versions of Flatcar Container Linux are battle-tested within the Beta and Alpha channels before being promoted. The current version is Flatcar Container Linux 3602.2.0.

Also, the link in this section that provides instructions to remove the auto-updates is broken (404)

Flatcar Container Linux is designed to be updated automatically with different schedules per channel. You can disable this feature ,

I've filed a PR to fix the docs flatcar-archive/flatcar-docs#341.

@Threadache
Copy link

Threadache commented Oct 9, 2023

I'm sorry for this issue making it into stable. The 3602 release was baking in beta for 4 months and this went unnoticed.

Not a problem, our Ops team kept us safe, just wanted to be clear on impact and potential impact.

Are you able to temporarily switch to either beta (unaffected) or pin to the previous flatcar stable release on your end (#1205 (comment))? Does that work for you?

We're using LTS now, and it's working. We're looking at options to test compatibility with different flatcar versions in the background so we get advance warning in future before pipelines start failing.

We would fast path a fixed 3602 stable release with an eta of next week.

That's great, thank you.

I've filed a PR to fix the docs flatcar-archive/flatcar-docs#341.

I didn't realise it was just a bad link, thanks, looks like it points to https://www.flatcar.org/docs/latest/setup/releases/update-strategies/

@tormath1
Copy link
Contributor

tormath1 commented Oct 9, 2023

@Exelscior @Threadache thanks for the report: I added a test case to our test suite to prevent this kind of regression to happen again (flatcar/mantle#462).
On a side note, we frequently ask Flatcar users to run some Beta nodes in their workloads, as @jepio mentioned, this release was baking since 4 months: we consider Beta as "prod-ready" enough to run with actual workloads. That's a nice way to contribute to the project and to prevent other folks to running into this kind of situation 💪

@tormath1
Copy link
Contributor

@Exelscior hello, we just released a new Flatcar Stable version (3602.2.1) which should solve your issue. Can you confirm?

@Exelscior
Copy link
Author

@Exelscior hello, we just released a new Flatcar Stable version (3602.2.1) which should solve your issue. Can you confirm?

@tormath1 I can confirm that Flatcar Stable version (3602.2.1) resolves the issue.
Thanks for the heads-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel/stable Issue concerns the Stable channel. kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants