-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mount volumes before copying into a container #24655
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Still needs a test, putting this up so I can have some folks look at it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a linked Jira or github issue or why are you looking into this?
libpod/container_copy_common.go
Outdated
defer func() { | ||
vol.lock.Lock() | ||
if err := vol.unmount(false); err != nil { | ||
logrus.Errorf("Unmounting volume %s after container %s copy: %v", vol.Name(), c.ID(), err) | ||
} | ||
vol.lock.Unlock() | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right, this function here isn't doing any work it is returning another function so defer runs before the copy, you must add that to unmount()
in the if else case above.
/hold Requires extensive rework, DO NOT MERGE until I repush |
39a7445
to
a34abed
Compare
/hold cancel |
Cockpit tests failed for commit a34abed. @martinpitt, @jelly, @mvollmer please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire thing looks extremely racy, during the copy the container is not locked so it is possible for us to unmount the rootfs while to copy happens I think and do other bad things. The volume logic also strikes me as of. But I guess most of it are pre existing issues so not really relevant to your permissions/volume mounting fixes.
libpod/container_copy_common.go
Outdated
// This must be the first cleanup function so it fires before volume unmounts happen. | ||
cleanupFuncs = append([]func(){func() { | ||
// This is a gross hack to ensure correct permissions | ||
// on a volume that was copied into that needed, but did | ||
// not receive, a copy-up. | ||
volume.lock.Lock() | ||
|
||
if err := volume.update(); err != nil { | ||
logrus.Errorf("Unable to update volume %s status: %v", volume.Name(), err) | ||
volume.lock.Unlock() | ||
return | ||
} | ||
|
||
if volume.state.NeedsCopyUp && volume.state.NeedsChown { | ||
volume.state.NeedsCopyUp = false | ||
volume.state.CopiedUp = true | ||
if err := volume.save(); err != nil { | ||
logrus.Errorf("Unable to save volume %s state: %v", volume.Name(), err) | ||
volume.lock.Unlock() | ||
return | ||
} | ||
|
||
volume.lock.Unlock() | ||
|
||
for _, namedVol := range c.config.NamedVolumes { | ||
if namedVol.Name == volume.Name() { | ||
if err := c.fixVolumePermissions(namedVol); err != nil { | ||
logrus.Errorf("Unable to fix volume %s permissions: %v", volume.Name(), err) | ||
} | ||
return | ||
} | ||
} | ||
} | ||
}}, cleanupFuncs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part at all. The comment is not clear to me.
I really dislike how this locks/unlocks twice the same volume for no apparent reason other than fixVolumePermissions() also locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically: fixVolumePermissions chowns the base of the volume according to where it is mounted. However, to match Docker's somewhat convoluted copy-up behavior, it only does this when the volume is not empty or this is the first time we populated the volume - a copy-up. Copy-up overrides the checks that halt the chown if a volume has contents. This is basically treating the podman cp
into the volume as a copy-up event if one didn't happen to ensure we chown the volume despite it now being populated. It has the convenient side effect of avoiding any subsequent attempts to copy-up into the volume if it's mounted again.
I'll put this in a comment in the code but leaving it here so I don't forget, the logic here is somewhat torturous.
libpod/container_copy_common.go
Outdated
// If we are not running, generate an OCI spec. | ||
// Then use that to fix permissions on all the named volumes. | ||
// Has to be *after* everything is mounted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hire it is also not clear how running generateSpec() is fixing the permissions . Personally I would really much rather split out exactly what is needed here into a separate function from generateSpec() to not do more work than needed and then it should also be very clear from looking at such function which part is relevant.
|
||
cleanupFuncs = append(cleanupFuncs, func() { | ||
vol.lock.Lock() | ||
if err := vol.unmount(false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is if anyone kills the copy process before we do cleanup we leak all volume mounts forever as the reference count (v.state.MountCount) will not be decremented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know how we fix that. I could inhibit shutdown during the cleanup functions running, but that doesn't help us if someone kills the copy before then. I could inhibit shutdown for the entirety of the copy, but that would really violate user expectations.
We almost need a cleanup process for the volume mounts, but how would we signal it that it should run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I really don't know here... If it is killed there is no way to run such process. The only point would be it would allow user to manual recover but then again how would they know how often to run this command. They could run it twice and decrement the counter more than it was incremented which would also cause other issues.
I guess once option would be to register a new shutdown handler that runs the cleanup for SIGINT/TERM which I think would help already for the majority of cases. That is not straight forward either as we would remove the handler again once we are done.
36b7462
to
36f4765
Compare
Cockpit tests failed for commit 31d6526. @martinpitt, @jelly, @mvollmer please check. |
libpod/container_copy_common.go
Outdated
if err := c.syncContainer(); err != nil { | ||
logrus.Errorf("Unable to sync container %s state: %v", c.ID(), err) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the sync happen directly after the lock/unlock so we know the state is sane for cleanupFunc() and c.unmount()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unmount doesn't rely on it but the cleanup functions might, I'll move it
libpod/container_copy_common.go
Outdated
cleanupFuncs = append(cleanupFuncs, func() { | ||
_ = shutdown.Unregister(fmt.Sprintf("volume unmount %s", vol.Name())) | ||
|
||
if err := volUnmountFunc(); err != nil { | ||
logrus.Errorf("Unmounting container %s volume %s: %v", c.ID(), vol.Name(), err) | ||
} | ||
}) | ||
|
||
if err := shutdown.Register(fmt.Sprintf("volume unmount %s", vol.Name()), func(_ os.Signal) error { | ||
return volUnmountFunc() | ||
}); err != nil && !errors.Is(err, shutdown.ErrHandlerExists) { | ||
return nil, fmt.Errorf("adding shutdown handler for volume %s unmount: %w", vol.Name(), err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split out fmt.Sprintf("volume unmount %s", vol.Name())
in the outer scope and the use the var in both callers. That way we do no not have to worry about typos or chnages in only one place.
Second do we need to some random uid in the handler name as well? It does not matter for a local podman as there can only be one command running at the same time but with the service you could in theory get two copies into the same voluem at the same time and the if the service gets a SIGINT/TERM we would need to unmount twice to keep the mount count correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random ID probably a good idea, I'll add it.
test/e2e/cp_test.go
Outdated
ctrCreate.WaitWithDefaultTimeout() | ||
Expect(ctrCreate).To(ExitCleanly()) | ||
|
||
cp := podmanTest.Podman([]string{"cp", "/etc/hosts", fmt.Sprintf("%s:%s", ctrName, ctrVolPath)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not assume the hosts /etc/hosts exists or that it will be owned by root. Just create your own dummy file on the host in the test.
Also doesn't this cause a inconsistent behavior for rootless as uid 0 on the host is not mapped in the userns
Cockpit tests failed for commit 0fe846e. @martinpitt, @jelly, @mvollmer please check. |
7acd44e
to
638e629
Compare
Cockpit tests failed for commit 36b7462. @martinpitt, @jelly, @mvollmer please check. |
Cockpit tests failed for commit 36f4765. @martinpitt, @jelly, @mvollmer please check. |
638e629
to
2a65a70
Compare
Wrt. failed cockpit rawhide test: I reported the criu/kernel regression to https://bugzilla.redhat.com/show_bug.cgi?id=2328985 and silenced it in our tests in cockpit-project/bots#7146 . From now on, this failure will be ignored, i.e. tests should go back to green. Please either retry or re-push. |
8eba026
to
7de9b6d
Compare
Realization: the unmount-on-shutdown code here is inadequate. The copy will likely be ongoing, so the unmount will fail because the mount is busy. Adding a context to the copy itself could solve this but I don't know if that's worth it. |
I think for API usage it would make a lot of sense to cancel the copy. But given buildah copier.Put/Get which are used by that code to copy do not have a context that would require changes there first which is largly out of scope for this fix of course. |
Also you have to rebase to fix CI. (As always please rebase on each push to avoid merging on old base that might cause conflicts) |
This reverts commit 5de7b7c. We now require the Unregister shutdown handler function for handling unmounting named volumes after `podman cp` into a stopped container. Signed-off-by: Matt Heon <[email protected]>
This solves several problems with copying into volumes on a container that is not running. The first, and most obvious, is that we were previously entirely unable to copy into a volume that required mounting - like image volumes, volume plugins, and volumes that specified mount options. The second is that this fixed several permissions and content issues with a fresh volume and a container that has not been run before. A copy-up will not have occurred, so permissions on the volume root will not have been set and content will not have been copied into the volume. If the container is running, this is very low cost - we maintain a mount counter for named volumes, so it's just an increment in the DB if the volume actually needs mounting, and a no-op if it doesn't. Unfortunately, we also have to fix permissions, and that is rather more complicated. This involves an ugly set of manual edits to the volume state to ensure that the permissions fixes actually worked, as the code was never meant to be used in this way. It's really ugly, but necessary to reach full Docker compatibility. Fixes containers#24405 Signed-off-by: Matthew Heon <[email protected]>
7de9b6d
to
e66b788
Compare
This solves several problems with copying into volumes on a container that is not running.
The first, and most obvious, is that we were previously entirely unable to copy into a volume that required mounting - like image volumes, volume plugins, and volumes that specified mount options.
The second is that this fixed several permissions and content issues with a fresh volume and a container that has not been run before. A copy-up will not have occurred, so permissions on the volume root will not have been set and content will not have been copied into the volume.
If the container is running, this is very low cost - we maintain a mount counter for named volumes, so it's just an increment in the DB if the volume actually needs mounting, and a no-op if it doesn't.
Does this PR introduce a user-facing change?