Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose optional COSA_SUPERMIN_SIZE to set rootfs image partition size #3956

Closed

Conversation

mtalexan
Copy link
Contributor

When 'supermin --build ...' generates the rootfs, kernel, and initrd, it's passed a size to use for the rootfs. That sets the upper bound on the ostree that can be added to the rootfs of the images cosa generates. It was hardcoded to 10G, which is a reasonable starting point, but different use conditions can easily exceed that. For now, follow the COSA_SUPERMIN_MEMORY example and expose an optional COSA_SUPERMIN_SIZE environment variable that allows a caller to override the rootfs size in case they need the rootfs to be bigger or smaller than the default 10G.

/close #3955


There's already a COSA_SUPERMIN_MEMORY environment variable that's used to optionally configure how much RAM is allocated to the kola qemuexec call when it generates the actual qcow2 disk image file from the supermin kernel, initrd, and rootfs.

At some point in the future we could consider expanding the change here to either be specified via the image.yaml file, or (even better) calculate the size of the ostree we're going to put into the rootfs and use that to decide what size to make the rootfs.

…supermin.

When 'supermin --build ...' generates the rootfs, kernel, and initrd, it's passed a
size to use for the rootfs.  That sets the upper bound on the ostree that can be added
to the rootfs of the images cosa generates.  It was hardcoded to 10G, which is a reasonable
starting point, but different use conditions can easily exceed that.
For now, follow the COSA_SUPERMIN_MEMORY example and expose an optional COSA_SUPERMIN_SIZE
environment variable that allows a caller to override the rootfs size in case they need the
rootfs to be bigger or smaller than the default 10G.
Copy link

openshift-ci bot commented Nov 20, 2024

Hi @mtalexan. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mtalexan
Copy link
Contributor Author

As I was going thru some changes in this repo that weren't on my older fork, I found that this entire runvm() function is effectively deprecated. It's only used for direct CLI calls, which are being deprecated in favor of osbuild calls. When using osbuild, the runvm-osbuild script is used instead of this runvm() function, and that script handles dynamically calculating the rootfs size.


That said, unless the direct CLI is completely deprecated, I believe this is a trivial-enough change that it could be worth merging until the runvm() function is completely removed.

@jlebon
Copy link
Member

jlebon commented Nov 20, 2024

runvm isn't deprecated. It's the low-level wrapper for supermin that everything else uses (including how runvm-osbuild is run).

Hmm, I'm not quite sure about this change though. Are you not using the cache qcow2? All the heavy work should be happening there.

@mtalexan
Copy link
Contributor Author

runvm isn't deprecated. It's the low-level wrapper for supermin that everything else uses (including how runvm-osbuild is run).

Hmm, I'm not quite sure about this change though. Are you not using the cache qcow2? All the heavy work should be happening there.

I'm not sure what you mean by "the cache qcow2". From what I can tell, nothing outside runvm() is able to change anything about the rootfs partition, it's all hardcoded right there in runvm() as part of the supermin --build call. Maybe I'm misunderstanding something about the flow here?

As best I can tell from trying to reverse engineer the code and tools over the last few years, supermin is creating a rootfs, kernel, and initrd file, and has an init script defined in the runvm() that populates them with some of the packages shared from the parent/host system, and then sets up some networking and virtio. The files are then passed to a kola qemuexec that attaches to the files as if they were the appropriate disk partitions, and the virtio serial so commands can be run. Nothing I'm aware of after that is able to resize the rootfs, though I have seen recent-ish code that says it does qcow2 virtual disk resizing. How those files end up as partitions in the qcow2 has never been clear to me, but at least in older code the rootfs file itself was either determining the size of the partition, or being directly used, to pull the ostree into. Some of my configs end up with a 13G ostree pull and changing this hardcoded 10G size to 20G fixed the resulting build error at that time.

@dustymabe
Copy link
Member

the cache qcow

# Run with cache disk with optional snapshot=on, which means no changes get written back to
# the cache disk. `runvm_with_cache_snapshot on` will set snapshotting to on.
runvm_with_cache_snapshot() {
local snapshot=$1; shift
local cache_size=${RUNVM_CACHE_SIZE:-30G}
# "cache2" has an explicit label so we can find it in qemu easily
if [ ! -f "${workdir}"/cache/cache2.qcow2 ]; then
qemu-img create -f qcow2 cache2.qcow2.tmp "$cache_size"
(
# shellcheck source=src/libguestfish.sh
source /usr/lib/coreos-assembler/libguestfish.sh
virt-format --filesystem=xfs --label=cosa-cache -a cache2.qcow2.tmp)
mv -T cache2.qcow2.tmp "${workdir}"/cache/cache2.qcow2
fi
# And remove the old one
rm -vf "${workdir}"/cache/cache.qcow2
cache_args+=("-drive" "if=none,id=cache,discard=unmap,snapshot=${snapshot},file=${workdir}/cache/cache2.qcow2" \
"-device" "virtio-blk,drive=cache")
runvm "${cache_args[@]}" "$@"
}
runvm_with_cache() {
local snapshot='off'
runvm_with_cache_snapshot $snapshot "$@"
}

We try not to store much inside the supermin VM root disk because it's ephemeral from run to run. The cache qcow2 gets created on demand and you can affect the size of it with RUNVM_CACHE_SIZE environment variable.

@mtalexan
Copy link
Contributor Author

I figured out what's going on now and how the cache applies, but it took a lot of digging.

I originally encountered this back when cmd-buildextend-metal was the primary initial build file and it was calling runvm ... create_disk.sh. Back then, the ostree wasn't in a separately mounted (snapshoted) cache disk in the VM, it was created in a ${workdir}/cache folder that was part of the supermin rootfs. My ostree exceeded the size of the 10G hardcoded rootfs, so it failed before it could be put into the virtio-target device associated with the qcow2 file that create_disk.sh partitions, mounts, and writes to.

I see now that cmd-osbuild is the primary file, and it's doing runvm_with_cache_snapshot ... runvm-osbuild to create the output image. How it creates the image is now part of the osbuild tool itself, but it's being run with a separate qcow2 file mounted as a cosa-cache labeled XFS formatted disk that the supermin-init-prelude.sh script (called from the inline VM init script in runvm()) mounts over the top of the ${workdir}/cache that osbuild does all it's work and image generation in.

So it appears my fix was only relevant to a historical bug that was fixed in a better way.

@mtalexan mtalexan closed this Nov 20, 2024
@mtalexan mtalexan deleted the add-configurable-rootfs-image-size branch November 25, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow rootfs optionally >10G in images
3 participants