Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable remote tasks to be run in cluster #1216

Merged
merged 2 commits into from
Aug 15, 2024

Conversation

arewm
Copy link
Member

@arewm arewm commented Jul 29, 2024

By default, we should run builds matching the local architecture in-cluster to reduce the overhead of provisioning platforms. This will enable a fully matrixed build for all images using only the remote builds. This change will require the multi-platform controller to set the /ssh/host to localhost in order for the builds to run in-cluster.

In a change from the prior behavior of auto-appending the architecture to the tag, we will now append a sanitized version of the entire PLATFORM to the image tag upon request. This behavior is now needs to be explicitly requested which fixes a bug where users cannot specify the specific tag desired when using a remote build.

NOTE: There are not currently any local platforms configured. These need to be added before this PR have an effect in production.

@arewm arewm force-pushed the enable-local-in-remote branch from e09b4e7 to 500e20d Compare July 29, 2024 16:00
Copy link
Member

@ifireball ifireball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found what seems to be one major issue where the task essentially becomes a noop in the local case.

Also since the buildah-remote task is now gaining local capabilities, it may be desirable to stop using the task-generator script and instead switch to maintaining the generated code directly. We can do this under the assumption that the remote-capable task will eventually replace the local-only task. Doing this will IMO simplify maintenance going forward.

task-generator/remote/main.go Outdated Show resolved Hide resolved
task-generator/remote/main.go Outdated Show resolved Hide resolved
task-generator/remote/main.go Show resolved Hide resolved
@arewm arewm force-pushed the enable-local-in-remote branch from 500e20d to a724511 Compare July 30, 2024 17:57
@chmeliik
Copy link
Contributor

Also since the buildah-remote task is now gaining local capabilities, it may be desirable to stop using the task-generator script and instead switch to maintaining the generated code directly.

+1, the task-generator is quickly becoming very difficult to understand (the diff here is kind of impossible to review)

Can we instead add the remote capabilities to regular buildah

  • and make buildah-remote an alias for buildah (the same way as all the buildah-*gb variants, but with no patch.yaml)
  • or deprecate buildah-remote altogether?

@arewm
Copy link
Member Author

arewm commented Jul 31, 2024

We certainly can consolidate the tasks. Changes like the one that I proposed in #1185 will become much simpler as well.

If we consolidate, however, we will have to ensure that we support builds when the multi-platform-controller is not enabled. I don't think that we will be able to consolidate before the controller supports local builds.

@arewm arewm mentioned this pull request Aug 2, 2024
5 tasks
@arewm arewm force-pushed the enable-local-in-remote branch from a724511 to e3da7bd Compare August 5, 2024 17:36
@arewm arewm force-pushed the enable-local-in-remote branch 3 times, most recently from 71e0059 to de9fc4b Compare August 5, 2024 18:34
@arewm arewm requested a review from ifireball August 5, 2024 18:34
@arewm arewm force-pushed the enable-local-in-remote branch from bc9027e to 5e0b00f Compare August 12, 2024 12:48
@arewm arewm force-pushed the enable-local-in-remote branch 4 times, most recently from b569064 to 7c63f42 Compare August 12, 2024 14:10
@arewm arewm marked this pull request as ready for review August 12, 2024 14:10
@arewm arewm changed the title WIP: Enable remote tasks to be run in cluster Enable remote tasks to be run in cluster Aug 12, 2024
@arewm arewm force-pushed the enable-local-in-remote branch from 7c63f42 to 06514d8 Compare August 12, 2024 17:02
@arewm arewm requested a review from a team August 13, 2024 00:24
@arewm arewm force-pushed the enable-local-in-remote branch from 26af9a4 to 3a8c8b2 Compare August 13, 2024 15:43
@arewm arewm force-pushed the enable-local-in-remote branch 7 times, most recently from f4e5b34 to 33b2df7 Compare August 14, 2024 13:10
@arewm arewm force-pushed the enable-local-in-remote branch from 33b2df7 to 957d109 Compare August 14, 2024 13:40
@chmeliik
Copy link
Contributor

This change will require the multi-platform controller to set
the /ssh/host to localhost in order for the builds to run in-cluster.

Ah, so we can't really replace replace the regular buildah task with this one if we don't get rid of this requirement. The basic buildah use case shouldn't require multi-platform-controller

@chmeliik
Copy link
Contributor

This change will require the multi-platform controller to set
the /ssh/host to localhost in order for the builds to run in-cluster.

Ah, so we can't really replace replace the regular buildah task with this one if we don't get rid of this requirement. The basic buildah use case shouldn't require multi-platform-controller

Another problem for unifying the tasks is that the user doesn't get to decide whether to build in-cluster or on a VM - the multi-platform-controller config does.

@chmeliik
Copy link
Contributor

Another problem for unifying the tasks is that the user doesn't get to decide whether to build in-cluster or on a VM - the multi-platform-controller config does.

Now that I think about it, won't these changes together with https://github.com/redhat-appstudio/infra-deployments/pull/4329/files break the option to run x86 builds on a VM?

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, my main concern is if we're not removing the option to run x86 builds on a VM (#1216 (comment))

task/buildah-remote/0.2/buildah-remote.yaml Show resolved Hide resolved
task/buildah-remote/0.2/buildah-remote.yaml Show resolved Hide resolved
task-generator/remote/main.go Show resolved Hide resolved
task-generator/remote/main.go Outdated Show resolved Hide resolved
task-generator/remote/main.go Show resolved Hide resolved
@arewm
Copy link
Member Author

arewm commented Aug 14, 2024

Another problem for unifying the tasks is that the user doesn't get to decide whether to build in-cluster or on a VM - the multi-platform-controller config does.

Now that I think about it, won't these changes together with https://github.com/redhat-appstudio/infra-deployments/pull/4329/files break the option to run x86 builds on a VM?

The user decides what platform they want to build by specifying the parameter. The configuration determines whether that should happen locally or on a VM if necessary. If it doesn't matter whether builds happen locally or remotely, then this implementation is left up to the system. One challenge for the host configs/MPC controller will be to identify the best way to migrate workloads from a VM-configured platform to a local-configured platform (or vice versa). I think that these challenges exist outside of the scope of the currently proposed changes to buildah-remote.

For now, there are only AMD nodes in clusters, but there might be ARM nodes on the cluster in the future. With the changes in redhat-appstudio/infra-deployments/pull/4329, users can still run x86 builds on a VM, they just need to select one of the configured non-local hosts (i.e. using any of the */amd64 platforms. There is a desire to move linux/amd64 to be run in-cluster but this will be a breaking change and we need to migrate any current users off of it onto the new linux-mlarge/amd64 platform first (this change will happen in the future if it happens at all).

Today, localhost can be easily used to indicate that you want to build in-cluster as I used in my tests. When multiple node platforms are available, we will have to figure out how to enable the local builds to be scheduled to the right nodes.

@chmeliik
Copy link
Contributor

chmeliik commented Aug 14, 2024

users can still run x86 builds on a VM, they just need to select one of the configured non-local hosts (i.e. using any of the */amd64 platforms

There is a desire to move linux/amd64 to be run in-cluster but this will be a breaking change and we need to migrate any current users off of it onto the new linux-mlarge/amd64 platform first

Ah right, it's amd64, not x86_64. local-platforms: "linux/x86_64, in that PR confused me. This will need good docs 😅 It's all opaque behavior of the system, not something easily understandable to a user. Also, how is linux-mlarge/amd64 a "platform"?

@arewm arewm force-pushed the enable-local-in-remote branch from 957d109 to 0d07675 Compare August 14, 2024 15:32
arewm added 2 commits August 15, 2024 09:44
By default, we should run builds matching the local architecture
in-cluster to reduce the overhead of provisioning platforms. This will
enable a fully matrixed build for all images using only the remote
builds. This change will require the multi-platform controller to set
the /ssh/host to localhost in order for the builds to run in-cluster.

In a change from the prior behavior, we will now append a sanitized
version of the entire PLATFORM to the image tag upon request. We will no
longer try to extract just the arch from the PLATFORM as all platforms
may not follow the `os/arch` pattern. By appending the entire PLATFORM,
we remove any dependency on how local/remote platforms are configured. This
behavior is now needs to be explicitly requested.

Signed-off-by: arewm <[email protected]>
Updating the image used for the remote tasks resulted in a bump in the
buildah version. This includes
containers/common@08fc0b450 which no longer
sets the repository or tag when pulling without the optional name. To
properly populate the image in the container's registry, we need to push
and pull it with the optional name.

Signed-off-by: arewm <[email protected]>
@arewm arewm force-pushed the enable-local-in-remote branch from 0d07675 to 49c8395 Compare August 15, 2024 13:51
@arewm arewm enabled auto-merge August 15, 2024 13:53
@arewm arewm added this pull request to the merge queue Aug 15, 2024
Merged via the queue into konflux-ci:main with commit 24b12ab Aug 15, 2024
9 checks passed
@ifireball
Copy link
Member

ifireball commented Sep 5, 2024

@chmeliik

Another problem for unifying the tasks is that the user doesn't get to decide whether to build in-cluster or on a VM - the multi-platform-controller config does.

Now that I think about it, won't these changes together with https://github.com/redhat-appstudio/infra-deployments/pull/4329/files break the option to run x86 builds on a VM?

They still get to decide, by a platform name that is explicit about it such as linux-mlarge/amd64 as opposed to assuming linux/amd64 would end on a VM, which is something they shouldn't care about if they only want to build for a foreign platform as opposed to explicitly asking to be on a VM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants