-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenStack: automatically populate RHCOS image #2473
OpenStack: automatically populate RHCOS image #2473
Conversation
pkg/asset/rhcos/image.go
Outdated
case azure.Name: | ||
osimage, err = rhcos.VHD(ctx) | ||
case baremetal.Name: | ||
// Note that baremetal IPI currently uses the OpenStack image | ||
// because this contains the necessary ironic config drive | ||
// ignition support, which isn't enabled in the UPI BM images | ||
osimage, err = rhcos.OpenStack(ctx) | ||
case none.Name, vsphere.Name: | ||
case none.Name, openstack.Name, vsphere.Name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be clusterID.InfraID + "-rhcos"
for the openstack case so that you don't have to check for empty value each time?
This isn't immediately obvious why we do things differently for OpenStack. Could you also explain why not using osimage, err = rhcos.OpenStack(ctx)
? I'm thinking it may be so that we can destroy cluster in a tenant without impacting other clusters in the same tenant. Would be nice to add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusterID is not generated yet here, so we don't know its infra id.
We do things differently for OpenStack, because image creation is not idempotent there. When we ask Terraform to create "rhcos" image in Glance 5 times it will create 5 different images, which is not true for other platforms.
rhcos.OpenStack(ctx) returns a url where we can find our binary file (like https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.80.20191002.1/x86_64/rhcos-43.80.20191002.1-openstack.x86_64.qcow2) But we expect osImage to return a Glance image name, "rhcos" for instance. So we can't use this function here directly. Instead, image names are generated later and there we know if this string is empty, then Terraform needs to create a new image "-rhcos".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is anything preventing us from generating the clusterID earlier then?
My point being, we should never have the image name being an empty string. It is either generated based on the clusterID or set via OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
.
pkg/asset/cluster/tfvars.go
Outdated
|
||
// If baseImage is empty then it was not provided by the user and we need to create it. | ||
// For doing this we get the image URL and give it to Terraform. | ||
// Image name in this case will be: <InfraID>-rhcos, i.e. user-rdd6e-rhcos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/user-rdd6e-rhcos/clustername-rdd6e-rhcos/
pkg/asset/cluster/tfvars.go
Outdated
baseImage := string(*rhcosImage) | ||
var baseImageURL string | ||
|
||
// If baseImage is empty then it was not provided by the user and we need to create it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to leave the possibility to provide a base image, the installer pins the image for a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an OCP requirement to give the possibility to provide a base image for the installer by setting OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE variable. Also this is good for CI, where we don't need to create new images every time, since we can reuse the existing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
/retest |
pkg/asset/cluster/tfvars.go
Outdated
ctx, cancel := context.WithTimeout(context.TODO(), 30*time.Second) | ||
defer cancel() | ||
|
||
baseImageURL, err = rhcosplatforms.OpenStack(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not calling this function in the rhcos image asset 8a4153b#diff-a5ccce89cb15a18ce4324bf3aa273a6fL81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rhcosplatforms.OpenStack(ctx) returns a url ( https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.80.20191002.1/x86_64/rhcos-43.80.20191002.1-openstack.x86_64.qcow2), but we expect osImage function to return a Glance image name (i.e. "rhcos"), so we can't use the function there directly.
Until now, no "production" use case depended on the "RHCOS builds API" at https://releases-art-rhcos.svc.ci.openshift.org/ One thing we probably should do is have the downloader try finding content first at https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/ or so. |
Also, is the download code verifying the download using the sha256? |
/retest |
6eb858f
to
a8f2005
Compare
/test e2e-openstack |
/retest |
a8f2005
to
8029723
Compare
@cgwalters download urls are taken from rhcos.json file. I think we can update rhcos.OpenStackImageURL function in a separate PR to add alternative locations there checksum validation has been added! |
The tf verify_checksum option checks integrity of image in glance (it's enabled by default BTW, but your change is still welcome because better be explicit), we also need to check for integrity of the downloaded image (verify sha is the same as https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/sha256sum.txt). |
3203470
to
028678b
Compare
a727416
to
f0b2535
Compare
a26d0ba
to
3b90814
Compare
data/data/openstack/main.tf
Outdated
|
||
resource "openstack_images_image_v2" "base_image" { | ||
// we need to create a new image only if the base image url has been provided + base image name is <cluster_id>-rhcos | ||
count = var.openstack_base_image_url != "" && var.openstack_base_image == "${var.cluster_id}-rhcos"? 1 : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
terraform fmt
wants to add a space before the question mark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, done
3b90814
to
4801e00
Compare
We already have that in the installer itself. And more importantly, if the user has GPG verified their installer binary (or its checksum via another means), then that transitively covers the openstack image too. If we fetch a checksum file separately over HTTPS, we only have the TLS guarantee (any CA could have MITM'd). |
Just for cross-reference: supporting this was part of the idea in #1399 |
@cgwalters Yeah, I saw that! The issue is that OpenStack Glance doesn't support sha256 checksums (md5 only), so we can't ask Terraform to verify it directly. My intention was to merge this patch first, because we need the feature badly, and then figure out how we can add sha256 checksums support. |
pkg/asset/cluster/tfvars.go
Outdated
if baseImage == clusterID.InfraID+"-rhcos" { | ||
var err error | ||
ctx, cancel := context.WithTimeout(context.TODO(), 30*time.Second) | ||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing? It looks like it's always cancelling a context that is unused?
OK, overall One idea I had in a different PR is to have a URL endpoint like https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos?sha256=0123abc... I.e. we fetch by digest, same way we do container images today. So the installer would always try (And probably we should change things so that "released" installers only use mirror.openshift.com or so) |
Still looking at the change, but I had some more fundamental questions. The goal for for Azure, it is the link of the vhd file in some Azure storage account. before this the value for openstack was Glance image, and I think now that value should be the link the openstack QCOW. One this to note is, these envs are internal and development only and I would be fine changing their behavior.. ========= When going from proto-image (links) to platform-based Images, the machine objects will always point to the platform-based Image. So I would prefer the machine objects should be updated to use With this the terraform passing should be simple as you only provide the URL to the terraform and it creates the image corresponding/matches ^^ |
4801e00
to
723356a
Compare
/hold Just tried this PR. Because of gophercloud limitation (?) this basically pulls the 1.8GB rhcos image locally before uploading it to glance and it does it for every deployment. It also keeps all images cached in |
723356a
to
cfc511d
Compare
This happens once per invocation of |
@sdodson once each time the source image changes, but it doesn't delete the data after... I will update the PR description to make it more clear. |
/test e2e-openstack |
/retest |
|
||
resource "openstack_images_image_v2" "base_image" { | ||
// we need to create a new image only if the base image url has been provided, plus base image name is <cluster_id>-rhcos | ||
count = var.openstack_base_image_url != "" && var.openstack_base_image_name == "${var.cluster_id}-rhcos" ? 1 : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be simplified to var.openstack_base_image_url != ""
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave it like that, just in case the user exports TF_VAR_openstack_base_image_url
. We've had an issue in our CI where a previous version of this patch created a duplicate rhcos image, it caused all the jobs to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is an optional check to prevent possible regressions that can break our ci
pkg/asset/machines/master.go
Outdated
imageName := clusterID.InfraID + "-rhcos" | ||
// Here we check whether rhcosImage is a url or not. If this is the case, it means that the image was | ||
// created by the installer and we have to use the universal name "<infraID>-rhcos". Otherwise, it means | ||
// that we are given the name of the pre-created Glance image, which we should use for node provisioning. | ||
_, err = url.ParseRequestURI(string(*rhcosImage)) | ||
if err != nil { | ||
imageName = string(*rhcosImage) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it... Absolutely agree that it makes sense, but I don't know where to put this function properly. There is no a place for OpenStack util functions, unfortunately. I'll try to figure out where we can put it.
pkg/asset/machines/master.go
Outdated
|
||
imageName := clusterID.InfraID + "-rhcos" | ||
// Here we check whether rhcosImage is a url or not. If this is the case, it means that the image was | ||
// created by the installer and we have to use the universal name "<infraID>-rhcos". Otherwise, it means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rhcosImage
being an URL doesn't necessary mean it was created by the installer (it's possible that the user gives an URL to OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
) but it means it needs to be imported to glance under a unique name, clusterID.InfraID + "-rhcos"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly right... If rhcosImage
is a URL, then the installer always creates an image from this URL with name clusterID.InfraID + "-rhcos"
. It doesn't matter how the URL was obtained, from rhcos.json
or from OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
variable.
|
||
resource "openstack_images_image_v2" "base_image" { | ||
// we need to create a new image only if the base image url has been provided, plus base image name is <cluster_id>-rhcos | ||
count = var.openstack_base_image_url != "" && var.openstack_base_image_name == "${var.cluster_id}-rhcos" ? 1 : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave it like that, just in case the user exports TF_VAR_openstack_base_image_url
. We've had an issue in our CI where a previous version of this patch created a duplicate rhcos image, it caused all the jobs to fail.
description = "Name of the base image to use for the nodes." | ||
} | ||
|
||
variable "openstack_base_image_url" { | ||
type = string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a default value, otherwise it fails with The input variable "openstack_base_image_url" has not been assigned a value.
when exporting OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny thing, I did that but forgot to git commit --amend
it and pushed the patch without this change. Agree, this is important.
cfc511d
to
545daea
Compare
@Fedosin: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/hold cancel |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, cgwalters, Fedosin, mandre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR allows to automatically populate a Glance image if it was not pre-created by the user.
Goals:
Not included in goals of this work:
Generation of the URL from which we download the image file. It is obtained by the existing rhcos.OpenStack function based on the data from rhcos.json file.
Prerequisites:
In OpenShift 4.2, the name of the asset was hardcoded to "rhcos". Although it can be overridden with the env variable, users still have to create an image manually.
This is inconvenient for users and can lead to unpleasant consequences when OpenShift clusters of different versions can accidentally use the same RHCOS image.
Technical solution:
We consider 3 situations:
RHCOS image file URL is generated from rhcos.json file by rhcos.OpenStack function. New Glance image called "<InfraID>-rhcos" is created by Terraform.
RHCOS image file URL is provided by the variable, rhcos.json data is ignored. New Glance image called "<InfraID>-rhcos" is created by Terraform.
Installer will reuse the existing Glance image defined by the variable, no new Glance images will be created.
Gotchas:
Future work: