Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1712562: pin the 4.1 GA candidate RHCOS build boot images #1770

Conversation

imcleod
Copy link
Contributor

@imcleod imcleod commented May 21, 2019

Putting WIP for now so we can discuss the URI (and destination branch) here

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 21, 2019
@wking wking changed the title WIP - pin the 4.1 GA candidate RHCOS build boot images WIP: Bug 1712562: pin the 4.1 GA candidate RHCOS build boot images May 21, 2019
@wking
Copy link
Member

wking commented May 21, 2019

I bumped the PR subject to reference https://bugzilla.redhat.com/show_bug.cgi?id=1712562

data/data/rhcos.json Outdated Show resolved Hide resolved
data/data/rhcos.json Show resolved Hide resolved
data/data/rhcos.json Outdated Show resolved Hide resolved
@imcleod imcleod force-pushed the feature/4.1-ga-image-pin-take2 branch from f3611e1 to c270bf8 Compare May 21, 2019 20:56
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 21, 2019
@wking wking changed the title WIP: Bug 1712562: pin the 4.1 GA candidate RHCOS build boot images Bug 1712562: pin the 4.1 GA candidate RHCOS build boot images May 21, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2019
@wking
Copy link
Member

wking commented May 21, 2019

9f1e50dd8129 looks good to me.

/approve
/hold

Waiting on folks to green-light the issue for 4.1.0.

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 21, 2019
@imcleod imcleod requested a review from crawford May 21, 2019 21:20
@wking
Copy link
Member

wking commented May 21, 2019

@tracyrankin set the issue target to 4.1.0 (why doesn't Bugzilla drop anchors so I can link these metadata changes). So we're all set to go here once we get approval from @crawford or other lead that they're ok with how this PR is resolving the issue.

@crawford
Copy link
Contributor

Why are we using anything other than the output of that script? How will the removal of those sections affect non-supported platforms? (gracefully error? panic?)

@wking
Copy link
Member

wking commented May 21, 2019

Why are we using anything other than the output of that script?

Because there's no way for the installer to access or otherwise expose the removed content, with the
exception of libvirt, which we're not building anyway. And removing the content avoids breaking in a CI-registry baseURI when folks should be using the mirrors.

How will the removal of those sections affect non-supported platforms?

You might be able to panic libvirt, but we aren't building for libvirt (I'll check). We don't have consumers for the other removed data.

@wking
Copy link
Member

wking commented May 21, 2019

Oh, and we can't use the mirrors for baseURI, because the 410.8.20190520.0 data hasn't been pushed up there yet.

@wking
Copy link
Member

wking commented May 21, 2019

$ git checkout origin/pr/1770
$ git describe --dirty
unreleased-master-995-g9f1e50dd8
$ TAGS=libvirt hack/build.sh
$ mkdir testing
$ cat <<EOF >testing/install-config.yaml
> apiVersion: v1
> baseDomain: installer.testing
> metadata:
>   name: wking
> platform:
>   libvirt:
>     URI: qemu+tcp://192.168.122.1/system
> pullSecret: |
>   {
>     ...REDACTED...
>   }
> EOF
$ ./bin/openshift-install --dir testing create cluster
INFO Fetching OS image: .                         
FATAL failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get libvirt Terraform variables: failed to use cached libvirt image: Get : unsupported protocol scheme "" 

Not the greatest error message, but not a panic.

@wking
Copy link
Member

wking commented May 21, 2019

e2e-aws-upgrade:

fail [k8s.io/kubernetes/test/e2e/upgrades/apps/job.go:63]: Expected
    <bool>: false
to be true

That's a known issue.

/test e2e-aws-upgrade

@crawford
Copy link
Contributor

I'm having trouble understanding this statement:

And removing the content avoids breaking in a CI-registry baseURI when folks should be using the mirrors.

What is the harm in leaving behind that URL? Is it just to prevent the accidental use of those CI artifacts?

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented May 21, 2019

The images section cannot be removed:

/hold

@wking
Copy link
Member

wking commented May 21, 2019

What is the harm in leaving behind that URL? Is it just to prevent the accidental use of those CI artifacts?

Yeah. If folks want to use these artifacts for official releases like what comes out of this branch, they should be pulling them from the mirrors, not the CI-cluster's forwarder.

The images section cannot be removed:

  • It breaks libvirt, for as much as we think that's developer only that's still important

I don't really care either way for libvirt. Folks running libvirt can supply their own QEMU URI regardless of what we do here. I'd be happy with a built-in link to the mirrors for the libvirt folks, but since this isn't up on the mirrors yet that's not possible. I'm ok with the CI-redirector link for the libvirt folks, because breaking them when the CI cluster happens to be down or the redirector is otherwise out of action won't break production users.

  • The images section is used for UPI vsphere, metal CI testing and those are 4.1 GA requirements.

Having these tests diverge from our docs is a potential source of CI/user divergence. But good point, I'd forgotten about this "consumer API" :p.

And CI cannot use the upstream mirror locations.

Why not? Or is this just "we may make breaking changes to the bootimages that we need to preflight in CI", in which case I'm concerned about how we'd roll that out in the 4.1 docs. Can we save breaking bootimage changes for 4.2?

I'm not strongly committed to this second commit. If you're both ok with the first, we can drop it and get this landed without further discussion. But if we do want to land a baseURI to support these other consumers, it seems like the CI-redirector is not where we want to be pointing.

@crawford
Copy link
Contributor

I'm fine with just that first commit.

@wking wking force-pushed the feature/4.1-ga-image-pin-take2 branch from 9f1e50d to c270bf8 Compare May 21, 2019 23:16
@openshift-ci-robot openshift-ci-robot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 21, 2019
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 21, 2019
@wking
Copy link
Member

wking commented May 21, 2019

I'm fine with just that first commit.

Ok, I've dropped 9f1e50dd8 (talked to @imcleod about it earlier, and he's fine either way).

@crawford
Copy link
Contributor

/approve

@abhinavdahiya
Copy link
Contributor

What is the harm in leaving behind that URL? Is it just to prevent the accidental use of those CI artifacts?

Yeah. If folks want to use these artifacts for official releases like what comes out of this branch, they should be pulling them from the mirrors, not the CI-cluster's forwarder.

The images section cannot be removed:

  • It breaks libvirt, for as much as we think that's developer only that's still important

I don't really care either way for libvirt.

Libvirt is an IPI style i do care about not breaking this way.

Folks running libvirt can supply their own QEMU URI regardless of what we do here. I'd be happy with a built-in link to the mirrors for the libvirt folks, but since this isn't up on the mirrors yet that's not possible. I'm ok with the CI-redirector link for the libvirt folks, because breaking them when the CI cluster happens to be down or the redirector is otherwise out of action won't break production users.

The goal is to treat IPI same, so any divergence we do there, i'm against that. LIbvirt needs to do the same as AWS, ie use the embedded.

  • The images section is used for UPI vsphere, metal CI testing and those are 4.1 GA requirements.

Having these tests diverge from our docs is a potential source of CI/user divergence. But good point, I'd forgotten about this "consumer API" :p.

CI doesn't test mirrors, it tests the source code at that point of time. we promote the RHCOS pipeline to mirrros not the other way around...
And CI is not diverging from docs. Docs mention to use the "approved" bootimages, for master the embedded rhcos.json is the "approved".

And CI cannot use the upstream mirror locations.

Why not? Or is this just "we may make breaking changes to the bootimages that we need to preflight in CI", in which case I'm concerned about how we'd roll that out in the 4.1 docs. Can we save breaking bootimage changes for 4.2?

I'm not strongly committed to this second commit. If you're both ok with the first, we can drop it and get this landed without further discussion. But if we do want to land a baseURI to support these other consumers, it seems like the CI-redirector is not where we want to be pointing.

c270bf8 lgtm

/hold cancel
/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 21, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, imcleod, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member

wking commented May 22, 2019

Libvirt needs to do the same as AWS, ie use the embedded.

This is fine with me, I just think the CI redirector is the wrong choice for embedding.

CI doesn't test mirrors, it tests the source code at that point of time. we promote the RHCOS pipeline to mirrros not the other way around.

We want to cover both the RHCOS pipeline and the mirror-facing docs. Ideally we'd default to the mirrors for general release-branch CI, because that exercises the flow we're documenting for users. If there's some issue with the documented flow (mirrors down, mirrors flaky, accidentally uploaded wrong content to the mirrors), mirror-facing CI would turn it up. We also want to cover new RHCOS, but only for the installer jobs that are explicitly about gating RHCOS bootimage promotion. I'm fine passing in overrides for the installer jobs in those cases, but I think the rest of the org should be using the mirrors for their release-4.1 preflights.

And really, I'm less concerned about CI than I am about production consumers. We can clobber whatever we need to clobber in CI. But when we decide to give users a way to get bootimages that isn't "grep the source" or "read the docs" (#1399), I think we'll need to make the production/mirror vs. CI distinction more clear. Or maybe we'll have moved beyond needing bootimages by then ;).

@openshift-merge-robot openshift-merge-robot merged commit c651738 into openshift:release-4.1 May 22, 2019
@openshift-ci-robot
Copy link
Contributor

@imcleod: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 c270bf8 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants