Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO: manage machines userdata secret #368

Merged
merged 1 commit into from
Jun 29, 2020

Conversation

runcom
Copy link
Member

@runcom runcom commented Jun 10, 2020

Currently the installer creates pointer Ignition configs which end up as user data secrets referenced by the installer-generated Machine(Sets). That makes things like the Ignition v2 (spec 3) transition tricky.
The key point is that the Secret holding the initial stub ignition is completely unmanaged after it's created by the installer.
In order to upgrade to the new Ignition spec, the MCO needs a way to update the Secret to the new Ignition spec.

Signed-off-by: Antonio Murdaca [email protected]

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2020
@runcom
Copy link
Member Author

runcom commented Jun 10, 2020

@openshift/openshift-team-mco-maintainers ptal

@runcom
Copy link
Member Author

runcom commented Jun 10, 2020

/cc @cgwalters @abhinavdahiya as well

@openshift-ci-robot
Copy link

@runcom: GitHub didn't allow me to request PR reviews from the following users: well, as.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @cgwalters @abhinavdahiya as well

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM!

enhancements/machine-config/user-data-secret-managed.md Outdated Show resolved Hide resolved
enhancements/machine-config/user-data-secret-managed.md Outdated Show resolved Hide resolved
enhancements/machine-config/user-data-secret-managed.md Outdated Show resolved Hide resolved
enhancements/machine-config/user-data-secret-managed.md Outdated Show resolved Hide resolved
@runcom runcom force-pushed the userdata-managed branch from 9143084 to f23c6dd Compare June 11, 2020 12:33
@runcom
Copy link
Member Author

runcom commented Jun 11, 2020

@enxebre ptal as well (changes nothing for MAO but this contains possible future steps as well)

@runcom runcom force-pushed the userdata-managed branch from f23c6dd to 857456d Compare June 15, 2020 13:12
Comment on lines +67 to +68
- The installer keeps creating the Secret that holds the stub ignition config (at any given version) under a new name
- The secret will change name from `<role>-user-data` to `<role>-user-data-managed` which is going to be v3 at a later time before the release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not,

  • the installer continues to create the user-provided secret as <role>-user-data
  • mco can now generate a secret <role>-user-data-managed based on the user/installer provided one. _copying if it must or migrating if it can for upgrades.
  • the installer changes to using managed one for machinesets as install-time utilizing mco's generation feature.
  • the installer continues to use user/installer created pointer for control-plane until we move to mahine-api managed one.

this doesn't break the existence or expectation of the previously created user-data-secret and still allowing us to move/migrate most of our users to generated one.

Copy link
Contributor

@yuqi-zhang yuqi-zhang Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the installer changes to using managed one for machinesets as install-time utilizing mco's generation feature

Does the installer run bootstrap MC* components before machinesets are generated? (not sure of the ordering there)

the installer continues to use user/installer created pointer for control-plane until we move to machine-api managed one.

A bit confused on what you mean here, doesn't the MAO inherit the secret from installer directly today?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Abhinav is bringing up the "installation" part only here - when we migrate the installer to v3, the pointer config <role>-user-data will also be v3 so installation still works and the MCO can take over copying that and moving it to <role>-user-data-managed for further changes in the future.

Still, I have the same concerns Jerry has around timing, the MCO runs too late to be able to inject the new pointer config name to machinesets

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think related to what abhinav is saying, the installer can probably get away with just having a user-data secret and not generating the new one. Which means:

  • for existing installs, the installer isn't involved and the existing user-data on the system is unchanged, and the MCO will have a user-data-managed for bootimage updating for later (or manual new machinesets)
  • for new installs, the user-data will be rendered as V3 initially. The MCO user-data-managed will also be V3. This is so we don't have to involve machine-api in this enhancement directly, and MAO support can come at a later stage (alongside bootimage changes)

This also will be fine from an UPI perspective I think.

I think I made some mistakes during our initial discussions. Since we're not directly supporting bootimage upgrades yet, this enhancement is really allowing for 2 things:

  1. if the user creates a new machineset with spec 3 bootimages, they have a secret to reference to fetch their initial ignition config
  2. to facilitate upgrading to spec 3 bootimages, later

For the 4.6 timeframe if we're not looking to support existing clusters upgrading bootimages, I guess this enhancement will not be a blocker (but needed for the future for sure)

@runcom
Copy link
Member Author

runcom commented Jun 16, 2020

I think related to what abhinav is saying, the installer can probably get away with just having a user-data secret and not generating the new one. Which means:

  • for existing installs, the installer isn't involved and the existing user-data on the system is unchanged, and the MCO will have a user-data-managed for bootimage updating for later (or manual new machinesets)
  • for new installs, the user-data will be rendered as V3 initially. The MCO user-data-managed will also be V3. This is so we don't have to involve machine-api in this enhancement directly, and MAO support can come at a later stage (alongside bootimage changes)

This also will be fine from an UPI perspective I think.

right, let's just clarify what Abhinav means by having MCO generates the stub before machinesets cause that's not the case today and I'm not sure what that would involve

For the 4.6 timeframe if we're not looking to support existing clusters upgrading bootimages, I guess this enhancement will not be a blocker (but needed for the future for sure)

definitely not a blocker but the first point is still important so I think we can go with this anyway since the change isn't too invasive and still allows for future tweaks

@runcom runcom force-pushed the userdata-managed branch from 857456d to a2c9fc9 Compare June 18, 2020 13:30
@runcom
Copy link
Member Author

runcom commented Jun 18, 2020

alrighty, so I've updated this pending only some clarification from #368 (comment)

I've marked the conversations as Resolved but feel free to unresolve them if you think there's still something I'll have to address

This should be ready now I guess - calling for a last check 👼 @yuqi-zhang @enxebre @abhinavdahiya

@yuqi-zhang
Copy link
Contributor

A couple of things I'm not 100% clear on, it sounds like for new clusters on spec 3, there will be no user-data, only user-data-managed. Is that correct?

From the proposal both the installer and the MCO will be generating xxx-user-data-managed at some point. Is this similar to how the bootstrap MCO vs actual MCO handling of machineconfigs? i.e. the installer generates a base user-data-managed and then the MCO generates basically the same one, but has the option to update if needed?

Also from the proposal it sounds like user-data-managed will soon be synonymous with spec 3. xxx-user-data will no longer be generated but will stay on existing clusters until we have a way to upgrade bootimages. Is that correct as well?

@runcom
Copy link
Member Author

runcom commented Jun 18, 2020

it sounds like for new clusters on spec 3, there will be no user-data, only user-data-managed. Is that correct?

Correct, that should be ok as we don’t expect users to bring v2 machines anymore anyway since those are new installs

From the proposal both the installer and the MCO will be generating xxx-user-data-managed at some point.

I’m not fully sure about this as what we now right now is that the installer is responsible to generate it but we’ll take over as soon as the cluster runs. Basically nothing is changing except we get the ownership after install and we’re free to update it the way we want.
So I guess, to reply to your question, yes, we need to generate the very same stub like we do for MCs and that’s why my code PRs wish to centralize the generation (either in mco or installer, no preference) so that chances that we drift will be minimal.

Also from the proposal it sounds like user-data-managed will soon be synonymous with spec 3. xxx-user-data will no longer be generated but will stay on existing clusters until we have a way to upgrade bootimages. Is that correct as well?

Right, at some point we could even run a routine on upgrades that verifies that machinesets are all v3 and we can even delete the old user data (just thinking out loud).

So, I think this enhancement isn’t really changing anything other than names and centralizing the generation of the stub itself. The mco takeover is, as we discussed, a thing we need for scaling up using v3 bootimages anyway.

Hope it makes sense!

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this sounds good. I think there's value in having a new secret on the system, so this path is preferred to updating just the existing one.

The proposal is pretty simple today:

- The installer keeps creating the Secret that holds the stub ignition config (at any given version) under a new name
- The secret will change name from `<role>-user-data` to `<role>-user-data-managed` which is going to be v3 at a later time before the release
Copy link
Contributor

@michaelgugino michaelgugino Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<role> is insufficient. We should make a userdata for each machineconfigpool.

Copy link
Member Author

@runcom runcom Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is insufficient. We should make a userdata for each machineconfigpool.

where are we not doing that since we do this just for master and worker? I'm not sure what you're referring to, creating userdata for all machineconfigpool in a cluster from the installer or mco isn't something we have now nor we plan to support, we just support master and worker and any custom created (manually...) machinesets will set that to the worker/compute userdata...

Also, if you could expand on your thoughts that would be better next time :)

Copy link
Member Author

@runcom runcom Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring to already created custom machinesets (pools are out of the picture here as they have to inherit from the worker MCP) then since we don't support automatic bootimages update, then we don't have this issue as if you're changing the bootimage manually (in 4.6) you have to create this yourself, not the cluster.

What you're (I guess..) referring to will become an issue once we go and update bootimages, but we're not there nor this proposal make that change harder in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@runcom runcom Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#319

that is not related to this enhancement is it and there's room to improve even when this is in right? I'll comment on the issue I guess

Copy link
Member Author

@runcom runcom Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I might just ask then, do we forsee that going with this enhancement will prevent #319 from landing at a later time?
I want to also echo that #319 is definitely on our radar and super important but unlikely to conflict with this wrt the ignition migration.

I’ll do something meanwhile and discuss #319 with our PO and GL to see if we have or make space to make it happen based on the current team s priorities (but we cannot make promises as is)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I guess I really care about in this proposal is the naming. We should have a 1 to 1 mapping of user-data secrets to machine-config-pools. Since we're touching naming here, we should do that now.

As far as being a prereq, I guess what I'm thinking is, that this enhancement is not much of an enhancement. The 'real' enhancement is some larger story, and this is an implementation detail (of an intermediate step) rather than a real enhancement. So sorting these details out is the prereq, not necessarily implementing them. If we change the names later, more cleanup, more things to migrate.

Not to say that this general problem is unique to this enhancement, but I think it's an increasing trend around this repo that we're treating small stories as an enhancement, so having bigger picture discussions is more difficult.

If there's a compelling reason to not name the user-data the installer creates <mcp-name>-user-data now, I'm not sure what it is. "Roles" are more or less going away in k8s as far as I understand them, so mapping user-data to roles long term doesn't make sense. It does make sense to map the user-data to the config that it's actually requesting IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a compelling reason to not name the user-data the installer creates -user-data now, I'm not sure what it is. "Roles" are more or less going away in k8s as far as I understand them, so mapping user-data to roles long term doesn't make sense. It does make sense to map the user-data to the config that it's actually requesting IMO.

What I think we’re saying is that we’re not chanting the naming. The role prefix has always been there and we’re not changing them with this, maybe that’s the disconnection we have for this enhancement? What we do with this in 4.6 is appending -managed and that effectively doesn’t change any behavior. It doesn’t make it better to decouple roles as you said but it doesn’t make it worse either so I’m not fully sure again what we’re even discussing about.
Everybody realizes that #319 and what you described in your comment is important but it doesn’t change the fact that this enhancement isn’t about that. This is not changing naming either again, it’s adding a suffix which I can’t see how it would complicate things later on. Having the 1-1 mapping is IMO trivial as well to do in the MCO besides landing this or not.
We’re not postponing a proper enhancement to get away with our stuff, we’re discussing different things, we’re not sticking to the “bad” pattern of not following what an enhancement should be, we’re discussing different things that happen to be related.

Copy link
Member Author

@runcom runcom Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're touching naming here, we should do that now.

I’ll keep disagreeing with this also, we’re not changing naming, we’re touching naming in a not-regressive way to implement something else which won’t impact any such change from happening in the near future. What you’re saying is again implying that while we here we might as well do #319 which isn’t, again, the case as these are different thing and touching the naming (in a non regressive way) doesn’t imply that we get to fix #319 which targets another issue - I’ll state this again, this enhancement, be it a real one or just implementation details, is about migrating to ignition v3.

Thinking more about this: doing the -user-data-whatever is a trivial change MCO side but that doesn't account for the installer only generating the user-data for only masters and workers because UPI has to be supported so unless somebody has cycles to do that in this release (which we don't) I'm still thinking we can do this by just appending -managed to the user-data and later account for #319.

All I guess I really care about in this proposal is the naming. We should have a 1 to 1 mapping of user-data secrets to machine-config-pools. Since we're touching naming here, we should do that now.

we're already doing that, master and worker are not just roles but MCPs as well (which we are not changing) - you want to sneak in the support for custom pools as per #319 which is unrelated to this change and it seems there's an "easy" workaround openshift/machine-config-operator#1619 (comment) and I've already stated that we're gonna work on #319 outside of the migration scope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, i agree with @michaelgugino's proposal in #319 . taken by itself the naming change he is proposing sounds like it is not being contested. that said, i think what @runcom is saying here about this enhancement not changing the current behavior with regards to naming makes some sense to me as well from the perspective of planning this work. although i am a little confused about adding -managed to some of these names as that sounds like we are starting to get into a naming change.

personally i would love to see #319 be able to be included in this work, but i understand the rationale for separating the two issues.

just my thoughts =)

@yuqi-zhang
Copy link
Contributor

Given that the conversation has mostly died down I am going to go ahead and give this the LGTM to ensure the spec 3 transition work can progress accordingly. If anyone feels that this as a standalone enhancement is problematic, please feel free to re-open the discussion and we can modify as we go.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 8097f7a into openshift:master Jun 29, 2020
@runcom runcom deleted the userdata-managed branch June 30, 2020 08:58
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Aug 3, 2020
This openshift/installer#3730 changed the name of the userdata generated by the installer to satisfy MCO ign v2 -> v3 migration and let the new secret be under mco management. openshift/enhancements#368.
@cgwalters
Copy link
Member

OK this got reverted. If we re-do this I'd like to consider trying to implement something along
openshift/machine-config-operator#1619
and potentially openshift/machine-config-operator#1720 too.

See conversation in coreos/ignition#1126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants