-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Update rollout lib #276
🌱 Update rollout lib #276
Conversation
/assign @haoqing0110 |
Adding |
1b3f924
to
bd8457a
Compare
cluster/v1alpha1/helpers.go
Outdated
if needToRollout { | ||
rolloutClusters[cluster] = status | ||
rolloutClusters = append(rolloutClusters, status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check the rolloutClusters length I think, should not put all the needToRollout existing clusters into the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This iterate on the existing clusterRolloutStatus, Assuming it shouldn't never reach the length. okay will add it for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For addon, there's the possibility that 3 managedclusteraddons are created(existing) and status is "ToApply", if don't check the length, all of the 3 will be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming the existingClusterStatus hold only clusterStatus; progressing, succeeded, failed and timeout cause the clusterStatus ToApply means the cluster workload does not exist yet. So as addon controller you should not add clusterStatus ToApply to the existingClusterStatus cause the addon (workload) does not exist. What do you think ?
I will add check to ignore the ToApply clusterStatus at the existingClusterStatus loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the check for the ToApply status, so if the existingClusterStatus has all clusters in ToApply status, the RolloutResult will contain only the required maxConcurrency or the required decisionGroup. Added unitTest as well for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ToApply means the desired status not applied yet, no matter it's a fresh install (no current workload) or upgrade case (current workload does not match the desired workload) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I added the check for the ToApply state as well.
|
||
clusterGroupKeys := clusterGroupsMap.GetOrderedGroupKeys() | ||
for _, status := range existingClusterStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is to rollout by group, why go through the existingClusterStatus and put it to result? what will the returned rolloutClusters be like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it return the clusters that are still progressing, failing (before the timeout) and new clusters add to the group after the rollout starts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have some concerns about going through the existingClusterStatus first and then going through the rest of the clusters, for both progressivePerCluster and progressivePerGroup.
For example, the env has 100 clusters with status Succeed, then rollout starts, what's the expected cluster status returned by existingClusterStatus? For addon existingClusterStatus will return ToApply when rollout starts, in this case, all the clusters will be returned by GetRolloutCluster(), which doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, same as above let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another gap after this change is that, in addon, one cma will have multiple placement, there's placement overlap. For example:
The config-v1 existingClusterStatus has cluster1 and config-v2 existingClusterStatus has cluster2, cluster3. For config-v1, cluster2 is not in existing cluster and it should also not be treated as added cluster and return in rollout result.
installStrategy:
type: Placements
placements:
- name: placement1 // placement1 select cluster1 and cluster2
namespace: default
configs:
- group: addon.open-cluster-management.io
resource: addonhubconfigs
name: config-v1
rolloutStrategy:
type: Progressive
- name: placement2 // placement2 select cluster2 and cluster3
namespace: default
configs:
- group: addon.open-cluster-management.io
resource: addonhubconfigs
name: config-v2
rolloutStrategy:
type: Progressive
I'm thinking should we just rollout the clusters returned by ClusterRolloutStatusFunc and put added/deleted cluster in another field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed (deleted) clusters are already in another field ClustersRemoved .
Regards the added clusters, not sure if I understand the example above correctly. Config-v1 is associated with placement1 and config-v2 is associated with placement2; those are 2 different configs with 2 different placements. However; as cluster-2 is associated with config-v1 and config-v2, cluster-2 must apply config-v2 as it has higher version. Is that correct ?
If yes, I guess this is more addon logic that is need to be handled inside the addon controller. what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addon has returned a SKIP status for cluster2 when rollout config-v1 to indicate not to rollout it. Not sure if PR can handle the case when cluster2 is not in existingClusterStatus and user want to SKIP rollout it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the idea for existingClusterStatus; if the the existing cluster has Skip status it will not be considered in the rolloutResult->ClustersToRollout. The addon set the cluster2 ClusterRolloutStatus to skip (for configv1 and placement1) then the rolloutResult->ClustersToRollout will not have it. I added comments here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, addon might need to put SKIP cluster in existingClusterStatus so that won't put it in ClustersToRollout. Sounds good.
b6551e1
to
c92b89c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I've left some minor comments.
(Regarding the current discussion about removed clusters: In general, I like having the removed clusters available to the controller so that they can be handled right away, but I'm still processing the existing discussion about it.)
cluster/v1alpha1/helpers.go
Outdated
} | ||
|
||
// ClusterRolloutStatusFunc defines a function to return the rollout status for a managed cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ClusterRolloutStatusFunc defines a function to return the rollout status for a managed cluster. | |
// ClusterRolloutStatusFunc defines a function to return the rollout status for a managed cluster for a given workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// ClustersToRollout is a map where the key is the cluster name and the value is the ClusterRolloutStatus. | ||
ClustersToRollout map[string]ClusterRolloutStatus | ||
// ClustersTimeOut is a map where the key is the cluster name and the value is the ClusterRolloutStatus. | ||
ClustersTimeOut map[string]ClusterRolloutStatus | ||
// ClustersToRollout is a slice of ClusterRolloutStatus that will be rolled out. | ||
ClustersToRollout []ClusterRolloutStatus | ||
// ClustersTimeOut is a slice of ClusterRolloutStatus that are timeout. | ||
ClustersTimeOut []ClusterRolloutStatus | ||
// ClustersRemoved is a slice of ClusterRolloutStatus that are removed. | ||
ClustersRemoved []ClusterRolloutStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is an array now so that the mapping to each cluster's name is clearer to the end user? I kind of liked the map[string]ClusterRolloutStatus
for uniqueness and simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand its easier to iterate with a map but with the changes that happen we don't really need the clusterName to be in a map. Plus its better to move the clusterName to the ClusterRolloutStatus struct to be identifiable object by itself.
cluster/v1alpha1/helpers.go
Outdated
// Calculate the length for progressive rollOut | ||
// If the MaxConcurrency not defined, total clusters length is considered as maxConcurrency. | ||
clusterGroups = r.pdTracker.ExistingClusterGroupsBesides(groupKeys...) | ||
length, err := calculateLength(strategy.Progressive.MaxConcurrency, len(clusterGroups.GetClusters())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: This is outside of this PR's intent, but could we consider renaming calculateLength()
to something more descriptive like calculateRolloutSize()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for _, cluster := range clusters { | ||
if clusterStatus.ClusterName == cluster { | ||
exist = true | ||
currentClusterStatus = append(currentClusterStatus, clusterStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a break
here so that the loop doesn't continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done
cluster/v1alpha1/helpers.go
Outdated
} | ||
|
||
existingClusters[status.ClusterName] = true | ||
if status.Status == Succeeded || status.Status == TimeOut { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: This could be a switch
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, done.
cluster/v1alpha1/helpers.go
Outdated
if len(rolloutClusters) >= length { | ||
return RolloutResult{ | ||
ClustersToRollout: rolloutClusters, | ||
ClustersTimeOut: timeoutClusters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check redundant since it's already in the loop above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, removed.
c92b89c
to
718293e
Compare
timeoutClusters := []ClusterRolloutStatus{} | ||
existingClusters := make(map[string]bool) | ||
|
||
for _, status := range existingClusterStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should here sort the existingClusterStatus by name by alphabetical order to ensure every time returns the same result?
cluster/v1alpha1/helpers.go
Outdated
// Set as false to consider the cluster in the decisionGroups iteration. | ||
existingClusters[status.ClusterName] = false | ||
case Failed, Progressing: | ||
newStatus, needToRollout := determineRolloutStatusAndContinue(status, timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can here combine the switch case and determineRolloutStatusAndContinue's switch case into one function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, done.
718293e
to
003b118
Compare
// +k8s:deepcopy-gen=false | ||
type RolloutHandler struct { | ||
type RolloutHandler[T any] struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be runtime.Object rather than any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtime.Object will required casting for the used Type and We want the RolloutHandler to be initiated with a certain type (eg; manifestwork, policy, addon, ...etc) not a runtime.object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but any is a more loose constraint, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes "any" is kind of loose, but we are safe. The main idea behind make a generic Type "any" is to avoid doing casting (or force certain type) at the clusterRolloutStatusFunc Function implementation similar to the unit test example here AND the helper lib is safe we don't call the clusterRolloutStatusFunc any where, its just a func definition that will be used to create the existing ClustersRollOutStatus at the consumer API side. What do you think ? Is that make sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run make update
will fail on deepcopy functions generating step:
Generating deepcopy funcs
W1017 14:10:32.195161 62703 parse.go:863] Making unsupported type entry "T" for: &types.TypeParam{check:(*types.Checker)(nil), id:0x1, obj:(*types.TypeName)(0xc009651900), index:0, bound:(*types.Interface)(0xc00009a960)}
F1017 14:10:32.221298 62703 deepcopy.go:890] Hit an unsupported type open-cluster-management.io/api/cluster/v1alpha1.ClusterRolloutStatusFunc[T] for open-cluster-management.io/api/cluster/v1alpha1.ClusterRolloutStatusFunc[T], from open-cluster-management.io/api/cluster/v1alpha1.RolloutHandler[T]
Seems deepcopy lib doesn't support generic type yet. kubernetes/gengo#225
Either we revert the normal struct or we disable package-level generation and explicitly add the comment to each struct we need do deepcopy function generating: #288
cc @haoqing0110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @dhaiducek , I test controller-gen
with latest api main branch, it fails and generates unexpected code for generic type:
./_output/tools/bin/controller-gen object:headerFile="hack/empty.txt" paths="./cluster/v1alpha1"
open-cluster-management.io/api/cluster/v1alpha1:-: invalid type: func(clusterName string, workload T) (open-cluster-management.io/api/cluster/v1alpha1.ClusterRolloutStatus, error)
Error: not all generators ran successfully
run `controller-gen object:headerFile=hack/empty.txt paths=./cluster/v1alpha1 -w` to see all available markers, or `controller-gen object:headerFile=hack/empty.txt paths=./cluster/v1alpha1 -h` for usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, we can also change to the explicit way to claim which struct needs gen deepcopy funcs: #288
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems the gengo
tooling may not be respecting the deepcopy-gen=false
tag? In any case, I don't have a great deal of familiarity with deepcopy, but here's a commit that might work as an alternative to selectively generating the APIs:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xuezhaojun I'm able to reproduce it as well. having the same issue as @dhaiducek sounds like deepCopy generator does not respect the deepcopy-gen=false
tag !
Based on the deepCopy code doc here setting the deepcopy-gen=false should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing the alternative! I will follow with it.
cluster/v1alpha1/helpers.go
Outdated
@@ -83,20 +88,21 @@ func NewRolloutHandler(pdTracker *clusterv1beta1.PlacementDecisionClustersTracke | |||
// | |||
// ClustersTimeOut: If the cluster status is Progressing or Failed, and the status lasts longer than timeout defined in strategy, | |||
// will list them RolloutResult.ClustersTimeOut with status TimeOut. | |||
func (r *RolloutHandler) GetRolloutCluster(rolloutStrategy RolloutStrategy, statusFunc ClusterRolloutStatusFunc) (*RolloutStrategy, RolloutResult, error) { | |||
// func (r *RolloutHandler) GetRolloutCluster(rolloutStrategy RolloutStrategy, statusFunc ClusterRolloutStatusFunc) (*RolloutStrategy, RolloutResult, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line
cluster/v1alpha1/helpers.go
Outdated
} | ||
|
||
func progressivePerCluster(clusterGroupsMap clusterv1beta1.ClusterGroupsMap, length int, timeout time.Duration, existingClusterStatus []ClusterRolloutStatus) RolloutResult { | ||
rolloutClusters := []ClusterRolloutStatus{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use var rolloutClusters, timeoutClusters []ClusterRolloutStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, done
Signed-off-by: melserngawy <[email protected]>
003b118
to
ac0fe2c
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: qiujian16, serngawy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
bf4f47e
into
open-cluster-management-io:main
Signed-off-by: melserngawy <[email protected]>
Signed-off-by: melserngawy <[email protected]>
Signed-off-by: melserngawy <[email protected]>
Signed-off-by: melserngawy <[email protected]> Fix: run `make update` fail because deepcopy doesn't support generic type. (open-cluster-management-io#288) Signed-off-by: xuezhaojun <[email protected]> Fix verify ci step missing. (open-cluster-management-io#289) Signed-off-by: xuezhaojun <[email protected]> :bug: Use controller-runtime for deepcopy generation for cluster:v1alpha1 (open-cluster-management-io#291) * Revert "Fix: run `make update` fail because deepcopy doesn't support generic type. (open-cluster-management-io#288)" This reverts commit ae208c8. Signed-off-by: Dale Haiducek <[email protected]> * Use `controller-gen` for deepcopy cluster:v1alpha1 GenGo isn't respecting the `+k8s:deepcopy-gen=false` tag to skip generation for the generic type Signed-off-by: Dale Haiducek <[email protected]> --------- Signed-off-by: Dale Haiducek <[email protected]> 🐛 add ca bundle to addon proxy settings (open-cluster-management-io#293) Signed-off-by: Yang Le <[email protected]> Revert "remove ClusterSet ClusterSetBinding API version v1beta1 (open-cluster-management-io#266)" This reverts commit 9675ab7. Signed-off-by: haoqing0110 <[email protected]>
Fix: run `make update` fail because deepcopy doesn't support generic type. (#288) Fix verify ci step missing. (#289) :bug: Use controller-runtime for deepcopy generation for cluster:v1alpha1 (#291) * Revert "Fix: run `make update` fail because deepcopy doesn't support generic type. (#288)" This reverts commit ae208c8. * Use `controller-gen` for deepcopy cluster:v1alpha1 GenGo isn't respecting the `+k8s:deepcopy-gen=false` tag to skip generation for the generic type --------- 🐛 add ca bundle to addon proxy settings (#293) Revert "remove ClusterSet ClusterSetBinding API version v1beta1 (#266)" This reverts commit 9675ab7. Signed-off-by: haoqing0110 <[email protected]> Co-authored-by: Mohamed ElSerngawy <[email protected]>
Summary
Related issue(s)
Fixes #