Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Localities to FoundationDBClusterSpec #1353

Conversation

manfontan
Copy link
Collaborator

@manfontan manfontan commented Sep 15, 2022

Description

This PR implements the first step towards supporting three data hall configuration in the operator.

The current changes will allow to configure three data hall using Localities and Node Selectors.

Since the sidecar currently does not read the topology key label from the pods. We will be providing the NodeSelectors as part of the Locality information. Each node selector must correspond to a data hall.

TopologyKey logic is not implemented in this PR. A validation should report an error if this value is included in the localities spec.

The data_hall locality would look something like this:

Locality{
  Key:         FDBLocalityDataHallKey,
  Value:       "",
  TopologyKey: corev1.LabelTopologyZone,
  NodeSelectors: [][]string{
	  {"foundationdb", "zone1"},
	  {"foundationdb", "zone2"},
	  {"foundationdb", "zone3"},
},

When adding pods the operator will read the FDB status and assign a NS to the pod depending on the process distribution across DH. If there are no processes with DH info it will pick a random NS.

The ProcessGroup Status info will be updated to include the DataHall locality information. This info will be used during pod updates to add the NodeSelector from the Locality in order to calculate the PodSpecHash.

For removals ChooseDistributedProcesses has been updated to use FDBLocalityDataHallKey in the fields in order to maximally pick a well-distributed set of processes.

As discussed in:
#348

Type of change

Please select one of the options below.

New feature (non-breaking change which adds functionality)

Discussion

The changes in this PR are based on
https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/three_datahall.md

Testing

Please describe the tests that you ran to verify your changes. Unit tests?
Manual testing?

docker run --rm --entrypoint=/bin/bash -ti --platform="linux/amd64" -v $(pwd):/work -w /work docker.io/library/golang:1.18.5 ./scripts/setup_container.sh
root@f7a6ffa5d70d:/work# make manifests
...
/work
/go/bin/controller-gen "crd:maxDescLen=0,crdVersions=v1,generateEmbeddedObjectMeta=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
# Per default controller-gen will generate a ClusterRole for our example we want to use a Role and the namespace marker doesn't
# work since it requires a namespace and kustomize doesn't support to change the Kind.
make: Warning: File 'config/crd/bases/apps.foundationdb.org_foundationdbclusters.yaml' has modification time 0.35 s in the future
make: warning:  Clock skew detected.  Your build may be incomplete.
root@f7a6ffa5d70d:/work# make test
go test  ./... -coverprofile cover.out
?       github.com/FoundationDB/fdb-kubernetes-operator [no test files]
ok      github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta1     0.388s  coverage: 41.2% of statements
ok      github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta2     0.516s  coverage: 46.8% of statements
?       github.com/FoundationDB/fdb-kubernetes-operator/cmd/po-docgen   [no test files]
...

Do we need to perform additional testing once this is merged, or perform in a larger testing environment?
This changes should not have any effect so additional testing is not required I believe.

Documentation

Did you update relevant documentation within this repository?
No

If this change is adding new functionality, do we need to describe it in our user manual?
Once the three data hall functionality is fully implemented it should be documented indeed.

If this change is adding or removing subreconcilers, have we updated the core technical design doc to reflect that?
N/A

If this change is adding new safety checks or new potential failure modes, have we documented and how to debug potential issues?
N/A

Follow-up

Are there any follow-up issues that we should pursue in the future?

Does this introduce new defaults that we should re-evaluate in the future?
No

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 10d2e4a
  • Duration 4:05:51
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to add the functionality in another PR?

api/v1beta1/foundationdbcluster_types.go Outdated Show resolved Hide resolved
api/v1beta1/foundationdbcluster_types.go Outdated Show resolved Hide resolved
@manfontan
Copy link
Collaborator Author

Is the plan to add the functionality in another PR?

I can continue working on this PR no problem. If that is your preference.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

@johscheuer
Copy link
Member

Is the plan to add the functionality in another PR?

I can continue working on this PR no problem. If that is your preference.

Personally I would prefer to implement them in the same PR otherwise we have new fields on the CRD that have no effect.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fceb6a1
  • Duration 4:06:44
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 2f0b162
  • Duration 2:47:39
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fc6166e
  • Duration 4:07:56
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: eb7cc09
  • Duration 4:07:44
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 3a2a36d
  • Duration 4:11:42
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

minor fixes.

Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 06972fa
  • Duration 4:11:28
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 85c6873
  • Duration 4:11:47
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 1ac900a
  • Duration 4:11:38
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 05b1207
  • Duration 4:11:30
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@manfontan
Copy link
Collaborator Author

@johscheuer I have updated the PR with your feedback and added some unit tests (I can add a few more if required).
I have been looking at e2e testing but I am not sure how long it would take me to get this PR e2e tested.
If you think this can be done in a reasonable time I would be happy to do so.

@manfontan manfontan requested a review from johscheuer March 19, 2023 12:28
Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I block some time this week to review it.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 80ca79c
  • Duration 0:05:23
  • Result: ❌ FAILED
  • Error: Error while executing command: make -C e2e compile. Reason: exit status 2
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
Signed-off-by: Manuel Fontan <[email protected]>
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: af004d4
  • Duration 4:10:37
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: f32bf80
  • Duration 4:10:16
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 8312fde
  • Duration 4:10:32
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: da7001d
  • Duration 4:10:23
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Signed-off-by: Manuel Fontan <[email protected]>
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fad17f9
  • Duration 4:10:36
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@manfontan manfontan requested a review from johscheuer April 26, 2023 19:26
@manfontan manfontan closed this May 30, 2023
@manfontan
Copy link
Collaborator Author

Duplicates #1651

This implementation is for a single FoundationDBCluster but since it has been idle for such a long time and three_data_hall initial support is on the way I will close the PR for now.

@manfontan manfontan deleted the CDF-1816-Add-fdb-cluster-localities branch June 13, 2023 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants