Skip to content

Commit

Permalink
add instruction to modify controller deployment
Browse files Browse the repository at this point in the history
Signed-off-by: Sunyanan Choochotkaew <[email protected]>
  • Loading branch information
sunya-ch committed Apr 22, 2024
1 parent 0fa6570 commit 2b07f90
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 2 deletions.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- [One-time all-to-all - *recommended for small cluster (<10 HostInterfaces)*](#one-time-all-to-all---recommended-for-small-cluster-10-hostinterfaces)
- [Health checker service](#health-checker-service)
- [Uninstallation](#uninstallation)
- [Troubleshooting](#troubleshooting)

<!-- /TOC -->

Expand Down Expand Up @@ -252,4 +253,9 @@ Deploy health check and agents to the cluster to serve a functional and connetio

```bash
operator-sdk cleanup multi-nic-cni-operator --delete-all -n multi-nic-cni-operator
```
```

# Troubleshooting
Multi-NIC CNI operator is composed of multiple components. There are some common issues could happen due to the missing required configuration (e.g., security rule on VPC), versioning, and 3rd party CNI.

To primarily troubleshoot the failure, please check the [Troubleshooting (Common Issues) page](https://foundation-model-stack.github.io/multi-nic-cni/troubleshooting/troubleshooting/).
24 changes: 23 additions & 1 deletion document/docs/troubleshooting/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Manual Troubleshooting
# Manual Troubleshooting (Common Issues)

** Please first confirm feature supports on each multi-nic-cni release version from [here](../release/index.md). **

<!-- TOC tocDepth:2..3 chapterDepth:3..6 -->

- [Issues](#issues)
- [ Multi-NIC CNI Controller gets OOMKilled](#multi-nic-cni-controller-gets-oomkilled)
- [Pod failed to start](#pod-failed-to-start)
- [Pod failed to start (Summary Table)](#pod-failed-to-start-summary-table)
- [Ping failed](#ping-failed)
Expand All @@ -25,6 +26,7 @@
- [Update daemon pod to use latest version](#update-daemon-pod-to-use-latest-version)
- [Update controller to use latest version](#update-controller-to-use-latest-version)
- [Safe upgrade Multi-NIC CNI operator](#safe-upgrade-multi-nic-cni-operator)
- [Customize Multi-NIC CNI controller of operator](#customize-multi-nic-cni-controller-of-operator)

<!-- /TOC -->

Expand All @@ -40,6 +42,10 @@ export FAILED_NODE_IP = # IP of FAILED_NODE
export MULTI_NIC_NAMESPACE= # namespace where multi-nic cni operator is deployed, default=multi-nic-cni-operator
```

### Multi-NIC CNI Controller gets OOMKilled

This is expected issue in a large cluster where the controller requires large amount of member to operate. Please adjust the resource limit in the controller deployment. For the case of installing via operator hub or operator bundle, please check the step to modify the deployment in [Customize Multi-NIC CNI controller of operator](#customize-multi-nic-cni-controller-of-operator).

### Pod failed to start

**Issue:**
Expand Down Expand Up @@ -461,3 +467,19 @@ Log in to FAILED_NODE with `oc debug node/$FAILED_NODE` or using [nettools](http
<br>

Otherwise, check [live migration](https://github.com/foundation-model-stack/multi-nic-cni/tree/doc/live-migration)

### Customize Multi-NIC CNI controller of operator
If the multi-nic-cni operator has been managed by the Operator Lifecycle Manager (olm) (installed by operator-sdk run bundle or via operator hub), the modification to the controller deployment (multi-nic-cni controller pod) will be overriden by the olm.

To modify the value such as resource request/limit to the controller pod, you need to edit the `.spec.install.spec.deployments` section in the ClusterServiceVersion (csv) resource of the multi-nic-cni operator.

You can locate the csv resource of multi-nic-cni operator in your cluster from the following command.

```
kubectl get csv -l operators.coreos.com/multi-nic-cni-operator.multi-nic-cni-operator -A
```
*Before v1.0.5, the csv are created in all namespaces. You need to edit the csv in the namespace that the controller has been deployed. The modification of csv in the other namespace will not be applied.*

0 comments on commit 2b07f90

Please sign in to comment.