-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bring your own lab (BYOL) doc - 2nd revision
- Loading branch information
sferlin
committed
Nov 23, 2023
1 parent
8594aaf
commit 810e2da
Showing
4 changed files
with
22 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,9 @@ rh_labs: | |
- alias | ||
- scalelab | ||
|
||
byol_labs: | ||
- byol | ||
|
||
labs: | ||
alias: | ||
dns: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,8 +5,8 @@ Assuming you received a set of machines, this guide will walk you through gettin | |
In a BYOL, due to the non-standard interface names and NIC PCI slots, we have to craft jetlag's inventory file by hand. | ||
|
||
The bastion machine needs 2 interfaces: | ||
- The interface connected to the network, i.e., with an IP assigned, a L3 network. | ||
- The control-plane interface, from which the cluster nodes are accessed (this is a L2 network, i.e., it does have an IP assigned). | ||
- The interface connected to the network, i.e., with an IP assigned, a L3 network. This interface usually referred to as *lab_network* as it provides the connectivity into the bastion machine. | ||
- The control-plane interface, from which the cluster nodes are accessed (this is a L2 network, i.e., it does not have an IP assigned). | ||
|
||
The cluster machines need (at least) 1 interface: | ||
- The control-plane interface, from which other cluster nodes are accessed. | ||
|
@@ -31,23 +31,8 @@ _**Table of Contents**_ | |
Sometimes your bastion may have undesirable settings such as `firewalld` or `iptables` with rules in place. These are recommended to be fixed, e.g., `firewalld` can be silenced and `iptables` cleaned. | ||
|
||
1. Select your bastion machine from the allocation | ||
2. Copy your ssh keys to the designated bastion machine | ||
|
||
```console | ||
[user@fedora ~]$ ssh-copy-id [email protected] | ||
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed | ||
/usr/bin/ssh-copy-id: INFO: 2 key(s) remain to be installed -- if you are prompted now it is to install the new keys | ||
Warning: Permanently added 'xxx-r660.machine.com,x.x.x.x' (ECDSA) to the list of known hosts. | ||
[email protected]'s password: | ||
|
||
Number of key(s) added: 2 | ||
|
||
Now try logging into the machine, with: "ssh '[email protected]'" | ||
and check to make sure that only the key(s) you wanted were added. | ||
[user@fedora ~]$ | ||
``` | ||
|
||
3. Install some additional tools to help after reboot | ||
2. Install some additional tools to help after reboot | ||
|
||
Also, make sure RHEL has repositories added and an active subscription, since `jetlag` will require some packages: `dnsmasq`, `frr`, `golang-bin`, `httpd` and `httpd-tools`, `ipmitool`, `python3-pip`, `podman`, and `skopeo`. | ||
|
||
|
@@ -58,7 +43,7 @@ Updating Subscription Management repositories. | |
Complete! | ||
``` | ||
|
||
4. Setup ssh keys on the bastion and copy to itself to permit local ansible interactions | ||
3. Setup ssh keys on the bastion and copy to itself to permit local ansible interactions | ||
|
||
```console | ||
[root@xxx-r660 ~]# ssh-keygen | ||
|
@@ -90,7 +75,7 @@ and check to make sure that only the key(s) you wanted were added. | |
[root@xxx-r660 ~]# | ||
``` | ||
|
||
5. Clone `jetlag` | ||
4. Clone `jetlag` | ||
|
||
```console | ||
[root@xxx-r660 ~]# git clone https://github.com/redhat-performance/jetlag.git | ||
|
@@ -103,7 +88,7 @@ Receiving objects: 100% (4510/4510), 831.98 KiB | 21.33 MiB/s, done. | |
Resolving deltas: 100% (2450/2450), done. | ||
``` | ||
|
||
6. Download your pull_secret.txt from [console.redhat.com/openshift/downloads](https://console.redhat.com/openshift/downloads) and place it in the root directory of `jetlag` | ||
5. Download your pull_secret.txt from [console.redhat.com/openshift/downloads](https://console.redhat.com/openshift/downloads) and place it in the root directory of `jetlag` | ||
|
||
```console | ||
[root@xxx-r660 jetlag]# cat pull_secret.txt | ||
|
@@ -112,7 +97,7 @@ Resolving deltas: 100% (2450/2450), done. | |
... | ||
``` | ||
|
||
7. Change to `jetlag` directory, and then run `source bootstrap.sh` | ||
6. Change to `jetlag` directory, and then run `source bootstrap.sh` | ||
|
||
```console | ||
[root@xxx-r660 ~]# cd jetlag/ | ||
|
@@ -122,7 +107,7 @@ Collecting pip | |
(.ansible) [root@xxx-r660 jetlag]# | ||
``` | ||
|
||
8. Subsequent bastion setup attempts | ||
7. Subsequent bastion setup attempts | ||
|
||
If you wish to install a different OCP version after having prepared your bastion, [clean up](https://github.com/redhat-performance/jetlag/blob/main/docs/troubleshooting.md#bastion---clean-all-container-services--podman-pods) all running pods and rerun the `setup-bastion.yml` playbook: | ||
|
||
|
@@ -145,7 +130,7 @@ Copy the vars file and edit it to create the inventory with your lab info: | |
|
||
### Lab & cluster infrastructure vars | ||
|
||
Change `lab` to `lab: scalelab or ibmcloud or alias` | ||
Change `lab` to `lab: scalelab or ibmcloud or alias or byol` | ||
|
||
Change `lab_cloud` to `lab_cloud: na` | ||
|
||
|
@@ -210,7 +195,7 @@ The `ansible/vars/all.yml` now resembles .. | |
# Lab & cluster infrastructure vars | ||
################################################################################ | ||
# Which lab to be deployed into (Ex scalelab) | ||
lab: alias | ||
lab: byol | ||
# Which cloud in the lab environment (Ex cloud42) | ||
lab_cloud: na | ||
|
@@ -303,8 +288,9 @@ Choose wisely which server for which role: bastion, masters and workers. Make su | |
- Record the names and MACs of their L3 network NIC to be used for the inventory. | ||
- Choose the control-plane NICs, the L2 NIC interface. | ||
- Record the interface names and MACs of the chosen control-plane interfaces. | ||
- Make sure you have root access to the bms, i.e., idrac for Dell. In the example below the bmc_user and bmc_password are set to root and password. | ||
|
||
Now, copy the inventory file and edit it with the above info from your lab: | ||
Now, copy the inventory file and edit it with the above info manually for your lab: | ||
|
||
``` | ||
# Create inventory playbook will generate this for you much easier | ||
|
@@ -330,7 +316,7 @@ boot_iso=discovery.iso | |
bmc_user=root | ||
bmc_password=password | ||
lab_interface=<lab_mac interface name> | ||
network_interface=<anything, not used> | ||
network_interface=<anything> | ||
network_prefix=24 | ||
gateway=198.18.10.1 | ||
dns1=198.18.10.1 | ||
|
@@ -346,7 +332,7 @@ boot_iso=discovery.iso | |
bmc_user=root | ||
bmc_password=password | ||
lab_interface=<lab_mac interface name> | ||
network_interface=<anything, not used> | ||
network_interface=<anything> | ||
network_prefix=24 | ||
gateway=198.18.10.1 | ||
dns1=198.18.10.1 | ||
|
@@ -408,11 +394,9 @@ In jetlab, we divide the cluster installation process in two phases: (1) Setup t | |
|
||
- The task ''Clean lab interfaces'' will fail if there is no file at this location `/root/clean-interface.sh`. | ||
|
||
- I observed that the file `/root/bm/opm-linux.tar.gz` could fail to be downloaded. | ||
|
||
- The task ''Stop and disable iptables'' failed because `dnf install iptables-services` and `systemctl start` needed to be done. | ||
|
||
- Most importantly, while the setup-bastion can finish successfully, some pods did not come up with "permission denied". In [blog](https://www.redhat.com/sysadmin/container-permission-denied-errors) SELinux seemed to be the cause when containers mount a writable volume, as starting the container manually (without mounting the volume) worked. The direct fix to this issue is appending the ":Z" flag at the end (instead of touching SELinux in general) in a few locations in `jetlag`: | ||
- Most importantly, while the setup-bastion can successfully finish, some pods did not come up showing "permission denied". In [blog](https://www.redhat.com/sysadmin/container-permission-denied-errors) SELinux seemed to be the cause when containers mount a writable volume, as starting the container manually (without mounting the volume) worked. The direct fix to this issue is appending the ":Z" flag at the end (instead of touching SELinux in general) in a few locations in `jetlag`: | ||
- In `ansible/roles/bastion-assisted-installer/tasks/main.yml`: | ||
- `/opt/assisted-service/data/postgresql:/var/lib/pgsql:Z` | ||
- `/opt/assisted-service/nginx-ui.conf:/opt/bitnami/nginx/conf/server_blocks/nginx.conf:Z` | ||
|
@@ -429,4 +413,4 @@ In jetlab, we divide the cluster installation process in two phases: (1) Setup t | |
- Make sure to inspect the 'BIOS Settings' in the machine for both, the *boot order* and *boot type*. `jetlag` will mount the .iso and instruct the machines for a one-time boot, where, later, they should be able to boot from the disk. In other words, check if the string in the boot order field contains the hard disk. Once booted, in the virtual console, you will see the L3 NIC interface with an 198.10.18.x address, which is correct according to our `byol.yml` above. | ||
- [badfish](https://github.com/redhat-performance/badfish) could be used, however, it uses FQDN for the machines only, and we also do not have the configuration for Del R660 and R760 in the interface config file yet. | ||
|
||
- In the assistant installer GUI, under cluster events, if you observe any *permission denied* error, it is related to the SELinux issue pointed out above. If you however notice an issue related to *wrong booted device*, make sure to observe in your virtual console, if the machines booted from the disk, and if the boot order contains the option. This is a classic boot order issue. The correct steps in the assistant installer procedures are that the control-plane nodes will boot from the disk, be configured, and join the control-plane "nominated" as the bootstrap node (around 45 t0 47% of the installation) to continue with the installation of the worker nodes. | ||
- In the assistant installer GUI, under cluster events, if you observe any *permission denied* error, it is related to the SELinux issue pointed out previously. If you however notice an issue related to *wrong booted device*, make sure to observe in your virtual console, if the machines booted from the disk, and if the boot order contains the disk option. This is a classic boot order issue. The steps in the assistant installer are that the control-plane nodes will boot from the disk to be configured, and then join the control-plane "nominated" as the bootstrap node (this happens around 45-47% of the installation) to continue with the installation of the worker nodes. |