Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring your own lab (BYOL) doc #407

Merged
merged 10 commits into from
Dec 21, 2023
Merged

Conversation

sferlin
Copy link
Contributor

@sferlin sferlin commented Nov 10, 2023

Changes needed to jetlag to make a bm ipv4 connected install in a Dell r660/r670 BYOL with RHEL 9.2.

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some feedback, thanks for getting this in an working through all of those issues in a new lab!!

ansible/inventory/inventory-bm-byol.sample Show resolved Hide resolved
bmc_user=root
bmc_password=password
lab_interface=<lab_mac interface name>
network_interface=<anything, not used>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to be careful saying "anything, not used" since the network_interface is the var used to determine the actual interface for the network that the ocp cluster is deployed on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this var need to be here, but the value is "not used". May be it is overwritten by all.yml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrote anything in my experiments without taking any effect. In the sample file it is written as "eth0".


The bastion machine needs 2 interfaces:
- The interface connected to the network, i.e., with an IP assigned, a L3 network.
- The control-plane interface, from which the cluster nodes are accessed (this is a L2 network, i.e., it does have an IP assigned).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with

it does have an IP assigned

you meant does not have an IP assigned?

In a BYOL, due to the non-standard interface names and NIC PCI slots, we have to craft jetlag's inventory file by hand.

The bastion machine needs 2 interfaces:
- The interface connected to the network, i.e., with an IP assigned, a L3 network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a note here that this network is typically referenced as a lab network as it provides the connectivity into the bastion machine.

# Lab & cluster infrastructure vars
################################################################################
# Which lab to be deployed into (Ex scalelab)
lab: alias
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to change this to byol when we add that term to the permitted list.


- The disks could vary from SATA/SAS to NVME, and therefore the /dev/disk/by-path IDs will vary.

- The task ''Clean lab interfaces'' will fail if there is no file at this location `/root/clean-interface.sh`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a when statement and skip when lab is set to byol to address this.


- The task ''Clean lab interfaces'' will fail if there is no file at this location `/root/clean-interface.sh`.

- `/root/bm/opm-linux.tar.gz` failed to be downloaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now thanks to @radez with #404


- `/root/bm/opm-linux.tar.gz` failed to be downloaded.

- The task 'Stop and disable iptables' failed because dnf install iptables-services and start with systemctl needed to be done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add some error handling on this task or make sure that iptables-services is in the list of packages to be installed or perhaps there is a better solution. I'll take a look in the near future.

- The task 'Stop and disable iptables' failed because dnf install iptables-services and start with systemctl needed to be done.

Some pods of the setup-bastion did not come up with "permission denied" issues. In our [blog](https://www.redhat.com/sysadmin/container-permission-denied-errors) SELinux seemed to be the cause when containers mount a writable volume, as starting the container manually (without mounting the volume) worked, i.e., UID 26 was also correct.
The suggestion of appending a ":Z" flag at the end (instead of touching SELinux in general) fixed it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this makes sense, so the typical Scale and Alias lab machines come with rhel 8 and with selinux set to permissive instead of enabled. It seems the machines you have in the lab must have had selinux set to enabled. I don't particularly like the idea of saying you must disable or make selinux permissive, so we should see if the :Z flag works out of the box with scale/alias lab machines instead IMHO.

<IP or FQDN> ansible_ssh_user=root bmc_address=<IP or FQDN>

[bastion:vars]
bmc_user=root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"root" should be generalized. it is not always "root"

[controlplane:vars]
role=master
boot_iso=discovery.iso
bmc_user=root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"root". Ditto

bmc_user=root
bmc_password=password
lab_interface=<lab_mac interface name>
network_interface=<anything, not used>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this var need to be here, but the value is "not used". May be it is overwritten by all.yml.

[worker:vars]
role=worker
boot_iso=discovery.iso
bmc_user=root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"root". Ditto.

Complete!
```

4. Setup ssh keys on the bastion and copy to itself to permit local ansible interactions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we generate new ssh keys, we do not need step 2 above.

# you must stop and rm all assisted-installer containers on the bastion and rerun
# the setup-bastion step in order to setup your bastion's assisted-installer to
# the version you specified
ocp_release_image: quay.io/openshift-release-dev/ocp-release:4.14.1-x86_64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generalize the name

ocp_release_image: quay.io/openshift-release-dev/ocp-release:4.14.1-x86_64

# This should just match the above release image version (Ex: 4.13)
openshift_version: "4.14"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generalize the version

smcipmitool_url:

bastion_lab_interface: eno8303 #ens1f0
bastion_controlplane_interface: ens1f0 #eno8303
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two interfaces are not fixed. The lab_interface is whatever the BYOL comes with. The controlplane_interface is a choice decided when we crafted the inventory file.

################################################################################
# Network configuration for all bm cluster and rwn control-plane nodes
controlplane_lab_interface: eno8303 #eno12399
controlplane_network_interface: eno12399 #eno8303
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. These two interfaces are not fixed. Choose them correctly.


### Lab & cluster infrastructure vars

Change `lab` to `lab: scalelab or ibmcloud or alias`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this suggestion


Change `lab` to `lab: scalelab or ibmcloud or alias`

Change `lab_cloud` to `lab_cloud: na`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use byol in rh_labs , then this should be lab_cloud: byol

docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved

- I observed that the file `/root/bm/opm-linux.tar.gz` could fail to be downloaded.

- The task ''Stop and disable iptables'' failed because `dnf install iptables-services` and `systemctl start` needed to be done.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, maybe we can make the entire tone of this guide "instructional" as opposed to writing a more personal account of your experience? Applies to the next few statements also.

@sferlin sferlin force-pushed the doc_byol branch 2 times, most recently from ac50f12 to b22d255 Compare November 23, 2023 13:26
Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selinux changes are working even with selinux in permissive, however we should apply it to all volume mounts for all podman containers in jetlag.

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a few more changes before merging this.

ansible/vars/lab.yml Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
ansible/roles/validate-vars/tasks/main.yml Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Outdated Show resolved Hide resolved
docs/bastion-deploy-bm-byol.md Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants