Skip to content

Setting up a reliable robot communication with Zenoh

boxanm edited this page Dec 30, 2024 · 6 revisions

This tutorial will show you how to set up Zenoh and CycloneDDS for reliable communication between multiple computers with ROS 2. The principal demand for the communication on the Warthog was to get the same functionality as we used to have on ROS 1. Nevertheless, Zenoh is quite powerful and has some nice features over classic ROS 1.

Theoretically, it should be possible to also get a ROS 1-like experience when running zenoh on your laptop and connecting to the Warthog over wifi. However, my tests showed it wasn't working reliably. So, instead, use either Foxglove or a remote desktop.

Prerequirements:

  • Ubuntu 22
  • ROS 2 Humble

While the same config would probably work on ROS 2 Iron/Jazzy and later, it is probably better to replace DDS with Zenoh completely as the rmw_middleware.

This guide somewhat follows the guides described on CycloneDDS and zenoh-plugin-ros2dds Githubs.

Installation

CycloneDDS

We will start by installing CycloneDDS. CycloneDDS is an rmw from the same authors as Zenoh. They recommend using it instead of the default FastDDS, so that's what we are gonna do. Download and install CycloneDDS. The easiest way is to use apt:

sudo apt install ros-humble-rmw-cyclonedds-cpp

Now, we need to tell ROS that CycloneDDS is the preferred rmw implementation. That can be done with the following command:

export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp

To make the change persistent across reboots, place it in your .bashrc file, or if your robot is configured with norlab_robot, the correct place is in scripts/bash/env.sh.

We don't necessarily want to use the default CycloneDDS configuration. Instead, we will create a cyclone.xml file and tell CycloneDDS to use it with

export CYCLONEDDS_URI=file:///home/robot/ros2_ws/src/norlab_robot/scripts/config/cyclone_dds.xml

On all three Warthog computers the file contains something like this:

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config
https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
<Domain>
    <General>
     <DontRoute>true</DontRoute>
    </General>
    <Discovery>
      <ParticipantIndex>auto</ParticipantIndex>
      <MaxAutoParticipantIndex>100</MaxAutoParticipantIndex>
    </Discovery>
    <Internal>
      <SocketReceiveBufferSize min="10MB"/>
    </Internal>
    </Domain>
</CycloneDDS>

Notice that we set the ParticipantIndex to auto and the max value to 100. This is important if the computer launches many nodes. Having multiple nodes with the same index would not work. For your reference, the full list of possible configurations is located here, and Eclipse also provides a guide on how to use it.

To verify that everything is working correctly (for now, outside of ROS), open two terminal windows. In the first one, execute

CYCLONEDDS_URI=file:///home/robot/ros2_ws/src/norlab_robot/scripts/config/cyclone_dds.xml ddsperf sanity

and in the second one, call

CYCLONEDDS_URI=file:///home/robot/ros2_ws/src/norlab_robot/scripts/config/cyclone_dds.xml ddsperf pong

You should see the window launching sanity showing some statistics on the communication. The logging stops when you kill the pong command. To verify that you have a functional ROS communication (local only), open two terminals again. In the first one, run

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp \
 CYCLONEDDS_URI=file:///home/robot/ros2_ws/src/norlab_robot/scripts/config/cyclone_dds.xml \
 ros2 run demo_nodes_cpp talker

in the second one, execute

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp \
 CYCLONEDDS_URI=file:///home/robot/ros2_ws/src/norlab_robot/scripts/config/cyclone_dds.xml \
 ros2 run demo_nodes_cpp listener

If you encountered any problems during these basic tests, scroll down to the known issues section.

While it is possible to do way more with this cool rmw implementation, I didn't succeed in getting inter-computer communication work when relying only on CycloneDDS. So instead, we will install zenoh.

Zenoh

zenoh-bridge-ros2dds is a bridge between local DDS implementation and other zenoh instances on the network. We will start by installing it with

echo "deb [trusted=yes] https://download.eclipse.org/zenoh/debian-repo/ /" | sudo tee -a /etc/apt/sources.list > /dev/null
sudo apt update
sudo apt install zenoh-bridge-ros2dds

Since zenoh replaces the DDS communication layer between computers, we must ensure that no DDS communication can occur between our devices. The minimum that we need to do is to set

export ROS_LOCALHOST_ONLY=1

or if your system is Iron/Jazzy and later:

export ROS_AUTOMATIC_DISCOVERY_RANGE=LOCALHOST

Unfortunately the ROS_LOCALHOST_ONLY doesn't seem to work properly, so we also need to modify each computer's ROS_DOMAIN_ID to differ. On the Warthog, I chose:

  • low-level: 215
  • jetson-orin: 216
  • high-level: 3

When this is done, you can execute zenoh-bridge-ros2dds and zenoh-bridge-ros2dds -e tcp/<robot-ip>:7447 on your two devices (❗this will not work on the Warthog since there's another instance of zenoh-bridge-ros2dds already running❗️). After launching the two bridges, you can try the talker listener demo mentioned earlier and should see messages being sent between your devices.

Similarly to CycloneDDS, zenoh can also be configured with a file. In norlab_robot, the file is located inside /home/robot/ros2_ws/src/norlab_robot/scripts/config/zenoh-bridge.json5. To launch zenoh-bridge-ros2dds using that file, use the -c flag. You can refer to the three JSON files for each computer:

There are two main sections inside these files:

Plugins
{
  plugins: {
    ////
    //// ROS2 related configuration
    //// All settings are optional and are unset by default - uncomment the ones you want to set
    ////
    ros2dds: {
   }
}

Contains settings related to the bridge between DDS and Zenoh. Here, you can setup a whitelist/blacklist for topics or downsample the publishing rate of a given topic. On the Warthog, we use this to filter out most of ZED X camera's topics. See Jetson Orin for more info.

Zenoh

Main zenoh configuration. A full list of default values is located here. Here, we can specify the device's name:

metadata: {
    name: "warthog-low-level",
  },

and other metadata. Note that all our devices are configured as a router. There are other options described in Zenoh's documentation. The sections called scouting and gossip ensure that a new device is discovered automatically without us specifying all the IP addresses, interfaces, and routes between devices in advance. While, in general, we wouldn't need to list the devices in either the connect or listen sections, we can do some optimization and avoid some issues by doing so. For example, high-level and jetson-orin are connected through two routes:

  • 192.168.3.*/24 through the switch
  • 10.0.0.*/24 a direct connection

we can specify that the desired route is through the direct link:

connect: {
    endpoints: [
      "tcp/10.0.0.2:7447"
    ]
  },

and similarly on jetson-orin.

Another optimization we can do is to only listen on the interfaces where we actually expect Zenoh traffic. On the high-level, these would be:

listen: {
    endpoints: [
      // "<proto>/<ip>:<port>"
      "tcp/192.168.3.3:7447#iface=enp1s0f1",
      "tcp/10.0.0.1:7447#iface=enp7s0"
    ]
  },

Finally, the low-level typically boots even before the router is started. This is a problem as the interface it's connected to the rest of the network is down at the moment zenoh starts. To cope with this, specify that Warthog's subnet is on interface enp0s31f6:

listen: {
    endpoints: [
        "tcp/192.168.3.2:7447#iface=enp0s31f6"
    ]
  },

Once you are happy with your config, you can create a systemctl service that start's zenoh automatically on boot. On the Warthog, they are always located in /etc/systemd/system/launch_zenoh.service and contain this:

[Unit]
Description=Eclipse Zenoh Bridge for ROS2 with a DDS RMW
Documentation=https://github.com/eclipse-zenoh/zenoh-plugin-ros2dds
After=network-online.target robot_bringup.service lo-multicast.service
Wants=network-online.target robot_bringup.service lo-multicast.service

[Service]
Type=forking
WorkingDirectory=/home/robot/
ExecStart=/home/robot/ros2_ws/src/norlab_robot/scripts/startup/launch_zenoh.sh
KillMode=mixed
KillSignal=SIGINT
RestartKillSignal=SIGINT
Restart=on-failure
RestartSec=2
TimeoutStopSec=10s
User=robot
StandardOutput=journal
StandardError=journal
SyslogIdentifier=zenoh-bridge-ros2dds

[Install]
WantedBy=default.target

The bridge itself is started in a screen with a script located in scripts/startup/launch_zenoh.sh. Executing the bridge directly in the service did not work correctly.


Known issues

CycloneDDS

  1. If ddsperf sanity returns
ddsperf: failed to increase socket receive buffer size to at least 10485760 bytes, current is 425984 bytes
dds_create_participant(domain -1) failed: -1

you need to increase the maximum memory used for IP fragments to at least 128 MB: sudo sysctl -w net.ipv4.ipfrag_high_thresh=134217728 and the maximum Linux kernel receive buffer size to at least 33 MB: sudo sysctl -w net.core.rmem_max=33554432. StereoLabs guide recommends even higher values (2 GB)

To make these changes permanent, modify them in /etc/sysctl.d/10-cyclone-max.conf:

net.core.rmem_max=33554432
net.ipv4.ipfrag_high_thresh=134217728

Zenoh

  1. Any ros2 command complains about multicast not being enabled on the loopback interface: You can fix this for the current boot with this command: sudo ip l set lo multicast on. Setting this permanently is not easy because every device uses a different networking manager (yes, each of low-level, high-level and jetson-orin rely on a different tool 🤯). What I did instead was configuring a systemctl service that starts before launch_zenoh.service. The service is located in /etc/systemd/system/lo-multicast.service and contains:
[Unit]
Description=Enable multicast on the loopback interface
Before=zenoh-bridge-ros2dds.service

[Service]
Type=oneshot
ExecStart=/sbin/ip link set lo multicast on
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

Make sure to include

After=lo-multicast.service
Wants=lo-multicast.service

in your launch_zenoh.service file.

  1. jetson-orin doesn't show all topics published at low-level. The missing topics are /controller/cmd_vel /joint_states /warthog/platform/cmd_lights /warthog/platform/emergency_stop and at the time of writing this guide, there wasn't any publisher on these 4 topics, so it is a correct behavior.

  2. It is impossible to downsample large ROS messages (for example images) when using zenoh-bridge-ros2dds. I created a Github issue asking about this.

  3. When nothing works anymore, it's often a good idea to restart all zenoh services. Use the function below to restart launch_zenoh.service on all three computers:

restart_all_zenoh() {
	password="" # fill the password value
	command="sudo -S systemctl restart launch_zenoh.service"
	echo $password | ssh warthog-high-level $command
	echo $password | ssh warthog-low-level $command
	echo $password | ssh warthog-jetson-orin $command
}

Norlab's Robots

Protocols

Templates

Resources

Grants

Datasets

Mapping

Deep Learning

ROS

Ubuntu

Docker (work in progress)

Tips & tricks

Clone this wiki locally