Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to gateway on a host with many network interfaces by setting EPICS_PVA_ADDR_LIST #162

Open
goetzpf opened this issue Dec 5, 2024 · 3 comments

Comments

@goetzpf
Copy link

goetzpf commented Dec 5, 2024

I have the following setup:

  • The Gateway host has several interfaces.
  • The pva gateway is configured to run the server on the second network interface
  • The client is in a separate network that has only a route to this second interface
  • The client connects by setting EPICS_PVA_ADDR_LIST

When I use EPICS_PVA_ADDR_LIST to connect directly to the gateway, the repsonse to the channel discovery request contains the wrong source interface address, not the one that was selected with EPICS_PVA_ADDR_LIST.

Here is the concrete example:

The gateway host has (among others) these interfaces;

$ ip -4 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
10: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 172.30.134.57/24 brd 172.30.134.255 scope global bond0
valid_lft forever preferred_lft forever
11: bond3: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.16.7/24 brd 192.168.16.255 scope global bond3
valid_lft forever preferred_lft forever

This is the gateway configuration file:

{
"version":2,
"clients":[
{
"name":"GWCLIENT1",
"provider":"pva",
"addrlist":"172.30.130.255",
"autoaddrlist":false,
"bcastport":5076
}
],
"servers":[
{
"name":"CASERVER1",
"clients":["GWCLIENT1"],
"interface":["192.168.16.7"],
"addrlist":"192.168.16.255",
"autoaddrlist":false,
"serverport":5075,
"bcastport":5076,
"getholdoff":0,
"statusprefix":"PV:",
"access":"",
"pvlist":""
}
]
}

I run this command on the client host (ip 193.149.12.42):

$ (export EPICS_PVA_DEBUG=100; export EPICS_PVA_AUTO_ADDR_LIST=NO; export EPICS_PVA_ADDR_LIST=192.168.16.7; pvinfo idadm:calc1)
2024-12-05T16:29:14.637 Creating datagram socket from: 0.0.0.0:58085.
2024-12-05T16:29:14.637 Broadcast address #0: 192.168.16.7:5076. (unicast)
2024-12-05T16:29:14.637 Setting up UDP for interface 193.149.12.42/255.255.255.0, broadcast 193.149.12.255, dest .
2024-12-05T16:29:14.637 Creating datagram socket from: 193.149.12.42:5076.
2024-12-05T16:29:14.637 Creating datagram socket from: 193.149.12.255:5076.
2024-12-05T16:29:14.637 Creating datagram socket from: 224.0.0.128:5076.
2024-12-05T16:29:14.637 Local multicast enabled on 127.0.0.1/224.0.0.128:5076.
2024-12-05T16:29:14.637 Sending 57 bytes 0.0.0.0:58085 -> 192.168.16.7:5076.
2024-12-05T16:29:14.638 UDP Client Rx (53) 0.0.0.0:58085 <- 172.30.134.57:5076
...

The relevant information is this:

EPICS_PVA_ADDR_LIST is set to connect directly to 192.168.16.7. The response UDP package has a source address of 172.30.134.57, not 192.168.16.7 as it should be. In our network setup, the client host has a route to 192.168.16.7 but not to 172.30.134.57 so it cannot connect.

I have verified that the same command line works when I connect to a soft-IOC on a host with many interfaces with was started by setting EPICS_PVAS_INTF_ADDR_LIST only on one of the network interfaces.

So this must be a problem of the pva gateway and not the pv access implementation in EPICS base.

@mdavidsaver
Copy link
Member

@goetzpf As you describe it, your configuration seems correct. Which makes me think that something more is going on.

  • Can you provide your routing table? ip -4 route
  • Can you access any gateway status PV? eg pvget PV:clients
  • In addition to pvinfo (from pvAccessCPP), please also test with pvxinfo (from PVXS). This shouldn't make a difference, but is a useful data point.
  • With pvxinfo, instead of setting EPICS_PVA_ADDR_LIST, instead set EPICS_PVA_NAME_SERVERS=192.168.16.7 to avoid UDP entirely.

If none of these give any hints, could you repeat your test while running a packet capture on the gateway host?

eg. tshark -i lo -i bond0 -i bond3 -w capture.pcapng You could then transfer capture.pcapng to another host and filter only the pva traffic.

@goetzpf
Copy link
Author

goetzpf commented Dec 18, 2024

Hello Michael,

thank you for your quick response.

On 12/10/24 01:12, mdavidsaver wrote:

@goetzpf As you describe it, your configuration seems correct. Which makes me think that something more is going on.

Can you provide your routing table? ip -4 route

I have added the output of "ip -4 route" for the client host and the gateway host as attachment.

Can you access any gateway status PV? eg [pvget PV:clients](https://epics-base.github.io/p4p/gw.html#status-pvs)

No, this doesn't work either.

In addition to pvinfo (from pvAccessCPP), please also test with pvxinfo (from PVXS). This shouldn't make a difference, but is a useful data point.

pvxinfo gives this:

EPICS_PVA_AUTO_ADDR_LIST=NO EPICS_PVA_ADDR_LIST=192.168.16.7 bin/linux-x86_64/pvxinfo idadm:calc1
2024-12-10T21:39:09.691273541 ERR pvxs.tcp.io connection to Server 172.30.134.57:5075 closed with socket error 111 : Connection refused
2024-12-10T21:39:11.900122939 ERR pvxs.tcp.io connection to Server 172.30.134.57:5075 closed with socket error 111 : Connection refused

2024-12-10T21:39:14.100491146 ERR pvxs.tcp.io connection to Server 172.30.134.57:5075 closed with socket error 111 : Connection refused
Timeout
With pvxinfo, instead of setting EPICS_PVA_ADDR_LIST, instead set EPICS_PVA_NAME_SERVERS=192.168.16.7 to avoid UDP entirely.
EPICS_PVA_AUTO_ADDR_LIST=NO EPICS_PVA_NAME_SERVERS=192.168.16.7 bin/linux-x86_64/pvxinfo idadm:calc

idadm:calc1 from 192.168.16.7:5075
struct "epics:nt/NTScalar:1.0" {
    double value
    struct "alarm_t" {
        int32_t severity
        int32_t status
        string message
    } alarm
    struct {
        int64_t secondsPastEpoch
        int32_t nanoseconds
        int32_t userTag
    } timeStamp
    struct {
        double limitLow
        double limitHigh
        string description
        string units
        int32_t precision
        struct "enum_t" {
            int32_t index
            string[] choices
        } form
    } display
    struct "control_t" {
        double limitLow
        double limitHigh
        double minStep
    } control
    struct "valueAlarm_t" {
        bool active
        double lowAlarmLimit
        double lowWarningLimit
        double highWarningLimit
        double highAlarmLimit
        int32_t lowAlarmSeverity
        int32_t lowWarningSeverity
        int32_t highWarningSeverity
        int32_t highAlarmSeverity
        int8_t hysteresis
    } valueAlarm
}

--> This works !

If none of these give any hints, could you repeat your test while running a packet capture on the gateway host?

eg. tshark -i lo -i bond0 -i bond3 -w capture.pcapng You could then transfer capture.pcapng to another host and filter only the pva traffic.

I have added a capture "raw-blc.pcapng.gz" here: https://e.pcloud.link/publink/show?code=XZrVHwZyTeMnXkRSs04l29KeBlks8CJlmW7

The attached screenshot from wireshark where I have opened the file above shows in the last line that the source IP of the reply UDP package is wrong.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.

I must admit that our routing table on the gateway host is a bit strange regarding the default route. However, I tested everything today on a different system where the default route is via 192.168.16.1 and this still doesn't work, I there see a similar problem. The source IP in the reply UDP packet is not the IP of the interface where the server part of the gateway is attached.

Thanks,

Goetz
routing-client.txt
routing-gateway.txt
Screenshot_2024-12-10_21-47-03

@mdavidsaver
Copy link
Member

mdavidsaver commented Dec 23, 2024

Something I should have asked earlier.

What all PVA peers (clients and servers) are running on the gateway host? Which module version(s) are involved?

If you are using P4P older than 4.0.0, please re-test with a newer version.

I have added a capture "raw-blc.pcapng.gz" ...

hmmm... The unicast search packets being re-send as local multicast (packets 62481 and 63761) do not look correct in a way which I don't think that PVXS is capable of. The "ORIGIN" address is 192.168.16.255 instead of the actual 192.168.16.42. Which tells me that some PVA peer using (probably) pvAccessCPP is receiving the unicast.

fyi. The local mulicast "hack" which lets PVA avoid the need for separate UDP ports like CA. It does depends on all peers on a system cooperating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants