Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route multicast events (with nl_pid > 0) are ignored #218

Closed
acheronfail opened this issue Jun 26, 2023 · 7 comments · Fixed by #209
Closed

Route multicast events (with nl_pid > 0) are ignored #218

acheronfail opened this issue Jun 26, 2023 · 7 comments · Fixed by #209
Assignees
Labels

Comments

@acheronfail
Copy link
Contributor

acheronfail commented Jun 26, 2023

I'm trying to make something that functions similar to nl-monitor ipv4-ifaddr with neli, but I'm struggling to get anywhere with it.

This is my current attempt:

let (socket, mut multicast) =
    NlRouter::connect(NlFamily::Route, None, Groups::empty()).unwrap();

socket
    .add_mcast_membership(Groups::new_groups(&[RTNLGRP_IPV4_IFADDR]))
    .unwrap();

match multicast.next() {
    None => unreachable!(),
    Some(response) => {
        dbg!(response).unwrap();
    }
}

As far as I can tell, this should be reporting events any time an ipv4 address on the machine changes, but I get no output at all with this setup. I've been looking at how nl-monitor works, and comparing that to neli and they look very similar here, so I'm not sure what's different...

Any chance you might know where I should look next? 🙏

@acheronfail
Copy link
Contributor Author

acheronfail commented Jul 1, 2023

Just used strace and I can see that messages are coming through, it's just that multicast.next() never returns.

recvfrom(9, [
    {
        nlmsg_len=80,
        nlmsg_type=RTM_NEWADDR,
        nlmsg_flags=0,
        nlmsg_seq=1688203532,
        nlmsg_pid=8187
    },
    {
        ifa_family=AF_INET,
        ifa_prefixlen=32,
        ifa_flags=IFA_F_PERMANENT,
        ifa_scope=RT_SCOPE_UNIVERSE,
        ifa_index=if_nametoindex("wlan0")
    },
    [
        [
            {
                nla_len=8,
                nla_type=IFA_ADDRESS
            },
            inet_addr("10.0.0.254")
        ],
        [
            {
                nla_len=8,
                nla_type=IFA_LOCAL
            },
            inet_addr("10.0.0.254")
        ],
        [
            {
                nla_len=10,
                nla_type=IFA_LABEL
            },
            "wlan0"
        ],
        [
            {
                nla_len=8,
                nla_type=IFA_FLAGS
            },
            IFA_F_PERMANENT
        ],
        [
            {
                nla_len=20,
                nla_type=IFA_CACHEINFO
            },
            {
                ifa_prefered=4294967295,
                ifa_valid=4294967295,
                cstamp=38685,
                tstamp=38685
            }
        ]
    ]
], 32768, 0, NULL, NULL) = 80

Is this because I need a specific type before neli will parse it and return it? If so, is there a way to always return any message? (Or alternatively, how do I find out the correct neli type?)

@acheronfail
Copy link
Contributor Author

acheronfail commented Jul 1, 2023

I'll update this with any further findings...

Debugging Tips

  • Alright, I just discovered that I can initialise a logger and use RUST_LOG=trace for more debug information about what neli is doing.
  • I've also built with RUSTFLAGS=-g so debug symbols are included and I can debug/step-through neli with a debugger.

Confusing things...

Findings

My current findings are that the senders collection here is always empty, and the message's pid is non-zero, and so although neli receives and parses the message, it doesn't send it back to the multicast receiver in my example.

This is a combination of nl_pid != 0 and also nl_seq = <some very high number, like 1688206355>.

So, for some reason - messages received here don't have nl_pid = 0. This means that neli doesn't forward those events to the multicast_receiver, because it seems to only forward events with nl_pid = 0 to the multicast_receiver. Any other message received on the socket is simply ignored and dropped.

Is this ignoring of events intended behaviour?

@acheronfail
Copy link
Contributor Author

acheronfail commented Jul 1, 2023

I can confirm that nl-monitor ipv4-ifaddr also receives events on its equivalent of a multicast received with nl_pid != 0 and nl_seq = <random high number>. No wait, I was confused! strace -ff nl-monitor ipv4-ifaddr shows that these messages are received with nl_pid == 0 - so something mustn't be set right with neli...

I've changed the title - I think this should be updated to a feature perhaps? (No permissions to change label...)

EDIT:

I created #219 in an attempt to fix this.

@acheronfail acheronfail changed the title Need a little help receiving netlink route multicast messages NlRouter seems to ignore multicast events when nl_pid != 0 Jul 1, 2023
@acheronfail acheronfail changed the title NlRouter seems to ignore multicast events when nl_pid != 0 nl_pid is not 0 when receiving NlFamily::Route multicast events Jul 1, 2023
@acheronfail acheronfail changed the title nl_pid is not 0 when receiving NlFamily::Route multicast events Multicast events with nl_pid > 0 are ignored Jul 1, 2023
@acheronfail
Copy link
Contributor Author

acheronfail commented Jul 1, 2023

Alright, the whole thing is working (provided my fork that's in #219 is used).

Here's some sample code:

// setup socket for netlink route
let (socket, mut multicast) =
    NlRouter::connect(NlFamily::Route, None, Groups::empty()).unwrap();

// add multicast membership for ipv4-addr updates
socket
    .add_mcast_membership(Groups::new_groups(&[RTNLGRP_IPV4_IFADDR]))
    .unwrap();

// listen for multicast events
// NOTE: currently requires the changes here: https://github.com/jbaublitz/neli/pull/219
type Next = Option<Result<Nlmsghdr<u16, Ifaddrmsg>, RouterError<u16, Ifaddrmsg>>>;
match multicast.next_typed::<u16, Ifaddrmsg>() as Next {
    None => todo!(),
    // we got a multicast message
    Some(response) => {
        // if there are errors on the multicast channel, they'll be here in this result
        let response = response.unwrap();
        // get message payload
        let ifaddr_msg = response.get_payload().unwrap();
        // get a handle to the message's rt attributes
        let rt_attrs_handle = ifaddr_msg.rtattrs().get_attr_handle();
        // get the address attribute
        let addr_attr = rt_attrs_handle.get_attribute(Ifa::Address).unwrap();
        // convert the raw bytes from the attribute into an `Ipv4Addr` struct
        let bytes: &[u8] = addr_attr.rta_payload().as_ref();
        let bytes: &[u8; 4] = bytes.try_into().unwrap();
        let ipv4 = Ipv4Addr::from(*bytes);
        // 🎉 we did it!
        dbg!(ipv4);
    }
}

I'm leaving this issue open as the tracking issue for ignored multicast events.

@acheronfail acheronfail changed the title Multicast events with nl_pid > 0 are ignored Route multicast events (with nl_pid > 0) are ignored Jul 2, 2023
@acheronfail
Copy link
Contributor Author

acheronfail commented Jul 2, 2023

Wait - I'm so sorry for all the spam 😅 - after looking at this again I seem to have completely glossed over the fact that nl-monitor receives events with nl_pid == 0 but when I try with neli I get events with nl_pid > 0!

So, the PR I created is probably bogus - these should be multicast events... but why don't the events come through with nl_pid == 0 when I subscribe with neli??? This has got me so confused...

Again, this issue is now more or less a diary of my experience learning about netlink 😅.

I think I was originally right, actually. The thing that confused me is reading strace's output of the recvmsg calls:

recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000010}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=80, nlmsg_type=RTM_DELADDR, nlmsg_flags=0, nlmsg_seq=1688293504, nlmsg_pid=19010}, {ifa_family=AF_INET, ifa_prefixlen=32, ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("wlan0")}, [[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("10.0.0.254")], [{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("10.0.0.254")], [{nla_len=10, nla_type=IFA_LABEL}, "wlan0"], [{nla_len=8, nla_type=IFA_FLAGS}, IFA_F_PERMANENT], [{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=142503, tstamp=142503}]]], iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 80

I was confusing the first part of the recvmsg call...

{msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000010}

...thinking that contained the actual netlink message header - but it doesn't! The actual header is this part:

{nlmsg_len=80, nlmsg_type=RTM_DELADDR, nlmsg_flags=0, nlmsg_seq=1688293504, nlmsg_pid=19010}

So, I believe my previous comments about multicast messages with nl_pid > 0 are correct.

@jbaublitz
Copy link
Owner

@acheronfail Can you test #209 and let me know if that resolves the issue. Someone else suggested that I use recvfrom instead of recv to determine whether a message is coming from a netlink multicast group or not. Based on my initial testing, it seems to resolve the problem of heuristics. Can you please confirm that it resolves your issue too?

@acheronfail
Copy link
Contributor Author

Ah yes! Thank you so much, I was going around in circle so many times 😅

I can confirm that works for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants