Skip to content
This repository has been archived by the owner on Oct 26, 2022. It is now read-only.

Support batch messages #170

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

stbuehler
Copy link
Contributor

This is motivated by netfilter; changes to netfilter must be done through a series of messages (started by NFNL_MSG_BATCH_BEGIN, ended by NFNL_MSG_BATCH_END). The full batch needs to be submitted to the kernel in one write/sendto/..., otherwise the kernel will abort the batch.

@cathay4t
Copy link
Collaborator

@stbuehler Thanks for the patch. Could you provide test or example code helping me understand how this works in upper level?

@stbuehler
Copy link
Contributor Author

Hi. I'm working on some netfilter code; I don't think it's ready to show yet.

But here is a small example. netfilter changes need to be done in a "batch" request; it starts with a BatchBegin and ends with a BatchEnd message. The kernel wants to read the end message in the same "buffer" as the begin message (I suppose so userspace can't block the "lock" held during a change). The batch is also locked to a certain netfilter subsystem.

let mut batch = handle.batch(SocketAddr::new(0, 0));
batch.notify(NetfilterMessage {
	header: NetfilterHeader {
		resource_id: (libc::NFNL_SUBSYS_NFTABLES as u16) * 256,
		..Default::default()
	},
	body: NetfilterBody::BatchBegin,
}.request_no_ack());
let c_table_response = batch.request(NetfilterMessage {
	header: NetfilterHeader {
		family: Family::INET,
		..Default::default()
	},
	body: NetfilterBody::NewTable(NftTable {
		name: Some(std::ffi::OsString::from("filter")),
		flags: Some(0),
		..Default::default()
	}),
}.create());
let c_table_set = batch.request(NetfilterMessage {
	header: NetfilterHeader {
		family: Family::INET,
		..Default::default()
	},
	body: NetfilterBody::NewSet(NftSet {
		table: Some(std::ffi::OsString::from("filter")),
		name: Some(std::ffi::OsString::from("allow")),
		key_type: Some(9), // "ether_addr" in userspace nft; kernel doesn't care
		key_len: Some(6),
		id: Some(1),
		flags: Some(libc::NFT_SET_TIMEOUT as u32),
		// userdata: b"\x00\x04\x02\x00\x00\x00",
		..Default::default()
	}),
}.create());
batch.notify(NetfilterMessage {
	header: NetfilterHeader {
		resource_id: (libc::NFNL_SUBSYS_NFTABLES as u16) * 256,
		..Default::default()
	},
	body: NetfilterBody::BatchEnd,
}.request_no_ack());
batch.send()?;

Copy link
Collaborator

@cathay4t cathay4t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. API changed, please increase version number.
2. As a lower level library, a single fault will require hours debuging when used in other tool, please at least include a unit test case to ensure batch message is encoding to expected u8 array and also decoding in expected way.

netlink-proto/src/protocol/request.rs Outdated Show resolved Hide resolved
@stbuehler stbuehler force-pushed the batch-messages branch 2 times, most recently from 3147601 to 7253106 Compare October 30, 2021 21:07
@stbuehler stbuehler force-pushed the batch-messages branch 2 times, most recently from 08def5c to 0cbe2e0 Compare November 7, 2021 15:00
@little-dude
Copy link
Owner

FWIW this is a feature I'd love to see supported. Thanks for working on this @stbuehler :)

@stbuehler stbuehler mentioned this pull request Nov 13, 2021
@little-dude
Copy link
Owner

Hey @stbuehler do you think you'll pursue this PR now that #195 is in? I'd love to see batch messages landing so I might pick it up next week if you're not planning to come back to it.

@stbuehler
Copy link
Contributor Author

Hey @stbuehler do you think you'll pursue this PR now that #195 is in? I'd love to see batch messages landing so I might pick it up next week if you're not planning to come back to it.

Yeah sorry, got... distracted. Quite late here, so I probably should take another look tomorrow, but let's see how this is looking now. (Also should test whether my netfilter code still works with this.)

@dzamlo dzamlo mentioned this pull request Nov 26, 2021
The definition of the `NLMSG_NEXT` macro in the kernel makes it clear
that netlink messages are padded to a multiple of 4 bytes; this padding
is not part of the `nlmsg_len` field of the message header.

Luckily `nlmsg_len` is usually already a multiple of 4 (as the netlink
attributes are also padded to multiple of 4 bytes, and fixed-length
parts at the beginning of messages are usually aligned too), so the
padding is usually empty.

Anyway, fix this:

1. Make sure encoded messages is padded to multiple of 4 bytes

Technically the last message in a datagram doesn't need padding (i.e.
the kernel doesn't require it), but it's easier to be consistent.

2. Skip padding in decoder

Similar to the kernel we ignore missing padding for now, which would be
real bad if the buffer is received via "stream" instead of "datagram",
and might include partial datagrams (i.e. where the padding is read
later and not yet in the buffer).

Netlink sockets are datagram sockets though, so this is fine.

Note: NetlinkAuditCodec might need similar changes, but audit is quite
special.
- inner `FramedIO` only generic over socket type, not codec / message
  type
- prepare for second `Sink` impl on `NetlinkFramed` with less code
  duplication
Also drop tuple conversions; preparing for a Batch variant.
Prepares for handling of batch messages.
This is motivated by netfilter; changes to netfilter must be done
through a series of messages (started by NFNL_MSG_BATCH_BEGIN, ended by
NFNL_MSG_BATCH_END).  The full batch needs to be submitted to the kernel
in one write/sendto/..., otherwise the kernel will abort the batch.

(And sending a batch without an END message is interpreted as a query to
verify the batch without actually committing it.)
@stbuehler
Copy link
Contributor Author

First of all: sorry, didn't work on this in a long time. I wasn't satisfied with how I split the changes into single commits, and I think this is looking better now.

Sadly I discovered a new design issue (I was working on similar things in python the last two weeks, which made me think about it again):
I think the handling of pending_requests seems dangerous (#138): basically all messages to the kernel require NLM_F_REQUEST, and not adding NLM_F_ACK means the pending_requests will stay around forever unless the request either returns an error or a "done" response (after possibly other messages). Imho this means you should always set NLM_F_ACK and hope the kernel honors the flag, just to make sure pending_requests is cleaned up.

Now regarding batch messages and pending_requests:

  • The real bad news is: neither NFNL_MSG_BATCH_BEGIN nor NFNL_MSG_BATCH_END respect NLM_F_ACK - i.e. the kernel will never send a message on success. A successful empty batch (i.e. just begin+end) returns no message at all... (and if the batch was aborted due to a failure of an inner message, the begin/end will still not fail...)
  • On the other hand even a NFNL_MSG_BATCH_END message can fail with an error (for example if it is missing the NLM_F_REQUEST), and NFNL_MSG_BATCH_BEGIN can fail due to lots of problems.
  • Some errors on single messages won't abort the full batch, but rather collect all errors (and ACKs) in all messages. But some errors (afaict: all BEGIN failures, OOM, ...) will abort the batch, which means there won't be any ACKs.
  • Afaict an ACK on a single message in the batch won't mean the batch actually got committed

Not sure how to handle this in a sane way. I think it will at least require a manual way to remove entries from pending_requests once we know they won't receive any messages anymore. And perhaps a flag whether a "response" message was the last one in a datagram - because I think we will receive all errors/acks for a batch in a single datagram; once we saw the last response in a datagram that matches a batch, we know we got all responses ("implicit ack").

Another way could be using the same sequence number for all messages in a batch, and collect all responses for it from a single datagram before dropping the pending_requests entry (instead of dropping it on the first ack). It would be a challenge to map errors to single messages of the batch, but otoh that is probably not important anyway - the batch should either succeed or fail completely, at least in a first iteration of the feature :)

@stbuehler
Copy link
Contributor Author

Just checked and it doesn't seem like we get the errors in one go; so that is not an option. Next idea: use a dummy request (with a simple response) after a batch: as the kernel handles the messages in order the response to the dummy request marks the end of all errors for the batch. Still need a way to stop waiting for sequence ids so we can detect problems with the BATCH_END message.

@cathay4t
Copy link
Collaborator

cathay4t commented May 7, 2022

@stbuehler This PR in draft mode for a while. Is it ready for review?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants