Fixes for LPM trie #4710

kernel-patches-daemon-bpf-rc · 2024-11-27T00:39:43Z

Pull request for series with
subject: Fixes for LPM trie
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649

kernel-patches-daemon-bpf-rc · 2024-11-27T00:39:43Z

Upstream branch: 3448ad2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-11-28T21:48:47Z

Upstream branch: 3448ad2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-11-29T11:51:50Z

Upstream branch: 3448ad2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-11-29T12:05:33Z

Upstream branch: 3448ad2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-11-29T16:23:02Z

Upstream branch: 537a252
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-12-02T16:30:56Z

Upstream branch: 537a252
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-12-02T17:44:40Z

Upstream branch: 537a252
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-12-02T22:13:32Z

Upstream branch: 537a252
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

When "node->prefixlen == matchlen" is true, it means that the node is fully matched. If "node->prefixlen == key->prefixlen" is false, it means the prefix length of key is greater than the prefix length of node, otherwise, matchlen will not be equal with node->prefixlen. However, it also implies that the prefix length of node must be less than max_prefixlen. Therefore, "node->prefixlen == trie->max_prefixlen" will always be false when the check of "node->prefixlen == key->prefixlen" returns false. Remove this unnecessary comparison. Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Hou Tao <[email protected]> Acked-by: Daniel Borkmann <[email protected]>

There is no need to call kfree(im_node) when updating element fails, because im_node must be NULL. Remove the unnecessary kfree() for im_node. Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Hou Tao <[email protected]> Acked-by: Daniel Borkmann <[email protected]>

Add the currently missing handling for the BPF_EXIST and BPF_NOEXIST flags. These flags can be specified by users and are relevant since LPM trie supports exact matches during update. Fixes: b95a5c4 ("bpf: add a longest prefix match trie map implementation") Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Hou Tao <[email protected]> Acked-by: Daniel Borkmann <[email protected]>

When a LPM trie is full, in-place updates of existing elements incorrectly return -ENOSPC. Fix this by deferring the check of trie->n_entries. For new insertions, n_entries must not exceed max_entries. However, in-place updates are allowed even when the trie is full. Fixes: b95a5c4 ("bpf: add a longest prefix match trie map implementation") Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]>

trie_get_next_key() uses node->prefixlen == key->prefixlen to identify an exact match, However, it is incorrect because when the target key doesn't fully match the found node (e.g., node->prefixlen != matchlen), these two nodes may also have the same prefixlen. It will return expected result when the passed key exist in the trie. However when a recently-deleted key or nonexistent key is passed to trie_get_next_key(), it may skip keys and return incorrect result. Fix it by using node->prefixlen == matchlen to identify exact matches. When the condition is true after the search, it also implies node->prefixlen equals key->prefixlen, otherwise, the search would return NULL instead. Fixes: b471f2f ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Hou Tao <[email protected]>

Multiple syzbot warnings have been reported. These warnings are mainly about the lock order between trie->lock and kmalloc()'s internal lock. See report [1] as an example: ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted ------------------------------------------------------ syz.3.2069/15008 is trying to acquire lock: ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ... but task is already holding lock: ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ... which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&trie->lock){-.-.}-{2:2}: __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3a/0x60 trie_delete_elem+0xb0/0x820 ___bpf_prog_run+0x3e51/0xabd0 __bpf_prog_run32+0xc1/0x100 bpf_dispatcher_nop_func ...... bpf_trace_run2+0x231/0x590 __bpf_trace_contention_end+0xca/0x110 trace_contention_end.constprop.0+0xea/0x170 __pv_queued_spin_lock_slowpath+0x28e/0xcc0 pv_queued_spin_lock_slowpath queued_spin_lock_slowpath queued_spin_lock do_raw_spin_lock+0x210/0x2c0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x42/0x60 __put_partials+0xc3/0x170 qlink_free qlist_free_all+0x4e/0x140 kasan_quarantine_reduce+0x192/0x1e0 __kasan_slab_alloc+0x69/0x90 kasan_slab_alloc slab_post_alloc_hook slab_alloc_node kmem_cache_alloc_node_noprof+0x153/0x310 __alloc_skb+0x2b1/0x380 ...... -> #0 (&n->list_lock){-.-.}-{2:2}: check_prev_add check_prevs_add validate_chain __lock_acquire+0x2478/0x3b30 lock_acquire lock_acquire+0x1b1/0x560 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3a/0x60 get_partial_node.part.0+0x20/0x350 get_partial_node get_partial ___slab_alloc+0x65b/0x1870 __slab_alloc.constprop.0+0x56/0xb0 __slab_alloc_node slab_alloc_node __do_kmalloc_node __kmalloc_node_noprof+0x35c/0x440 kmalloc_node_noprof bpf_map_kmalloc_node+0x98/0x4a0 lpm_trie_node_alloc trie_update_elem+0x1ef/0xe00 bpf_map_update_value+0x2c1/0x6c0 map_update_elem+0x623/0x910 __sys_bpf+0x90c/0x49a0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trie->lock); lock(&n->list_lock); lock(&trie->lock); lock(&n->list_lock); *** DEADLOCK *** [1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7 A bpf program attached to trace_contention_end() triggers after acquiring &n->list_lock. The program invokes trie_delete_elem(), which then acquires trie->lock. However, it is possible that another process is invoking trie_update_elem(). trie_update_elem() will acquire trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke get_partial_node() and try to acquire &n->list_lock (not necessarily the same lock object). Therefore, lockdep warns about the circular locking dependency. Invoking kmalloc() before acquiring trie->lock could fix the warning. However, since BPF programs call be invoked from any context (e.g., through kprobe/tracepoint/fentry), there may still be lock ordering problems for internal locks in kmalloc() or trie->lock itself. To eliminate these potential lock ordering problems with kmalloc()'s internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent BPF memory allocator APIs that can be invoked in any context. The lock ordering problems with trie->lock (e.g., reentrance) will be handled separately. Two aspects of this change require explanation: 1. Intermediate and leaf nodes are allocated from the same allocator. The value size of LPM trie is usually small and only use one allocator reduces the memory overhead of BPF memory allocator. 2. nodes are freed before invoking spin_unlock_irqrestore(). Therefore, there is no need to add paired migrate_{disable|enable}() calls for these free operations. Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]>

After switching from kmalloc() to the bpf memory allocator, there will be no blocking operation during the update of LPM trie. Therefore, change trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in atomic context, even on RT kernels. The max value of prefixlen is 2048. Therefore, update or deletion operations will find the target after at most 2048 comparisons. Constructing a test case which updates an element after 2048 comparisons under a 8 CPU VM, and the average time and the maximal time for such update operation is about 210us and 900us. Signed-off-by: Hou Tao <[email protected]>

Move test_lpm_map.c to map_tests/ to include LPM trie test cases in regular test_maps run. Most code remains unchanged, including the use of assert(). Only reduce n_lookups from 64K to 512, which decreases test_lpm_map runtime from 37s to 0.7s. Signed-off-by: Hou Tao <[email protected]>

Add more test cases for LPM trie in test_maps: 1) test_lpm_trie_update_flags It constructs various use cases for BPF_EXIST and BPF_NOEXIST and check whether the return value of update operation is expected. 2) test_lpm_trie_update_full_maps It tests the update operations on a full LPM trie map. Adding new node will fail and overwriting the value of existed node will succeed. 3) test_lpm_trie_iterate_strs and test_lpm_trie_iterate_ints There two test cases test whether the iteration through get_next_key is sorted and expected. These two test cases delete the minimal key after each iteration and check whether next iteration returns the second minimal key. The only difference between these two test cases is the former one saves strings in the LPM trie and the latter saves integers. Without the fix of get_next_key, these two cases will fail as shown below: test_lpm_trie_iterate_strs(1091):FAIL:iterate #2 got abc exp abS test_lpm_trie_iterate_ints(1142):FAIL:iterate #1 got 0x2 exp 0x1 Signed-off-by: Hou Tao <[email protected]>

kernel-patches-daemon-bpf-rc · 2024-12-03T00:59:24Z

Upstream branch: 537a252
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=912649
version: 2

kernel-patches-daemon-bpf-rc · 2024-12-03T01:44:21Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=912649 expired. Closing PR.

kernel-patches-daemon-bpf-rc bot added new V2 bpf V2-ci-pass labels Nov 27, 2024

kernel-patches-daemon-bpf-rc bot force-pushed the bpf_base branch from e9d46d9 to 8170504 Compare November 28, 2024 21:48

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from 0f9c7ba to 14126b1 Compare November 28, 2024 21:48

kernel-patches-daemon-bpf-rc bot added V2-ci-fail and removed V2-ci-pass labels Nov 28, 2024

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from 14126b1 to eabea7c Compare November 29, 2024 11:51

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from eabea7c to d833ebf Compare November 29, 2024 12:05

kernel-patches-daemon-bpf-rc bot added V2-ci-pass and removed V2-ci-fail labels Nov 29, 2024

kernel-patches-daemon-bpf-rc bot force-pushed the bpf_base branch from 8170504 to bf5cb90 Compare November 29, 2024 16:22

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from d833ebf to 1832d43 Compare November 29, 2024 16:23

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from 1832d43 to d56d904 Compare December 2, 2024 16:31

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from d56d904 to 8ccd467 Compare December 2, 2024 17:44

kernel-patches-daemon-bpf-rc bot added V2-ci-fail and removed V2-ci-pass labels Dec 2, 2024

kernel-patches-daemon-bpf-rc bot force-pushed the bpf_base branch from bf5cb90 to f8fc3a6 Compare December 2, 2024 22:12

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from 8ccd467 to 0dde750 Compare December 2, 2024 22:13

kernel-patches-daemon-bpf-rc bot added V2-ci-pass and removed V2-ci-fail labels Dec 3, 2024

kernel-patches-daemon-bpf-rc bot force-pushed the bpf_base branch from f8fc3a6 to a55dd53 Compare December 3, 2024 00:58

Hou Tao added 9 commits December 2, 2024 16:59

kernel-patches-daemon-bpf-rc bot force-pushed the series/910440=>bpf branch from 0dde750 to 0595def Compare December 3, 2024 00:59

kernel-patches-daemon-bpf-rc bot added changes-requested and removed new labels Dec 3, 2024

kernel-patches-daemon-bpf-rc bot closed this Dec 3, 2024

kernel-patches-daemon-bpf-rc bot deleted the series/910440=>bpf branch December 5, 2024 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for LPM trie #4710

Fixes for LPM trie #4710

kernel-patches-daemon-bpf-rc bot commented Nov 27, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 27, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 28, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 29, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 29, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 29, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 2, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 2, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 2, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 3, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 3, 2024

Fixes for LPM trie #4710

Fixes for LPM trie #4710

Conversation

kernel-patches-daemon-bpf-rc bot commented Nov 27, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 27, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 28, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 29, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 29, 2024

kernel-patches-daemon-bpf-rc bot commented Nov 29, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 2, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 2, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 2, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 3, 2024

kernel-patches-daemon-bpf-rc bot commented Dec 3, 2024