From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mauricio Vasquez B Subject: [RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps Date: Fri, 31 Aug 2018 23:25:48 +0200 Message-ID: <153575074884.30050.17670029209466860207.stgit@kernel> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Return-path: Received: from fm1nodo5.polito.it ([130.192.180.18]:44597 "EHLO fm1nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727240AbeIABfX (ORCPT ); Fri, 31 Aug 2018 21:35:23 -0400 Sender: netdev-owner@vger.kernel.org List-ID: In some applications this is needed have a pool of free elements, like for example the list of free L4 ports in a SNAT. None of the current maps allow to do it as it is not possibleto get an any element without having they key it is associated to. This patchset implements two new kind of eBPF maps: queue and stack. Those maps provide to eBPF programs the peek, push and pop operations, and for userspace applications a new bpf_map_lookup_and_delete_elem() is added. Signed-off-by: Mauricio Vasquez B --- I am sending this as an RFC because there is still an issue I am not sure how to solve. The queue/stack maps have a linked list for saving the nodes, and a preallocation schema based on the pcpu_freelist already implemented and used in the htabmap. Each time an element is pushed into the map, a node from the pcpu_freelist is taken and then added to the linked list. The pop operation takes and *removes* the first node from the linked list, then it uses *call_rcu* to postpose freeing the node, i.e, the node is only returned to the pcpu_freelist when the rcu callback is executed. This is needed because an element returned by the pop() operation should remain valid for the whole duration of the eBPF program. The problem is that elements are not immediately returned to the free list, so in some cases the push operation could fail because there are not free nodes in the pcpu_freelist. The following code snippet exposes that problem. ... /* Push MAP_SIZE elements */ for (i = 0; i < MAP_SIZE; i++) assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0); /* Pop all elements */ for (i = 0; i < MAP_SIZE; i++) assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 && val == vals[i]); // sleep(1) <-- If I put this sleep, everything works. /* Push MAP_SIZE elements */ for (i = 0; i < MAP_SIZE; i++) assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0); ^^^ This fails because there are not available elements in pcpu_freelist ... I think a possible solution is to oversize the pcpu_freelist (no idea by how much, maybe double or, or make it 1.5 time the max elements in the map?) I also have concerns about it, it would waste that memory in many cases and this is also probably that it doesn't solve the issue because that code snippet is puhsing and popping elements too fast, so even if the pcpu_freelist is much large a certain time instant all the elements could be used. Is this really an important issue? Any idea of how to solve it? Thanks, --- Mauricio Vasquez B (4): bpf: add bpf queue and stack maps bpf: restrict use of peek/push/pop Sync uapi/bpf.h to tools/include selftests/bpf: add test cases for queue and stack maps include/linux/bpf.h | 8 include/linux/bpf_types.h | 2 include/uapi/linux/bpf.h | 36 ++ kernel/bpf/Makefile | 2 kernel/bpf/helpers.c | 44 ++ kernel/bpf/queue_stack_maps.c | 353 ++++++++++++++++++++ kernel/bpf/syscall.c | 96 +++++ kernel/bpf/verifier.c | 20 + net/core/filter.c | 6 tools/include/uapi/linux/bpf.h | 36 ++ tools/lib/bpf/bpf.c | 12 + tools/lib/bpf/bpf.h | 1 tools/testing/selftests/bpf/Makefile | 2 tools/testing/selftests/bpf/bpf_helpers.h | 7 tools/testing/selftests/bpf/test_maps.c | 101 ++++++ tools/testing/selftests/bpf/test_progs.c | 99 ++++++ tools/testing/selftests/bpf/test_queue_map.c | 4 tools/testing/selftests/bpf/test_queue_stack_map.h | 59 +++ tools/testing/selftests/bpf/test_stack_map.c | 4 19 files changed, 877 insertions(+), 15 deletions(-) create mode 100644 kernel/bpf/queue_stack_maps.c create mode 100644 tools/testing/selftests/bpf/test_queue_map.c create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h create mode 100644 tools/testing/selftests/bpf/test_stack_map.c