[RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps

* [RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps
@ 2018-08-31 21:25 Mauricio Vasquez B
  2018-08-31 21:25 ` [RFC PATCH bpf-next v2 1/4] bpf: add bpf queue and stack maps Mauricio Vasquez B
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mauricio Vasquez B @ 2018-08-31 21:25 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev

In some applications this is needed have a pool of free elements, like for
example the list of free L4 ports in a SNAT.  None of the current maps allow
to do it as it is not possibleto get an any element without having they key
it is associated to.

This patchset implements two new kind of eBPF maps: queue and stack.
Those maps provide to eBPF programs the peek, push and pop operations, and for
userspace applications a new bpf_map_lookup_and_delete_elem() is added.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>

---

I am sending this as an RFC because there is still an issue I am not sure how
to solve.

The queue/stack maps have a linked list for saving the nodes, and a
preallocation schema based on the pcpu_freelist already implemented and used
in the htabmap.  Each time an element is pushed into the map, a node from the
pcpu_freelist is taken and then added to the linked list.

The pop operation takes and *removes* the first node from the linked list, then
it uses *call_rcu* to postpose freeing the node, i.e, the node is only returned
to the pcpu_freelist when the rcu callback is executed.  This is needed because
an element returned by the pop() operation should remain valid for the whole
duration of the eBPF program.

The problem is that elements are not immediately returned to the free list, so
in some cases the push operation could fail because there are not free nodes
in the pcpu_freelist.

The following code snippet exposes that problem.

...
	/* Push MAP_SIZE elements */
	for (i = 0; i < MAP_SIZE; i++)
		assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);

	/* Pop all elements */
	for (i = 0; i < MAP_SIZE; i++)
		assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 &&
		       val == vals[i]);

  // sleep(1) <-- If I put this sleep, everything works.
	/* Push MAP_SIZE elements */
	for (i = 0; i < MAP_SIZE; i++)
		assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);
           ^^^
           This fails because there are not available elements in pcpu_freelist
...

I think a possible solution is to oversize the pcpu_freelist (no idea by how
much, maybe double or, or make it 1.5 time the max elements in the map?)
I also have concerns about it, it would waste that memory in many cases and
this is also probably that it doesn't solve the issue because that code snippet
is puhsing and popping elements too fast, so even if the pcpu_freelist is much
large a certain time instant all the elements could be used.

Is this really an important issue?
Any idea of how to solve it?

Thanks,
---

Mauricio Vasquez B (4):
      bpf: add bpf queue and stack maps
      bpf: restrict use of peek/push/pop
      Sync uapi/bpf.h to tools/include
      selftests/bpf: add test cases for queue and stack maps

 include/linux/bpf.h                                |    8 
 include/linux/bpf_types.h                          |    2 
 include/uapi/linux/bpf.h                           |   36 ++
 kernel/bpf/Makefile                                |    2 
 kernel/bpf/helpers.c                               |   44 ++
 kernel/bpf/queue_stack_maps.c                      |  353 ++++++++++++++++++++
 kernel/bpf/syscall.c                               |   96 +++++
 kernel/bpf/verifier.c                              |   20 +
 net/core/filter.c                                  |    6 
 tools/include/uapi/linux/bpf.h                     |   36 ++
 tools/lib/bpf/bpf.c                                |   12 +
 tools/lib/bpf/bpf.h                                |    1 
 tools/testing/selftests/bpf/Makefile               |    2 
 tools/testing/selftests/bpf/bpf_helpers.h          |    7 
 tools/testing/selftests/bpf/test_maps.c            |  101 ++++++
 tools/testing/selftests/bpf/test_progs.c           |   99 ++++++
 tools/testing/selftests/bpf/test_queue_map.c       |    4 
 tools/testing/selftests/bpf/test_queue_stack_map.h |   59 +++
 tools/testing/selftests/bpf/test_stack_map.c       |    4 
 19 files changed, 877 insertions(+), 15 deletions(-)
 create mode 100644 kernel/bpf/queue_stack_maps.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_map.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h
 create mode 100644 tools/testing/selftests/bpf/test_stack_map.c

^ permalink raw reply	[flat|nested] 8+ messages in thread