All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauricio Vasquez B <mauricio.vasquez@polito.it>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	netdev@vger.kernel.org
Subject: [RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps
Date: Fri, 31 Aug 2018 23:25:48 +0200	[thread overview]
Message-ID: <153575074884.30050.17670029209466860207.stgit@kernel> (raw)

In some applications this is needed have a pool of free elements, like for
example the list of free L4 ports in a SNAT.  None of the current maps allow
to do it as it is not possibleto get an any element without having they key
it is associated to.

This patchset implements two new kind of eBPF maps: queue and stack.
Those maps provide to eBPF programs the peek, push and pop operations, and for
userspace applications a new bpf_map_lookup_and_delete_elem() is added.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>

---

I am sending this as an RFC because there is still an issue I am not sure how
to solve.

The queue/stack maps have a linked list for saving the nodes, and a
preallocation schema based on the pcpu_freelist already implemented and used
in the htabmap.  Each time an element is pushed into the map, a node from the
pcpu_freelist is taken and then added to the linked list.

The pop operation takes and *removes* the first node from the linked list, then
it uses *call_rcu* to postpose freeing the node, i.e, the node is only returned
to the pcpu_freelist when the rcu callback is executed.  This is needed because
an element returned by the pop() operation should remain valid for the whole
duration of the eBPF program.

The problem is that elements are not immediately returned to the free list, so
in some cases the push operation could fail because there are not free nodes
in the pcpu_freelist.

The following code snippet exposes that problem.

...
	/* Push MAP_SIZE elements */
	for (i = 0; i < MAP_SIZE; i++)
		assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);

	/* Pop all elements */
	for (i = 0; i < MAP_SIZE; i++)
		assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 &&
		       val == vals[i]);

  // sleep(1) <-- If I put this sleep, everything works.
	/* Push MAP_SIZE elements */
	for (i = 0; i < MAP_SIZE; i++)
		assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);
           ^^^
           This fails because there are not available elements in pcpu_freelist
...

I think a possible solution is to oversize the pcpu_freelist (no idea by how
much, maybe double or, or make it 1.5 time the max elements in the map?)
I also have concerns about it, it would waste that memory in many cases and
this is also probably that it doesn't solve the issue because that code snippet
is puhsing and popping elements too fast, so even if the pcpu_freelist is much
large a certain time instant all the elements could be used.

Is this really an important issue?
Any idea of how to solve it?

Thanks,
---

Mauricio Vasquez B (4):
      bpf: add bpf queue and stack maps
      bpf: restrict use of peek/push/pop
      Sync uapi/bpf.h to tools/include
      selftests/bpf: add test cases for queue and stack maps


 include/linux/bpf.h                                |    8 
 include/linux/bpf_types.h                          |    2 
 include/uapi/linux/bpf.h                           |   36 ++
 kernel/bpf/Makefile                                |    2 
 kernel/bpf/helpers.c                               |   44 ++
 kernel/bpf/queue_stack_maps.c                      |  353 ++++++++++++++++++++
 kernel/bpf/syscall.c                               |   96 +++++
 kernel/bpf/verifier.c                              |   20 +
 net/core/filter.c                                  |    6 
 tools/include/uapi/linux/bpf.h                     |   36 ++
 tools/lib/bpf/bpf.c                                |   12 +
 tools/lib/bpf/bpf.h                                |    1 
 tools/testing/selftests/bpf/Makefile               |    2 
 tools/testing/selftests/bpf/bpf_helpers.h          |    7 
 tools/testing/selftests/bpf/test_maps.c            |  101 ++++++
 tools/testing/selftests/bpf/test_progs.c           |   99 ++++++
 tools/testing/selftests/bpf/test_queue_map.c       |    4 
 tools/testing/selftests/bpf/test_queue_stack_map.h |   59 +++
 tools/testing/selftests/bpf/test_stack_map.c       |    4 
 19 files changed, 877 insertions(+), 15 deletions(-)
 create mode 100644 kernel/bpf/queue_stack_maps.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_map.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h
 create mode 100644 tools/testing/selftests/bpf/test_stack_map.c

             reply	other threads:[~2018-09-01  1:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-31 21:25 Mauricio Vasquez B [this message]
2018-08-31 21:25 ` [RFC PATCH bpf-next v2 1/4] bpf: add bpf queue and stack maps Mauricio Vasquez B
2018-08-31 21:26 ` [RFC PATCH bpf-next v2 2/4] bpf: restrict use of peek/push/pop Mauricio Vasquez B
2018-09-07  0:13 ` [RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps Alexei Starovoitov
2018-09-07  6:27   ` Joe Stringer
2018-09-07 20:40   ` Mauricio Vasquez
2018-09-11  1:04 Alexei Starovoitov
2018-09-11 14:48 ` Mauricio Vasquez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153575074884.30050.17670029209466860207.stgit@kernel \
    --to=mauricio.vasquez@polito.it \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.