rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next v2 0/4] Introduce BPF_MA_REUSE_AFTER_RCU_GP
@ 2023-04-08 14:18 Hou Tao
  2023-04-08 14:18 ` [RFC bpf-next v2 1/4] selftests/bpf: Add benchmark for bpf memory allocator Hou Tao
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Hou Tao @ 2023-04-08 14:18 UTC (permalink / raw)
  To: bpf, Martin KaFai Lau, Alexei Starovoitov
  Cc: Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Paul E . McKenney, rcu, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

As discussed in v1, currently the freed objects in bpf memory allocator
may be reused immediately by the new allocation, it introduces
use-after-bpf-ma-free problem for non-preallocated hash map and makes
lookup procedure return incorrect result. The immediate reuse also makes
introducing new use case more difficult (e.g. qp-trie).

The patch series tries to introduce BPF_MA_REUSE_AFTER_RCU_GP to solve
these problems. For BPF_MA_REUSE_AFTER_GP, the freed objects are reused
only after one RCU grace period and may be freed by bpf memory allocator
after another RCU-tasks-trace grace period. So for bpf programs which
care about reuse problem, these programs can use
bpf_rcu_read_{lock,unlock}() to access these freed objects safely and
for those which doesn't care, there will be safely use-after-bpf-ma-free
because these objects have not been freed by bpf memory allocator.

The current implementation is far from perfect, but I think it is ready
for get some feedbacks before putting in more effort. The implementation
mainly focus on how to speed up the transition from freed elements to
reusable elements and try to reduce the risk of OOM.

To accelerate the transition, it dynamically allocates rcu_head and call
call_rcu() in a kworker to do the transition. The frequency of call_rcu()
invocation could be improved by calling call_rcu() in irq work, but after
did that, I found the RCU grace period increased a lot and I still could
not figure out why. To reduce the risk of OOM, these reusable elements need
to be free as well, but we can not dynamically allocate rcu_head to do
that, because compared with RCU grace period RCU-tasks-trace grace
period is slower, so the freeing of reusable elements is just like the
freeing in normal bpf memory allocator, but these is one difference: for
BPF_MA_REUSE_AFTER_GP bpf ma these freeing elements are still available
for reuse in unit_alloc(). Please see individual patches for more details.

Comments and suggestions are always welcome.

Change Log:
v2:
 * add a benchmark for bpf memory allocator to compare between different
   flavor of bpf memory allocator.
 * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator.
v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/

Hou Tao (4):
  selftests/bpf: Add benchmark for bpf memory allocator
  bpf: Factor out a common helper free_all()
  bpf: Pass bitwise flags to bpf_mem_alloc_init()
  bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

 include/linux/bpf_mem_alloc.h                 |   9 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/cpumask.c                          |   2 +-
 kernel/bpf/hashtab.c                          |   5 +-
 kernel/bpf/memalloc.c                         | 390 ++++++++++++++++--
 tools/testing/selftests/bpf/Makefile          |   3 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_htab_mem.c     | 273 ++++++++++++
 .../selftests/bpf/progs/htab_mem_bench.c      | 145 +++++++
 9 files changed, 785 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-04-28  6:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-08 14:18 [RFC bpf-next v2 0/4] Introduce BPF_MA_REUSE_AFTER_RCU_GP Hou Tao
2023-04-08 14:18 ` [RFC bpf-next v2 1/4] selftests/bpf: Add benchmark for bpf memory allocator Hou Tao
2023-04-22  2:59   ` Alexei Starovoitov
2023-04-23  1:55     ` Hou Tao
2023-04-27  4:20       ` Alexei Starovoitov
2023-04-27 13:46         ` Paul E. McKenney
2023-04-28  6:13           ` Hou Tao
2023-04-28  2:16         ` Hou Tao
2023-04-23  8:03     ` Hou Tao
2023-04-08 14:18 ` [RFC bpf-next v2 2/4] bpf: Factor out a common helper free_all() Hou Tao
2023-04-08 14:18 ` [RFC bpf-next v2 3/4] bpf: Pass bitwise flags to bpf_mem_alloc_init() Hou Tao
2023-04-08 14:18 ` [RFC bpf-next v2 4/4] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP Hou Tao
2023-04-22  3:12   ` Alexei Starovoitov
2023-04-23  7:41     ` Hou Tao
2023-04-27  4:24       ` Alexei Starovoitov
2023-04-28  2:24         ` Hou Tao
2023-04-21  6:23 ` [RFC bpf-next v2 0/4] " Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).