[RFC bpf-next v3 0/6] Handle immediate reuse in bpf memory allocator

* [RFC bpf-next v3 0/6] Handle immediate reuse in bpf memory allocator
@ 2023-04-29 10:12 Hou Tao
  2023-04-29 10:12 ` [RFC bpf-next v3 1/6] bpf: Factor out a common helper free_all() Hou Tao
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Hou Tao @ 2023-04-29 10:12 UTC (permalink / raw)
  To: bpf, Martin KaFai Lau, Alexei Starovoitov
  Cc: Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Paul E . McKenney, rcu, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

As discussed in v1, currently the freed objects in bpf memory allocator
may be reused immediately by the new allocation, it introduces
use-after-bpf-ma-free problem for non-preallocated hash map and makes
lookup procedure return incorrect result. The immediate reuse also makes
introducing new use case more difficult (e.g. qp-trie).

The patch series tries to solve these problems by introducing
BPF_MA_{REUSE|FREE}_AFTER_RCU_GP in bpf memory allocator. For
REUSE_AFTER_GP, the freed objects are reused only after one RCU grace
period and may be freed by bpf memory allocator after another
RCU-tasks-trace grace period. So for bpf programs which care about reuse
problem, these programs can use bpf_rcu_read_{lock,unlock}() to access
these objects safely and for those which doesn't care, there will be
safely use-after-bpf-ma-free because these objects have not been freed
by bpf memory allocator. FREE_AFTER_GP behavior differently. Instead of
making the freed elements being reusable after one RCU GP, it directly
freed these elements back to slab after one RCU GP, so sleepable bpf
program must use bpf_rcu_read_{lock,unlock}() to access elements
allocated from FREE_AFTER_GP bpf memory allocator.

Personally I prefer FREE_AFTER_RCU_GP because its implementation is much
simpler compared with REUSE_AFTER_RCU and its memory usage is also better
than REUSE_AFTER_GP. But its shortcoming is also obvious, so I want to get
some feedback before putting in more effort. As usual, comments and
suggestions are always welcome.

Change Log:
v3:
 * add BPF_MA_FREE_AFTER_RCU_GP bpf memory allocator
 * Update htab memory benchmark
   * move the benchmark patch to the last patch
   * remove array and useless bpf_map_lookup_elem(&array, ...) in bpf
     programs
   * add synchronization between addition CPU and deletion CPU for
     add_del_on_diff_cpu case to prevent unnecessary loop
   * add the benchmark result for "extra call_rcu + bpf ma"

v2: https://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/
 * add a benchmark for bpf memory allocator to compare between different
   flavor of bpf memory allocator.
 * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator.
v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/

Hou Tao (6):
  bpf: Factor out a common helper free_all()
  bpf: Pass bitwise flags to bpf_mem_alloc_init()
  bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP
  bpf: Introduce BPF_MA_FREE_AFTER_RCU_GP
  bpf: Add two module parameters in htab for memory benchmark
  selftests/bpf: Add benchmark for bpf memory allocator

 include/linux/bpf_mem_alloc.h                 |  10 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/cpumask.c                          |   2 +-
 kernel/bpf/hashtab.c                          |  43 +-
 kernel/bpf/memalloc.c                         | 529 ++++++++++++++++--
 tools/testing/selftests/bpf/Makefile          |   3 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_htab_mem.c     | 352 ++++++++++++
 .../bpf/benchs/run_bench_htab_mem.sh          |  64 +++
 .../selftests/bpf/progs/htab_mem_bench.c      | 135 +++++
 10 files changed, 1090 insertions(+), 54 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh
 create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c

-- 
2.29.2

^ permalink raw reply	[flat|nested] 20+ messages in thread