bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hou Tao <houtao@huaweicloud.com>
To: bpf@vger.kernel.org
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
	Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	rcu@vger.kernel.org, houtao1@huawei.com
Subject: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc
Date: Fri, 30 Dec 2022 12:11:45 +0800	[thread overview]
Message-ID: <20221230041151.1231169-1-houtao@huaweicloud.com> (raw)

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset tries to fix the problems found when checking how htab map
handles element reuse in bpf memory allocator. The immediate reuse of
freed elements may lead to two problems in htab map:

(1) reuse will reinitialize special fields (e.g., bpf_spin_lock) in
    htab map value and it may corrupt lookup procedure with BFP_F_LOCK
    flag which acquires bpf-spin-lock during value copying. The
    corruption of bpf-spin-lock may result in hard lock-up.
(2) lookup procedure may get incorrect map value if the found element is
    freed and then reused.

Because the type of htab map elements are the same, so problem #1 can be
fixed by supporting ctor in bpf memory allocator. The ctor initializes
these special fields in map element only when the map element is newly
allocated. If it is just a reused element, there will be no
reinitialization.

Problem #2 exists for both non-preallocated and preallocated htab map.
By adding seq in htab element, doing reuse check and retrying the
lookup procedure may be a feasible solution, but it will make the
lookup API being hard to use, because the user needs to check whether
the found element is reused or not and repeat the lookup procedure if it
is reused. A simpler solution would be just disabling freed elements
reuse and freeing these elements after lookup procedure ends.

In order to reduce the overhead of call_rcu_tasks_trace() for each freed
elements, freeing these elements in batch by moving these freed elements
into a global per-cpu free list firstly, then after the number of freed
elements reaches the threshold, these freed elements will be moved into
a dymaically allocated object and being freed by a global per-cpu worker
by calling call_rcu_tasks_trace().

Because the solution frees memory by allocating new memory, so if there
is no memory available, the global per-cpu worker will call
rcu_barrier_tasks_trace() to wait for the expiration of RCU grace period
and free these free elements which have been spliced into a temporary
list. And the newly freed elements will be freed after another round of
rcu_barrier_tasks_trace() if there is still no memory. Maybe need to
reserve some bpf_ma_free_batch to speed up the free. Now also doesn't
consider the scenario when RCU grace period is slow. Because these
newly-allocated memory (aka bpf_ma_free_batch) will be freed after the
expiration of RCU grace period, so if grace period is slow, there may be
too much bpf_ma_free_batch being allocated.

Aftering applying BPF_MA_NO_REUSE in htab map, the performance of
"./map_perf_test 4 18 8192" drops from 520K to 330K events per sec on
one CPU. It is a big performance degradation, so hope to get some
feedbacks on whether or not it is necessary and how to better fixing the
reuse problem in htab map (global allocated object may have the same
problems as htab map). Comments are always welcome.

Regards,
Hou

Hou Tao (6):
  bpf: Support ctor in bpf memory allocator
  bpf: Factor out a common helper free_llist()
  bpf: Pass bitwise flags to bpf_mem_alloc_init()
  bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator
  bpf: Use BPF_MA_NO_REUSE in htab map
  selftests/bpf: Add test case for element reuse in htab map

 include/linux/bpf_mem_alloc.h                 |  12 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/hashtab.c                          |  17 +-
 kernel/bpf/memalloc.c                         | 218 ++++++++++++++++--
 .../selftests/bpf/prog_tests/htab_reuse.c     | 111 +++++++++
 .../testing/selftests/bpf/progs/htab_reuse.c  |  19 ++
 6 files changed, 353 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_reuse.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_reuse.c

-- 
2.29.2


             reply	other threads:[~2022-12-30  4:12 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  4:11 Hou Tao [this message]
2022-12-30  4:11 ` [RFC PATCH bpf-next 1/6] bpf: Support ctor in bpf memory allocator Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 2/6] bpf: Factor out a common helper free_llist() Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 3/6] bpf: Pass bitwise flags to bpf_mem_alloc_init() Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 4/6] bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 5/6] bpf: Use BPF_MA_NO_REUSE in htab map Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 6/6] selftests/bpf: Add test case for element reuse " Hou Tao
2023-01-01  1:26 ` [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Alexei Starovoitov
2023-01-01 18:48   ` Yonghong Song
2023-01-03 13:47     ` Hou Tao
2023-01-04  6:10       ` Yonghong Song
2023-01-04  6:30         ` Hou Tao
2023-01-04  7:14           ` Yonghong Song
2023-01-04 18:26             ` Alexei Starovoitov
2023-02-10 16:32               ` Kumar Kartikeya Dwivedi
2023-02-10 21:06                 ` Alexei Starovoitov
2023-02-11  1:09                   ` Hou Tao
2023-02-11 16:33                     ` Alexei Starovoitov
2023-02-11 16:34                       ` Alexei Starovoitov
2023-02-15  1:54                         ` Martin KaFai Lau
2023-02-15  4:02                           ` Hou Tao
2023-02-15  7:22                             ` Martin KaFai Lau
2023-02-16  2:11                               ` Hou Tao
2023-02-16  7:47                                 ` Martin KaFai Lau
2023-02-16  8:18                                   ` Hou Tao
2023-02-16 13:55                         ` Hou Tao
2023-02-16 16:35                           ` Alexei Starovoitov
2023-02-17  1:19                             ` Hou Tao
2023-02-22 19:30                               ` Alexei Starovoitov
2023-02-15  2:35                       ` Hou Tao
2023-02-15  2:42                         ` Alexei Starovoitov
2023-02-15  3:00                           ` Hou Tao
2023-01-03 13:40   ` Hou Tao
2023-01-03 19:38     ` Alexei Starovoitov
2023-01-10  6:26       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221230041151.1231169-1-houtao@huaweicloud.com \
    --to=houtao@huaweicloud.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).