bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hou Tao <houtao@huaweicloud.com>
To: bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@linux.dev>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	rcu@vger.kernel.org, houtao1@huawei.com
Subject: [RFC PATCH bpf-next v4 0/3] Handle immediate reuse in bpf memory allocator
Date: Tue,  6 Jun 2023 11:53:07 +0800	[thread overview]
Message-ID: <20230606035310.4026145-1-houtao@huaweicloud.com> (raw)

From: Hou Tao <houtao1@huawei.com>

Hi,

The implementation of v4 is mainly based on suggestions from Alexi [0].
There are still pending problems for the current implementation as shown
in the benchmark result in patch #3, but there was a long time from the
posting of v3, so posting v4 here for further disscussions and more
suggestions.

The first problem is the huge memory usage compared with bpf memory
allocator which does immediate reuse:

htab-mem-benchmark (reuse-after-RCU-GP):
| name               | loop (k/s)| average memory (MiB)| peak memory (MiB)|
| --                 | --        | --                  | --               |
| no_op              | 1159.18   | 0.99                | 0.99             |
| overwrite          | 11.00     | 2288                | 4109             |
| batch_add_batch_del| 8.86      | 1558                | 2763             |
| add_del_on_diff_cpu| 4.74      | 11.39               | 14.77            |

htab-mem-benchmark (immediate-reuse):
| name               | loop (k/s)| average memory (MiB)| peak memory (MiB)|
| --                 | --        | --                  | --               |
| no_op              | 1160.66   | 0.99                | 1.00             |
| overwrite          | 28.52     | 2.46                | 2.73             |
| batch_add_batch_del| 11.50     | 2.69                | 2.95             |
| add_del_on_diff_cpu| 3.75      | 15.85               | 24.24            |

It seems the direct reason is the slow RCU grace period. During
benchmark, the elapsed time when reuse_rcu() callback is called is about
100ms or even more (e.g., 2 seconds). I suspect the global per-bpf-ma
spin-lock and the irq-work running in the contex of freeing process will
increase the running overhead of bpf program, the running time of
getpgid() is increased, the contex switch is slowed down and the RCU
grace period increases [1], but I am still diggin into it.

Another problem is the performance degradation compared with immediate
reuse and the output from perf report shown the per-bpf-ma spin-lock is a
top-one hotspot:

map_perf_test (reuse-after-RCU-GP)
0:hash_map_perf kmalloc 194677 events per sec

map_perf_test (immediate reuse)
2:hash_map_perf kmalloc 384527 events per sec

Considering the purpose of introducing per-bpf-ma reusable list is to
handle the case in which the allocation and free are done on different
CPUs (e.g., add_del_on_diff_cpu) and a per-cpu reuse list will be enough
for overwrite & batch_add_batch_del cases. So maybe we could implement a
hybrid of global reusable list and per-cpu reusable list and switch
between these two kinds of list according to the history of allocation
and free frequency.

As ususal, suggestions and comments are always welcome.

[0]: https://lore.kernel.org/bpf/20230503184841.6mmvdusr3rxiabmu@MacBook-Pro-6.local
[1]: https://lore.kernel.org/bpf/1b64fc4e-d92e-de2f-4895-2e0c36427425@huaweicloud.com

Change Log:
v4:
 * no kworker (Alexei)
 * Use a global reusable list in bpf memory allocator (Alexei)
 * Remove BPF_MA_FREE_AFTER_RCU_GP flag and do reuse-after-rcu-gp
   defaultly in bpf memory allocator (Alexei)
 * add benchmark results from map_perf_test (Alexei)

v3: https://lore.kernel.org/bpf/20230429101215.111262-1-houtao@huaweicloud.com/
 * add BPF_MA_FREE_AFTER_RCU_GP bpf memory allocator
 * Update htab memory benchmark
   * move the benchmark patch to the last patch
   * remove array and useless bpf_map_lookup_elem(&array, ...) in bpf
     programs
   * add synchronization between addition CPU and deletion CPU for
     add_del_on_diff_cpu case to prevent unnecessary loop
   * add the benchmark result for "extra call_rcu + bpf ma"

v2: https://lore.kernel.org/bpf/20230408141846.1878768-1-houtao@huaweicloud.com/
 * add a benchmark for bpf memory allocator to compare between different
   flavor of bpf memory allocator.
 * implement BPF_MA_REUSE_AFTER_RCU_GP for bpf memory allocator.

v1: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/

Hou Tao (3):
  bpf: Factor out a common helper free_all()
  selftests/bpf: Add benchmark for bpf memory allocator
  bpf: Only reuse after one RCU GP in bpf memory allocator

 include/linux/bpf_mem_alloc.h                 |   4 +
 kernel/bpf/memalloc.c                         | 385 ++++++++++++------
 tools/testing/selftests/bpf/Makefile          |   3 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_htab_mem.c     | 352 ++++++++++++++++
 .../bpf/benchs/run_bench_htab_mem.sh          |  42 ++
 .../selftests/bpf/progs/htab_mem_bench.c      | 135 ++++++
 7 files changed, 809 insertions(+), 116 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_htab_mem.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh
 create mode 100644 tools/testing/selftests/bpf/progs/htab_mem_bench.c

-- 
2.29.2


             reply	other threads:[~2023-06-06  3:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06  3:53 Hou Tao [this message]
2023-06-06  3:53 ` [RFC PATCH bpf-next v4 1/3] bpf: Factor out a common helper free_all() Hou Tao
2023-06-06  3:53 ` [RFC PATCH bpf-next v4 2/3] selftests/bpf: Add benchmark for bpf memory allocator Hou Tao
2023-06-06 21:13   ` Alexei Starovoitov
2023-06-07  1:32     ` Hou Tao
2023-06-06  3:53 ` [RFC PATCH bpf-next v4 3/3] bpf: Only reuse after one RCU GP in " Hou Tao
2023-06-06 12:30 ` [RFC PATCH bpf-next v4 0/3] Handle immediate reuse " Hou Tao
2023-06-06 21:04   ` Alexei Starovoitov
2023-06-07  1:19     ` Hou Tao
2023-06-07  1:39       ` Alexei Starovoitov
2023-06-07  7:56         ` Hou Tao
2023-06-07  8:42     ` Hou Tao
2023-06-07 17:52       ` Alexei Starovoitov
2023-06-07 20:50         ` Alexei Starovoitov
2023-06-07 23:23           ` Alexei Starovoitov
2023-06-07 23:30             ` Paul E. McKenney
2023-06-07 23:50               ` Alexei Starovoitov
2023-06-08  0:13                 ` Paul E. McKenney
2023-06-08  0:34                   ` Alexei Starovoitov
2023-06-08  1:15                     ` Paul E. McKenney
2023-06-08  3:35                     ` Hou Tao
2023-06-08  4:30                       ` Hou Tao
2023-06-08  4:30                       ` Alexei Starovoitov
2023-06-08  1:57             ` Hou Tao
2023-06-08  1:51           ` Hou Tao
2023-06-08  2:55             ` Paul E. McKenney
2023-06-08  3:43               ` Hou Tao
2023-06-08 16:18                 ` Paul E. McKenney
2023-06-09  3:02                   ` Hou Tao
2023-06-09 14:30                     ` Paul E. McKenney
2023-06-12  2:03                       ` Hou Tao
2023-06-12  3:40                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230606035310.4026145-1-houtao@huaweicloud.com \
    --to=houtao@huaweicloud.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).