bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: daniel@iogearbox.net, andrii@kernel.org, void@manifault.com,
	houtao@huaweicloud.com, paulmck@kernel.org
Cc: tj@kernel.org, rcu@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v3 bpf-next 00/13] bpf: Introduce bpf_mem_cache_free_rcu().
Date: Tue, 27 Jun 2023 18:56:21 -0700	[thread overview]
Message-ID: <20230628015634.33193-1-alexei.starovoitov@gmail.com> (raw)

From: Alexei Starovoitov <ast@kernel.org>

v2->v3:
- dropped _tail optimization for free_by_rcu_ttrace
- new patch 5 to refactor inc/dec of c->active
- change 'draining' logic in patch 7
- add rcu_barrier in patch 12
- __llist_add-> llist_add(waiting_for_gp_ttrace) in patch 9 to fix race
- David's Ack in patch 13 and explanation that migrate_disable cannot be removed just yet.

v1->v2:
- Fixed race condition spotted by Hou. Patch 7.

v1:

Introduce bpf_mem_cache_free_rcu() that is similar to kfree_rcu except
the objects will go through an additional RCU tasks trace grace period
before being freed into slab.

Patches 1-9 - a bunch of prep work
Patch 10 - a patch from Paul that exports rcu_request_urgent_qs_task().
Patch 12 - the main bpf_mem_cache_free_rcu patch.
Patch 13 - use it in bpf_cpumask.

bpf_local_storage, bpf_obj_drop, qp-trie will be other users eventually.

With additional hack patch to htab that replaces bpf_mem_cache_free with bpf_mem_cache_free_rcu
the following are benchmark results:
- map_perf_test 4 8 16348 1000000
drops from 800k to 600k. Waiting for RCU GP makes objects cache cold.

- bench htab-mem -a -p 8
20% drop in performance and big increase in memory. From 3 Mbyte to 50 Mbyte. As expected.

- bench htab-mem -a -p 16 --use-case add_del_on_diff_cpu
Same performance and better memory consumption.
Before these patches this bench would OOM (with or without 'reuse after GP')
Patch 8 addresses the issue.

At the end the performance drop and additional memory consumption due to _rcu()
were expected and came out to be within reasonable margin.
Without Paul's patch 10 the memory consumption in 'bench htab-mem' is in Gbytes
which wouldn't be acceptable.

Patch 8 is a heuristic to address 'alloc on one cpu, free on another' issue.
It works well in practice. One can probably construct an artificial benchmark
to make heuristic ineffective, but we have to trade off performance, code complexity,
and memory consumption.

The life cycle of objects:
alloc: dequeue free_llist
free: enqeueu free_llist
free_llist above high watermark -> free_by_rcu_ttrace
free_rcu: enqueue free_by_rcu -> waiting_for_gp
after RCU GP waiting_for_gp -> free_by_rcu_ttrace
free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab

Alexei Starovoitov (12):
  bpf: Rename few bpf_mem_alloc fields.
  bpf: Simplify code of destroy_mem_alloc() with kmemdup().
  bpf: Let free_all() return the number of freed elements.
  bpf: Refactor alloc_bulk().
  bpf: Factor out inc/dec of active flag into helpers.
  bpf: Further refactor alloc_bulk().
  bpf: Change bpf_mem_cache draining process.
  bpf: Add a hint to allocated objects.
  bpf: Allow reuse from waiting_for_gp_ttrace list.
  selftests/bpf: Improve test coverage of bpf_mem_alloc.
  bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().
  bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.

Paul E. McKenney (1):
  rcu: Export rcu_request_urgent_qs_task()

 include/linux/bpf_mem_alloc.h                 |   2 +
 include/linux/rcutiny.h                       |   2 +
 include/linux/rcutree.h                       |   1 +
 kernel/bpf/cpumask.c                          |  20 +-
 kernel/bpf/memalloc.c                         | 341 +++++++++++++-----
 kernel/rcu/rcu.h                              |   2 -
 .../testing/selftests/bpf/progs/linked_list.c |   2 +-
 7 files changed, 261 insertions(+), 109 deletions(-)

-- 
2.34.1


             reply	other threads:[~2023-06-28  1:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28  1:56 Alexei Starovoitov [this message]
2023-06-28  1:56 ` [PATCH v3 bpf-next 01/13] bpf: Rename few bpf_mem_alloc fields Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 02/13] bpf: Simplify code of destroy_mem_alloc() with kmemdup() Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 03/13] bpf: Let free_all() return the number of freed elements Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 04/13] bpf: Refactor alloc_bulk() Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 05/13] bpf: Factor out inc/dec of active flag into helpers Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 06/13] bpf: Further refactor alloc_bulk() Alexei Starovoitov
     [not found]   ` <ZJxWR9SZ5lya+MN+@corigine.com>
2023-06-28 18:58     ` Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 07/13] bpf: Change bpf_mem_cache draining process Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 08/13] bpf: Add a hint to allocated objects Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 09/13] bpf: Allow reuse from waiting_for_gp_ttrace list Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 10/13] rcu: Export rcu_request_urgent_qs_task() Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 11/13] selftests/bpf: Improve test coverage of bpf_mem_alloc Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 12/13] bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu() Alexei Starovoitov
2023-06-28 17:57   ` Paul E. McKenney
2023-06-29  3:52     ` Alexei Starovoitov
2023-06-29 14:20       ` Paul E. McKenney
2023-06-29  2:24   ` Hou Tao
2023-06-29  3:42     ` Alexei Starovoitov
2023-06-29  4:31       ` Hou Tao
2023-07-06  3:25     ` Alexei Starovoitov
2023-06-28  1:56 ` [PATCH v3 bpf-next 13/13] bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230628015634.33193-1-alexei.starovoitov@gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=houtao@huaweicloud.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).