bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Yonghong Song <yhs@meta.com>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
	Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	rcu@vger.kernel.org, Hou Tao <houtao1@huawei.com>,
	Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc
Date: Wed, 22 Feb 2023 11:30:41 -0800	[thread overview]
Message-ID: <CAADnVQLg+WHzym=SC0KF0uzWw0J7ADjABBdZ9QDepdAT0z7V-g@mail.gmail.com> (raw)
In-Reply-To: <2a58c4a8-781f-6d84-e72a-f8b7117762b4@huaweicloud.com>

On Thu, Feb 16, 2023 at 5:19 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 2/17/2023 12:35 AM, Alexei Starovoitov wrote:
> > On Thu, Feb 16, 2023 at 5:55 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> Beside BPF_REUSE_AFTER_RCU_GP, is BPF_FREE_AFTER_RCU_GP a feasible solution ?
> > The idea is for bpf_mem_free to wait normal RCU GP before adding
> > the elements back to the free list and free the elem to global kernel memory
> > only after both rcu and rcu_tasks_trace GPs as it's doing now.
> >
> >> Its downside is that it will enforce sleep-able program to use
> >> bpf_rcu_read_{lock,unlock}() to access these returned pointers ?
> > sleepable can access elems without kptrs/spin_locks
> > even when not using rcu_read_lock, since it's safe, but there is uaf.
> > Some progs might be fine with it.
> > When sleepable needs to avoid uaf they will use bpf_rcu_read_lock.
> Thanks for the explanation for BPF_REUSE_AFTER_RCU_GP. It seems that
> BPF_REUSE_AFTER_RCU_GP may incur OOM easily, because before the expiration of
> one RCU GP, these freed elements will not available to both bpf ma or slab
> subsystem and after the expiration of RCU GP, these freed elements are only
> available for one bpf ma but the number of these freed elements maybe too many
> for one bpf ma, so part of these freed elements will be freed through
> call_rcu_tasks_trace() and these freed-again elements will not be available for
> slab subsystem untill the expiration of tasks trace RCU. In brief, after one RCU
> GP, part of these freed elements will be reused, but the majority of these
> elements will still be freed through call_rcu_tasks_trace(). Due to the doubt
> above, I proposed BPF_FREE_AFTER_RCU to directly free these elements after one
> RCU GP and enforce sleepable program to use bpf_rcu_read_lock() to access these
> elements, but the enforcement will break the existing sleepable programs, so
> BPF_FREE_AFTER_GP is still not a good idea. I will check whether or not these is
> still OOM risk for BPF_REUSE_AFTER_RCU_GP and try to mitigate if it is possible
> (e.g., share these freed elements between all bpf ma instead of one bpf ma which
> free it).

Since BPF_REUSE_AFTER_RCU_GP is a new thing that will be used
by qptrie map and maybe? local storage, there is no sleepable breakage.
If we start using BPF_REUSE_AFTER_RCU_GP for hashmaps with kptrs
and enforce bpf_rcu_read_lock() this is also ok, since kptrs are unstable.
I prefer to avoid complicating bpf ma with sharing free lists across all ma-s.
Unless this is really trivial code that is easy to review.

  reply	other threads:[~2023-02-22 19:31 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  4:11 [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 1/6] bpf: Support ctor in bpf memory allocator Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 2/6] bpf: Factor out a common helper free_llist() Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 3/6] bpf: Pass bitwise flags to bpf_mem_alloc_init() Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 4/6] bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 5/6] bpf: Use BPF_MA_NO_REUSE in htab map Hou Tao
2022-12-30  4:11 ` [RFC PATCH bpf-next 6/6] selftests/bpf: Add test case for element reuse " Hou Tao
2023-01-01  1:26 ` [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Alexei Starovoitov
2023-01-01 18:48   ` Yonghong Song
2023-01-03 13:47     ` Hou Tao
2023-01-04  6:10       ` Yonghong Song
2023-01-04  6:30         ` Hou Tao
2023-01-04  7:14           ` Yonghong Song
2023-01-04 18:26             ` Alexei Starovoitov
2023-02-10 16:32               ` Kumar Kartikeya Dwivedi
2023-02-10 21:06                 ` Alexei Starovoitov
2023-02-11  1:09                   ` Hou Tao
2023-02-11 16:33                     ` Alexei Starovoitov
2023-02-11 16:34                       ` Alexei Starovoitov
2023-02-15  1:54                         ` Martin KaFai Lau
2023-02-15  4:02                           ` Hou Tao
2023-02-15  7:22                             ` Martin KaFai Lau
2023-02-16  2:11                               ` Hou Tao
2023-02-16  7:47                                 ` Martin KaFai Lau
2023-02-16  8:18                                   ` Hou Tao
2023-02-16 13:55                         ` Hou Tao
2023-02-16 16:35                           ` Alexei Starovoitov
2023-02-17  1:19                             ` Hou Tao
2023-02-22 19:30                               ` Alexei Starovoitov [this message]
2023-02-15  2:35                       ` Hou Tao
2023-02-15  2:42                         ` Alexei Starovoitov
2023-02-15  3:00                           ` Hou Tao
2023-01-03 13:40   ` Hou Tao
2023-01-03 19:38     ` Alexei Starovoitov
2023-01-10  6:26       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQLg+WHzym=SC0KF0uzWw0J7ADjABBdZ9QDepdAT0z7V-g@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=houtao@huaweicloud.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).