All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Delyan Kratunov <delyank@fb.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
	"joannelkoong@gmail.com" <joannelkoong@gmail.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"tj@kernel.org" <tj@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	"alexei.starovoitov@gmail.com" <alexei.starovoitov@gmail.com>,
	Kernel Team <Kernel-team@fb.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator, UAPI in particular
Date: Tue, 30 Aug 2022 02:17:12 +0200	[thread overview]
Message-ID: <CAP01T75LPynUfJmHG=Q6jPZxMOO5T-0TRT3rF6h5Ai=OJ2omBA@mail.gmail.com> (raw)
In-Reply-To: <5254e00f409cff1e0b8655aeb673b8f77dab21fe.camel@fb.com>

On Tue, 30 Aug 2022 at 01:18, Delyan Kratunov <delyank@fb.com> wrote:
>
> > [...]
> >
> > I don't think that assumption will hold. Requiring RCU protection for
> > all local kptrs means allocator cache reuse becomes harder, as
> > elements are stuck in freelist until the next grace period. It
> > necessitates use of larger caches.
> > For some use cases where they can tolerate reuse, it might not be
> > optimal. IMO the allocator should be independent of how the lifetime
> > of elements is managed.
>
> All maps and allocations are already rcu-protected, we're not adding new here. We're
> only relying on this rcu protection (c.f. the chain of call_rcu_task_trace and
> call_rcu in the patchset) to prove that no program can observe an allocator pointer
> that is corrupted or stale.
>
> >
> > That said, even if you assume RCU protection, that still doesn't
> > address the real problem. Yes, you can access the value without
> > worrying about it moving to another map, but consider this case:
> > Prog unloading,
> > populate_used_allocators -> map walks map_values, tries to take
> > reference to local kptr whose backing allocator is A.
> > Loads kptr, meanwhile some other prog sharing access to the map value
> > exchanges (kptr_xchg) another pointer into that field.
> > Now you take reference to allocator A in used_allocators, while actual
> > value in map is for allocator B.
>
> This is fine. The algorithm is conservative, it may keep allocators around longer
> than they're needed. Eventually there will exist a time that this map won't be
> accessible to any program, at which point both allocator A and B will be released.
>
> It is possible to make a more precise algorithm but given that this behavior is only
> really a problem with (likely pinned) long-lived maps, it's imo not worth it for v1.
>
> >
> > So you either have to cmpxchg and then retry if it fails (which is not
> > a wait-free operation, and honestly not great imo), or if you don't
> > handle this:
> > Someone moved an allocated local kptr backed by B into your map, but
> > you don't hold reference to it.
>
> You don't need a reference while something else is holding the allocator alive. The
> references exist to extend the lifetime past the lifetimes of programs that can
> directly reference the allocator.
>
> > That other program may release
> > allocator map -> allocator,
>
> The allocator map cannot be removed while it's in used_maps of the program. On
> program unload, we'll add B to the used_allocators list, as a value from B is in the
> map. Only then will the allocator map be releasable.
>

Ahh, so prog1 and prog2 both share map M, but not allocator map A and
B, so if it swaps in pointer of B into M while prog1 is unloading, it
will take unneeded ref to A (it it sees kptr to A just before swap).
But when prog2 will unload, it will then see that ptr value is from B
so it will also take ref to B in M's used_allocators. Then A stays
alive for a little while longer, but only when this race happens. But
there won't be any correctness problem as both A and B are kept alive.

Makes sense, but IIUC this only takes care of this specific case. It
can also see 'NULL' and miss taking reference to A.

prog1 uses A, M
prog2 uses B, M
A and B are allocator maps, M is normal hashmap, M is shared between both.

prog1 is unloading:
M holds kptr from A.
during walk, before loading field, prog2 xchg it to NULL, M does not
take ref to A. // 1
Right after it is done processing this field during its walk, prog2
now xchg it back in, now M is holding A but did not take ref to A. //
2
prog1 unloads. M's used_allocators list is empty.

M is still kept alive by prog2.
Now prog2 has already moved its result of kptr_xchg back into the map in step 2.
Hence, prog2 execution can terminate, this ends its RCU section.
Allocator A is waiting for all prior RCU readers, eventually it can be freed.
Now prog2 runs again, starts a new RCU section, kptr_xchg this ptr
from M, and tries to free it. Allocator A is already gone.

If this is also taken care of, I'll only be poking at the code next
when you post it, so that I don't waste any more of your time. But
IIUC this can still cause issues, right?

      parent reply	other threads:[~2022-08-30  0:17 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 21:42 [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 01/15] bpf: Introduce any context " Alexei Starovoitov
2022-08-20  0:03   ` kernel test robot
2022-08-20  5:29   ` kernel test robot
2022-08-19 21:42 ` [PATCH v3 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 07/15] bpf: Optimize call_rcu " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU Alexei Starovoitov
2022-08-24 19:58   ` Kumar Kartikeya Dwivedi
2022-08-25  0:13     ` Alexei Starovoitov
2022-08-25  0:35       ` Joel Fernandes
2022-08-25  0:49         ` Joel Fernandes
2022-08-19 21:42 ` [PATCH v3 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 12/15] bpf: Remove tracing program restriction on map types Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs Alexei Starovoitov
2022-08-19 22:21   ` Kumar Kartikeya Dwivedi
2022-08-19 22:43     ` Alexei Starovoitov
2022-08-19 22:56       ` Kumar Kartikeya Dwivedi
2022-08-19 23:01         ` Alexei Starovoitov
2022-08-24 19:49           ` Kumar Kartikeya Dwivedi
2022-08-25  0:08             ` Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 14/15] bpf: Remove prealloc-only restriction for " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc Alexei Starovoitov
2022-08-24 20:03 ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Kumar Kartikeya Dwivedi
2022-08-25  0:16   ` Alexei Starovoitov
2022-08-25  0:56 ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator, UAPI in particular Delyan Kratunov
2022-08-26  4:03   ` Kumar Kartikeya Dwivedi
2022-08-29 21:23     ` Delyan Kratunov
2022-08-29 21:29     ` Delyan Kratunov
2022-08-29 22:07       ` Kumar Kartikeya Dwivedi
2022-08-29 23:18         ` Delyan Kratunov
2022-08-29 23:45           ` Alexei Starovoitov
2022-08-30  0:20             ` Kumar Kartikeya Dwivedi
2022-08-30  0:26               ` Alexei Starovoitov
2022-08-30  0:44                 ` Kumar Kartikeya Dwivedi
2022-08-30  1:05                   ` Alexei Starovoitov
2022-08-30  1:40                     ` Delyan Kratunov
2022-08-30  3:34                       ` Alexei Starovoitov
2022-08-30  5:02                         ` Kumar Kartikeya Dwivedi
2022-08-30  6:03                           ` Alexei Starovoitov
2022-08-30 20:31                             ` Delyan Kratunov
2022-08-31  1:52                               ` Alexei Starovoitov
2022-08-31 17:38                                 ` Delyan Kratunov
2022-08-31 18:57                                   ` Alexei Starovoitov
2022-08-31 20:12                                     ` Kumar Kartikeya Dwivedi
2022-08-31 20:38                                       ` Alexei Starovoitov
2022-08-31 21:02                                     ` Delyan Kratunov
2022-08-31 22:32                                       ` Kumar Kartikeya Dwivedi
2022-09-01  0:41                                         ` Alexei Starovoitov
2022-09-01  3:55                                       ` Alexei Starovoitov
2022-09-01 22:46                                         ` Delyan Kratunov
2022-09-02  0:12                                           ` Alexei Starovoitov
2022-09-02  1:40                                             ` Delyan Kratunov
2022-09-02  3:29                                               ` Alexei Starovoitov
2022-09-04 22:28                                                 ` Kumar Kartikeya Dwivedi
2022-08-30  0:17           ` Kumar Kartikeya Dwivedi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP01T75LPynUfJmHG=Q6jPZxMOO5T-0TRT3rF6h5Ai=OJ2omBA@mail.gmail.com' \
    --to=memxor@gmail.com \
    --cc=Kernel-team@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=davemarchevsky@fb.com \
    --cc=delyank@fb.com \
    --cc=joannelkoong@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.