bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Delyan Kratunov <delyank@fb.com>
To: "davem@davemloft.net" <davem@davemloft.net>,
	"alexei.starovoitov@gmail.com" <alexei.starovoitov@gmail.com>
Cc: "tj@kernel.org" <tj@kernel.org>,
	"joannelkoong@gmail.com" <joannelkoong@gmail.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"memxor@gmail.com" <memxor@gmail.com>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	Kernel Team <Kernel-team@fb.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator, UAPI in particular
Date: Thu, 25 Aug 2022 00:56:30 +0000	[thread overview]
Message-ID: <d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com> (raw)
In-Reply-To: <20220819214232.18784-1-alexei.starovoitov@gmail.com>

Alexei and I spent some time today going back and forth on what the uapi to this
allocator should look like in a BPF program. To both of our surprise, the problem
space became far more complicated than we anticipated.

There are three primary problems we have to solve:
1) Knowing which allocator an object came from, so we can safely reclaim it when
necessary (e.g., freeing a map).
2) Type confusion between local and kernel types. (I.e., a program allocating kernel
types and passing them to helpers/kfuncs that don't expect them). This is especially
important because the existing kptr mechanism assumes kernel types everywhere.
3) Allocated objects lifetimes, allocator refcounting, etc. It all gets very hairy
when you allow allocated objects in pinned maps.

This is the proposed design that we landed on:

1. Allocators get their own MAP_TYPE_ALLOCATOR, so you can specify initial capacity
at creation time. Value_size > 0 takes the kmem_cache path. Probably with
btf_value_type_id enforcement for the kmem_cache path.

2. The helper APIs are just bpf_obj_alloc(bpf_map *, bpf_core_type_id_local(struct
foo)) and bpf_obj_free(void *). Note that obj_free() only takes an object pointer.

3. To avoid mixing BTF type domains, a new type tag (provisionally __kptr_local)
annotates fields that can hold values with verifier type `PTR_TO_BTF_ID |
BTF_ID_LOCAL`. obj_alloc only ever returns these local kptrs and only ever resolves
against program-local btf (in the verifier, at runtime it only gets an allocation
size). 
3.1. If eventually we need to pass these objects to kfuncs/helpers, we can introduce
a new bpf_obj_export helper that takes a PTR_TO_LOCAL_BTF_ID and returns the
corresponding PTR_TO_BTF_ID, after verifying against an allowlist of some kind. This
would be the only place these objects can leak out of bpf land. If there's no runtime
aspect (and there likely wouldn't be), we might consider doing this transparently,
still against an allowlist of types.

4. To ensure the allocator stays alive while objects from it are alive, we must be
able to identify which allocator each __kptr_local pointer came from, and we must
keep the refcount up while any such values are alive. One concern here is that doing
the refcount manipulation in kptr_xchg would be too expensive. The proposed solution
is to: 
4.1 Keep a struct bpf_mem_alloc* in the header before the returned object pointer
from bpf_mem_alloc(). This way we never lose track which bpf_mem_alloc to return the
object to and can simplify the bpf_obj_free() call.
4.2. Tracking used_allocators in each bpf_map. When unloading a program, we would
walk all maps that the program has access to (that have kptr_local fields), walk each
value and ensure that any allocators not already in the map's used_allocators are
refcount_inc'd and added to the list. Do note that allocators are also kept alive by
their bpf_map wrapper but after that's gone, used_allocators is the main mechanism.
Once the bpf_map is gone, the allocator cannot be used to allocate new objects, we
can only return objects to it.
4.3. On map free, we walk and obj_free() all the __kptr_local fields, then
refcount_dec all the used_allocators.

Overall, we think this handles all the nasty corners - objects escaping into
kfuncs/helpers when they shouldn't, pinned maps containing pointers to allocations,
programs accessing multiple allocators having deterministic freelist behaviors -
while keeping the API and complexity sane. The used_allocators approach can certainly
be less conservative (or can be even precise) but for a v1 that's probably overkill.

Please, feel free to shoot holes in this design! We tried to capture everything but
I'd love confirmation that we didn't miss anything.

--Delyan

  parent reply	other threads:[~2022-08-25  0:56 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 21:42 [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 01/15] bpf: Introduce any context " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 07/15] bpf: Optimize call_rcu " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU Alexei Starovoitov
2022-08-24 19:58   ` Kumar Kartikeya Dwivedi
2022-08-25  0:13     ` Alexei Starovoitov
2022-08-25  0:35       ` Joel Fernandes
2022-08-25  0:49         ` Joel Fernandes
2022-08-19 21:42 ` [PATCH v3 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 12/15] bpf: Remove tracing program restriction on map types Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs Alexei Starovoitov
2022-08-19 22:21   ` Kumar Kartikeya Dwivedi
2022-08-19 22:43     ` Alexei Starovoitov
2022-08-19 22:56       ` Kumar Kartikeya Dwivedi
2022-08-19 23:01         ` Alexei Starovoitov
2022-08-24 19:49           ` Kumar Kartikeya Dwivedi
2022-08-25  0:08             ` Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 14/15] bpf: Remove prealloc-only restriction for " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc Alexei Starovoitov
2022-08-24 20:03 ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Kumar Kartikeya Dwivedi
2022-08-25  0:16   ` Alexei Starovoitov
2022-08-25  0:56 ` Delyan Kratunov [this message]
2022-08-26  4:03   ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator, UAPI in particular Kumar Kartikeya Dwivedi
2022-08-29 21:23     ` Delyan Kratunov
2022-08-29 21:29     ` Delyan Kratunov
2022-08-29 22:07       ` Kumar Kartikeya Dwivedi
2022-08-29 23:18         ` Delyan Kratunov
2022-08-29 23:45           ` Alexei Starovoitov
2022-08-30  0:20             ` Kumar Kartikeya Dwivedi
2022-08-30  0:26               ` Alexei Starovoitov
2022-08-30  0:44                 ` Kumar Kartikeya Dwivedi
2022-08-30  1:05                   ` Alexei Starovoitov
2022-08-30  1:40                     ` Delyan Kratunov
2022-08-30  3:34                       ` Alexei Starovoitov
2022-08-30  5:02                         ` Kumar Kartikeya Dwivedi
2022-08-30  6:03                           ` Alexei Starovoitov
2022-08-30 20:31                             ` Delyan Kratunov
2022-08-31  1:52                               ` Alexei Starovoitov
2022-08-31 17:38                                 ` Delyan Kratunov
2022-08-31 18:57                                   ` Alexei Starovoitov
2022-08-31 20:12                                     ` Kumar Kartikeya Dwivedi
2022-08-31 20:38                                       ` Alexei Starovoitov
2022-08-31 21:02                                     ` Delyan Kratunov
2022-08-31 22:32                                       ` Kumar Kartikeya Dwivedi
2022-09-01  0:41                                         ` Alexei Starovoitov
2022-09-01  3:55                                       ` Alexei Starovoitov
2022-09-01 22:46                                         ` Delyan Kratunov
2022-09-02  0:12                                           ` Alexei Starovoitov
2022-09-02  1:40                                             ` Delyan Kratunov
2022-09-02  3:29                                               ` Alexei Starovoitov
2022-09-04 22:28                                                 ` Kumar Kartikeya Dwivedi
2022-08-30  0:17           ` Kumar Kartikeya Dwivedi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com \
    --to=delyank@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=davemarchevsky@fb.com \
    --cc=joannelkoong@gmail.com \
    --cc=memxor@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).