All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	Delyan Kratunov <delyank@fb.com>
Subject: Re: [PATCH RFC bpf-next v1 16/32] bpf: Introduce BPF memory object model
Date: Thu, 8 Sep 2022 04:39:43 +0200	[thread overview]
Message-ID: <CAP01T77-ygt+MvvwzRwo+3kDrk_8sCv-ASGT8qL2PvPjL_11jw@mail.gmail.com> (raw)
In-Reply-To: <20220908003429.wsucvsdcxnkipcja@macbook-pro-4.dhcp.thefacebook.com>

On Thu, 8 Sept 2022 at 02:34, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Sep 04, 2022 at 10:41:29PM +0200, Kumar Kartikeya Dwivedi wrote:
> > Add the concept of a memory object model to BPF verifier.
> >
> > What this means is that there are now some types that are not just plain
> > old data, but require explicit action when they are allocated on a
> > storage, before their lifetime is considered as started and before it is
> > allowed for them to escape the program. The verifier will track state of
> > such fields during the various phases of the object lifetime, where it
> > can be sure about certain invariants.
> >
> > Some inspiration is taken from existing memory object and lifetime
> > models in C and C++ which have stood the test of time. See [0], [1], [2]
> > for more information, to find some similarities. In the future, the
> > separation of storage and object lifetime may be made more stark by
> > allowing to change effective type of storage allocated for a local kptr.
> > For now, that has been left out. It is only possible when verifier
> > understands when the program has exclusive access to storage, and when
> > the object it is hosting is no longer accessible to other CPUs.
> >
> > This can be useful to maintain size-class based freelists inside BPF
> > programs and reuse storage of same size for different types. This would
> > only be safe to allow if verifier can ensure that while storage lifetime
> > has not ended, object lifetime for the current type has. This
> > necessiates separating the two and accomodating a simple model to track
> > object lifetime (composed recursively of more objects whose lifetime
> > is individually tracked).
> >
> > Everytime a BPF program allocates such non-trivial types, it must call a
> > set of constructors on the object to fully begin its lifetime before it
> > can make use of the pointer to this type. If the program does not do so,
> > the verifier will complain and lead to failure in loading of the
> > program.
> >
> > Similarly, when ending the lifetime of such types, it is required to
> > fully destruct the object using a series of destructors for each
> > non-trivial member, before finally freeing the storage the object is
> > making use of.
> >
> > During both the construction and destruction phase, there can be only
> > one program that can own and access such an object, hence their is no
> > need of any explicit synchronization. The single ownership of such
> > objects makes it easy for the verifier to enforce the safety around the
> > beginning and end of the lifetime without resorting to dynamic checks.
> >
> > When there are multiple fields needing construction or destruction, the
> > program must call their constructors in ascending order of the offset of
> > the field.
> >
> > For example, consider the following type (support for such fields will
> > be added in subsequent patches):
> >
> > struct data {
> >       struct bpf_spin_lock lock;
> >       struct bpf_list_head list __contains(struct, foo, node);
> >       int data;
> > };
> >
> > struct data *d = bpf_kptr_alloc(...);
> > if (!d) { ... }
> >
> > Now, the type of d would be PTR_TO_BTF_ID | MEM_TYPE_LOCAL |
> > OBJ_CONSTRUCTING, as it needs two constructor calls (for lock and head),
> > before it can be considered fully initialized and alive.
> >
> > Hence, we must do (in order of field offsets):
> >
> > bpf_spin_lock_init(&d->lock);
> > bpf_list_head_init(&d->list);
>
> All sounds great in theory, but I think it's unnecessary complex at this point.
> There is still a need to __bpf_list_head_init_zeroed as seen in later patches.

This particular call is only because of map values. INIT_LIST_HEAD for
prealloc init or alloc_elem would be costly.
There won't be any concern to do it in check_and_init_map_value, we
zero out the field there already. Nothing else needs this check.

List helpers I am planning to inline, it doesn't make sense to have
two loads/stores inside kfuncs. And then for local kptrs there is no
need to zero init. pop_front/pop_back are even uglier. There you need
NULL check + zero init, _then_ check for list_empty. Same with future
list_splice.

I don't believe list helpers are going to be so infrequent such that
all this might not matter at all.

But fine, I still consider this a fair point. I thought a lot about this too.

It really boils down to: do we really want to always zero init?

What seems more desirable to me is forcing initialization like this,
esp. since memory reuse is going to be the more common case,
and then simply relaxing initialization when we know it comes from
bpf_kptr_zalloc. needs_construction similar to needs_destruction.
We aren't requiring bpf_list_node_fini, same idea there.

Zeroing the entire big struct vs zeroing/initing two fields makes a
huge difference.

> So all this verifier enforced constructors we don't need _today_.
> Zero init of everything works.
> It's the case for list_head, list_node, spin_lock, rb_root, rb_node.
> Pretty much all new data structures will work with zero init
> and all of them need async dtors.
> The verifier cannot help during destruction.
> dtors have to be specified declaratively in a bpf prog for new types

I think about it the other way around.

There actually isn't a need to specify any dtor IMO for custom types.
Just init and free your type inline. Much more familiar to people
doing C already.
Custom types are always just data without special fields, and we know
how to destroy BPF special fields.
Map already knows how to 'destruct' these types, just like it has to
know how to destruct map value.

map value type and local kptr type are similar in that regard. They
are both local types in prog BTF with special fields.
If it can do it for map value, it can do it for local kptr if it finds
it in map (it has to).

To me taking prog reference and setting up per-type dtor is the uglier
solution. It's unnecessary for the user. That then forces you to have
similar semantics like bpf_timer. map_release_uref will be used to
break the reference cycle between map and prog, which is undesirable.
More effort then - to think about some way to alleviate that, or live
with it and compromise.

Later, asynchronous destruction (RCU case where it won't be done
immediately) is just setting reg->states for all fields as
FIELD_STATE_CONSTRUCTED for reg in callback, but type as
OBJ_DESTRUCTING, forcing you to do nothing but unwind and free in that
context.

The real reason to give destruction control to users for local kptrs
is the ability to manage what to do with drained resources.
They might as well splice out their list when freeing a node to a
local list_head, or move it to a map. Same with more cases in the
future (kptr inside kptr).

It shouldn't be invoked on bpf_kptr_free automagically. That is the
job of the language and best suited to that.
Verifier will see BPF ASM after translation from C/C++/Rust/etc., so
for us the destruction at language level appears as the destructing
phase of local kptr in verifier. For maps it's the last resort, where
programs are already gone, so there is nothing left to do but free
stuff.




> and as known kfuncs for list_head/node, rb_root/node.
> There will be unfreed link lists in maps and the later patches handle that
> without OBJ_DESTRUCTING.
> So let's postpone this patch.

  reply	other threads:[~2022-09-08  2:40 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-04 20:41 [PATCH RFC bpf-next v1 00/32] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 01/32] bpf: Add copy_map_value_long to copy to remote percpu memory Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 02/32] bpf: Support kptrs in percpu arraymap Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 03/32] bpf: Add zero_map_value to zero map value with special fields Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 04/32] bpf: Support kptrs in percpu hashmap and percpu LRU hashmap Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 05/32] bpf: Support kptrs in local storage maps Kumar Kartikeya Dwivedi
2022-09-07 19:00   ` Alexei Starovoitov
2022-09-08  2:47     ` Kumar Kartikeya Dwivedi
2022-09-09  5:27   ` Martin KaFai Lau
2022-09-09 11:22     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 06/32] bpf: Annotate data races in bpf_local_storage Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 07/32] bpf: Allow specifying volatile type modifier for kptrs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 08/32] bpf: Add comment about kptr's PTR_TO_MAP_VALUE handling Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 09/32] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 10/32] bpf: Drop kfunc support from btf_check_func_arg_match Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 11/32] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 12/32] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
2022-09-07 22:11   ` Alexei Starovoitov
2022-09-08  2:49     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 13/32] bpf: Introduce bpf_list_head support for BPF maps Kumar Kartikeya Dwivedi
2022-09-07 22:46   ` Alexei Starovoitov
2022-09-08  2:58     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 14/32] bpf: Introduce bpf_kptr_alloc helper Kumar Kartikeya Dwivedi
2022-09-07 23:30   ` Alexei Starovoitov
2022-09-08  3:01     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 15/32] bpf: Add helper macro bpf_expr_for_each_reg_in_vstate Kumar Kartikeya Dwivedi
2022-09-07 23:48   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 16/32] bpf: Introduce BPF memory object model Kumar Kartikeya Dwivedi
2022-09-08  0:34   ` Alexei Starovoitov
2022-09-08  2:39     ` Kumar Kartikeya Dwivedi [this message]
2022-09-08  3:37       ` Alexei Starovoitov
2022-09-08 11:50         ` Kumar Kartikeya Dwivedi
2022-09-08 14:18           ` Alexei Starovoitov
2022-09-08 14:45             ` Kumar Kartikeya Dwivedi
2022-09-08 15:11               ` Alexei Starovoitov
2022-09-08 15:37                 ` Kumar Kartikeya Dwivedi
2022-09-08 15:59                   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 17/32] bpf: Support bpf_list_node in local kptrs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 18/32] bpf: Support bpf_spin_lock " Kumar Kartikeya Dwivedi
2022-09-08  0:35   ` Alexei Starovoitov
2022-09-09  8:25     ` Dave Marchevsky
2022-09-09 11:20       ` Kumar Kartikeya Dwivedi
2022-09-09 14:26         ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 19/32] bpf: Support bpf_list_head " Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 20/32] bpf: Introduce bpf_kptr_free helper Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 21/32] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
2022-09-08  0:27   ` Alexei Starovoitov
2022-09-08  0:39     ` Kumar Kartikeya Dwivedi
2022-09-08  0:55       ` Alexei Starovoitov
2022-09-08  1:00     ` Kumar Kartikeya Dwivedi
2022-09-08  1:08       ` Alexei Starovoitov
2022-09-08  1:15         ` Kumar Kartikeya Dwivedi
2022-09-08  2:39           ` Alexei Starovoitov
2022-09-09  8:13   ` Dave Marchevsky
2022-09-09 11:05     ` Kumar Kartikeya Dwivedi
2022-09-09 14:24       ` Alexei Starovoitov
2022-09-09 14:50         ` Kumar Kartikeya Dwivedi
2022-09-09 14:58           ` Alexei Starovoitov
2022-09-09 18:32             ` Andrii Nakryiko
2022-09-09 19:25               ` Alexei Starovoitov
2022-09-09 20:21                 ` Andrii Nakryiko
2022-09-09 20:57                   ` Alexei Starovoitov
2022-09-10  0:21                     ` Andrii Nakryiko
2022-09-11 22:31                       ` Alexei Starovoitov
2022-09-20 20:55                         ` Andrii Nakryiko
2022-10-18  4:06                           ` Andrii Nakryiko
2022-09-09 22:30                 ` Dave Marchevsky
2022-09-09 22:49                   ` Kumar Kartikeya Dwivedi
2022-09-09 22:57                     ` Alexei Starovoitov
2022-09-09 23:04                       ` Kumar Kartikeya Dwivedi
2022-09-09 22:51                   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 22/32] bpf: Bump BTF_KFUNC_SET_MAX_CNT Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 23/32] bpf: Add single ownership BPF linked list API Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 24/32] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 25/32] bpf: Allow storing local kptrs in BPF maps Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 26/32] bpf: Wire up freeing of bpf_list_heads in maps Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 27/32] bpf: Add destructor for bpf_list_head in local kptr Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 28/32] bpf: Remove duplicate PTR_TO_BTF_ID RO check Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 29/32] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 30/32] selftests/bpf: Add BTF tag macros for local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 31/32] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 32/32] selftests/bpf: Add referenced local kptr tests Kumar Kartikeya Dwivedi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP01T77-ygt+MvvwzRwo+3kDrk_8sCv-ASGT8qL2PvPjL_11jw@mail.gmail.com \
    --to=memxor@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=delyank@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.