All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	Delyan Kratunov <delyank@fb.com>
Subject: Re: [PATCH RFC bpf-next v1 16/32] bpf: Introduce BPF memory object model
Date: Thu, 8 Sep 2022 17:37:27 +0200	[thread overview]
Message-ID: <CAP01T75s+d9Ko7V5dqe94_DbehRv5RXCPGOkjb2CG+wxCe_uvA@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQLc6bWuyknq_ZqLqEyMmkgg3nia6VW7+9MvgDPTOvJ=kQ@mail.gmail.com>

On Thu, 8 Sept 2022 at 17:11, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 7:46 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Thu, 8 Sept 2022 at 16:18, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Sep 8, 2022 at 4:50 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > >
> > > > I slept over this. I think I can get behind this idea of implicit
> > > > ctor/dtor. We might have open coded construction/destruction later if
> > > > we want.
> > > >
> > > > I am however thinking of naming these helpers:
> > > > bpf_kptr_new
> > > > bpf_kptr_delete
> > > > to make it clear it does a little more than just allocating the type.
> > > > The open coded cases can later derive their allocation from the more
> > > > bare bones bpf_kptr_alloc instead in the future.
> > >
> > > New names make complete sense. Good idea.
> > >
> > > > The main reason to have open coded-ness was being able to 'manage'
> > > > resources once visibility reduces to current CPU (bpf_refcount_put,
> > > > single ownership after xchg, etc.). Even with RCU, we won't allow
> > > > touching the BPF special fields without refcount. bpf_spin_lock is
> > > > different, as it protects more than just bpf special fields.
> > > >
> > > > But one can still splice or kptr_xchg before passing to bpf_kptr_free
> > > > to do that. bpf_kptr_free is basically cleaning up whatever is left by
> > > > then, forcefully. In the future, we might even be able to do elision
> > > > of implicit dtors based on the seen data flow (splicing in single
> > > > ownership implies list is empty, any other op will undo that, etc.) if
> > > > there are big structs with too many fields. Can also support that in
> > > > open coded cases.
> > >
> > > Right.
> > >
> > > >
> > > > What I want to think about more is whether we should still force
> > > > calling bpf_refcount_set vs always setting it to 1.
> > > >
> > > > I know we don't agree about whether list_add in shared mode should
> > > > take ref vs transfer ref. I'm leaning towards transfer since that will
> > > > be most intuitive. It then works the same way in both cases, single
> > > > ownership only transfers the sole reference you have, so you lose
> > > > access, but in shared you may have more than one. If you have just one
> > > > you will still lose access.
> > > >
> > > > It will be odd for list_add to consume it in one case and not the
> > > > other. People should already be fully conscious of how they are
> > > > managing the lifetime of their object.
> > > >
> > > > It then seems better to require users to set the initial refcount
> > > > themselves. When doing the initial linking it can be very cheap.
> > > > Later get/put/inc are always available.
> > > >
> > > > But forcing it to be called is going to be much simpler than this patch.
> > >
> > > I'm not convinced yet :)
> > > Pls hold on implementing one way or another.
> > > Let's land the single ownership case for locks, lists,
> > > rbtrees, allocators. That's plenty of patches.
> > > Then we can start a deeper discussion into the shared case.
> > > Whether it will be different in terms of 'lose access after list_add'
> > > is not critical to decide now. It can change in the future too.
> > >
> >
> > Right, I'm not implementing it yet. There's a lot of work left to even
> > finish single ownership structures, then lots of testing.
> > But it's helpful to keep thinking about future use cases while working
> > on the current stuff, just to make sure we're not
> > digging ourselves into a design hole.
> >
> > We have the option to undo damage here, since this is all
> > experimental, but there's still an expectation that the API is not
> > broken at whim. That wouldn't be very useful for users.
>
> imo this part is minor.
> The whole lock + list_or_rbtree in a single allocation
> restriction bothers me a lot more.
> We will find out for sure only when wwe have a prototype
> of lock + list + rbtree and let folks who requested it
> actually code things.
>

Sure.
But when I look in the kernel, I often see data and its lock often
allocated together.
The lock is just there to serialize access to the data structure. It
might as well not be if it's a bpf_llist_head, or only optionally.
It might be an entirely different way of serializing access (local_t +
percpu list).
But usually having both together is also great for locality.
Different use cases have different needs, the simple and common cases
are often well served by having both together.

Not every use case needs both list and/or rbtree. Some require access
to only one at a time.
e.g. We might reserve struct rb_node in xdp_frame, allowing struct
bpf_rb_tree __contains(xdp_frame, rb_node)
or struct bpf_rb_tree __contains(sk_buff, rb_node), unifying the
queueing primitives for XDP and TC.
It makes sense to make simple cases faster and simpler to use.

This is why I eventually plan to add a RCU based hash table using
these single ownership lists in selftests, at least as a showcase it
can serve a 'real world' use case.

Dave's dynamic lock checks are conceptually not very different from
the verifier's perspective. A bpf_spin_lock * vs bpf_spin_lock
protecting the list. Indirection allows it to assume a dynamic value
at runtime. Some checks can still be done statically, and some where
the pointer's indeterminism hinders static analysis can be offloaded
to runtime. Different tradeoffs.

> > > The other reason to do implicit inits and ref count sets is to
> >
> > I am not contesting implicit construction.
> > Other lists already work with zero initialization so list_head seems
> > more of an exception.
> > But it's done for good reasons to avoid extra NULL checks
> > unnecessarily, and make the implementation of list helpers more
> > efficient and simple at the same time.
> >
> > > avoid fighting llvm.
> > > obj = bpf_kptr_new();
> > > obj->var1 = 1;
> > > some_func(&obj->var2);
> > > In many cases the compiler is allowed to sink stores.
> > > If there are two calls that "init" two different fields
> > > the compiler is allowed to change the order as well
> > > even if it doesn't see the body of the function and the function is
> > > marked as __pure. Technically initializers as pure functions.
> >
> > But bpf_refcount_set won't be marked __pure, neither am I proposing to
> > allow direct stores to 'set' it.
> > I'm not a compiler expert by any means, but AFAIK it should not be
> > doing such reordering for functions otherwise.
> > What if the function inside has a memory barrier? That would
> > completely screw up things.
> > It's going to have external linkage, so I don't think it can assume
> > anything about side effects or not. So IMO this is not a good point.
>
> The pure attribute tells the compiler that the function doesn't
> have side effects. We even use it in the kernel code base.
> Sooner or later we'll start using it in bpf too.
> Things like memcmp is a primary example.
> I have to correct myself though. refcount_set shouldn't be
> considered pure.
>
> > Unless you're talking about some new way of inlining such helpers from
> > the compiler side that doesn't exist yet.
> >
> > > The verifier and llvm already "fight" a lot.
> > > We gotta be very careful in the verifier and not assume
> > > that the code stays as written in C.
> >
> > So will these implicit zero stores be done when we enter != NULL
> > branch, or lazily on first access (helper arg, load, store)?
>
> Whichever way is faster and still safe.
> I assumed that we'd have to zero them after successful alloc.
>
> > This is the flip side: rewritings insns to add stores to local kptr
> > can only happen after the NULL check, in the != NULL branch, at that
> > point we cannot assume R1-R5 are free for use, so complicated field
> > initialization will be uglier to do implicitly (e.g. if it involves
> > calling functions etc.).
> > There are pros and cons for both.
>
> Are you expecting the verifier to insert zero inits
> as actual insns after 'call bpf_kptr_new' insn ?
> Hmm. I imagined bpf_kptr_new helper will do it.
> Just a simple loop that is inverse of zero_map_value().
> Not the fastest thing, but good enough to start and can be
> optimized later.
> The verifier can insert ST insn too in !null branch,
> since that insn only needs one register and it's known
> in that branch.
> It's questionable that a bunch of ST insns will be faster
> than a zero_map_value-like loop.

I would definitely think a bunch of direct 0 stores would be faster.
zero_map_value will access some kind of offset array in memory and
then read from it and loop over to do the stores.
Does seem like more work to do, even if it's hot in cache those are
unneeded extra accesses for information we know statically.
So I'll most likely emit a bunch of ST insns zeroing it out in v1.

  reply	other threads:[~2022-09-08 15:38 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-04 20:41 [PATCH RFC bpf-next v1 00/32] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 01/32] bpf: Add copy_map_value_long to copy to remote percpu memory Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 02/32] bpf: Support kptrs in percpu arraymap Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 03/32] bpf: Add zero_map_value to zero map value with special fields Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 04/32] bpf: Support kptrs in percpu hashmap and percpu LRU hashmap Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 05/32] bpf: Support kptrs in local storage maps Kumar Kartikeya Dwivedi
2022-09-07 19:00   ` Alexei Starovoitov
2022-09-08  2:47     ` Kumar Kartikeya Dwivedi
2022-09-09  5:27   ` Martin KaFai Lau
2022-09-09 11:22     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 06/32] bpf: Annotate data races in bpf_local_storage Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 07/32] bpf: Allow specifying volatile type modifier for kptrs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 08/32] bpf: Add comment about kptr's PTR_TO_MAP_VALUE handling Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 09/32] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 10/32] bpf: Drop kfunc support from btf_check_func_arg_match Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 11/32] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 12/32] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
2022-09-07 22:11   ` Alexei Starovoitov
2022-09-08  2:49     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 13/32] bpf: Introduce bpf_list_head support for BPF maps Kumar Kartikeya Dwivedi
2022-09-07 22:46   ` Alexei Starovoitov
2022-09-08  2:58     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 14/32] bpf: Introduce bpf_kptr_alloc helper Kumar Kartikeya Dwivedi
2022-09-07 23:30   ` Alexei Starovoitov
2022-09-08  3:01     ` Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 15/32] bpf: Add helper macro bpf_expr_for_each_reg_in_vstate Kumar Kartikeya Dwivedi
2022-09-07 23:48   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 16/32] bpf: Introduce BPF memory object model Kumar Kartikeya Dwivedi
2022-09-08  0:34   ` Alexei Starovoitov
2022-09-08  2:39     ` Kumar Kartikeya Dwivedi
2022-09-08  3:37       ` Alexei Starovoitov
2022-09-08 11:50         ` Kumar Kartikeya Dwivedi
2022-09-08 14:18           ` Alexei Starovoitov
2022-09-08 14:45             ` Kumar Kartikeya Dwivedi
2022-09-08 15:11               ` Alexei Starovoitov
2022-09-08 15:37                 ` Kumar Kartikeya Dwivedi [this message]
2022-09-08 15:59                   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 17/32] bpf: Support bpf_list_node in local kptrs Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 18/32] bpf: Support bpf_spin_lock " Kumar Kartikeya Dwivedi
2022-09-08  0:35   ` Alexei Starovoitov
2022-09-09  8:25     ` Dave Marchevsky
2022-09-09 11:20       ` Kumar Kartikeya Dwivedi
2022-09-09 14:26         ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 19/32] bpf: Support bpf_list_head " Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 20/32] bpf: Introduce bpf_kptr_free helper Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 21/32] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
2022-09-08  0:27   ` Alexei Starovoitov
2022-09-08  0:39     ` Kumar Kartikeya Dwivedi
2022-09-08  0:55       ` Alexei Starovoitov
2022-09-08  1:00     ` Kumar Kartikeya Dwivedi
2022-09-08  1:08       ` Alexei Starovoitov
2022-09-08  1:15         ` Kumar Kartikeya Dwivedi
2022-09-08  2:39           ` Alexei Starovoitov
2022-09-09  8:13   ` Dave Marchevsky
2022-09-09 11:05     ` Kumar Kartikeya Dwivedi
2022-09-09 14:24       ` Alexei Starovoitov
2022-09-09 14:50         ` Kumar Kartikeya Dwivedi
2022-09-09 14:58           ` Alexei Starovoitov
2022-09-09 18:32             ` Andrii Nakryiko
2022-09-09 19:25               ` Alexei Starovoitov
2022-09-09 20:21                 ` Andrii Nakryiko
2022-09-09 20:57                   ` Alexei Starovoitov
2022-09-10  0:21                     ` Andrii Nakryiko
2022-09-11 22:31                       ` Alexei Starovoitov
2022-09-20 20:55                         ` Andrii Nakryiko
2022-10-18  4:06                           ` Andrii Nakryiko
2022-09-09 22:30                 ` Dave Marchevsky
2022-09-09 22:49                   ` Kumar Kartikeya Dwivedi
2022-09-09 22:57                     ` Alexei Starovoitov
2022-09-09 23:04                       ` Kumar Kartikeya Dwivedi
2022-09-09 22:51                   ` Alexei Starovoitov
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 22/32] bpf: Bump BTF_KFUNC_SET_MAX_CNT Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 23/32] bpf: Add single ownership BPF linked list API Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 24/32] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 25/32] bpf: Allow storing local kptrs in BPF maps Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 26/32] bpf: Wire up freeing of bpf_list_heads in maps Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 27/32] bpf: Add destructor for bpf_list_head in local kptr Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 28/32] bpf: Remove duplicate PTR_TO_BTF_ID RO check Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 29/32] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 30/32] selftests/bpf: Add BTF tag macros for local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 31/32] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
2022-09-04 20:41 ` [PATCH RFC bpf-next v1 32/32] selftests/bpf: Add referenced local kptr tests Kumar Kartikeya Dwivedi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP01T75s+d9Ko7V5dqe94_DbehRv5RXCPGOkjb2CG+wxCe_uvA@mail.gmail.com \
    --to=memxor@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=delyank@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.