bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@fb.com>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kernel Team <kernel-team@fb.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [RFC PATCH bpf-next 05/11] bpf: Add bpf_spin_lock member to rbtree
Date: Wed, 10 Aug 2022 15:06:15 -0700	[thread overview]
Message-ID: <CAADnVQ+1SE8EVMEuM=6fbjkA63Lv-OqzMCrgAkg5dS_mb5g6bg@mail.gmail.com> (raw)
In-Reply-To: <CAP01T75nt69=jgGPGXYXHSGc5EDHejgLQpyY8TMeUy2U4Prxvg@mail.gmail.com>

On Wed, Aug 10, 2022 at 2:47 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Just to continue brainstorming: Comments on this?
>
> Instead of a rbtree map, you have a struct bpf_rbtree global variable
> which works like a rbtree. To associate a lock with multiple
> bpf_rbtree, you do clang style thread safety annotation in the bpf
> program:
>
> #define __guarded_by(lock) __attribute__((btf_type_tag("guarded_by:" #lock))
>
> struct bpf_spin_lock shared_lock;
> struct bpf_rbtree rbtree1 __guarded_by(shared_lock);
> struct bpf_rbtree rbtree2 __guarded_by(shared_lock);
>
> guarded_by tag is mandatory for the rbtree. Verifier ensures
> shared_lock spin lock is held whenever rbtree1 or rbtree2 is being
> accessed, and whitelists add/remove helpers inside the critical
> section.

We've been discussing exactly btf_type_tag annotation
to associate rbtree with a lock :)
Thanks for bringing it up as well.
Great that we're aligned.

> I don't think associating locks to rbtree dynamically is a hard
> requirement for your use case? But if you need that, you may probably
> also allocate sets of rbtree that are part of the same lock "class"
> dynamically using bpf_kptr_alloc, and do similar annotation for the
> struct being allocated.
> struct rbtree_set {
>   struct bpf_spin_lock lock;
>   struct bpf_rbtree rbtree1 __guarded_by(lock);
>   struct bpf_rbtree rbtree2 __guarded_by(lock);
> };
> struct rbtree_set *s = bpf_kptr_alloc(sizeof(*s), btf_local_type_id(*s));
> // Just stash the pointer somewhere with kptr_xchg
> On bpf_kptr_free, the verifier knows this is not a "trivial" struct,
> so inserts destructor calls for bpf_rbtree fields during program
> fixup.
>
> The main insight is that lock and rbtree are part of the same
> allocation (map value for global case, ptr_to_btf_id for dynamic case)
> so the locked state can be bound to the reg state in the verifier.

It works nicely in the static case, but ffwd a bit.
We might need an rbtree of rbtrees.
Pretty much like map-in-map that we have today for hash tables.

And the rbtree_node might be a part of an rbtree while
being chained into a separate link list.

We need a lock to protect operations on both rbtree and link list.
Then we might need to create an rbtree dynamically for each cgroup.
And store a pointer to rb_root in cgroup local storage.
Maybe allocating a lock + rb_root + linklist_head as one
allocation will not be too restrictive.

> Then we can also make this new rbtree API use kfuncs instead of UAPI
> helpers, to get some field experience before baking it in.

+1 to that.

> Any opinions? Any brainos or deficiencies in the scheme above?

It would be great if we can do the lock checks statically.
Dynamic locks means that rbtree_add/erase and in the future
link list insert/remove might fail which is horrible from
programmer perspective.
We've been thinking about the "abort" concept for such cases.
When helpers detect an unsafe condition like correct lock is not
taken, they can abort execution of itself, the bpf program
and prevent the program from executing in the future.
Conceptually it sounds great and will solve all kinds of ugliness,
but it's not clear yet how to implement such abort mechanism
which would mean stack unwind and calling of destructors for kptr-s,
refcnt decrements for acquired objects like sockets, etc.

  reply	other threads:[~2022-08-10 22:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-22 18:34 [RFC PATCH bpf-next 00/11] bpf: Introduce rbtree map Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 01/11] bpf: Pull repeated reg access bounds check into helper fn Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 02/11] bpf: Add verifier support for custom callback return range Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 03/11] bpf: Add rb_node_off to bpf_map Dave Marchevsky
2022-08-01 22:19   ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 04/11] bpf: Add rbtree map Dave Marchevsky
2022-08-01 21:49   ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 05/11] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
2022-08-01 22:17   ` Alexei Starovoitov
2022-08-02 13:59     ` Kumar Kartikeya Dwivedi
2022-08-02 15:30       ` Alexei Starovoitov
2022-08-10 21:46     ` Kumar Kartikeya Dwivedi
2022-08-10 22:06       ` Alexei Starovoitov [this message]
2022-08-10 23:16         ` Kumar Kartikeya Dwivedi
2022-08-15  5:33       ` Yonghong Song
2022-08-15  5:37         ` Kumar Kartikeya Dwivedi
2022-07-22 18:34 ` [RFC PATCH bpf-next 06/11] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
2022-08-01 21:58   ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 07/11] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 08/11] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
2022-08-01 22:41   ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 09/11] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
2022-08-01 22:23   ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 10/11] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Dave Marchevsky
2022-07-29 16:31   ` Tejun Heo
2022-08-01 22:44   ` Alexei Starovoitov
2022-08-02 13:05     ` Kumar Kartikeya Dwivedi
2022-08-02 15:10       ` Alexei Starovoitov
2022-08-10 17:56     ` Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 11/11] selftests/bpf: Add rbtree map tests Dave Marchevsky
2022-07-28  7:18   ` Yonghong Song
2022-08-10 17:48     ` Dave Marchevsky
2022-07-28  7:04 ` [RFC PATCH bpf-next 00/11] bpf: Introduce rbtree map Yonghong Song
2022-08-10 17:54   ` Dave Marchevsky
2022-08-01 21:27 ` Alexei Starovoitov
2022-08-10 18:11   ` Dave Marchevsky
2022-08-02 22:02 ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQ+1SE8EVMEuM=6fbjkA63Lv-OqzMCrgAkg5dS_mb5g6bg@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kernel-team@fb.com \
    --cc=memxor@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).