All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Dave Marchevsky <davemarchevsky@meta.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kernel Team <kernel-team@fb.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
Date: Thu, 8 Dec 2022 16:39:39 -0800	[thread overview]
Message-ID: <20221209003939.tsgkghhwznj44agl@macbook-pro-6.dhcp.thefacebook.com> (raw)
In-Reply-To: <3e6af95f-1d8b-aaf9-7e65-002b8fff19b6@meta.com>

On Thu, Dec 08, 2022 at 06:35:24PM -0500, Dave Marchevsky wrote:
> > 
> > Here is the proposal with one new field 'active_lock_id':
> > 
> > first = bpf_rbtree_first(root) KF_RET_NULL
> >   check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
> >   R0 = PTR_TO_BTF_ID|MEM_ALLOC|PTR_MAYBE_NULL ref_obj_id = 0;
> >   R0->active_lock_id = root->reg->id
> >   R0->id = ++env->id_gen; which will be cleared after !NULL check inside prog.
> > 
> > same way we can add rb_find, rb_find_first,
> > but not rb_next, rb_prev, since they don't have 'root' argument.
> > 
> > bpf_rbtree_add(root, node, cb); KF_RELEASE.
> >   needs to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id > 0
> >   check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
> >   calls release_reference(node->ref_obj_id)
> >   converts 'node' to PTR_TO_BTF_ID|MEM_ALLOC ref_obj_id = 0;
> >   node->active_lock_id = root->reg->id
> > 
> > 'node' is equivalent to 'first'. They both point to some element
> > inside rbtree and valid inside spin_locked region.
> > It's ok to read|write to both under lock.
> > 
> > removed_node = bpf_rbtree_remove(root, node); KF_ACQUIRE|KF_RET_NULL
> >   need to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id = 0; and 
> >   usual check_reg_allocation_locked(root)
> >   R0 = PTR_TO_BTF_ID|MEM_ALLOC|MAYBE_NULL
> >   R0->ref_obj_id = R0->id = acquire_reference_state();
> >   R0->active_lock_id should stay 0
> >   mark_reg_unknown(node)
> > 
> > bpf_spin_unlock(lock);
> >   checks lock->id == cur->active_lock.id
> >   for all regs in state 
> >     if (reg->active_lock_id == lock->id)
> >        mark_reg_unknown(reg)
> 
> OK, so sounds like a few more points of agreement, regardless of whether
> we go the runtime checking route or the other one:
> 
>   * We're tossing 'full untrusted' for now. non-owning references will not be
>     allowed to escape critical section. They'll be clobbered w/
>     mark_reg_unknown.

agree

>     * No pressing need to make bpf_obj_drop callable from critical section.
>       As a result no owning or non-owning ref access can page fault.

agree

> 
>   * When spin_lock is unlocked, verifier needs to know about all non-owning
>     references so that it can clobber them. Current implementation -
>     ref_obj_id + release_on_unlock - is bad for a number of reasons, should
>     be replaced with something that doesn't use ref_obj_id or reg->id.
>     * Specific better approach was proposed above: new field + keep track
>       of lock and datastructure identity.

yes

> 
> Differences in proposed approaches:
> 
> "Type System checks + invalidation on 'destructive' rbtree ops"
> 
>   * This approach tries to prevent aliasing problems by invalidating
>     non-owning refs after 'destructive' rbtree ops - like rbtree_remove -
>     in addition to invalidation on spin_unlock
> 
>   * Type system guarantees invariants:
>     * "if it's an owning ref, the node is guaranteed to not be in an rbtree"
>     * "if it's a non-owning ref, the node is guaranteed to be in an rbtree"
> 
>   * Downside: mass non-owning ref invalidation on rbtree_remove will make some
>     programs that logically don't have aliasing problem will be rejected by
>     verifier. Will affect usability depending on how bad this is.

yes.

> 
> 
> "Runtime checks + spin_unlock invalidation only"
> 
>   * This approach allows for the possibility of aliasing problem. As a result
>     the invariants guaranteed in point 2 above don't necessarily hold.
>     * Helpers that add or remove need to account for possibility that the node
>       they're operating on has already been added / removed. Need to check this
>       at runtime and nop if so.

Only 'remove' needs to check.
'add' is operating on 'owning ref'. It cannot fail.
Some future 'add_here(root, owning_node_to_add, nonowning_location)'
may need to fail.

> 
>   * non-owning refs are only invalidated on spin_unlock.
>     * As a result, usability issues of previous approach don't happen here.
> 
>   * Downside: Need to do runtime checks, some additional verifier complexity
>     to deal with "runtime check failed" case due to prev approach's invariant
>     not holding
> 
> Conversion of non-owning refs to 'untrusted' at a invalidation point (unlock
> or remove) can be added to either approach (maybe - at least it was specifically
> discussed for "runtime checks"). Such untrusted refs, by virtue of being
> PTR_UNTRUSTED, can fault, and aren't accepted by rbtree_{add, remove} as input.

correct.

> For the "type system" approach this might ameliorate some of the usability
> issues. For the "runtime checks" approach it would only be useful to let
> such refs escape spin_unlock.

the prog can do bpf_rdonly_cast() even after mark_unknown.

> But we're not going to do non-owning -> 'untrusted' for now, just listing for
> completeness.

right, because of bpf_rdonly_cast availability.

> The distance between what I have now and "type system" approach is smaller
> than "runtime checks" approach. And to get from "type system" to "runtime
> checks" I'd need to:
> 
>   * Remove 'destructive op' invalidation points
>   * Add runtime checks to rbtree_{add,remove}
>   * Add verifier handling of runtime check failure possibility
> 
> Of which only the first point is getting rid of something added for the
> "type system" approach, and won't be much work relative to all the refactoring
> and other improvements that are common between the two approaches.
> 
> So for V2 I will do the "type system + invalidation on 'destructive' ops"
> approach as it'll take less time. This'll get eyes on common improvements
> faster. Then can do a "runtime checks" v3 and we can compare usability of both
> on same base.

Sure, if you think cleanup on rbtree_remove is faster to implement
then definitely go for it.
I was imagining the other way around, but it's fine. Happy to be wrong.
I'm not seeing though how you gonna do that cleanup.
Another id-like field?
Before doing all coding could you post a proposal in the format that I did above?
imo it's much easier to think through in that form instead of analyzing the src code.

      reply	other threads:[~2022-12-09  0:39 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
2022-12-07 16:41   ` Kumar Kartikeya Dwivedi
2022-12-07 18:34     ` Dave Marchevsky
2022-12-07 18:59       ` Alexei Starovoitov
2022-12-07 20:38         ` Dave Marchevsky
2022-12-07 22:46           ` Alexei Starovoitov
2022-12-07 23:42             ` Dave Marchevsky
2022-12-07 19:03       ` Kumar Kartikeya Dwivedi
2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
2022-12-07  1:32   ` Alexei Starovoitov
2022-12-07 16:49   ` Kumar Kartikeya Dwivedi
2022-12-07 19:05     ` Alexei Starovoitov
2022-12-17  8:59       ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 03/13] bpf: Minor refactor of ref_set_release_on_unlock Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types Dave Marchevsky
2022-12-07  1:41   ` Alexei Starovoitov
2022-12-07 18:52     ` Dave Marchevsky
2022-12-07 19:01       ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
2022-12-07  1:48   ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
2022-12-07 14:20   ` kernel test robot
2022-12-06 23:09 ` [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
2022-12-07  1:51   ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
2022-12-07  2:01   ` Alexei Starovoitov
2022-12-17  8:49     ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
2022-12-07  2:18   ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 Dave Marchevsky
2022-12-07  2:39   ` Alexei Starovoitov
2022-12-07  6:46     ` Dave Marchevsky
2022-12-07 18:06       ` Alexei Starovoitov
2022-12-07 23:39         ` Dave Marchevsky
2022-12-08  0:47           ` Alexei Starovoitov
2022-12-08  8:50             ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
2022-12-06 23:10 ` [PATCH bpf-next 13/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
2022-12-07  2:50 ` [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure patchwork-bot+netdevbpf
2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
2022-12-07 22:28   ` Dave Marchevsky
2022-12-07 23:06     ` Alexei Starovoitov
2022-12-08  1:18       ` Dave Marchevsky
2022-12-08  3:51         ` Alexei Starovoitov
2022-12-08  8:28           ` Dave Marchevsky
2022-12-08 12:57             ` Kumar Kartikeya Dwivedi
2022-12-08 20:36               ` Alexei Starovoitov
2022-12-08 23:35                 ` Dave Marchevsky
2022-12-09  0:39                   ` Alexei Starovoitov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221209003939.tsgkghhwznj44agl@macbook-pro-6.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=davemarchevsky@meta.com \
    --cc=kernel-team@fb.com \
    --cc=memxor@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.