Re: [PATCH v1 bpf-next 4/5] bpf: Add bpf_dynptr_clone

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: bpf@vger.kernel.org, andrii@kernel.org, ast@kernel.org,
	daniel@iogearbox.net
Subject: Re: [PATCH v1 bpf-next 4/5] bpf: Add bpf_dynptr_clone
Date: Mon, 17 Apr 2023 16:46:13 -0700	[thread overview]
Message-ID: <CAEf4BzZPUmcSAPwtgVRebCDWccaU6EC3yHt99Asm=akoewbBEA@mail.gmail.com> (raw)
In-Reply-To: <CAJnrk1b+FsAUHneWdyarMs6kwd8CBNFi9s7n38KXQ2uF+NkvTw@mail.gmail.com>

On Thu, Apr 13, 2023 at 11:03 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Apr 12, 2023 at 3:12 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > The cloned dynptr will point to the same data as its parent dynptr,
> > > with the same type, offset, size and read-only properties.
> > >
> > > Any writes to a dynptr will be reflected across all instances
> > > (by 'instance', this means any dynptrs that point to the same
> > > underlying data).
> > >
> > > Please note that data slice and dynptr invalidations will affect all
> > > instances as well. For example, if bpf_dynptr_write() is called on an
> > > skb-type dynptr, all data slices of dynptr instances to that skb
> > > will be invalidated as well (eg data slices of any clones, parents,
> > > grandparents, ...). Another example is if a ringbuf dynptr is submitted,
> > > any instance of that dynptr will be invalidated.
> > >
> > > Changing the view of the dynptr (eg advancing the offset or
> > > trimming the size) will only affect that dynptr and not affect any
> > > other instances.
> > >
> > > One example use case where cloning may be helpful is for hashing or
> > > iterating through dynptr data. Cloning will allow the user to maintain
> > > the original view of the dynptr for future use, while also allowing
> > > views to smaller subsets of the data after the offset is advanced or the
> > > size is trimmed.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  kernel/bpf/helpers.c  |  14 +++++
> > >  kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
> > >  2 files changed, 126 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index bac4c6fe49f0..108f3bcfa6da 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
> > >         return ptr->offset;
> > >  }
> > >
> > > +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
> > > +                                struct bpf_dynptr_kern *clone__uninit)
> >
> > I think most of uses for bpf_dynptr_clone() would be to get a partial
> > view (like you mentioned above, to, e.g., do a hashing of a part of
> > the memory range). So it feels it would be best UX if clone would
> > allow you to define a new range in one go. So e.g., to create a
> > "sub-dynptr" for range of bytes [10, 30), it should be enough to:
> >
> > struct bpf_dynptr orig_ptr, new_ptr;
> >
> > bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr);
> >
> > Instead of:
> >
> > bpf_dynptr_clone(&orig_ptr, &new_ptr);
> > bpf_dynptr_advance(&new_ptr, 10);
> > bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30);
> >
>
> I don't think we can do this because we can't have bpf_dynptr_clone()
> fail (which might happen if we take in a range, if the range is
> invalid). This is because in the case where the clone is of a
> reference-counted dynptr (eg like a ringbuf-type dynptr), the clone
> even if it's invalid must be treated by the verifier as a legit dynptr
> (since the verifier can't know ahead of time whether the clone call
> will succeed or not) which means if the invalid clone dynptr is then
> passed into a reference-releasing function, the verifier will release
> the reference but the actual reference won't be released at runtime
> (since the clone dynptr is invalid), which would leak the reference.
> An example is something like:
>
>  // invalid range is passed, error is returned and new_ptr is invalid
> bpf_dynptr_clone(&ringbuf_ptr, 9999999, 9999999, &new_ptr);
> // this releases the reference and invalidates both new_ptr and ringbuf_ptr
> bpf_ringbuf_discard_dynptr(&new_ptr, 0);
>
> At runtime, bpf_ringbuf_discard_dynptr() will be a no-op since new_ptr
> is invalid - the ringbuf record still needs to be submitted/discarded,
> but the verifier will think this already happened

Ah, tricky, good point. But ok, I guess with bpf_dynptr_adjust()
proposal in another email this would be ok:

bpf_dynptr_clone(..); /* always succeeds */
bpf_dynptr_adjust(&new_ptr, 10, 30); /* could fail to adjust, but
dynptr is left valid */

Right?

>
> >
> > This, btw, shows the awkwardness of the bpf_dynptr_trim() approach.
> >
> > If someone really wanted an exact full-sized copy, it's trivial:
> >
> > bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr);
> >
> >
> > BTW, let's rename bpf_dynptr_get_size -> bpf_dynptr_size()? That
> > "get_" is a sore in the eye, IMO.
>
> Will do!
> >
> >
> > > +{
> > > +       if (!ptr->data) {
> > > +               bpf_dynptr_set_null(clone__uninit);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       memcpy(clone__uninit, ptr, sizeof(*clone__uninit));
> >
> > why memcpy instead of `*clone__uninit = *ptr`?
> >
> No good reason :) I'll change this for v2
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
> > >  {
> > >         return obj;
> > > @@ -2429,6 +2442,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_get_size)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_get_offset)
> > > +BTF_ID_FLAGS(func, bpf_dynptr_clone)
> > >  BTF_SET8_END(common_btf_ids)
> > >
> > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 3660b573048a..804cb50050f9 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -302,6 +302,7 @@ struct bpf_kfunc_call_arg_meta {
> > >         struct {
> > >                 enum bpf_dynptr_type type;
> > >                 u32 id;
> > > +               u32 ref_obj_id;
> > >         } initialized_dynptr;
> > >         struct {
> > >                 u8 spi;
> > > @@ -963,24 +964,15 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
> > >         return 0;
> > >  }
> > >
> > > -static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi)
> > >  {
> > > -       struct bpf_func_state *state = func(env, reg);
> > > -       int spi, i;
> > > -
> > > -       spi = dynptr_get_spi(env, reg);
> > > -       if (spi < 0)
> > > -               return spi;
> > > +       int i;
> > >
> > >         for (i = 0; i < BPF_REG_SIZE; i++) {
> > >                 state->stack[spi].slot_type[i] = STACK_INVALID;
> > >                 state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > >         }
> > >
> > > -       /* Invalidate any slices associated with this dynptr */
> > > -       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type))
> > > -               WARN_ON_ONCE(release_reference(env, state->stack[spi].spilled_ptr.ref_obj_id));
> > > -
> > >         __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> > >         __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> > >
> > > @@ -1007,6 +999,51 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
> > >          */
> > >         state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > >         state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > > +}
> > > +
> > > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +{
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       int spi;
> > > +
> > > +       spi = dynptr_get_spi(env, reg);
> > > +       if (spi < 0)
> > > +               return spi;
> > > +
> > > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > > +               int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
> > > +               int i;
> > > +
> > > +               /* If the dynptr has a ref_obj_id, then we need to invaldiate
> >
> > typo: invalidate
> >
> > > +                * two things:
> > > +                *
> > > +                * 1) Any dynptrs with a matching ref_obj_id (clones)
> > > +                * 2) Any slices associated with the ref_obj_id
> >
> > I think this is where this dynptr_id confusion comes from. The rule
> > should be "any slices derived from this dynptr". But as mentioned on
> > another thread, it's a separate topic which we can address later.
> >
> If there's a parent and a clone and slices derived from the parent and
> slices derived from the clone, if the clone is invalidated then we
> need to invalidate slices associated with the parent as well. So
> shouldn't it be "any slices associated with the ref obj id" not "any
> slices derived from this dynptr"? (also just a note, parent/clone
> slices will share the same ref obj id and the same dynptr id, so
> checking against either does the same thing)

So, we have a ringbuf dynptr with ref_obj_id=1, id=2, ok? We clone it,
clone gets ref_obj_id=1, id=3. If either original dynptr or clone
dynptr is released due to bpf_ringbuf_discard_dynptr(), we invalidate
all the dynptrs with ref_obj_id=1. During invalidation of each dynptr,
we invalidate all the slices with ref_obj_id==dynptr's id. So we'll
invalidate slices derived from dynptr with id=2 (original dynptr), and
then all the slices derived from dynptr with id=3?

>
> > > +                */
> > > +
> > > +               /* Invalidate any slices associated with this dynptr */
> > > +               WARN_ON_ONCE(release_reference(env, ref_obj_id));
> > > +
> > > +               /* Invalidate any dynptr clones */
> > > +               for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) {
> > > +                       if (state->stack[i].spilled_ptr.ref_obj_id == ref_obj_id) {
> >
> > nit: invert if condition and continue to reduce nestedness of the rest
> > the loop body
>
> Ooh good idea
> >
> > > +                               /* it should always be the case that if the ref obj id
> > > +                                * matches then the stack slot also belongs to a
> > > +                                * dynptr
> > > +                                */
> > > +                               if (state->stack[i].slot_type[0] != STACK_DYNPTR) {
> > > +                                       verbose(env, "verifier internal error: misconfigured ref_obj_id\n");
> > > +                                       return -EFAULT;
> > > +                               }
> > > +                               if (state->stack[i].spilled_ptr.dynptr.first_slot)
> > > +                                       invalidate_dynptr(env, state, i);
> > > +                       }
> > > +               }
> > > +
> > > +               return 0;
> > > +       }
> > > +
> > > +       invalidate_dynptr(env, state, spi);
> > >
> > >         return 0;
> > >  }
> > > @@ -6967,6 +7004,50 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> > >         return 0;
> > >  }
> > >
> > > +static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type,
> > > +                              int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta)
> > > +{
> > > +       struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > > +       struct bpf_reg_state *first_reg_state, *second_reg_state;
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type;
> > > +       int err, spi, ref_obj_id;
> > > +
> > > +       if (!dynptr_type) {
> > > +               verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n");
> > > +               return -EFAULT;
> > > +       }
> > > +       arg_type |= get_dynptr_type_flag(dynptr_type);
> >
> >
> > what about MEM_RDONLY and MEM_UNINIT flags, do we need to derive and add them?
>
> I don't think we need MEM_UNINIT because we can't clone an
> uninitialized dynptr. But yes, definitely MEM_RDONLY. I missed that, I
> will add it in in v2
>
> >
> > > +
> > > +       err = process_dynptr_func(env, regno, insn_idx, arg_type);
> > > +       if (err < 0)
> > > +               return err;
> > > +
> > > +       spi = dynptr_get_spi(env, reg);
> > > +       if (spi < 0)
> > > +               return spi;
> > > +
> > > +       first_reg_state = &state->stack[spi].spilled_ptr;
> > > +       second_reg_state = &state->stack[spi - 1].spilled_ptr;
> > > +       ref_obj_id = first_reg_state->ref_obj_id;
> > > +
> > > +       /* reassign the clone the same dynptr id as the original */
> > > +       __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
> > > +       __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
> >
> > I'm not sure why clone should have the same dynptr_id? Isn't it a new
> > instance of a dynptr? I get preserving ref_obj_id (if refcounted), but
> > why reusing dynptr_id?..
> >
> I think we need to also copy over the dynptr id because in the case of
> a non-reference counted dynptr, if the parent (or clone) is
> invalidated (eg overwriting bytes of the dynptr on the stack), we must
> also invalidate the slices of the clone (or parent)

yep, right now we'll have to do that because we have dynptr_id. But if
we get rid of it and stick to ref_obj_id and id, then clone would need
to get a new id, but keep ref_obj_id, right?

> >
> > > +
> > > +       if (meta->initialized_dynptr.ref_obj_id) {
> > > +               /* release the new ref obj id assigned during process_dynptr_func */
> > > +               err = release_reference_state(cur_func(env), ref_obj_id);
> > > +               if (err)
> > > +                       return err;
> >
> > ugh... this is not good to create reference just to release. If we
> > need to reuse logic, let's reuse the logic without parts that
> > shouldn't happen. Please check if we can split process_dynptr_func()
> > appropriately to allow this.
>
> I'll change this for v2. I think the simplest approach would be having
> mark_stack_slots_dynptr() take in a boolean or something that'll
> indicate whether it should acquire a new reference state or not
> >
> > > +               /* reassign the clone the same ref obj id as the original */
> > > +               first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> > > +               second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> >
> > [...]