Re: [PATCH v13 bpf-next 09/10] bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr

From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Joanne Koong" <joannelkoong@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	"Martin KaFai Lau" <martin.lau@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Network Development" <netdev@vger.kernel.org>,
	"Toke Høiland-Jørgensen" <toke@kernel.org>
Subject: Re: [PATCH v13 bpf-next 09/10] bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
Date: Fri, 10 Mar 2023 22:54:27 +0100	[thread overview]
Message-ID: <20230310215427.ncvhj2xqvjss4uj6@apollo> (raw)
In-Reply-To: <CAEf4BzYo-8ckyi-aogvW9HijNh+Z81CE__mWtmVJtCzuY+oECA@mail.gmail.com>

On Fri, Mar 10, 2023 at 10:29:45PM CET, Andrii Nakryiko wrote:
> On Fri, Mar 10, 2023 at 1:15 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Mar 07, 2023 at 04:01:28PM -0800, Andrii Nakryiko wrote:
> > > > > >
> > > > > > I agree this is simpler, but I'm not sure it will work properly. Verifier won't
> > > > > > know when the lifetime of the buffer ends, so if we disallow spills until its
> > > > > > written over it's going to be a pain for users.
> > > > > >
> > > > > > Something like:
> > > > > >
> > > > > > for (...) {
> > > > > >         char buf[64];
> > > > > >         bpf_dynptr_slice_rdwr(..., buf, 64);
> > > > > >         ...
> > > > > > }
> > > > > >
> > > > > > .. and then compiler decides to spill something where buf was located on stack
> > > > > > outside the for loop. The verifier can't know when buf goes out of scope to
> > > > > > unpoison the slots.
> > > > >
> > > > > You're saying the "verifier doesn't know when buf ...".
> > > > > The same applies to the compiler. It has no visibility
> > > > > into what bpf_dynptr_slice_rdwr is doing.
> > > >
> > > > That is true, it can't assume anything about the side effects. But I am talking
> > > > about the point in the program when the buffer object no longer lives. Use of
> > > > the escaped pointer to such an object any longer is UB. The compiler is well
> > > > within its rights to reuse its stack storage at that point, including for
> > > > spilling registers. Which is why "outside the for loop" in my earlier reply.
> > > >
> > > > > So it never spills into a declared C array
> > > > > as I tried to explain in the previous reply.
> > > > > Spill/fill slots are always invisible to C.
> > > > > (unless of course you do pointer arithmetic asm style)
> > > >
> > > > When the declared array's lifetime ends, it can.
> > > > https://godbolt.org/z/Ez7v4xfnv
> > > >
> > > > The 2nd call to bar as part of unrolled loop happens with fp-8, then it calls
> > > > baz, spills r0 to fp-8, and calls bar again with fp-8.
> >
> > Right. If user writes such program and does explicit store of spillable
> > pointer into a stack.
> > I was talking about compiler generated spill/fill and I still believe
> > that compiler will not be reusing variable's stack memory for them.
> >
> > > >
> > > > If such a stack slot is STACK_POISON, verifier will reject this program.
> >
> > Yes and I think it's an ok trade-off.
> > The user has to specifically code such program to hit this issue.
> > I don't think we will see this in practice.
> > If we do we can consider a more complex fix.
>
> I was just debugging (a completely unrelated) issue where two
> completely independent functions, with different local variables, were
> reusing the same stack slots just because of them being inlined in
> parent functions. So stack reuse happens all the time, unfortunately.
> It's not always obvious or malicious.
>
> >
> > > >
> > > > >
> > > > > > > > +       *(void **)eth = (void *)0xdeadbeef;
> > > > > > > > +       ctx = *(void **)buffer;
> > > > > > > > +       eth_proto = eth->eth_proto + ctx->len;
> > > > > > > >         if (eth_proto == bpf_htons(ETH_P_IP))
> > > > > > > >                 err = process_packet(&ptr, eth, nh_off, false, ctx);
> > > > > > > >
> > > > > > > > I think the proper fix is to treat it as a separate return type distinct from
> > > > > > > > PTR_TO_MEM like PTR_TO_MEM_OR_PKT (or handle PTR_TO_MEM | DYNPTR_* specially),
> > > > > > > > fork verifier state whenever there is a write, so that one path verifies it as
> > > > > > > > PTR_TO_PACKET, while another as PTR_TO_STACK (if buffer was a stack ptr). I
> > > > > > > > think for the rest it's not a problem, but there are allow_ptr_leak checks
> > > > > > > > applied to PTR_TO_STACK and PTR_TO_MAP_VALUE, so that needs to be rechecked.
> > > > > > > > Then we ensure that program is safe in either path.
> > > > > > > >
> > > > > > > > Also we need to fix regsafe to not consider other PTR_TO_MEMs equivalent to such
> > > > > > > > a pointer. We could also fork verifier states on return, to verify either path
> > > > > > > > separately right from the point following the call instruction.
> > > > > > >
> > > > > > > This is too complex imo.
> > > > > >
> > > > > > A better way to phrase this is to verify with R0 = PTR_TO_PACKET in one path,
> > > > > > and push_stack with R0 = buffer's reg->type + size set to len in the other path
> > > > > > for exploration later. In terms of verifier infra everything is there already,
> > > > > > it just needs to analyze both cases which fall into the regular code handling
> > > > > > the reg->type's. Probably then no adjustments to regsafe are needed either. It's
> > > > > > like exploring branch instructions.
> > > > >
> > > > > I still don't like it. There is no reason to go a complex path
> > > > > when much simpler suffices.
> > >
> > > This issue you are discussing is the reason we don't support
> > > bpf_dynptr_from_mem() taking PTR_TO_STACK (which is a pity, but we
> > > postponed it initially).
> > >
> > > I've been thinking about something along the lines of STACK_POISON,
> > > but remembering associated id/ref_obj_id. When ref is released, turn
> > > STACK_POISON to STACK_MISC. If it's bpf_dynptr_slice_rdrw() or
> > > bpf_dynptr_from_mem(), which don't have ref_obj_id, they still have ID
> > > associated with returned pointer, so can we somehow incorporate that?
> >
> > There is dynptr_id in PTR_TO_MEM that is used by destroy_if_dynptr_stack_slot(),
> > but I don't see how we can use it to help this case.
> > imo plain STACK_POISON that is overwriteable by STACK_MISC/STACK_ZERO
> > should be good enough in practice.
>
> That's basically what I'm proposing, except when this overwrite
> happens we have to go and invalidate all the PTR_TO_MEM references
> that are pointing to that stack slot. E.g., in the below case
> (assuming we allow LOCAL dynptr to be constructed from stack)
>
> char buf[256], *p;
> struct bpf_dynptr dptr;
>
> bpf_dynptr_from_mem(buf, buf+256, &dptr);
>
> p = bpf_dynptr_data(&dptr, 128, 16); /* get 16-byte slice into buf, at
> offset 128 */
>
> /* buf[128] through buf[128+16] are STACK_POISON */
>
> buf[128] = 123;
>
> So here is where the problem happens. Should we invalidate just p
> here? Or entire dptr? Haven't thought much about details, but
> something like that. It was getting messy when we started to think
> about this with Joanne.
>

I think there's also the option (for your particular case) to conservatively
mark the entire range a dynptr pointing to stack represents as STACK_MISC
whenever a *write* happens (through bpf_dynptr_write or pointers obtained using
bpf_dynptr_data). We do know exact memory start and length when creating the
dynptr, right?

If somebody tries to be funky, e.g. by doing a spill and then trying to
overwrite its value, the entire range becomes STACK_MISC, so reload would just
mark the reg as unknown. You can be a bit smarter when you know the exact start
and length of stack memory e.g. bpf_dynptr_data pointer points to, but I'm
unsure that will be needed.

Otherwise things work as normal, users can spill stuff to the stack if they
wants, and as long as they are not writing through the dynptr again, we don't
remark the entire range STACK_MISC. If that was the last use of dynptr and it
went out of scope, things work normally. If not, the dynptr and its buffer
should still be in scope so it won't be the compiler doing something funny
spilling stuff into it, only the user.

Due to STACK_INVALID complications over-eager remarking as STACK_MISC might only
make sense for privileged programs, but otherwise I think this is ok.

Am I missing something important?