Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc

From: Amery Hung <ameryhung@gmail.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
	 jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
	xiyou.wangcong@gmail.com,  yepeilin.cs@gmail.com,
	netdev@vger.kernel.org,  Kui-Feng Lee <thinker.li@gmail.com>
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Fri, 9 Feb 2024 12:14:55 -0800	[thread overview]
Message-ID: <CAMB2axMg1RQOaOA+5bvh234YK98o7vMmm5B4+VT__kS1=Tcqyw@mail.gmail.com> (raw)
In-Reply-To: <8a2e9cf6-ef36-4ba8-bb95-fb592bdce5db@linux.dev>

On Thu, Feb 1, 2024 at 5:47 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/31/24 8:23 AM, Amery Hung wrote:
> >>> 1. Passing a referenced kptr into a bpf program, which will also need
> >>> to be released, or exchanged into maps or allocated objects.
> >> "enqueue" should be the one considering here:
> >>
> >> struct Qdisc_ops {
> >>          /* ... */
> >>          int                     (*enqueue)(struct sk_buff *skb,
> >>                                             struct Qdisc *sch,
> >>                                             struct sk_buff **to_free);
> >>
> >> };
> >>
> >> The verifier only marks the skb as a trusted kptr but does not mark its
> >> reg->ref_obj_id. Take a look at btf_ctx_access(). In particular:
> >>
> >>          if (prog_args_trusted(prog))
> >>                  info->reg_type |= PTR_TRUSTED;
> >>
> >> The verifier does not know the skb ownership is passed into the ".enqueue" ops
> >> and does not know the bpf prog needs to release it or store it in a map.
> >>
> >> The verifier tracks the reference state when a KF_ACQUIRE kfunc is called (just
> >> an example, not saying we need to use KF_ACQUIRE kfunc). Take a look at
> >> acquire_reference_state() which is the useful one here.
> >>
> >> Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier can
> >> always acquire_reference_state() for the "struct sk_buff *skb" argument.
> >>
> >> Take a look at a recent RFC:
> >> https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@gmail.com/
> >> which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC patch
> >> is tagging the argument could be NULL by appending "__nullable" to the argument
> >> name. The verifier will enforce that the bpf prog must check for NULL first.
> >>
> >> The similar idea can be used here but with a different tagging (for example,
> >> "__must_release", admittedly not a good name). While the RFC patch is
> >> in-progress, for now, may be hardcode for the ".enqueue" ops in
> >> check_struct_ops_btf_id() and always acquire_reference_state() for the skb. This
> >> part can be adjusted later once the RFC patch will be in shape.
> >>
> > Make sense. One more thing to consider here is that .enqueue is
> > actually a reference acquiring and releasing function at the same
> > time. Assuming ctx written to by a struct_ops program can be seen by
> > the kernel, another new tag for the "to_free" argument will still be
> > needed so that the verifier can recognize when writing skb to
> > "to_free".
>
> I don't think "to_free" needs special tagging. I was thinking the
> "bpf_qdisc_drop" kfunc could be a KF_RELEASE. Ideally, it should be like
>
> __bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
>                                struct sk_buff **to_free)
> {
>         return qdisc_drop(skb, sch, to_free);
> }
>
> However, I don't think the verifier supports pointer to pointer now. Meaning
> "struct sk_buff **to_free" does not work.
>
> If the ptr indirection spinning in my head is sound, one possible solution to
> unblock the qdisc work is to introduce:
>
> struct bpf_sk_buff_ptr {
>         struct sk_buff *skb;
> };
>
> and the bpf_qdisc_drop kfunc:
>
> __bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
>                                 struct bpf_sk_buff_ptr *to_free_list)
>
> and the enqueue prog:
>
> SEC("struct_ops/enqueue")
> int BPF_PROG(test_enqueue, struct sk_buff *skb,
>               struct Qdisc *sch,
>               struct bpf_sk_buff_ptr *to_free_list)
> {
>         return bpf_qdisc_drop(skb, sch, to_free_list);
> }
>
> and the ".is_valid_access" needs to change the btf_type from "struct sk_buff **"
> to "struct bpf_sk_buff_ptr *" which is sort of similar to the bpf_tcp_ca.c that
> is changing the "struct sock *" type to the "struct tcp_sock *" type.
>
> I have the compiler-tested idea here:
> https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=qdisc-ideas
>
>
> >
> >> Then one more thing is to track when the struct_ops bpf prog is actually reading
> >> the value of the skb pointer. One thing is worth to mention here, e.g. a
> >> struct_ops prog for enqueue:
> >>
> >> SEC("struct_ops")
> >> int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch,
> >>               struct sk_buff **to_free)
> >> {
> >>          return bpf_qdisc_drop(skb, sch, to_free);
> >> }
> >>
> >> Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to an array
> >> of __u64 as the only argument. The skb is actually in ctx[0], sch is in
> >> ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g. r1 = ctx[0]),
> >> btf_ctx_access() marks the reg_type to PTR_TRUSTED. It needs to also initialize
> >> the reg->ref_obj_id by the id obtained earlier from acquire_reference_state()
> >> during check_struct_ops_btf_id() somehow.
>

I appreciate the idea. The pointer redirection works without problems.
I now have a working fifo bpf qdisc using struct_ops. I will explore
how other parts of qdisc work with struct_ops.

Thanks,
Amery