From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com,
netdev@vger.kernel.org, Kui-Feng Lee <thinker.li@gmail.com>
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Thu, 1 Feb 2024 17:47:44 -0800 [thread overview]
Message-ID: <8a2e9cf6-ef36-4ba8-bb95-fb592bdce5db@linux.dev> (raw)
In-Reply-To: <CAMB2axOdeE5dPeFGvgM5QVd9a47srtvDFZd1VUYjSarNJC=T_w@mail.gmail.com>
On 1/31/24 8:23 AM, Amery Hung wrote:
>>> 1. Passing a referenced kptr into a bpf program, which will also need
>>> to be released, or exchanged into maps or allocated objects.
>> "enqueue" should be the one considering here:
>>
>> struct Qdisc_ops {
>> /* ... */
>> int (*enqueue)(struct sk_buff *skb,
>> struct Qdisc *sch,
>> struct sk_buff **to_free);
>>
>> };
>>
>> The verifier only marks the skb as a trusted kptr but does not mark its
>> reg->ref_obj_id. Take a look at btf_ctx_access(). In particular:
>>
>> if (prog_args_trusted(prog))
>> info->reg_type |= PTR_TRUSTED;
>>
>> The verifier does not know the skb ownership is passed into the ".enqueue" ops
>> and does not know the bpf prog needs to release it or store it in a map.
>>
>> The verifier tracks the reference state when a KF_ACQUIRE kfunc is called (just
>> an example, not saying we need to use KF_ACQUIRE kfunc). Take a look at
>> acquire_reference_state() which is the useful one here.
>>
>> Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier can
>> always acquire_reference_state() for the "struct sk_buff *skb" argument.
>>
>> Take a look at a recent RFC:
>> https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@gmail.com/
>> which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC patch
>> is tagging the argument could be NULL by appending "__nullable" to the argument
>> name. The verifier will enforce that the bpf prog must check for NULL first.
>>
>> The similar idea can be used here but with a different tagging (for example,
>> "__must_release", admittedly not a good name). While the RFC patch is
>> in-progress, for now, may be hardcode for the ".enqueue" ops in
>> check_struct_ops_btf_id() and always acquire_reference_state() for the skb. This
>> part can be adjusted later once the RFC patch will be in shape.
>>
> Make sense. One more thing to consider here is that .enqueue is
> actually a reference acquiring and releasing function at the same
> time. Assuming ctx written to by a struct_ops program can be seen by
> the kernel, another new tag for the "to_free" argument will still be
> needed so that the verifier can recognize when writing skb to
> "to_free".
I don't think "to_free" needs special tagging. I was thinking the
"bpf_qdisc_drop" kfunc could be a KF_RELEASE. Ideally, it should be like
__bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
return qdisc_drop(skb, sch, to_free);
}
However, I don't think the verifier supports pointer to pointer now. Meaning
"struct sk_buff **to_free" does not work.
If the ptr indirection spinning in my head is sound, one possible solution to
unblock the qdisc work is to introduce:
struct bpf_sk_buff_ptr {
struct sk_buff *skb;
};
and the bpf_qdisc_drop kfunc:
__bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
struct bpf_sk_buff_ptr *to_free_list)
and the enqueue prog:
SEC("struct_ops/enqueue")
int BPF_PROG(test_enqueue, struct sk_buff *skb,
struct Qdisc *sch,
struct bpf_sk_buff_ptr *to_free_list)
{
return bpf_qdisc_drop(skb, sch, to_free_list);
}
and the ".is_valid_access" needs to change the btf_type from "struct sk_buff **"
to "struct bpf_sk_buff_ptr *" which is sort of similar to the bpf_tcp_ca.c that
is changing the "struct sock *" type to the "struct tcp_sock *" type.
I have the compiler-tested idea here:
https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=qdisc-ideas
>
>> Then one more thing is to track when the struct_ops bpf prog is actually reading
>> the value of the skb pointer. One thing is worth to mention here, e.g. a
>> struct_ops prog for enqueue:
>>
>> SEC("struct_ops")
>> int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch,
>> struct sk_buff **to_free)
>> {
>> return bpf_qdisc_drop(skb, sch, to_free);
>> }
>>
>> Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to an array
>> of __u64 as the only argument. The skb is actually in ctx[0], sch is in
>> ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g. r1 = ctx[0]),
>> btf_ctx_access() marks the reg_type to PTR_TRUSTED. It needs to also initialize
>> the reg->ref_obj_id by the id obtained earlier from acquire_reference_state()
>> during check_struct_ops_btf_id() somehow.
next prev parent reply other threads:[~2024-02-02 1:47 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 21:56 [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 1/8] " Amery Hung
2024-01-23 23:51 ` Martin KaFai Lau
2024-01-24 5:22 ` Amery Hung
2024-01-26 2:22 ` Martin KaFai Lau
2024-01-27 1:17 ` Amery Hung
2024-01-30 6:39 ` Martin KaFai Lau
2024-01-30 17:49 ` Kui-Feng Lee
2024-01-31 1:01 ` Martin KaFai Lau
2024-01-31 16:49 ` Kui-Feng Lee
2024-01-31 16:59 ` Amery Hung
2024-01-31 16:23 ` Amery Hung
2024-02-02 1:47 ` Martin KaFai Lau [this message]
2024-02-09 20:14 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23 0:17 ` Andrii Nakryiko
2024-01-23 19:40 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29 ` Daniel Borkmann
2024-01-26 19:49 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10 ` Daniel Borkmann
2024-01-24 12:09 ` Jamal Hadi Salim
2024-01-24 13:07 ` Daniel Borkmann
2024-01-24 14:11 ` Jamal Hadi Salim
2024-01-24 15:26 ` Daniel Borkmann
2024-01-24 21:26 ` Amery Hung
2024-01-25 11:57 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a2e9cf6-ef36-4ba8-bb95-fb592bdce5db@linux.dev \
--to=martin.lau@linux.dev \
--cc=ameryhung@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=thinker.li@gmail.com \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yangpeihao@sjtu.edu.cn \
--cc=yepeilin.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).