All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Networking <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
Date: Thu, 22 Apr 2021 14:46:10 +0200	[thread overview]
Message-ID: <8735vih4h9.fsf@toke.dk> (raw)
In-Reply-To: <ab19b29f-f7a7-5f4b-001f-28d57c6c203f@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/22/21 11:08 AM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> On 4/21/21 9:48 PM, Toke Høiland-Jørgensen wrote:
>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>>>>> <memxor@gmail.com> wrote:
>>> [...]
>>>>>> ---
>>>>>>    tools/lib/bpf/libbpf.h   |  44 ++++++
>>>>>>    tools/lib/bpf/libbpf.map |   3 +
>>>>>>    tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>>>>>    3 files changed, 360 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>>>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>>>>> --- a/tools/lib/bpf/libbpf.h
>>>>>> +++ b/tools/lib/bpf/libbpf.h
>>>>>> @@ -16,6 +16,8 @@
>>>>>>    #include <stdbool.h>
>>>>>>    #include <sys/types.h>  // for size_t
>>>>>>    #include <linux/bpf.h>
>>>>>> +#include <linux/pkt_sched.h>
>>>>>> +#include <linux/tc_act/tc_bpf.h>
>>>>>
>>>>> apart from those unused macros below, are these needed in public API header?
>>>>>
>>>>>>    #include "libbpf_common.h"
>>>>>>
>>>>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>>>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>>>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>>>>
>>>>>> +/* Convenience macros for the clsact attach hooks */
>>>>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>>>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>>>>
>>>>> these seem to be used only internally, why exposing them in public
>>>>> API?
>>>>
>>>> No they're "aliases" for when you want to attach the filter directly to
>>>> the interface (and thus install the clsact qdisc as the root). You can
>>>> also use the filter with an existing qdisc (most commonly HTB), in which
>>>> case you need to specify the qdisc handle as the root. We have a few
>>>> examples of this use case:
>>>>
>>>> https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
>>>> and
>>>> https://github.com/xdp-project/xdp-cpumap-tc
>>>
>>> I'm a bit puzzled, could you elaborate on your use case on why you wouldn't
>>> use the tc egress hook for those especially given it's guaranteed to run
>>> outside of root qdisc lock?
>> 
>> Jesper can correct me if I'm wrong, but I think the first one of the
>> links above is basically his implementation of just that EDT-based
>> shaper. And it works reasonably well, except you don't get the nice
>> per-flow scheduling and sparse flow prioritisation like in FQ-CoDel
>> unless you implement that yourself in BPF when you set the timestamps
>> (and that is by no means trivial to implement).
>> 
>> So if you want to use any of the features of the existing qdiscs (I have
>> also been suggesting to people that they use tc_bpf if they want to
>> customise sch_cake's notion of flows or shaping tiers), you need to be
>> able to attach the filter to an existing qdisc. Sure, this means you're
>> still stuck behind the qdisc lock, but for some applications that is
>> fine (not everything is a data centre, some devices don't have that many
>> CPUs anyway; and as the second example above shows, you can get around
>> the qdisc lock by some clever use of partitioning of flows using
>> cpumap).
>> 
>> So what this boils down to is, we should keep the 'parent' parameter not
>> just as an egress/ingress enum, but also as a field the user can fill
>> out. I'm fine with moving the latter into the opts struct, though, so
>> maybe the function parameter can be an enum like:
>> 
>> enum bpf_tc_attach_point {
>>    BPF_TC_CLSACT_INGRESS,
>>    BPF_TC_CLSACT_EGRESS,
>>    BPF_TC_QDISC_PARENT
>> };
>> 
>> where if you set the last one you have to fill in the parent in opts?
>
> Fair enough, I still think this is a bit backwards and should be discouraged
> given the constraints, but if you have an actual need for it ... I'd rather
> simplify API naming, the fact that it's clsact is an implementation detail
> and shouldn't matter to a user, like:
>
> enum bpf_tc_attach_point {
> 	BPF_TC_INGRESS,
> 	BPF_TC_EGRESS,
> 	BPF_TC_CUSTOM_PARENT,
> };
>
> For BPF_TC_INGRESS and BPF_TC_EGRESS, I would enforce opts parent parameter
> to be /zero/ from the API.

OK, SGTM :)

-Toke


  reply	other threads:[~2021-04-22 12:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20 19:37 [PATCH bpf-next v3 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-21  6:37   ` Yonghong Song
2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-21  6:58   ` Yonghong Song
2021-04-21 17:06     ` Kumar Kartikeya Dwivedi
2021-04-22  1:56       ` Yonghong Song
2021-04-21 18:19   ` Andrii Nakryiko
2021-04-21 19:48     ` Toke Høiland-Jørgensen
2021-04-21 23:14       ` Daniel Borkmann
2021-04-22  9:08         ` Toke Høiland-Jørgensen
2021-04-22 11:51           ` Daniel Borkmann
2021-04-22 12:46             ` Toke Høiland-Jørgensen [this message]
2021-04-21 22:59   ` Daniel Borkmann
2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
2021-04-21 23:21       ` Daniel Borkmann
2021-04-21 23:30         ` Kumar Kartikeya Dwivedi
2021-04-21 23:41           ` Daniel Borkmann
2021-04-22  9:47             ` Shaun Crampton
2021-04-22 11:26               ` Daniel Borkmann
2021-04-22  3:43     ` Andrii Nakryiko
2021-04-22 15:35       ` Daniel Borkmann
2021-04-22 18:28         ` Andrii Nakryiko
2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-21  7:01   ` Yonghong Song
2021-04-21 18:24   ` Andrii Nakryiko
2021-04-21 19:56     ` Kumar Kartikeya Dwivedi
2021-04-21 20:38       ` Toke Høiland-Jørgensen
2021-04-21 22:41         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735vih4h9.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.