All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Martin KaFai Lau <kafai@fb.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Cong Wang <cong.wang@bytedance.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>
Subject: Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc
Date: Mon, 06 Sep 2021 13:45:38 +0200	[thread overview]
Message-ID: <87fsuiq659.fsf@toke.dk> (raw)
In-Reply-To: <CAM_iQpUhmYBvu7p_jdiYxxPLqMmo3EFfRPfEsciCypUpM58UnQ@mail.gmail.com>

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Wed, Sep 1, 2021 at 3:42 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> John Fastabend <john.fastabend@gmail.com> writes:
>>
>> > Cong Wang wrote:
>> >> On Tue, Aug 24, 2021 at 4:47 PM Martin KaFai Lau <kafai@fb.com> wrote:
>> >> > Please explain more on this.  What is currently missing
>> >> > to make qdisc in struct_ops possible?
>> >>
>> >> I think you misunderstand this point. The reason why I avoid it is
>> >> _not_ anything is missing, quite oppositely, it is because it requires
>> >> a lot of work to implement a Qdisc with struct_ops approach, literally
>> >> all those struct Qdisc_ops (not to mention struct Qdisc_class_ops).
>> >> WIth current approach, programmers only need to implement two
>> >> eBPF programs (enqueue and dequeue).
>> >>
>> >> Thanks.
>> >
>> > Another idea. Rather than work with qdisc objects which creates all
>> > these issues with how to work with existing interfaces, filters, etc.
>> > Why not create an sk_buff map? Then this can be used from the existing
>> > egress/ingress hooks independent of the actual qdisc being used.
>>
>> I agree. In fact, I'm working on doing just this for XDP, and I see no
>> reason why the map type couldn't be reused for skbs as well. Doing it
>> this way has a couple of benefits:
>
> I do see a lot of reasons, for starters, struct skb_buff is very different
> from struct xdp_buff, any specialized map can not be reused. I guess you
> are using a generic one, how do you handle the refcnt at least for skb?

Well, you can't keep XDP frames and skbs in the same map instance, but
you can create a map type that can be instantiated to hold either type
and otherwise keep the same semantics. The map can just inc/dec the
refcnt as skbs are added/removed from it.

>> - It leaves more flexibility to BPF: want a simple FIFO queue? just
>>   implement that with a single queue map. Or do you want to build a full
>>   hierarchical queueing structure? Just instantiate as many queue maps
>>   as you need to achieve this. Etc.
>
> Please give an example without a queue. ;) Queue is too simple, show us
> something more useful please. How do you plan to re-implement EDT with
> just queues?

I'm using 'queue' as a shorthand for any queueing/scheduling algorithm
implementable by a qdisc. We need to cover them all, obviously, not just
FIFO queues (in fact I think we should actively be discouraging those,
but that's a different story :) )

For EDT it would be something like:

- On enqueue, stick frames into the map with a rank corresponding to
  their transmission time (the map implements the PIFO queue, just like
  your patch).

- (re-)arm a BPF timer to fire at the time of the next transmission
  event, and have that timer trigger interface TX.

The first bit is straight-forward, and that last bit needs a new helper
or something like it. For qdiscs I guess we could just expose
qdisc_watchdog()? For XDP we'd need something new...


>> - The behaviour is defined entirely by BPF program behaviour, and does
>>   not require setting up a qdisc hierarchy in addition to writing BPF
>>   code.
>
> I have no idea why you call this a benefit, because my goal is to
> replace Qdisc's, not to replace any other things. You know there are
> plenty of Qdisc's which are not implemented in Linux kernel.

It's a benefit because it means you can keep everything together. I.e.,
you don't need to *both* write BPF code implementing your qdisc, *and* a
setup script to build the qdisc hierarchy. That simplifies deployment.

I suppose we could support inserting BPF qdiscs into a qdisc hierarchy
as well if needed. I don't personally see much use for that, but if
there's a use case, sure, why not?

>> - It should be possible to structure the hooks in a way that allows
>>   reusing queueing algorithm implementations between the qdisc and XDP
>>   layers.
>
> XDP has no skb but xdp_buff, no? And again, why only queues?

See above :)

-Toke


  reply	other threads:[~2021-09-06 11:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-21  1:02 [RFC Patch net-next] net_sched: introduce eBPF based Qdisc Cong Wang
2021-08-24 23:47 ` Martin KaFai Lau
2021-09-01  4:39   ` Cong Wang
2021-09-01  5:45     ` John Fastabend
2021-09-01 10:42       ` Toke Høiland-Jørgensen
2021-09-01 17:45         ` Martin KaFai Lau
2021-09-01 18:03           ` Alexei Starovoitov
2021-09-02 16:57           ` Toke Høiland-Jørgensen
2021-09-02 20:40             ` John Fastabend
2021-09-02 22:27               ` Toke Høiland-Jørgensen
2021-09-02 23:35                 ` Martin KaFai Lau
2021-09-03 14:44                   ` Toke Høiland-Jørgensen
2021-09-03 15:33                     ` Jamal Hadi Salim
2021-09-10  6:55                     ` Martin KaFai Lau
2021-09-10 11:31                       ` Toke Høiland-Jørgensen
2021-09-04  1:09           ` Cong Wang
2021-09-17  4:19             ` Martin KaFai Lau
2021-09-04  1:30         ` Cong Wang
2021-09-06 11:45           ` Toke Høiland-Jørgensen [this message]
2021-09-04  1:05       ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsuiq659.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.