All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: "Martin KaFai Lau" <kafai@fb.com>,
	"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Cong Wang" <cong.wang@bytedance.com>,
	"Jamal Hadi Salim" <jhs@mojatatu.com>,
	"Jiri Pirko" <jiri@resnulli.us>
Subject: Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc
Date: Fri, 3 Sep 2021 18:05:55 -0700	[thread overview]
Message-ID: <CAM_iQpUSJ=QRdjwo-0DG4O1MRkhu8k1yQQx2KM_UhPp=bkeOeA@mail.gmail.com> (raw)
In-Reply-To: <612f137f4dc5c_152fe20891@john-XPS-13-9370.notmuch>

On Tue, Aug 31, 2021 at 10:45 PM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > On Tue, Aug 24, 2021 at 4:47 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > Please explain more on this.  What is currently missing
> > > to make qdisc in struct_ops possible?
> >
> > I think you misunderstand this point. The reason why I avoid it is
> > _not_ anything is missing, quite oppositely, it is because it requires
> > a lot of work to implement a Qdisc with struct_ops approach, literally
> > all those struct Qdisc_ops (not to mention struct Qdisc_class_ops).
> > WIth current approach, programmers only need to implement two
> > eBPF programs (enqueue and dequeue).
> >
> > Thanks.
>
> Another idea. Rather than work with qdisc objects which creates all
> these issues with how to work with existing interfaces, filters, etc.
> Why not create an sk_buff map? Then this can be used from the existing
> egress/ingress hooks independent of the actual qdisc being used.

Because it is pointless to expose them to user-space in this context.
For example, what is the point of dropping a packet from user-space
in the context of Qdisc?

And I don't think there is a way for user-space to read those skb's inside
a map, which makes it more pointless.

>
> You mention skb should not be exposed to userspace? Why? Whats the
> reason for this? Anyways we can make kernel only maps if we want or
> scrub the data before passing it to userspace. We do this already in
> some cases.

I am not aware of any kernel-only map. For starters, we can't create
a map from kernel-space.

>
> IMO it seems cleaner and more general to allow sk_buffs
> to be stored in maps and pulled back out later for enqueue/dequeue.

Which exact map are you referring to? The queue map? It would only
provide FIFO. We want to give users as much freedom to order the
skbs as we can, I doubt hashmap could offer such freedom.

>
> I think one trick might be how to trigger the dequeue event on
> transition from stopped to running net_device or other events like
> this, but that could be solved with another program attached to
> those events to kick the dequeue logic.

I think we can still use current enqueue/dequeue eBPF program
except we need to transfer skb ownership and storage to map.

Thanks.

      parent reply	other threads:[~2021-09-04  1:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-21  1:02 [RFC Patch net-next] net_sched: introduce eBPF based Qdisc Cong Wang
2021-08-24 23:47 ` Martin KaFai Lau
2021-09-01  4:39   ` Cong Wang
2021-09-01  5:45     ` John Fastabend
2021-09-01 10:42       ` Toke Høiland-Jørgensen
2021-09-01 17:45         ` Martin KaFai Lau
2021-09-01 18:03           ` Alexei Starovoitov
2021-09-02 16:57           ` Toke Høiland-Jørgensen
2021-09-02 20:40             ` John Fastabend
2021-09-02 22:27               ` Toke Høiland-Jørgensen
2021-09-02 23:35                 ` Martin KaFai Lau
2021-09-03 14:44                   ` Toke Høiland-Jørgensen
2021-09-03 15:33                     ` Jamal Hadi Salim
2021-09-10  6:55                     ` Martin KaFai Lau
2021-09-10 11:31                       ` Toke Høiland-Jørgensen
2021-09-04  1:09           ` Cong Wang
2021-09-17  4:19             ` Martin KaFai Lau
2021-09-04  1:30         ` Cong Wang
2021-09-06 11:45           ` Toke Høiland-Jørgensen
2021-09-04  1:05       ` Cong Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM_iQpUSJ=QRdjwo-0DG4O1MRkhu8k1yQQx2KM_UhPp=bkeOeA@mail.gmail.com' \
    --to=xiyou.wangcong@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.