All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Zvi Effron <zeffron@riotgames.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Lorenz Bauer <lmb@cloudflare.com>,
	Lorenzo Bianconi <lbianconi@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>
Subject: Re: Redux: Backwards compatibility for XDP multi-buff
Date: Thu, 23 Sep 2021 20:45:29 +0200	[thread overview]
Message-ID: <87bl4jyvue.fsf@toke.dk> (raw)
In-Reply-To: <CAC1LvL2ZFHqqD4jkXdRNY0K-Sm-adb8OpQVcfv--aaQ+Z4j0EQ@mail.gmail.com>

Zvi Effron <zeffron@riotgames.com> writes:

> On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jakub Kicinski <kuba@kernel.org> writes:
>>
>> > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
>> >> >> Neither of those are desirable outcomes, I think; and if we add a
>> >> >> separate "XDP multi-buff" switch, we might as well make it system-wide?
>> >> >
>> >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we
>> >> > make xdp_redirect to linearize in case the packet is being redirected
>> >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp
>> >> > progs attached) from mb aware driver?
>> >>
>> >> Hmm, the assumption that XDP frames take up at most one page has been
>> >> fundamental from the start of XDP. So what does linearise mean in this
>> >> context? If we get a 9k packet, should we dynamically allocate a
>> >> multi-page chunk of contiguous memory and copy the frame into that, or
>> >> were you thinking something else?
>> >
>> > My $.02 would be to not care about redirect at all.
>> >
>> > It's not like the user experience with redirect is anywhere close
>> > to amazing right now. Besides (with the exception of SW devices which
>> > will likely gain mb support quickly) mixed-HW setups are very rare.
>> > If the source of the redirect supports mb so will likely the target.
>>
>> It's not about device support it's about XDP program support: If I run
>> an MB-aware XDP program on a physical interface and redirect the (MB)
>> frame into a container, and there's an XDP program running inside that
>> container that isn't MB-aware, bugs will ensue. Doesn't matter if the
>> veth driver itself supports MB...
>>
>> We could leave that as a "don't do that, then" kind of thing, but that
>> was what we were proposing (as the "do nothing" option) and got some
>> pushback on, hence why we're having this conversation :)
>>
>> -Toke
>>
>
> I hadn't even considered the case of redirecting to a veth pair on the same
> system. I'm assuming from your statement that the buffers are passed directly
> to the ingress inside the container and don't go through the sort of egress
> process they would if leaving the system? And I'm assuming that's as an
> optimization?

Yeah, if we redirect an XDP frame to a veth, the peer will get the same
xdp_frame, without ever building an SKB.

> I'm not sure that makes a difference, though. It's not about whether the
> driver's code is mb-capable, it's about whether the driver _as currently
> configured_ could generate multiple buffers. If it can, then only an mb-aware
> program should be able to be attached to it (and tail called from whatever's
> attached to it). If it can't, then there should be no way to have multiple
> buffers come to it.
>
> So in the situation you've described, either the veth driver should be in a
> state where it coalesces the multiple buffers into one, fragmenting the frame
> if necessary or drops the frame, or the program attached inside the container
> would need to be mb-aware. I'm assuming with the veth driver as written, this
> might mean that all programs attached to the veth driver would need to be
> mb-aware, which is obviously undesirable.

Hmm, I guess that as long as mb-frames only show up for large MTUs, the
MTU of the veth device would be a limiting factor just like for physical
devices, so we could just apply the same logic there. Not sure why I
didn't consider that before :/

> All of which significantly adds to the complexity to support mb-aware, so maybe
> this could be developed later? Initially we could have a sysctl toggling the
> state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a
> state for dynamic control once all XDP supporting drivers support the necessary
> dynamic functionality (if ever). At that point we'd have actual experience with
> the sysctl and could see how much of a burden having static control is.
>
> I may have been misinterpreting your use case though, and you were talking
> about the XDP program running on the egress side of the redirect? Is that what
> you were talking about case?

No I was talking about exactly what you outlined above. Although longer
term, I also think we can use XDP mb as a way to avoid having to
linearise SKBs when running XDP on them in veth (and for generic XDP) :)

-Toke


  reply	other threads:[~2021-09-23 18:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
2021-09-21 17:31 ` Zvi Effron
2021-09-21 18:22   ` Toke Høiland-Jørgensen
2021-09-21 19:17     ` Zvi Effron
2021-09-21 22:14       ` Toke Høiland-Jørgensen
2021-09-21 23:10         ` Zvi Effron
2021-09-22 20:13           ` Toke Høiland-Jørgensen
2021-09-21 20:12     ` Alexei Starovoitov
2021-09-21 22:20       ` Toke Høiland-Jørgensen
2021-09-21 22:51         ` Jakub Kicinski
2021-09-22 20:01           ` Toke Høiland-Jørgensen
2021-09-22 21:23             ` Zvi Effron
2021-09-23 18:45               ` Toke Høiland-Jørgensen [this message]
2021-09-23 13:46             ` Jakub Kicinski
2021-09-27 12:43               ` Jesper Dangaard Brouer
2021-09-21 22:54 ` Jakub Kicinski
2021-09-22 20:02   ` Toke Høiland-Jørgensen
2021-09-22 21:11     ` Zvi Effron
2021-09-23 19:00       ` Toke Høiland-Jørgensen
2021-09-23 10:33 ` Lorenz Bauer
2021-09-23 12:59   ` Toke Høiland-Jørgensen
2021-09-24 10:18     ` Lorenz Bauer
2021-09-24 17:55       ` Zvi Effron
2021-09-24 19:38       ` Toke Høiland-Jørgensen
2021-09-28  8:47         ` Lorenz Bauer
2021-09-28 13:43           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bl4jyvue.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lbianconi@redhat.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=zeffron@riotgames.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.