bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Zvi Effron <zeffron@riotgames.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>,
	Lorenzo Bianconi <lbianconi@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	netdev@vger.kernel.org, bpf <bpf@vger.kernel.org>
Subject: Re: Redux: Backwards compatibility for XDP multi-buff
Date: Tue, 21 Sep 2021 20:22:53 +0200	[thread overview]
Message-ID: <87ilyt3i0y.fsf@toke.dk> (raw)
In-Reply-To: <CAC1LvL1xgFMjjE+3wHH79_9rumwjNqDAS2Yg2NpSvmewHsYScA@mail.gmail.com>

Zvi Effron <zeffron@riotgames.com> writes:

> On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Hi Lorenz (Cc. the other people who participated in today's discussion)
>>
>> Following our discussion at the LPC session today, I dug up my previous
>> summary of the issue and some possible solutions[0]. Seems no on
>> actually replied last time, which is why we went with the "do nothing"
>> approach, I suppose. I'm including the full text of the original email
>> below; please take a look, and let's see if we can converge on a
>> consensus here.
>>
>> First off, a problem description: If an existing XDP program is exposed
>> to an xdp_buff that is really a multi-buffer, while it will continue to
>> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> the packet it'll only see part of the payload and not be aware of that
>> fact, and if it's calculating the packet length, that will also only be
>> wrong (only counting the first fragment).
>>
>> So what to do about this? First of all, to do anything about it, XDP
>> programs need to be able to declare themselves "multi-buffer aware" (but
>> see point 1 below). We could try to auto-detect it in the verifier by
>> which helpers the program is using, but since existing programs could be
>> perfectly happy to just keep running, it probably needs to be something
>> the program communicates explicitly. One option is to use the
>> expected_attach_type to encode this; programs can then declare it in the
>> source by section name, or the userspace loader can set the type for
>> existing programs if needed.
>>
>> With this, the kernel will know if a given XDP program is multi-buff
>> aware and can decide what to do with that information. For this we came
>> up with basically three options:
>>
>> 1. Do nothing. This would make it up to users / sysadmins to avoid
>>    anything breaking by manually making sure to not enable multi-buffer
>>    support while loading any XDP programs that will malfunction if
>>    presented with an mb frame. This will probably break in interesting
>>    ways, but it's nice and simple from an implementation PoV. With this
>>    we don't need the declaration discussed above either.
>>
>> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>>    the program doesn't understand it. This is relatively simple to
>>    implement, but it also makes for difficult-to-understand issues (why
>>    are my packets suddenly being dropped?), and it will incur runtime
>>    overhead.
>>
>> 3. Reject loading of programs that are not MB-aware when running in an
>>    MB-enabled mode. This would make things break in more obvious ways,
>>    and still allow a userspace loader to declare a program "MB-aware" to
>>    force it to run if necessary. The problem then becomes at what level
>>    to block this?
>>
>
> I think there's another potential problem with this as well: what happens to
> already loaded programs that are not MB-aware? Are they forcibly unloaded?

I'd say probably the opposite: You can't toggle whatever switch we end
up with if there are any non-MB-aware programs (you'd have to unload
them first)...

>>    Doing this at the driver level is not enough: while a particular
>>    driver knows if it's running in multi-buff mode, we can't know for
>>    sure if a particular XDP program is multi-buff aware at attach time:
>>    it could be tail-calling other programs, or redirecting packets to
>>    another interface where it will be processed by a non-MB aware
>>    program.
>>
>>    So another option is to make it a global toggle: e.g., create a new
>>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>>    program that doesn't support multi-buffer mode, and if it's unset,
>>    disable multi-buffer mode in all drivers. This will make it explicit
>>    when the multi-buffer mode is used, and prevent any accidental subtle
>>    malfunction of existing XDP programs. The drawback is that it's a
>>    mode switch, so more configuration complexity.
>>
>
> Could we combine the last two bits here into a global toggle that doesn't
> require a sysctl? If any driver is put into multi-buffer mode, then the system
> switches to requiring all programs be multi-buffer? When the last multi-buffer
> enabled driver switches out of multi-buffer, remove the system-wide
> restriction?

Well, the trouble here is that we don't necessarily have an explicit
"multi-buf mode" for devices. For instance, you could raise the MTU of a
device without it necessarily involving any XDP multi-buffer stuff (if
you're not running XDP on that device). So if we did turn "raising the
MTU" into such a mode switch, we would end up blocking any MTU changes
if any XDP programs are loaded. Or having an MTU change cause a
force-unload of all XDP programs.

Neither of those are desirable outcomes, I think; and if we add a
separate "XDP multi-buff" switch, we might as well make it system-wide?

> Regarding my above question, if non-MB-aware XDP programs are not forcibly
> unloaded, then a global toggle is also insufficient. An existing non-MB-aware
> XDP program would still beed to be rejected at attach time by the
> driver.

See above.

-Toke


  reply	other threads:[~2021-09-21 18:23 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
2021-09-21 17:31 ` Zvi Effron
2021-09-21 18:22   ` Toke Høiland-Jørgensen [this message]
2021-09-21 19:17     ` Zvi Effron
2021-09-21 22:14       ` Toke Høiland-Jørgensen
2021-09-21 23:10         ` Zvi Effron
2021-09-22 20:13           ` Toke Høiland-Jørgensen
2021-09-21 20:12     ` Alexei Starovoitov
2021-09-21 22:20       ` Toke Høiland-Jørgensen
2021-09-21 22:51         ` Jakub Kicinski
2021-09-22 20:01           ` Toke Høiland-Jørgensen
2021-09-22 21:23             ` Zvi Effron
2021-09-23 18:45               ` Toke Høiland-Jørgensen
2021-09-23 13:46             ` Jakub Kicinski
2021-09-27 12:43               ` Jesper Dangaard Brouer
2021-09-21 22:54 ` Jakub Kicinski
2021-09-22 20:02   ` Toke Høiland-Jørgensen
2021-09-22 21:11     ` Zvi Effron
2021-09-23 19:00       ` Toke Høiland-Jørgensen
2021-09-23 10:33 ` Lorenz Bauer
2021-09-23 12:59   ` Toke Høiland-Jørgensen
2021-09-24 10:18     ` Lorenz Bauer
2021-09-24 17:55       ` Zvi Effron
2021-09-24 19:38       ` Toke Høiland-Jørgensen
2021-09-28  8:47         ` Lorenz Bauer
2021-09-28 13:43           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ilyt3i0y.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=lbianconi@redhat.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=zeffron@riotgames.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).