bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Zvi Effron <zeffron@riotgames.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>,
	Lorenzo Bianconi <lbianconi@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	netdev@vger.kernel.org, bpf <bpf@vger.kernel.org>
Subject: Re: Redux: Backwards compatibility for XDP multi-buff
Date: Wed, 22 Sep 2021 22:13:51 +0200	[thread overview]
Message-ID: <87h7ec1i80.fsf@toke.dk> (raw)
In-Reply-To: <CAC1LvL1VArVCN4DoEDBReSPsALFtdpYVLVzzzA4wWa4DDYzCUw@mail.gmail.com>

Zvi Effron <zeffron@riotgames.com> writes:

> On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Zvi Effron <zeffron@riotgames.com> writes:
>>
>> > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Zvi Effron <zeffron@riotgames.com> writes:
>> >>
>> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
>> >> >>
>> >> >> Following our discussion at the LPC session today, I dug up my previous
>> >> >> summary of the issue and some possible solutions[0]. Seems no on
>> >> >> actually replied last time, which is why we went with the "do nothing"
>> >> >> approach, I suppose. I'm including the full text of the original email
>> >> >> below; please take a look, and let's see if we can converge on a
>> >> >> consensus here.
>> >> >>
>> >> >> First off, a problem description: If an existing XDP program is exposed
>> >> >> to an xdp_buff that is really a multi-buffer, while it will continue to
>> >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> >> >> the packet it'll only see part of the payload and not be aware of that
>> >> >> fact, and if it's calculating the packet length, that will also only be
>> >> >> wrong (only counting the first fragment).
>> >> >>
>> >> >> So what to do about this? First of all, to do anything about it, XDP
>> >> >> programs need to be able to declare themselves "multi-buffer aware" (but
>> >> >> see point 1 below). We could try to auto-detect it in the verifier by
>> >> >> which helpers the program is using, but since existing programs could be
>> >> >> perfectly happy to just keep running, it probably needs to be something
>> >> >> the program communicates explicitly. One option is to use the
>> >> >> expected_attach_type to encode this; programs can then declare it in the
>> >> >> source by section name, or the userspace loader can set the type for
>> >> >> existing programs if needed.
>> >> >>
>> >> >> With this, the kernel will know if a given XDP program is multi-buff
>> >> >> aware and can decide what to do with that information. For this we came
>> >> >> up with basically three options:
>> >> >>
>> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
>> >> >>    anything breaking by manually making sure to not enable multi-buffer
>> >> >>    support while loading any XDP programs that will malfunction if
>> >> >>    presented with an mb frame. This will probably break in interesting
>> >> >>    ways, but it's nice and simple from an implementation PoV. With this
>> >> >>    we don't need the declaration discussed above either.
>> >> >>
>> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>> >> >>    the program doesn't understand it. This is relatively simple to
>> >> >>    implement, but it also makes for difficult-to-understand issues (why
>> >> >>    are my packets suddenly being dropped?), and it will incur runtime
>> >> >>    overhead.
>> >> >>
>> >> >> 3. Reject loading of programs that are not MB-aware when running in an
>> >> >>    MB-enabled mode. This would make things break in more obvious ways,
>> >> >>    and still allow a userspace loader to declare a program "MB-aware" to
>> >> >>    force it to run if necessary. The problem then becomes at what level
>> >> >>    to block this?
>> >> >>
>> >> >
>> >> > I think there's another potential problem with this as well: what happens to
>> >> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>> >>
>> >> I'd say probably the opposite: You can't toggle whatever switch we end
>> >> up with if there are any non-MB-aware programs (you'd have to unload
>> >> them first)...
>> >>
>> >
>> > How would we communicate that issue? dmesg? I'm not very familiar with
>> > how sysctl change failure causes are communicated to users, so this
>> > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer
>> > 1` (or whatever ends up actually being the toggle) to active
>> > multi-buffer, and it fails because there's a loaded non-aware program,
>> > that seems like a potential for a lot of administrator pain.
>>
>> Hmm, good question. Document that this only fails if there's a
>> non-mb-aware XDP program loaded? Or use some other mechanism with better
>> feedback?
>>
>> >> >>    Doing this at the driver level is not enough: while a particular
>> >> >>    driver knows if it's running in multi-buff mode, we can't know for
>> >> >>    sure if a particular XDP program is multi-buff aware at attach time:
>> >> >>    it could be tail-calling other programs, or redirecting packets to
>> >> >>    another interface where it will be processed by a non-MB aware
>> >> >>    program.
>> >> >>
>> >> >>    So another option is to make it a global toggle: e.g., create a new
>> >> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>> >> >>    program that doesn't support multi-buffer mode, and if it's unset,
>> >> >>    disable multi-buffer mode in all drivers. This will make it explicit
>> >> >>    when the multi-buffer mode is used, and prevent any accidental subtle
>> >> >>    malfunction of existing XDP programs. The drawback is that it's a
>> >> >>    mode switch, so more configuration complexity.
>> >> >>
>> >> >
>> >> > Could we combine the last two bits here into a global toggle that doesn't
>> >> > require a sysctl? If any driver is put into multi-buffer mode, then the system
>> >> > switches to requiring all programs be multi-buffer? When the last multi-buffer
>> >> > enabled driver switches out of multi-buffer, remove the system-wide
>> >> > restriction?
>> >>
>> >> Well, the trouble here is that we don't necessarily have an explicit
>> >> "multi-buf mode" for devices. For instance, you could raise the MTU of a
>> >> device without it necessarily involving any XDP multi-buffer stuff (if
>> >> you're not running XDP on that device). So if we did turn "raising the
>> >> MTU" into such a mode switch, we would end up blocking any MTU changes
>> >> if any XDP programs are loaded. Or having an MTU change cause a
>> >> force-unload of all XDP programs.
>> >
>> > Maybe I missed something then, but you had stated that "while a
>> > particular driver knows if it's running in multi-buff mode" so I
>> > assumed that the driver would be able to tell when to toggle the mode
>> > on.
>>
>> Well, a driver knows when it is attaching an XDP program whether it (the
>> driver) is configured in a way such that this XDP program could
>> encounter a multi-buf.
>
> I know drivers sometimes reconfigure themselves when an XDP program is
> attached, but is there any information provided by the attach (other than that
> an XDP program is attaching) that they use to make configuration decisions
> during that reconfiguration?
>
> Without modifying the driver to intentionally configure itself differently
> based on whether or not the program is mb-aware (which is believe is currently
> not the case for any driver), won't the configuration of a driver be identical
> post XDP attach regardless of whether or not the program is mb-aware or not?
>
> I was thinking the driver would make it's mb-aware determination (and refcount
> adjustments) when its configuration changes for any reason that could
> potentially affect mb-aware status (mostly MTU adjustments, I suspect).
>
>>
>> > I had been thinking that when a driver turned multi-buffer off, it
>> > could trigger a check of all drivers, but that also seems like it
>> > could just be a global refcount of all the drivers that have requested
>> > multi-buffer mode. When a driver enables multi-buffer for itself, it
>> > increments the refcount, and when it disables, it decrements. A
>> > non-zero count means the system is in multi-buffer mode.
>>
>> I guess we could do a refcount-type thing when an multi-buf XDP program
>> is first attached (as per above). But I think it may be easier to just
>> do it at load-time, then, so it doesn't have to be in the driver, but
>> the BPF core could just enforce it.
>>
>> This would basically amount to a rule saying "you can't mix mb-aware and
>> non-mb-aware programs", and the first type to be loaded determines which
>> mode the system is in. This would be fairly simple to implement and
>> enforce, I suppose. The drawback is that it's potentially racy in the
>> order programs are loaded...
>>
>
> Accepting or rejecting at load time would definitely simplify things a bit. But
> I think the raciness is worse than just based on the first program to load. If
> we're doing refcounting at attach/detach time, then I can load an mb-aware and
> an mb-unaware program before attaching anything. What do I do when I attach one
> of them? The other would be in violation.
>
> If instead of making the determination at attach time, we make it at load time,
> I think it'd be better to go back to the sysctl controlling it, and simply not
> allow changing the sysctl if any XDP program at all is loaded, as opposed to
> if a non-aware program is installed.
>
> Then we're back to the sysctl controlling whether or not mb-aware is required.
> We stil have a communication to the administrator problem, but it's simplified
> a bit from "some loaded program doesn't comply" and having to track down which
> one to "there is an XDP program installed".

Right, that does simplify things. But if we encode the "mb-aware"
property in the program type (or sub-type, AKA expected_attach_type)
discovering which program is blocking the toggle should be fairly
simple, no?

>> -Toke
>>
>
> Side note: how do extension programs fit into this? An extension program that's
> going to freplace a function in an XDP program (that receives the context)
> would also need to mb-aware or not, but not all extension programs can attach
> to such functions, and we wouldn't want those programs to be impacted. Is this
> as simple as marking every XDP program and every extension program that takes
> an XDP context parameter as needing to be marked as mb-aware?

Hmm, that's also a good question. I was mentally assuming that freplace
programs could be limited by target type, which would mean that just
encoding the "mb-aware" property in attach type would be enough. But I
see that this is not, in fact, the case. Right now expected_attach_type
is unpused for freplace programs, so we could use that if the kernel is
to enforce this. Or we could make it up to the userspace loader to
ensure this by whatever mechanism it wants. For libxdp that would mean
setting the type of the dispatcher program based on the component
program and refusing to mix those, for instance...

-Toke


  reply	other threads:[~2021-09-22 20:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
2021-09-21 17:31 ` Zvi Effron
2021-09-21 18:22   ` Toke Høiland-Jørgensen
2021-09-21 19:17     ` Zvi Effron
2021-09-21 22:14       ` Toke Høiland-Jørgensen
2021-09-21 23:10         ` Zvi Effron
2021-09-22 20:13           ` Toke Høiland-Jørgensen [this message]
2021-09-21 20:12     ` Alexei Starovoitov
2021-09-21 22:20       ` Toke Høiland-Jørgensen
2021-09-21 22:51         ` Jakub Kicinski
2021-09-22 20:01           ` Toke Høiland-Jørgensen
2021-09-22 21:23             ` Zvi Effron
2021-09-23 18:45               ` Toke Høiland-Jørgensen
2021-09-23 13:46             ` Jakub Kicinski
2021-09-27 12:43               ` Jesper Dangaard Brouer
2021-09-21 22:54 ` Jakub Kicinski
2021-09-22 20:02   ` Toke Høiland-Jørgensen
2021-09-22 21:11     ` Zvi Effron
2021-09-23 19:00       ` Toke Høiland-Jørgensen
2021-09-23 10:33 ` Lorenz Bauer
2021-09-23 12:59   ` Toke Høiland-Jørgensen
2021-09-24 10:18     ` Lorenz Bauer
2021-09-24 17:55       ` Zvi Effron
2021-09-24 19:38       ` Toke Høiland-Jørgensen
2021-09-28  8:47         ` Lorenz Bauer
2021-09-28 13:43           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7ec1i80.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=lbianconi@redhat.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=zeffron@riotgames.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).