bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Redux: Backwards compatibility for XDP multi-buff
@ 2021-09-21 16:06 Toke Høiland-Jørgensen
  2021-09-21 17:31 ` Zvi Effron
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-21 16:06 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf

Hi Lorenz (Cc. the other people who participated in today's discussion)

Following our discussion at the LPC session today, I dug up my previous
summary of the issue and some possible solutions[0]. Seems no on
actually replied last time, which is why we went with the "do nothing"
approach, I suppose. I'm including the full text of the original email
below; please take a look, and let's see if we can converge on a
consensus here.

First off, a problem description: If an existing XDP program is exposed
to an xdp_buff that is really a multi-buffer, while it will continue to
run, it may end up with subtle and hard-to-debug bugs: If it's parsing
the packet it'll only see part of the payload and not be aware of that
fact, and if it's calculating the packet length, that will also only be
wrong (only counting the first fragment).

So what to do about this? First of all, to do anything about it, XDP
programs need to be able to declare themselves "multi-buffer aware" (but
see point 1 below). We could try to auto-detect it in the verifier by
which helpers the program is using, but since existing programs could be
perfectly happy to just keep running, it probably needs to be something
the program communicates explicitly. One option is to use the
expected_attach_type to encode this; programs can then declare it in the
source by section name, or the userspace loader can set the type for
existing programs if needed.

With this, the kernel will know if a given XDP program is multi-buff
aware and can decide what to do with that information. For this we came
up with basically three options:

1. Do nothing. This would make it up to users / sysadmins to avoid
   anything breaking by manually making sure to not enable multi-buffer
   support while loading any XDP programs that will malfunction if
   presented with an mb frame. This will probably break in interesting
   ways, but it's nice and simple from an implementation PoV. With this
   we don't need the declaration discussed above either.

2. Add a check at runtime and drop the frames if they are mb-enabled and
   the program doesn't understand it. This is relatively simple to
   implement, but it also makes for difficult-to-understand issues (why
   are my packets suddenly being dropped?), and it will incur runtime
   overhead.

3. Reject loading of programs that are not MB-aware when running in an
   MB-enabled mode. This would make things break in more obvious ways,
   and still allow a userspace loader to declare a program "MB-aware" to
   force it to run if necessary. The problem then becomes at what level
   to block this?

   Doing this at the driver level is not enough: while a particular
   driver knows if it's running in multi-buff mode, we can't know for
   sure if a particular XDP program is multi-buff aware at attach time:
   it could be tail-calling other programs, or redirecting packets to
   another interface where it will be processed by a non-MB aware
   program.

   So another option is to make it a global toggle: e.g., create a new
   sysctl to enable multi-buffer. If this is set, reject loading any XDP
   program that doesn't support multi-buffer mode, and if it's unset,
   disable multi-buffer mode in all drivers. This will make it explicit
   when the multi-buffer mode is used, and prevent any accidental subtle
   malfunction of existing XDP programs. The drawback is that it's a
   mode switch, so more configuration complexity.

None of these options are ideal, of course, but I hope the above
explanation at least makes sense. If anyone has any better ideas (or can
spot any flaws in the reasoning above) please don't hesitate to let us
know!

-Toke

[0] https://lore.kernel.org/r/8735srxglb.fsf@toke.dk


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
@ 2021-09-21 17:31 ` Zvi Effron
  2021-09-21 18:22   ` Toke Høiland-Jørgensen
  2021-09-21 22:54 ` Jakub Kicinski
  2021-09-23 10:33 ` Lorenz Bauer
  2 siblings, 1 reply; 26+ messages in thread
From: Zvi Effron @ 2021-09-21 17:31 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Hi Lorenz (Cc. the other people who participated in today's discussion)
>
> Following our discussion at the LPC session today, I dug up my previous
> summary of the issue and some possible solutions[0]. Seems no on
> actually replied last time, which is why we went with the "do nothing"
> approach, I suppose. I'm including the full text of the original email
> below; please take a look, and let's see if we can converge on a
> consensus here.
>
> First off, a problem description: If an existing XDP program is exposed
> to an xdp_buff that is really a multi-buffer, while it will continue to
> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
> the packet it'll only see part of the payload and not be aware of that
> fact, and if it's calculating the packet length, that will also only be
> wrong (only counting the first fragment).
>
> So what to do about this? First of all, to do anything about it, XDP
> programs need to be able to declare themselves "multi-buffer aware" (but
> see point 1 below). We could try to auto-detect it in the verifier by
> which helpers the program is using, but since existing programs could be
> perfectly happy to just keep running, it probably needs to be something
> the program communicates explicitly. One option is to use the
> expected_attach_type to encode this; programs can then declare it in the
> source by section name, or the userspace loader can set the type for
> existing programs if needed.
>
> With this, the kernel will know if a given XDP program is multi-buff
> aware and can decide what to do with that information. For this we came
> up with basically three options:
>
> 1. Do nothing. This would make it up to users / sysadmins to avoid
>    anything breaking by manually making sure to not enable multi-buffer
>    support while loading any XDP programs that will malfunction if
>    presented with an mb frame. This will probably break in interesting
>    ways, but it's nice and simple from an implementation PoV. With this
>    we don't need the declaration discussed above either.
>
> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>    the program doesn't understand it. This is relatively simple to
>    implement, but it also makes for difficult-to-understand issues (why
>    are my packets suddenly being dropped?), and it will incur runtime
>    overhead.
>
> 3. Reject loading of programs that are not MB-aware when running in an
>    MB-enabled mode. This would make things break in more obvious ways,
>    and still allow a userspace loader to declare a program "MB-aware" to
>    force it to run if necessary. The problem then becomes at what level
>    to block this?
>

I think there's another potential problem with this as well: what happens to
already loaded programs that are not MB-aware? Are they forcibly unloaded?

>    Doing this at the driver level is not enough: while a particular
>    driver knows if it's running in multi-buff mode, we can't know for
>    sure if a particular XDP program is multi-buff aware at attach time:
>    it could be tail-calling other programs, or redirecting packets to
>    another interface where it will be processed by a non-MB aware
>    program.
>
>    So another option is to make it a global toggle: e.g., create a new
>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>    program that doesn't support multi-buffer mode, and if it's unset,
>    disable multi-buffer mode in all drivers. This will make it explicit
>    when the multi-buffer mode is used, and prevent any accidental subtle
>    malfunction of existing XDP programs. The drawback is that it's a
>    mode switch, so more configuration complexity.
>

Could we combine the last two bits here into a global toggle that doesn't
require a sysctl? If any driver is put into multi-buffer mode, then the system
switches to requiring all programs be multi-buffer? When the last multi-buffer
enabled driver switches out of multi-buffer, remove the system-wide
restriction?

Regarding my above question, if non-MB-aware XDP programs are not forcibly
unloaded, then a global toggle is also insufficient. An existing non-MB-aware
XDP program would still beed to be rejected at attach time by the driver.

> None of these options are ideal, of course, but I hope the above
> explanation at least makes sense. If anyone has any better ideas (or can
> spot any flaws in the reasoning above) please don't hesitate to let us
> know!
>
> -Toke
>
> [0] https://lore.kernel.org/r/8735srxglb.fsf@toke.dk
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 17:31 ` Zvi Effron
@ 2021-09-21 18:22   ` Toke Høiland-Jørgensen
  2021-09-21 19:17     ` Zvi Effron
  2021-09-21 20:12     ` Alexei Starovoitov
  0 siblings, 2 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-21 18:22 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

Zvi Effron <zeffron@riotgames.com> writes:

> On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Hi Lorenz (Cc. the other people who participated in today's discussion)
>>
>> Following our discussion at the LPC session today, I dug up my previous
>> summary of the issue and some possible solutions[0]. Seems no on
>> actually replied last time, which is why we went with the "do nothing"
>> approach, I suppose. I'm including the full text of the original email
>> below; please take a look, and let's see if we can converge on a
>> consensus here.
>>
>> First off, a problem description: If an existing XDP program is exposed
>> to an xdp_buff that is really a multi-buffer, while it will continue to
>> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> the packet it'll only see part of the payload and not be aware of that
>> fact, and if it's calculating the packet length, that will also only be
>> wrong (only counting the first fragment).
>>
>> So what to do about this? First of all, to do anything about it, XDP
>> programs need to be able to declare themselves "multi-buffer aware" (but
>> see point 1 below). We could try to auto-detect it in the verifier by
>> which helpers the program is using, but since existing programs could be
>> perfectly happy to just keep running, it probably needs to be something
>> the program communicates explicitly. One option is to use the
>> expected_attach_type to encode this; programs can then declare it in the
>> source by section name, or the userspace loader can set the type for
>> existing programs if needed.
>>
>> With this, the kernel will know if a given XDP program is multi-buff
>> aware and can decide what to do with that information. For this we came
>> up with basically three options:
>>
>> 1. Do nothing. This would make it up to users / sysadmins to avoid
>>    anything breaking by manually making sure to not enable multi-buffer
>>    support while loading any XDP programs that will malfunction if
>>    presented with an mb frame. This will probably break in interesting
>>    ways, but it's nice and simple from an implementation PoV. With this
>>    we don't need the declaration discussed above either.
>>
>> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>>    the program doesn't understand it. This is relatively simple to
>>    implement, but it also makes for difficult-to-understand issues (why
>>    are my packets suddenly being dropped?), and it will incur runtime
>>    overhead.
>>
>> 3. Reject loading of programs that are not MB-aware when running in an
>>    MB-enabled mode. This would make things break in more obvious ways,
>>    and still allow a userspace loader to declare a program "MB-aware" to
>>    force it to run if necessary. The problem then becomes at what level
>>    to block this?
>>
>
> I think there's another potential problem with this as well: what happens to
> already loaded programs that are not MB-aware? Are they forcibly unloaded?

I'd say probably the opposite: You can't toggle whatever switch we end
up with if there are any non-MB-aware programs (you'd have to unload
them first)...

>>    Doing this at the driver level is not enough: while a particular
>>    driver knows if it's running in multi-buff mode, we can't know for
>>    sure if a particular XDP program is multi-buff aware at attach time:
>>    it could be tail-calling other programs, or redirecting packets to
>>    another interface where it will be processed by a non-MB aware
>>    program.
>>
>>    So another option is to make it a global toggle: e.g., create a new
>>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>>    program that doesn't support multi-buffer mode, and if it's unset,
>>    disable multi-buffer mode in all drivers. This will make it explicit
>>    when the multi-buffer mode is used, and prevent any accidental subtle
>>    malfunction of existing XDP programs. The drawback is that it's a
>>    mode switch, so more configuration complexity.
>>
>
> Could we combine the last two bits here into a global toggle that doesn't
> require a sysctl? If any driver is put into multi-buffer mode, then the system
> switches to requiring all programs be multi-buffer? When the last multi-buffer
> enabled driver switches out of multi-buffer, remove the system-wide
> restriction?

Well, the trouble here is that we don't necessarily have an explicit
"multi-buf mode" for devices. For instance, you could raise the MTU of a
device without it necessarily involving any XDP multi-buffer stuff (if
you're not running XDP on that device). So if we did turn "raising the
MTU" into such a mode switch, we would end up blocking any MTU changes
if any XDP programs are loaded. Or having an MTU change cause a
force-unload of all XDP programs.

Neither of those are desirable outcomes, I think; and if we add a
separate "XDP multi-buff" switch, we might as well make it system-wide?

> Regarding my above question, if non-MB-aware XDP programs are not forcibly
> unloaded, then a global toggle is also insufficient. An existing non-MB-aware
> XDP program would still beed to be rejected at attach time by the
> driver.

See above.

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 18:22   ` Toke Høiland-Jørgensen
@ 2021-09-21 19:17     ` Zvi Effron
  2021-09-21 22:14       ` Toke Høiland-Jørgensen
  2021-09-21 20:12     ` Alexei Starovoitov
  1 sibling, 1 reply; 26+ messages in thread
From: Zvi Effron @ 2021-09-21 19:17 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Zvi Effron <zeffron@riotgames.com> writes:
>
> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
> >>
> >> Following our discussion at the LPC session today, I dug up my previous
> >> summary of the issue and some possible solutions[0]. Seems no on
> >> actually replied last time, which is why we went with the "do nothing"
> >> approach, I suppose. I'm including the full text of the original email
> >> below; please take a look, and let's see if we can converge on a
> >> consensus here.
> >>
> >> First off, a problem description: If an existing XDP program is exposed
> >> to an xdp_buff that is really a multi-buffer, while it will continue to
> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
> >> the packet it'll only see part of the payload and not be aware of that
> >> fact, and if it's calculating the packet length, that will also only be
> >> wrong (only counting the first fragment).
> >>
> >> So what to do about this? First of all, to do anything about it, XDP
> >> programs need to be able to declare themselves "multi-buffer aware" (but
> >> see point 1 below). We could try to auto-detect it in the verifier by
> >> which helpers the program is using, but since existing programs could be
> >> perfectly happy to just keep running, it probably needs to be something
> >> the program communicates explicitly. One option is to use the
> >> expected_attach_type to encode this; programs can then declare it in the
> >> source by section name, or the userspace loader can set the type for
> >> existing programs if needed.
> >>
> >> With this, the kernel will know if a given XDP program is multi-buff
> >> aware and can decide what to do with that information. For this we came
> >> up with basically three options:
> >>
> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
> >>    anything breaking by manually making sure to not enable multi-buffer
> >>    support while loading any XDP programs that will malfunction if
> >>    presented with an mb frame. This will probably break in interesting
> >>    ways, but it's nice and simple from an implementation PoV. With this
> >>    we don't need the declaration discussed above either.
> >>
> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
> >>    the program doesn't understand it. This is relatively simple to
> >>    implement, but it also makes for difficult-to-understand issues (why
> >>    are my packets suddenly being dropped?), and it will incur runtime
> >>    overhead.
> >>
> >> 3. Reject loading of programs that are not MB-aware when running in an
> >>    MB-enabled mode. This would make things break in more obvious ways,
> >>    and still allow a userspace loader to declare a program "MB-aware" to
> >>    force it to run if necessary. The problem then becomes at what level
> >>    to block this?
> >>
> >
> > I think there's another potential problem with this as well: what happens to
> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>
> I'd say probably the opposite: You can't toggle whatever switch we end
> up with if there are any non-MB-aware programs (you'd have to unload
> them first)...
>

How would we communicate that issue? dmesg? I'm not very familiar with how
sysctl change failure causes are communicated to users, so this might be a
solved problem, but if I run `sysctl -w net.xdp.multibuffer 1` (or whatever
ends up actually being the toggle) to active multi-buffer, and it fails because
there's a loaded non-aware program, that seems like a potential for a lot of
administrator pain.

> >>    Doing this at the driver level is not enough: while a particular
> >>    driver knows if it's running in multi-buff mode, we can't know for
> >>    sure if a particular XDP program is multi-buff aware at attach time:
> >>    it could be tail-calling other programs, or redirecting packets to
> >>    another interface where it will be processed by a non-MB aware
> >>    program.
> >>
> >>    So another option is to make it a global toggle: e.g., create a new
> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
> >>    program that doesn't support multi-buffer mode, and if it's unset,
> >>    disable multi-buffer mode in all drivers. This will make it explicit
> >>    when the multi-buffer mode is used, and prevent any accidental subtle
> >>    malfunction of existing XDP programs. The drawback is that it's a
> >>    mode switch, so more configuration complexity.
> >>
> >
> > Could we combine the last two bits here into a global toggle that doesn't
> > require a sysctl? If any driver is put into multi-buffer mode, then the system
> > switches to requiring all programs be multi-buffer? When the last multi-buffer
> > enabled driver switches out of multi-buffer, remove the system-wide
> > restriction?
>
> Well, the trouble here is that we don't necessarily have an explicit
> "multi-buf mode" for devices. For instance, you could raise the MTU of a
> device without it necessarily involving any XDP multi-buffer stuff (if
> you're not running XDP on that device). So if we did turn "raising the
> MTU" into such a mode switch, we would end up blocking any MTU changes
> if any XDP programs are loaded. Or having an MTU change cause a
> force-unload of all XDP programs.

Maybe I missed something then, but you had stated that "while a particular
driver knows if it's running in multi-buff mode" so I assumed that the driver
would be able to tell when to toggle the mode on.

I had been thinking that when a driver turned multi-buffer off, it could
trigger a check of all drivers, but that also seems like it could just be a
global refcount of all the drivers that have requested multi-buffer mode. When
a driver enables multi-buffer for itself, it increments the refcount, and when
it disables, it decrements. A non-zero count means the system is in
multi-buffer mode.

Obviously this is more complex than just requiring the administrator to enable
the system-wide mode.

>
> Neither of those are desirable outcomes, I think; and if we add a
> separate "XDP multi-buff" switch, we might as well make it system-wide?
>

> > Regarding my above question, if non-MB-aware XDP programs are not forcibly
> > unloaded, then a global toggle is also insufficient. An existing non-MB-aware
> > XDP program would still beed to be rejected at attach time by the
> > driver.
>
> See above.
>
> -Toke
>

--Zvi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 18:22   ` Toke Høiland-Jørgensen
  2021-09-21 19:17     ` Zvi Effron
@ 2021-09-21 20:12     ` Alexei Starovoitov
  2021-09-21 22:20       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2021-09-21 20:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann,
	John Fastabend, Network Development, bpf

On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Zvi Effron <zeffron@riotgames.com> writes:
>
> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
> >>
> >> Following our discussion at the LPC session today, I dug up my previous
> >> summary of the issue and some possible solutions[0]. Seems no on
> >> actually replied last time, which is why we went with the "do nothing"
> >> approach, I suppose. I'm including the full text of the original email
> >> below; please take a look, and let's see if we can converge on a
> >> consensus here.
> >>
> >> First off, a problem description: If an existing XDP program is exposed
> >> to an xdp_buff that is really a multi-buffer, while it will continue to
> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
> >> the packet it'll only see part of the payload and not be aware of that
> >> fact, and if it's calculating the packet length, that will also only be
> >> wrong (only counting the first fragment).
> >>
> >> So what to do about this? First of all, to do anything about it, XDP
> >> programs need to be able to declare themselves "multi-buffer aware" (but
> >> see point 1 below). We could try to auto-detect it in the verifier by
> >> which helpers the program is using, but since existing programs could be
> >> perfectly happy to just keep running, it probably needs to be something
> >> the program communicates explicitly. One option is to use the
> >> expected_attach_type to encode this; programs can then declare it in the
> >> source by section name, or the userspace loader can set the type for
> >> existing programs if needed.
> >>
> >> With this, the kernel will know if a given XDP program is multi-buff
> >> aware and can decide what to do with that information. For this we came
> >> up with basically three options:
> >>
> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
> >>    anything breaking by manually making sure to not enable multi-buffer
> >>    support while loading any XDP programs that will malfunction if
> >>    presented with an mb frame. This will probably break in interesting
> >>    ways, but it's nice and simple from an implementation PoV. With this
> >>    we don't need the declaration discussed above either.
> >>
> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
> >>    the program doesn't understand it. This is relatively simple to
> >>    implement, but it also makes for difficult-to-understand issues (why
> >>    are my packets suddenly being dropped?), and it will incur runtime
> >>    overhead.
> >>
> >> 3. Reject loading of programs that are not MB-aware when running in an
> >>    MB-enabled mode. This would make things break in more obvious ways,
> >>    and still allow a userspace loader to declare a program "MB-aware" to
> >>    force it to run if necessary. The problem then becomes at what level
> >>    to block this?
> >>
> >
> > I think there's another potential problem with this as well: what happens to
> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>
> I'd say probably the opposite: You can't toggle whatever switch we end
> up with if there are any non-MB-aware programs (you'd have to unload
> them first)...
>
> >>    Doing this at the driver level is not enough: while a particular
> >>    driver knows if it's running in multi-buff mode, we can't know for
> >>    sure if a particular XDP program is multi-buff aware at attach time:
> >>    it could be tail-calling other programs, or redirecting packets to
> >>    another interface where it will be processed by a non-MB aware
> >>    program.
> >>
> >>    So another option is to make it a global toggle: e.g., create a new
> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
> >>    program that doesn't support multi-buffer mode, and if it's unset,
> >>    disable multi-buffer mode in all drivers. This will make it explicit
> >>    when the multi-buffer mode is used, and prevent any accidental subtle
> >>    malfunction of existing XDP programs. The drawback is that it's a
> >>    mode switch, so more configuration complexity.
> >>
> >
> > Could we combine the last two bits here into a global toggle that doesn't
> > require a sysctl? If any driver is put into multi-buffer mode, then the system
> > switches to requiring all programs be multi-buffer? When the last multi-buffer
> > enabled driver switches out of multi-buffer, remove the system-wide
> > restriction?
>
> Well, the trouble here is that we don't necessarily have an explicit
> "multi-buf mode" for devices. For instance, you could raise the MTU of a
> device without it necessarily involving any XDP multi-buffer stuff (if
> you're not running XDP on that device). So if we did turn "raising the
> MTU" into such a mode switch, we would end up blocking any MTU changes
> if any XDP programs are loaded. Or having an MTU change cause a
> force-unload of all XDP programs.

MTU change that bumps driver into multi-buf mode or enable
the header split that also bumps it into multi-buf mode
probably shouldn't be allowed when non-mb aware xdp prog is attached.
That would be the simplest and least surprising behavior.
Force unload could cause security issues.

> Neither of those are desirable outcomes, I think; and if we add a
> separate "XDP multi-buff" switch, we might as well make it system-wide?

If we have an internal flag 'this driver supports multi-buf xdp' cannot we
make xdp_redirect to linearize in case the packet is being redirected
to non multi-buf aware driver (potentially with corresponding non mb aware xdp
progs attached) from mb aware driver?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 19:17     ` Zvi Effron
@ 2021-09-21 22:14       ` Toke Høiland-Jørgensen
  2021-09-21 23:10         ` Zvi Effron
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-21 22:14 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

Zvi Effron <zeffron@riotgames.com> writes:

> On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Zvi Effron <zeffron@riotgames.com> writes:
>>
>> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
>> >>
>> >> Following our discussion at the LPC session today, I dug up my previous
>> >> summary of the issue and some possible solutions[0]. Seems no on
>> >> actually replied last time, which is why we went with the "do nothing"
>> >> approach, I suppose. I'm including the full text of the original email
>> >> below; please take a look, and let's see if we can converge on a
>> >> consensus here.
>> >>
>> >> First off, a problem description: If an existing XDP program is exposed
>> >> to an xdp_buff that is really a multi-buffer, while it will continue to
>> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> >> the packet it'll only see part of the payload and not be aware of that
>> >> fact, and if it's calculating the packet length, that will also only be
>> >> wrong (only counting the first fragment).
>> >>
>> >> So what to do about this? First of all, to do anything about it, XDP
>> >> programs need to be able to declare themselves "multi-buffer aware" (but
>> >> see point 1 below). We could try to auto-detect it in the verifier by
>> >> which helpers the program is using, but since existing programs could be
>> >> perfectly happy to just keep running, it probably needs to be something
>> >> the program communicates explicitly. One option is to use the
>> >> expected_attach_type to encode this; programs can then declare it in the
>> >> source by section name, or the userspace loader can set the type for
>> >> existing programs if needed.
>> >>
>> >> With this, the kernel will know if a given XDP program is multi-buff
>> >> aware and can decide what to do with that information. For this we came
>> >> up with basically three options:
>> >>
>> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
>> >>    anything breaking by manually making sure to not enable multi-buffer
>> >>    support while loading any XDP programs that will malfunction if
>> >>    presented with an mb frame. This will probably break in interesting
>> >>    ways, but it's nice and simple from an implementation PoV. With this
>> >>    we don't need the declaration discussed above either.
>> >>
>> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>> >>    the program doesn't understand it. This is relatively simple to
>> >>    implement, but it also makes for difficult-to-understand issues (why
>> >>    are my packets suddenly being dropped?), and it will incur runtime
>> >>    overhead.
>> >>
>> >> 3. Reject loading of programs that are not MB-aware when running in an
>> >>    MB-enabled mode. This would make things break in more obvious ways,
>> >>    and still allow a userspace loader to declare a program "MB-aware" to
>> >>    force it to run if necessary. The problem then becomes at what level
>> >>    to block this?
>> >>
>> >
>> > I think there's another potential problem with this as well: what happens to
>> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>>
>> I'd say probably the opposite: You can't toggle whatever switch we end
>> up with if there are any non-MB-aware programs (you'd have to unload
>> them first)...
>>
>
> How would we communicate that issue? dmesg? I'm not very familiar with
> how sysctl change failure causes are communicated to users, so this
> might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer
> 1` (or whatever ends up actually being the toggle) to active
> multi-buffer, and it fails because there's a loaded non-aware program,
> that seems like a potential for a lot of administrator pain.

Hmm, good question. Document that this only fails if there's a
non-mb-aware XDP program loaded? Or use some other mechanism with better
feedback?

>> >>    Doing this at the driver level is not enough: while a particular
>> >>    driver knows if it's running in multi-buff mode, we can't know for
>> >>    sure if a particular XDP program is multi-buff aware at attach time:
>> >>    it could be tail-calling other programs, or redirecting packets to
>> >>    another interface where it will be processed by a non-MB aware
>> >>    program.
>> >>
>> >>    So another option is to make it a global toggle: e.g., create a new
>> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>> >>    program that doesn't support multi-buffer mode, and if it's unset,
>> >>    disable multi-buffer mode in all drivers. This will make it explicit
>> >>    when the multi-buffer mode is used, and prevent any accidental subtle
>> >>    malfunction of existing XDP programs. The drawback is that it's a
>> >>    mode switch, so more configuration complexity.
>> >>
>> >
>> > Could we combine the last two bits here into a global toggle that doesn't
>> > require a sysctl? If any driver is put into multi-buffer mode, then the system
>> > switches to requiring all programs be multi-buffer? When the last multi-buffer
>> > enabled driver switches out of multi-buffer, remove the system-wide
>> > restriction?
>>
>> Well, the trouble here is that we don't necessarily have an explicit
>> "multi-buf mode" for devices. For instance, you could raise the MTU of a
>> device without it necessarily involving any XDP multi-buffer stuff (if
>> you're not running XDP on that device). So if we did turn "raising the
>> MTU" into such a mode switch, we would end up blocking any MTU changes
>> if any XDP programs are loaded. Or having an MTU change cause a
>> force-unload of all XDP programs.
>
> Maybe I missed something then, but you had stated that "while a
> particular driver knows if it's running in multi-buff mode" so I
> assumed that the driver would be able to tell when to toggle the mode
> on.

Well, a driver knows when it is attaching an XDP program whether it (the
driver) is configured in a way such that this XDP program could
encounter a multi-buf.

> I had been thinking that when a driver turned multi-buffer off, it
> could trigger a check of all drivers, but that also seems like it
> could just be a global refcount of all the drivers that have requested
> multi-buffer mode. When a driver enables multi-buffer for itself, it
> increments the refcount, and when it disables, it decrements. A
> non-zero count means the system is in multi-buffer mode.

I guess we could do a refcount-type thing when an multi-buf XDP program
is first attached (as per above). But I think it may be easier to just
do it at load-time, then, so it doesn't have to be in the driver, but
the BPF core could just enforce it.

This would basically amount to a rule saying "you can't mix mb-aware and
non-mb-aware programs", and the first type to be loaded determines which
mode the system is in. This would be fairly simple to implement and
enforce, I suppose. The drawback is that it's potentially racy in the
order programs are loaded...

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 20:12     ` Alexei Starovoitov
@ 2021-09-21 22:20       ` Toke Høiland-Jørgensen
  2021-09-21 22:51         ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-21 22:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann,
	John Fastabend, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Zvi Effron <zeffron@riotgames.com> writes:
>>
>> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
>> >>
>> >> Following our discussion at the LPC session today, I dug up my previous
>> >> summary of the issue and some possible solutions[0]. Seems no on
>> >> actually replied last time, which is why we went with the "do nothing"
>> >> approach, I suppose. I'm including the full text of the original email
>> >> below; please take a look, and let's see if we can converge on a
>> >> consensus here.
>> >>
>> >> First off, a problem description: If an existing XDP program is exposed
>> >> to an xdp_buff that is really a multi-buffer, while it will continue to
>> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> >> the packet it'll only see part of the payload and not be aware of that
>> >> fact, and if it's calculating the packet length, that will also only be
>> >> wrong (only counting the first fragment).
>> >>
>> >> So what to do about this? First of all, to do anything about it, XDP
>> >> programs need to be able to declare themselves "multi-buffer aware" (but
>> >> see point 1 below). We could try to auto-detect it in the verifier by
>> >> which helpers the program is using, but since existing programs could be
>> >> perfectly happy to just keep running, it probably needs to be something
>> >> the program communicates explicitly. One option is to use the
>> >> expected_attach_type to encode this; programs can then declare it in the
>> >> source by section name, or the userspace loader can set the type for
>> >> existing programs if needed.
>> >>
>> >> With this, the kernel will know if a given XDP program is multi-buff
>> >> aware and can decide what to do with that information. For this we came
>> >> up with basically three options:
>> >>
>> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
>> >>    anything breaking by manually making sure to not enable multi-buffer
>> >>    support while loading any XDP programs that will malfunction if
>> >>    presented with an mb frame. This will probably break in interesting
>> >>    ways, but it's nice and simple from an implementation PoV. With this
>> >>    we don't need the declaration discussed above either.
>> >>
>> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>> >>    the program doesn't understand it. This is relatively simple to
>> >>    implement, but it also makes for difficult-to-understand issues (why
>> >>    are my packets suddenly being dropped?), and it will incur runtime
>> >>    overhead.
>> >>
>> >> 3. Reject loading of programs that are not MB-aware when running in an
>> >>    MB-enabled mode. This would make things break in more obvious ways,
>> >>    and still allow a userspace loader to declare a program "MB-aware" to
>> >>    force it to run if necessary. The problem then becomes at what level
>> >>    to block this?
>> >>
>> >
>> > I think there's another potential problem with this as well: what happens to
>> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>>
>> I'd say probably the opposite: You can't toggle whatever switch we end
>> up with if there are any non-MB-aware programs (you'd have to unload
>> them first)...
>>
>> >>    Doing this at the driver level is not enough: while a particular
>> >>    driver knows if it's running in multi-buff mode, we can't know for
>> >>    sure if a particular XDP program is multi-buff aware at attach time:
>> >>    it could be tail-calling other programs, or redirecting packets to
>> >>    another interface where it will be processed by a non-MB aware
>> >>    program.
>> >>
>> >>    So another option is to make it a global toggle: e.g., create a new
>> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>> >>    program that doesn't support multi-buffer mode, and if it's unset,
>> >>    disable multi-buffer mode in all drivers. This will make it explicit
>> >>    when the multi-buffer mode is used, and prevent any accidental subtle
>> >>    malfunction of existing XDP programs. The drawback is that it's a
>> >>    mode switch, so more configuration complexity.
>> >>
>> >
>> > Could we combine the last two bits here into a global toggle that doesn't
>> > require a sysctl? If any driver is put into multi-buffer mode, then the system
>> > switches to requiring all programs be multi-buffer? When the last multi-buffer
>> > enabled driver switches out of multi-buffer, remove the system-wide
>> > restriction?
>>
>> Well, the trouble here is that we don't necessarily have an explicit
>> "multi-buf mode" for devices. For instance, you could raise the MTU of a
>> device without it necessarily involving any XDP multi-buffer stuff (if
>> you're not running XDP on that device). So if we did turn "raising the
>> MTU" into such a mode switch, we would end up blocking any MTU changes
>> if any XDP programs are loaded. Or having an MTU change cause a
>> force-unload of all XDP programs.
>
> MTU change that bumps driver into multi-buf mode or enable
> the header split that also bumps it into multi-buf mode
> probably shouldn't be allowed when non-mb aware xdp prog is attached.
> That would be the simplest and least surprising behavior.
> Force unload could cause security issues.

Yeah, I agree, force-unload is not a good solution - I mostly mentioned
it for completeness.

I think it would be fine for a driver to disallow config changes if a
non-mb-aware program is loaded; that basically amounts to the checks
drivers do already if an XDP-program is loaded, so we could just keep
those around for the non-mb-case.

>> Neither of those are desirable outcomes, I think; and if we add a
>> separate "XDP multi-buff" switch, we might as well make it system-wide?
>
> If we have an internal flag 'this driver supports multi-buf xdp' cannot we
> make xdp_redirect to linearize in case the packet is being redirected
> to non multi-buf aware driver (potentially with corresponding non mb aware xdp
> progs attached) from mb aware driver?

Hmm, the assumption that XDP frames take up at most one page has been
fundamental from the start of XDP. So what does linearise mean in this
context? If we get a 9k packet, should we dynamically allocate a
multi-page chunk of contiguous memory and copy the frame into that, or
were you thinking something else?

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 22:20       ` Toke Høiland-Jørgensen
@ 2021-09-21 22:51         ` Jakub Kicinski
  2021-09-22 20:01           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2021-09-21 22:51 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi,
	Daniel Borkmann, John Fastabend, Network Development, bpf

On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
> >> Neither of those are desirable outcomes, I think; and if we add a
> >> separate "XDP multi-buff" switch, we might as well make it system-wide?  
> >
> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we
> > make xdp_redirect to linearize in case the packet is being redirected
> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp
> > progs attached) from mb aware driver?  
> 
> Hmm, the assumption that XDP frames take up at most one page has been
> fundamental from the start of XDP. So what does linearise mean in this
> context? If we get a 9k packet, should we dynamically allocate a
> multi-page chunk of contiguous memory and copy the frame into that, or
> were you thinking something else?

My $.02 would be to not care about redirect at all.

It's not like the user experience with redirect is anywhere close 
to amazing right now. Besides (with the exception of SW devices which
will likely gain mb support quickly) mixed-HW setups are very rare.
If the source of the redirect supports mb so will likely the target.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
  2021-09-21 17:31 ` Zvi Effron
@ 2021-09-21 22:54 ` Jakub Kicinski
  2021-09-22 20:02   ` Toke Høiland-Jørgensen
  2021-09-23 10:33 ` Lorenz Bauer
  2 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2021-09-21 22:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote:
> 1. Do nothing. This would make it up to users / sysadmins to avoid
>    anything breaking by manually making sure to not enable multi-buffer
>    support while loading any XDP programs that will malfunction if
>    presented with an mb frame. This will probably break in interesting
>    ways, but it's nice and simple from an implementation PoV. With this
>    we don't need the declaration discussed above either.
> 
> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>    the program doesn't understand it. This is relatively simple to
>    implement, but it also makes for difficult-to-understand issues (why
>    are my packets suddenly being dropped?), and it will incur runtime
>    overhead.
> 
> 3. Reject loading of programs that are not MB-aware when running in an
>    MB-enabled mode. This would make things break in more obvious ways,
>    and still allow a userspace loader to declare a program "MB-aware" to
>    force it to run if necessary. The problem then becomes at what level
>    to block this?
> 
>    Doing this at the driver level is not enough: while a particular
>    driver knows if it's running in multi-buff mode, we can't know for
>    sure if a particular XDP program is multi-buff aware at attach time:
>    it could be tail-calling other programs, or redirecting packets to
>    another interface where it will be processed by a non-MB aware
>    program.
> 
>    So another option is to make it a global toggle: e.g., create a new
>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>    program that doesn't support multi-buffer mode, and if it's unset,
>    disable multi-buffer mode in all drivers. This will make it explicit
>    when the multi-buffer mode is used, and prevent any accidental subtle
>    malfunction of existing XDP programs. The drawback is that it's a
>    mode switch, so more configuration complexity.

4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB
   thru tail calls.

IMHO that's very simple and covers majority of use cases.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 22:14       ` Toke Høiland-Jørgensen
@ 2021-09-21 23:10         ` Zvi Effron
  2021-09-22 20:13           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Zvi Effron @ 2021-09-21 23:10 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Zvi Effron <zeffron@riotgames.com> writes:
>
> > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Zvi Effron <zeffron@riotgames.com> writes:
> >>
> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
> >> >>
> >> >> Following our discussion at the LPC session today, I dug up my previous
> >> >> summary of the issue and some possible solutions[0]. Seems no on
> >> >> actually replied last time, which is why we went with the "do nothing"
> >> >> approach, I suppose. I'm including the full text of the original email
> >> >> below; please take a look, and let's see if we can converge on a
> >> >> consensus here.
> >> >>
> >> >> First off, a problem description: If an existing XDP program is exposed
> >> >> to an xdp_buff that is really a multi-buffer, while it will continue to
> >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
> >> >> the packet it'll only see part of the payload and not be aware of that
> >> >> fact, and if it's calculating the packet length, that will also only be
> >> >> wrong (only counting the first fragment).
> >> >>
> >> >> So what to do about this? First of all, to do anything about it, XDP
> >> >> programs need to be able to declare themselves "multi-buffer aware" (but
> >> >> see point 1 below). We could try to auto-detect it in the verifier by
> >> >> which helpers the program is using, but since existing programs could be
> >> >> perfectly happy to just keep running, it probably needs to be something
> >> >> the program communicates explicitly. One option is to use the
> >> >> expected_attach_type to encode this; programs can then declare it in the
> >> >> source by section name, or the userspace loader can set the type for
> >> >> existing programs if needed.
> >> >>
> >> >> With this, the kernel will know if a given XDP program is multi-buff
> >> >> aware and can decide what to do with that information. For this we came
> >> >> up with basically three options:
> >> >>
> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
> >> >>    anything breaking by manually making sure to not enable multi-buffer
> >> >>    support while loading any XDP programs that will malfunction if
> >> >>    presented with an mb frame. This will probably break in interesting
> >> >>    ways, but it's nice and simple from an implementation PoV. With this
> >> >>    we don't need the declaration discussed above either.
> >> >>
> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
> >> >>    the program doesn't understand it. This is relatively simple to
> >> >>    implement, but it also makes for difficult-to-understand issues (why
> >> >>    are my packets suddenly being dropped?), and it will incur runtime
> >> >>    overhead.
> >> >>
> >> >> 3. Reject loading of programs that are not MB-aware when running in an
> >> >>    MB-enabled mode. This would make things break in more obvious ways,
> >> >>    and still allow a userspace loader to declare a program "MB-aware" to
> >> >>    force it to run if necessary. The problem then becomes at what level
> >> >>    to block this?
> >> >>
> >> >
> >> > I think there's another potential problem with this as well: what happens to
> >> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
> >>
> >> I'd say probably the opposite: You can't toggle whatever switch we end
> >> up with if there are any non-MB-aware programs (you'd have to unload
> >> them first)...
> >>
> >
> > How would we communicate that issue? dmesg? I'm not very familiar with
> > how sysctl change failure causes are communicated to users, so this
> > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer
> > 1` (or whatever ends up actually being the toggle) to active
> > multi-buffer, and it fails because there's a loaded non-aware program,
> > that seems like a potential for a lot of administrator pain.
>
> Hmm, good question. Document that this only fails if there's a
> non-mb-aware XDP program loaded? Or use some other mechanism with better
> feedback?
>
> >> >>    Doing this at the driver level is not enough: while a particular
> >> >>    driver knows if it's running in multi-buff mode, we can't know for
> >> >>    sure if a particular XDP program is multi-buff aware at attach time:
> >> >>    it could be tail-calling other programs, or redirecting packets to
> >> >>    another interface where it will be processed by a non-MB aware
> >> >>    program.
> >> >>
> >> >>    So another option is to make it a global toggle: e.g., create a new
> >> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
> >> >>    program that doesn't support multi-buffer mode, and if it's unset,
> >> >>    disable multi-buffer mode in all drivers. This will make it explicit
> >> >>    when the multi-buffer mode is used, and prevent any accidental subtle
> >> >>    malfunction of existing XDP programs. The drawback is that it's a
> >> >>    mode switch, so more configuration complexity.
> >> >>
> >> >
> >> > Could we combine the last two bits here into a global toggle that doesn't
> >> > require a sysctl? If any driver is put into multi-buffer mode, then the system
> >> > switches to requiring all programs be multi-buffer? When the last multi-buffer
> >> > enabled driver switches out of multi-buffer, remove the system-wide
> >> > restriction?
> >>
> >> Well, the trouble here is that we don't necessarily have an explicit
> >> "multi-buf mode" for devices. For instance, you could raise the MTU of a
> >> device without it necessarily involving any XDP multi-buffer stuff (if
> >> you're not running XDP on that device). So if we did turn "raising the
> >> MTU" into such a mode switch, we would end up blocking any MTU changes
> >> if any XDP programs are loaded. Or having an MTU change cause a
> >> force-unload of all XDP programs.
> >
> > Maybe I missed something then, but you had stated that "while a
> > particular driver knows if it's running in multi-buff mode" so I
> > assumed that the driver would be able to tell when to toggle the mode
> > on.
>
> Well, a driver knows when it is attaching an XDP program whether it (the
> driver) is configured in a way such that this XDP program could
> encounter a multi-buf.

I know drivers sometimes reconfigure themselves when an XDP program is
attached, but is there any information provided by the attach (other than that
an XDP program is attaching) that they use to make configuration decisions
during that reconfiguration?

Without modifying the driver to intentionally configure itself differently
based on whether or not the program is mb-aware (which is believe is currently
not the case for any driver), won't the configuration of a driver be identical
post XDP attach regardless of whether or not the program is mb-aware or not?

I was thinking the driver would make it's mb-aware determination (and refcount
adjustments) when its configuration changes for any reason that could
potentially affect mb-aware status (mostly MTU adjustments, I suspect).

>
> > I had been thinking that when a driver turned multi-buffer off, it
> > could trigger a check of all drivers, but that also seems like it
> > could just be a global refcount of all the drivers that have requested
> > multi-buffer mode. When a driver enables multi-buffer for itself, it
> > increments the refcount, and when it disables, it decrements. A
> > non-zero count means the system is in multi-buffer mode.
>
> I guess we could do a refcount-type thing when an multi-buf XDP program
> is first attached (as per above). But I think it may be easier to just
> do it at load-time, then, so it doesn't have to be in the driver, but
> the BPF core could just enforce it.
>
> This would basically amount to a rule saying "you can't mix mb-aware and
> non-mb-aware programs", and the first type to be loaded determines which
> mode the system is in. This would be fairly simple to implement and
> enforce, I suppose. The drawback is that it's potentially racy in the
> order programs are loaded...
>

Accepting or rejecting at load time would definitely simplify things a bit. But
I think the raciness is worse than just based on the first program to load. If
we're doing refcounting at attach/detach time, then I can load an mb-aware and
an mb-unaware program before attaching anything. What do I do when I attach one
of them? The other would be in violation.

If instead of making the determination at attach time, we make it at load time,
I think it'd be better to go back to the sysctl controlling it, and simply not
allow changing the sysctl if any XDP program at all is loaded, as opposed to
if a non-aware program is installed.

Then we're back to the sysctl controlling whether or not mb-aware is required.
We stil have a communication to the administrator problem, but it's simplified
a bit from "some loaded program doesn't comply" and having to track down which
one to "there is an XDP program installed".

> -Toke
>

Side note: how do extension programs fit into this? An extension program that's
going to freplace a function in an XDP program (that receives the context)
would also need to mb-aware or not, but not all extension programs can attach
to such functions, and we wouldn't want those programs to be impacted. Is this
as simple as marking every XDP program and every extension program that takes
an XDP context parameter as needing to be marked as mb-aware?

--Zvi

On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Zvi Effron <zeffron@riotgames.com> writes:
>
> > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Zvi Effron <zeffron@riotgames.com> writes:
> >>
> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
> >> >>
> >> >> Following our discussion at the LPC session today, I dug up my previous
> >> >> summary of the issue and some possible solutions[0]. Seems no on
> >> >> actually replied last time, which is why we went with the "do nothing"
> >> >> approach, I suppose. I'm including the full text of the original email
> >> >> below; please take a look, and let's see if we can converge on a
> >> >> consensus here.
> >> >>
> >> >> First off, a problem description: If an existing XDP program is exposed
> >> >> to an xdp_buff that is really a multi-buffer, while it will continue to
> >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
> >> >> the packet it'll only see part of the payload and not be aware of that
> >> >> fact, and if it's calculating the packet length, that will also only be
> >> >> wrong (only counting the first fragment).
> >> >>
> >> >> So what to do about this? First of all, to do anything about it, XDP
> >> >> programs need to be able to declare themselves "multi-buffer aware" (but
> >> >> see point 1 below). We could try to auto-detect it in the verifier by
> >> >> which helpers the program is using, but since existing programs could be
> >> >> perfectly happy to just keep running, it probably needs to be something
> >> >> the program communicates explicitly. One option is to use the
> >> >> expected_attach_type to encode this; programs can then declare it in the
> >> >> source by section name, or the userspace loader can set the type for
> >> >> existing programs if needed.
> >> >>
> >> >> With this, the kernel will know if a given XDP program is multi-buff
> >> >> aware and can decide what to do with that information. For this we came
> >> >> up with basically three options:
> >> >>
> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
> >> >>    anything breaking by manually making sure to not enable multi-buffer
> >> >>    support while loading any XDP programs that will malfunction if
> >> >>    presented with an mb frame. This will probably break in interesting
> >> >>    ways, but it's nice and simple from an implementation PoV. With this
> >> >>    we don't need the declaration discussed above either.
> >> >>
> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
> >> >>    the program doesn't understand it. This is relatively simple to
> >> >>    implement, but it also makes for difficult-to-understand issues (why
> >> >>    are my packets suddenly being dropped?), and it will incur runtime
> >> >>    overhead.
> >> >>
> >> >> 3. Reject loading of programs that are not MB-aware when running in an
> >> >>    MB-enabled mode. This would make things break in more obvious ways,
> >> >>    and still allow a userspace loader to declare a program "MB-aware" to
> >> >>    force it to run if necessary. The problem then becomes at what level
> >> >>    to block this?
> >> >>
> >> >
> >> > I think there's another potential problem with this as well: what happens to
> >> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
> >>
> >> I'd say probably the opposite: You can't toggle whatever switch we end
> >> up with if there are any non-MB-aware programs (you'd have to unload
> >> them first)...
> >>
> >
> > How would we communicate that issue? dmesg? I'm not very familiar with
> > how sysctl change failure causes are communicated to users, so this
> > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer
> > 1` (or whatever ends up actually being the toggle) to active
> > multi-buffer, and it fails because there's a loaded non-aware program,
> > that seems like a potential for a lot of administrator pain.
>
> Hmm, good question. Document that this only fails if there's a
> non-mb-aware XDP program loaded? Or use some other mechanism with better
> feedback?
>
> >> >>    Doing this at the driver level is not enough: while a particular
> >> >>    driver knows if it's running in multi-buff mode, we can't know for
> >> >>    sure if a particular XDP program is multi-buff aware at attach time:
> >> >>    it could be tail-calling other programs, or redirecting packets to
> >> >>    another interface where it will be processed by a non-MB aware
> >> >>    program.
> >> >>
> >> >>    So another option is to make it a global toggle: e.g., create a new
> >> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
> >> >>    program that doesn't support multi-buffer mode, and if it's unset,
> >> >>    disable multi-buffer mode in all drivers. This will make it explicit
> >> >>    when the multi-buffer mode is used, and prevent any accidental subtle
> >> >>    malfunction of existing XDP programs. The drawback is that it's a
> >> >>    mode switch, so more configuration complexity.
> >> >>
> >> >
> >> > Could we combine the last two bits here into a global toggle that doesn't
> >> > require a sysctl? If any driver is put into multi-buffer mode, then the system
> >> > switches to requiring all programs be multi-buffer? When the last multi-buffer
> >> > enabled driver switches out of multi-buffer, remove the system-wide
> >> > restriction?
> >>
> >> Well, the trouble here is that we don't necessarily have an explicit
> >> "multi-buf mode" for devices. For instance, you could raise the MTU of a
> >> device without it necessarily involving any XDP multi-buffer stuff (if
> >> you're not running XDP on that device). So if we did turn "raising the
> >> MTU" into such a mode switch, we would end up blocking any MTU changes
> >> if any XDP programs are loaded. Or having an MTU change cause a
> >> force-unload of all XDP programs.
> >
> > Maybe I missed something then, but you had stated that "while a
> > particular driver knows if it's running in multi-buff mode" so I
> > assumed that the driver would be able to tell when to toggle the mode
> > on.
>
> Well, a driver knows when it is attaching an XDP program whether it (the
> driver) is configured in a way such that this XDP program could
> encounter a multi-buf.
>
> > I had been thinking that when a driver turned multi-buffer off, it
> > could trigger a check of all drivers, but that also seems like it
> > could just be a global refcount of all the drivers that have requested
> > multi-buffer mode. When a driver enables multi-buffer for itself, it
> > increments the refcount, and when it disables, it decrements. A
> > non-zero count means the system is in multi-buffer mode.
>
> I guess we could do a refcount-type thing when an multi-buf XDP program
> is first attached (as per above). But I think it may be easier to just
> do it at load-time, then, so it doesn't have to be in the driver, but
> the BPF core could just enforce it.
>
> This would basically amount to a rule saying "you can't mix mb-aware and
> non-mb-aware programs", and the first type to be loaded determines which
> mode the system is in. This would be fairly simple to implement and
> enforce, I suppose. The drawback is that it's potentially racy in the
> order programs are loaded...
>
> -Toke
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 22:51         ` Jakub Kicinski
@ 2021-09-22 20:01           ` Toke Høiland-Jørgensen
  2021-09-22 21:23             ` Zvi Effron
  2021-09-23 13:46             ` Jakub Kicinski
  0 siblings, 2 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-22 20:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi,
	Daniel Borkmann, John Fastabend, Network Development, bpf

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
>> >> Neither of those are desirable outcomes, I think; and if we add a
>> >> separate "XDP multi-buff" switch, we might as well make it system-wide?  
>> >
>> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we
>> > make xdp_redirect to linearize in case the packet is being redirected
>> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp
>> > progs attached) from mb aware driver?  
>> 
>> Hmm, the assumption that XDP frames take up at most one page has been
>> fundamental from the start of XDP. So what does linearise mean in this
>> context? If we get a 9k packet, should we dynamically allocate a
>> multi-page chunk of contiguous memory and copy the frame into that, or
>> were you thinking something else?
>
> My $.02 would be to not care about redirect at all.
>
> It's not like the user experience with redirect is anywhere close 
> to amazing right now. Besides (with the exception of SW devices which
> will likely gain mb support quickly) mixed-HW setups are very rare.
> If the source of the redirect supports mb so will likely the target.

It's not about device support it's about XDP program support: If I run
an MB-aware XDP program on a physical interface and redirect the (MB)
frame into a container, and there's an XDP program running inside that
container that isn't MB-aware, bugs will ensue. Doesn't matter if the
veth driver itself supports MB...

We could leave that as a "don't do that, then" kind of thing, but that
was what we were proposing (as the "do nothing" option) and got some
pushback on, hence why we're having this conversation :)

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 22:54 ` Jakub Kicinski
@ 2021-09-22 20:02   ` Toke Høiland-Jørgensen
  2021-09-22 21:11     ` Zvi Effron
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-22 20:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

Jakub Kicinski <kuba@kernel.org> writes:

> On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote:
>> 1. Do nothing. This would make it up to users / sysadmins to avoid
>>    anything breaking by manually making sure to not enable multi-buffer
>>    support while loading any XDP programs that will malfunction if
>>    presented with an mb frame. This will probably break in interesting
>>    ways, but it's nice and simple from an implementation PoV. With this
>>    we don't need the declaration discussed above either.
>> 
>> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>>    the program doesn't understand it. This is relatively simple to
>>    implement, but it also makes for difficult-to-understand issues (why
>>    are my packets suddenly being dropped?), and it will incur runtime
>>    overhead.
>> 
>> 3. Reject loading of programs that are not MB-aware when running in an
>>    MB-enabled mode. This would make things break in more obvious ways,
>>    and still allow a userspace loader to declare a program "MB-aware" to
>>    force it to run if necessary. The problem then becomes at what level
>>    to block this?
>> 
>>    Doing this at the driver level is not enough: while a particular
>>    driver knows if it's running in multi-buff mode, we can't know for
>>    sure if a particular XDP program is multi-buff aware at attach time:
>>    it could be tail-calling other programs, or redirecting packets to
>>    another interface where it will be processed by a non-MB aware
>>    program.
>> 
>>    So another option is to make it a global toggle: e.g., create a new
>>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>>    program that doesn't support multi-buffer mode, and if it's unset,
>>    disable multi-buffer mode in all drivers. This will make it explicit
>>    when the multi-buffer mode is used, and prevent any accidental subtle
>>    malfunction of existing XDP programs. The drawback is that it's a
>>    mode switch, so more configuration complexity.
>
> 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB
>    thru tail calls.
>
> IMHO that's very simple and covers majority of use cases.

Using the program type (or maybe the expected_attach_type) was how I was
imagining we'd encode the "I am MB aware" flag, yes. I hadn't actually
considered that this could be used to also restrict tail call/freplace
attachment, but that's a good point. So this leaves just the redirect
issue, then, see my other reply.

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 23:10         ` Zvi Effron
@ 2021-09-22 20:13           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-22 20:13 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	netdev, bpf

Zvi Effron <zeffron@riotgames.com> writes:

> On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Zvi Effron <zeffron@riotgames.com> writes:
>>
>> > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Zvi Effron <zeffron@riotgames.com> writes:
>> >>
>> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
>> >> >>
>> >> >> Following our discussion at the LPC session today, I dug up my previous
>> >> >> summary of the issue and some possible solutions[0]. Seems no on
>> >> >> actually replied last time, which is why we went with the "do nothing"
>> >> >> approach, I suppose. I'm including the full text of the original email
>> >> >> below; please take a look, and let's see if we can converge on a
>> >> >> consensus here.
>> >> >>
>> >> >> First off, a problem description: If an existing XDP program is exposed
>> >> >> to an xdp_buff that is really a multi-buffer, while it will continue to
>> >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> >> >> the packet it'll only see part of the payload and not be aware of that
>> >> >> fact, and if it's calculating the packet length, that will also only be
>> >> >> wrong (only counting the first fragment).
>> >> >>
>> >> >> So what to do about this? First of all, to do anything about it, XDP
>> >> >> programs need to be able to declare themselves "multi-buffer aware" (but
>> >> >> see point 1 below). We could try to auto-detect it in the verifier by
>> >> >> which helpers the program is using, but since existing programs could be
>> >> >> perfectly happy to just keep running, it probably needs to be something
>> >> >> the program communicates explicitly. One option is to use the
>> >> >> expected_attach_type to encode this; programs can then declare it in the
>> >> >> source by section name, or the userspace loader can set the type for
>> >> >> existing programs if needed.
>> >> >>
>> >> >> With this, the kernel will know if a given XDP program is multi-buff
>> >> >> aware and can decide what to do with that information. For this we came
>> >> >> up with basically three options:
>> >> >>
>> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
>> >> >>    anything breaking by manually making sure to not enable multi-buffer
>> >> >>    support while loading any XDP programs that will malfunction if
>> >> >>    presented with an mb frame. This will probably break in interesting
>> >> >>    ways, but it's nice and simple from an implementation PoV. With this
>> >> >>    we don't need the declaration discussed above either.
>> >> >>
>> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>> >> >>    the program doesn't understand it. This is relatively simple to
>> >> >>    implement, but it also makes for difficult-to-understand issues (why
>> >> >>    are my packets suddenly being dropped?), and it will incur runtime
>> >> >>    overhead.
>> >> >>
>> >> >> 3. Reject loading of programs that are not MB-aware when running in an
>> >> >>    MB-enabled mode. This would make things break in more obvious ways,
>> >> >>    and still allow a userspace loader to declare a program "MB-aware" to
>> >> >>    force it to run if necessary. The problem then becomes at what level
>> >> >>    to block this?
>> >> >>
>> >> >
>> >> > I think there's another potential problem with this as well: what happens to
>> >> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>> >>
>> >> I'd say probably the opposite: You can't toggle whatever switch we end
>> >> up with if there are any non-MB-aware programs (you'd have to unload
>> >> them first)...
>> >>
>> >
>> > How would we communicate that issue? dmesg? I'm not very familiar with
>> > how sysctl change failure causes are communicated to users, so this
>> > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer
>> > 1` (or whatever ends up actually being the toggle) to active
>> > multi-buffer, and it fails because there's a loaded non-aware program,
>> > that seems like a potential for a lot of administrator pain.
>>
>> Hmm, good question. Document that this only fails if there's a
>> non-mb-aware XDP program loaded? Or use some other mechanism with better
>> feedback?
>>
>> >> >>    Doing this at the driver level is not enough: while a particular
>> >> >>    driver knows if it's running in multi-buff mode, we can't know for
>> >> >>    sure if a particular XDP program is multi-buff aware at attach time:
>> >> >>    it could be tail-calling other programs, or redirecting packets to
>> >> >>    another interface where it will be processed by a non-MB aware
>> >> >>    program.
>> >> >>
>> >> >>    So another option is to make it a global toggle: e.g., create a new
>> >> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>> >> >>    program that doesn't support multi-buffer mode, and if it's unset,
>> >> >>    disable multi-buffer mode in all drivers. This will make it explicit
>> >> >>    when the multi-buffer mode is used, and prevent any accidental subtle
>> >> >>    malfunction of existing XDP programs. The drawback is that it's a
>> >> >>    mode switch, so more configuration complexity.
>> >> >>
>> >> >
>> >> > Could we combine the last two bits here into a global toggle that doesn't
>> >> > require a sysctl? If any driver is put into multi-buffer mode, then the system
>> >> > switches to requiring all programs be multi-buffer? When the last multi-buffer
>> >> > enabled driver switches out of multi-buffer, remove the system-wide
>> >> > restriction?
>> >>
>> >> Well, the trouble here is that we don't necessarily have an explicit
>> >> "multi-buf mode" for devices. For instance, you could raise the MTU of a
>> >> device without it necessarily involving any XDP multi-buffer stuff (if
>> >> you're not running XDP on that device). So if we did turn "raising the
>> >> MTU" into such a mode switch, we would end up blocking any MTU changes
>> >> if any XDP programs are loaded. Or having an MTU change cause a
>> >> force-unload of all XDP programs.
>> >
>> > Maybe I missed something then, but you had stated that "while a
>> > particular driver knows if it's running in multi-buff mode" so I
>> > assumed that the driver would be able to tell when to toggle the mode
>> > on.
>>
>> Well, a driver knows when it is attaching an XDP program whether it (the
>> driver) is configured in a way such that this XDP program could
>> encounter a multi-buf.
>
> I know drivers sometimes reconfigure themselves when an XDP program is
> attached, but is there any information provided by the attach (other than that
> an XDP program is attaching) that they use to make configuration decisions
> during that reconfiguration?
>
> Without modifying the driver to intentionally configure itself differently
> based on whether or not the program is mb-aware (which is believe is currently
> not the case for any driver), won't the configuration of a driver be identical
> post XDP attach regardless of whether or not the program is mb-aware or not?
>
> I was thinking the driver would make it's mb-aware determination (and refcount
> adjustments) when its configuration changes for any reason that could
> potentially affect mb-aware status (mostly MTU adjustments, I suspect).
>
>>
>> > I had been thinking that when a driver turned multi-buffer off, it
>> > could trigger a check of all drivers, but that also seems like it
>> > could just be a global refcount of all the drivers that have requested
>> > multi-buffer mode. When a driver enables multi-buffer for itself, it
>> > increments the refcount, and when it disables, it decrements. A
>> > non-zero count means the system is in multi-buffer mode.
>>
>> I guess we could do a refcount-type thing when an multi-buf XDP program
>> is first attached (as per above). But I think it may be easier to just
>> do it at load-time, then, so it doesn't have to be in the driver, but
>> the BPF core could just enforce it.
>>
>> This would basically amount to a rule saying "you can't mix mb-aware and
>> non-mb-aware programs", and the first type to be loaded determines which
>> mode the system is in. This would be fairly simple to implement and
>> enforce, I suppose. The drawback is that it's potentially racy in the
>> order programs are loaded...
>>
>
> Accepting or rejecting at load time would definitely simplify things a bit. But
> I think the raciness is worse than just based on the first program to load. If
> we're doing refcounting at attach/detach time, then I can load an mb-aware and
> an mb-unaware program before attaching anything. What do I do when I attach one
> of them? The other would be in violation.
>
> If instead of making the determination at attach time, we make it at load time,
> I think it'd be better to go back to the sysctl controlling it, and simply not
> allow changing the sysctl if any XDP program at all is loaded, as opposed to
> if a non-aware program is installed.
>
> Then we're back to the sysctl controlling whether or not mb-aware is required.
> We stil have a communication to the administrator problem, but it's simplified
> a bit from "some loaded program doesn't comply" and having to track down which
> one to "there is an XDP program installed".

Right, that does simplify things. But if we encode the "mb-aware"
property in the program type (or sub-type, AKA expected_attach_type)
discovering which program is blocking the toggle should be fairly
simple, no?

>> -Toke
>>
>
> Side note: how do extension programs fit into this? An extension program that's
> going to freplace a function in an XDP program (that receives the context)
> would also need to mb-aware or not, but not all extension programs can attach
> to such functions, and we wouldn't want those programs to be impacted. Is this
> as simple as marking every XDP program and every extension program that takes
> an XDP context parameter as needing to be marked as mb-aware?

Hmm, that's also a good question. I was mentally assuming that freplace
programs could be limited by target type, which would mean that just
encoding the "mb-aware" property in attach type would be enough. But I
see that this is not, in fact, the case. Right now expected_attach_type
is unpused for freplace programs, so we could use that if the kernel is
to enforce this. Or we could make it up to the userspace loader to
ensure this by whatever mechanism it wants. For libxdp that would mean
setting the type of the dispatcher program based on the component
program and refusing to mix those, for instance...

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-22 20:02   ` Toke Høiland-Jørgensen
@ 2021-09-22 21:11     ` Zvi Effron
  2021-09-23 19:00       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Zvi Effron @ 2021-09-22 21:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann,
	John Fastabend, netdev, bpf

On Wed, Sep 22, 2021 at 1:03 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jakub Kicinski <kuba@kernel.org> writes:
>
> > On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote:
> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
> >> anything breaking by manually making sure to not enable multi-buffer
> >> support while loading any XDP programs that will malfunction if
> >> presented with an mb frame. This will probably break in interesting
> >> ways, but it's nice and simple from an implementation PoV. With this
> >> we don't need the declaration discussed above either.
> >>
> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
> >> the program doesn't understand it. This is relatively simple to
> >> implement, but it also makes for difficult-to-understand issues (why
> >> are my packets suddenly being dropped?), and it will incur runtime
> >> overhead.
> >>
> >> 3. Reject loading of programs that are not MB-aware when running in an
> >> MB-enabled mode. This would make things break in more obvious ways,
> >> and still allow a userspace loader to declare a program "MB-aware" to
> >> force it to run if necessary. The problem then becomes at what level
> >> to block this?
> >>
> >> Doing this at the driver level is not enough: while a particular
> >> driver knows if it's running in multi-buff mode, we can't know for
> >> sure if a particular XDP program is multi-buff aware at attach time:
> >> it could be tail-calling other programs, or redirecting packets to
> >> another interface where it will be processed by a non-MB aware
> >> program.
> >>
> >> So another option is to make it a global toggle: e.g., create a new
> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP
> >> program that doesn't support multi-buffer mode, and if it's unset,
> >> disable multi-buffer mode in all drivers. This will make it explicit
> >> when the multi-buffer mode is used, and prevent any accidental subtle
> >> malfunction of existing XDP programs. The drawback is that it's a
> >> mode switch, so more configuration complexity.
> >
> > 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB
> > thru tail calls.
> >
> > IMHO that's very simple and covers majority of use cases.
>
> Using the program type (or maybe the expected_attach_type) was how I was
> imagining we'd encode the "I am MB aware" flag, yes. I hadn't actually
> considered that this could be used to also restrict tail call/freplace
> attachment, but that's a good point. So this leaves just the redirect
> issue, then, see my other reply.
>

I really like this apporoach as well, but before we commit to it, how likely
are we to encounter this type of situation (where we need to indicate whether
an XDP program supports a new capability) again in the future. And if we do,
are we willing to require that all programs supporting that new feature are
also mb-aware? Essentially, the suboptimal case I'm envisioning is needing to
have XDP_MB, XD_MB_NEW_THING, XDP_NEW_THING, and XDP all as program types. That
leads to exponential explosion in program types.

Also, every time we add a program type to encode a feature (instead of a truly
new type), we're essentially forcing a recompilation (and redeployment) of all
existing programs that take advantage of the libbpf section name program
typing. (I'm sure there are tools that can rename a section in an ELF file
without recompilation, but recompilation seems the simplest way to correct the
ELF files for most people.)

If we think this is a one-off, it's probably fine, but if we think it'll happen
again, is it worth it to find a solution that will work for future cases now,
instead of having XDP, XDP_MB, and then having to find a solution for future
cases?

--Zvi

> -Toke
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-22 20:01           ` Toke Høiland-Jørgensen
@ 2021-09-22 21:23             ` Zvi Effron
  2021-09-23 18:45               ` Toke Høiland-Jørgensen
  2021-09-23 13:46             ` Jakub Kicinski
  1 sibling, 1 reply; 26+ messages in thread
From: Zvi Effron @ 2021-09-22 21:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, Alexei Starovoitov, Lorenz Bauer,
	Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	Network Development, bpf

On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jakub Kicinski <kuba@kernel.org> writes:
>
> > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
> >> >> Neither of those are desirable outcomes, I think; and if we add a
> >> >> separate "XDP multi-buff" switch, we might as well make it system-wide?
> >> >
> >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we
> >> > make xdp_redirect to linearize in case the packet is being redirected
> >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp
> >> > progs attached) from mb aware driver?
> >>
> >> Hmm, the assumption that XDP frames take up at most one page has been
> >> fundamental from the start of XDP. So what does linearise mean in this
> >> context? If we get a 9k packet, should we dynamically allocate a
> >> multi-page chunk of contiguous memory and copy the frame into that, or
> >> were you thinking something else?
> >
> > My $.02 would be to not care about redirect at all.
> >
> > It's not like the user experience with redirect is anywhere close
> > to amazing right now. Besides (with the exception of SW devices which
> > will likely gain mb support quickly) mixed-HW setups are very rare.
> > If the source of the redirect supports mb so will likely the target.
>
> It's not about device support it's about XDP program support: If I run
> an MB-aware XDP program on a physical interface and redirect the (MB)
> frame into a container, and there's an XDP program running inside that
> container that isn't MB-aware, bugs will ensue. Doesn't matter if the
> veth driver itself supports MB...
>
> We could leave that as a "don't do that, then" kind of thing, but that
> was what we were proposing (as the "do nothing" option) and got some
> pushback on, hence why we're having this conversation :)
>
> -Toke
>

I hadn't even considered the case of redirecting to a veth pair on the same
system. I'm assuming from your statement that the buffers are passed directly
to the ingress inside the container and don't go through the sort of egress
process they would if leaving the system? And I'm assuming that's as an
optimization?

I'm not sure that makes a difference, though. It's not about whether the
driver's code is mb-capable, it's about whether the driver _as currently
configured_ could generate multiple buffers. If it can, then only an mb-aware
program should be able to be attached to it (and tail called from whatever's
attached to it). If it can't, then there should be no way to have multiple
buffers come to it.

So in the situation you've described, either the veth driver should be in a
state where it coalesces the multiple buffers into one, fragmenting the frame
if necessary or drops the frame, or the program attached inside the container
would need to be mb-aware. I'm assuming with the veth driver as written, this
might mean that all programs attached to the veth driver would need to be
mb-aware, which is obviously undesirable.

All of which significantly adds to the complexity to support mb-aware, so maybe
this could be developed later? Initially we could have a sysctl toggling the
state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a
state for dynamic control once all XDP supporting drivers support the necessary
dynamic functionality (if ever). At that point we'd have actual experience with
the sysctl and could see how much of a burden having static control is.

I may have been misinterpreting your use case though, and you were talking
about the XDP program running on the egress side of the redirect? Is that what
you were talking about case?

--Zvi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
  2021-09-21 17:31 ` Zvi Effron
  2021-09-21 22:54 ` Jakub Kicinski
@ 2021-09-23 10:33 ` Lorenz Bauer
  2021-09-23 12:59   ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 26+ messages in thread
From: Lorenz Bauer @ 2021-09-23 10:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf

On Tue, 21 Sept 2021 at 17:06, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Hi Lorenz (Cc. the other people who participated in today's discussion)
>
> Following our discussion at the LPC session today, I dug up my previous
> summary of the issue and some possible solutions[0]. Seems no on
> actually replied last time, which is why we went with the "do nothing"
> approach, I suppose. I'm including the full text of the original email
> below; please take a look, and let's see if we can converge on a
> consensus here.

Hi Toke,

Thanks for looping me in again. A bit of context what XDP at
Cloudflare looks like:

* We have a chain of XDP programs attached to a real network device.
This implements DDoS protection and L4 load balancing. This is
maintained by the team I am on.
* We have hundreds of network namespaces with veth that have XDP
attached to them. Traffic is routed from the root namespace into
these. This is maintained by the Magic Transit team, see this talk
from last year's LPC [1]
I'll try to summarise what I've picked up from the thread and add my
own 2c. Options being considered:

1. Make sure mb-aware and mb-unaware programs don't mix.

This could either be in the form of a sysctl or a dynamic property
similar to a refcount. We'd need to discern mb-aware from mb-unaware
somehow, most easily via a new program type. This means recompiling
existing programs (but then we expect that to be necessary anyways).
We'd also have to be able to indicate "mb-awareness" for freplace
programs.

The implementation complexity seems OK, but operator UX is not good:
it's not possible to slowly migrate a system to mb-awareness, it has
to happen in one fell swoop. This would be really problematic for us,
since we already have multiple teams writing and deploying XDP
independently of each other. This number is only going to grow. It
seems there will also be trickiness around redirecting into different
devices? Not something we do today, but it's kind of an obvious
optimization to start redirecting into network namespaces from XDP
instead of relying on routing.

2. Add a compatibility shim for mb-unaware programs receiving an mb frame.

We'd still need a way to indicate "MB-OK", but it could be a piece of
metadata on a bpf_prog. Whatever code dispatches to an XDP program
would have to include a prologue that linearises the xdp_buff if
necessary which implies allocating memory. I don't know how hard it is
to implement this. There is also the question of freplace: do we
extend linearising to them, or do they have to support MB?

You raised an interesting point: couldn't we hit programs that can't
handle data_end - data being above a certain length? I think we (=
Cloudflare) actually have one of those, since we in some cases need to
traverse the entire buffer to calculate a checksum (we encapsulate
UDPv4 in IPv6, don't ask). Turns out it's actually really hard to
calculate the checksum on a variable length packet in BPF so we've had
to introduce limits. However, this case isn't too important: we made
this choice consciously, knowing that MTU changes would break it.

Other than that I like this option a lot: mb-aware and mb-unaware
programs can co-exist, at the cost of performance. This allows
gradually migrating to our stack so that it can handle jumbo frames.

3. Make non-linearity invisible to the BPF program

Something I've wished for often is that I didn't have to deal with
nonlinearity at all, based on my experience with cls_redirect [2].
It's really hard to write a BPF program that handles non-linear skb,
especially when you have to call adjust_head, etc. which invalidates
packet buffers. This is probably impossible, but maybe someone has a
crazy idea? :)

Lorenz

1: https://youtu.be/UkvxPyIJAko?t=10057
2: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/bpf/progs/test_cls_redirect.c

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-23 10:33 ` Lorenz Bauer
@ 2021-09-23 12:59   ` Toke Høiland-Jørgensen
  2021-09-24 10:18     ` Lorenz Bauer
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-23 12:59 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Tue, 21 Sept 2021 at 17:06, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Hi Lorenz (Cc. the other people who participated in today's discussion)
>>
>> Following our discussion at the LPC session today, I dug up my previous
>> summary of the issue and some possible solutions[0]. Seems no on
>> actually replied last time, which is why we went with the "do nothing"
>> approach, I suppose. I'm including the full text of the original email
>> below; please take a look, and let's see if we can converge on a
>> consensus here.
>
> Hi Toke,
>
> Thanks for looping me in again. A bit of context what XDP at
> Cloudflare looks like:
>
> * We have a chain of XDP programs attached to a real network device.
> This implements DDoS protection and L4 load balancing. This is
> maintained by the team I am on.
> * We have hundreds of network namespaces with veth that have XDP
> attached to them. Traffic is routed from the root namespace into
> these. This is maintained by the Magic Transit team, see this talk
> from last year's LPC [1]
> I'll try to summarise what I've picked up from the thread and add my
> own 2c. Options being considered:
>
> 1. Make sure mb-aware and mb-unaware programs don't mix.

I think I would rather state this as "make sure mb-unaware programs
never encounter an mb frame". The programs can mix just fine in a
single-buffer world since mb-aware programs are backwards compatible.
All the multibuf helpers will also do the right thing even if there's
only a single buffer in a given packet.

> This could either be in the form of a sysctl or a dynamic property
> similar to a refcount. We'd need to discern mb-aware from mb-unaware
> somehow, most easily via a new program type. This means recompiling
> existing programs (but then we expect that to be necessary anyways).

Small nit here: while in most cases this property will  probably be set
by recompilation, the loader can override it as well. So if you have a
non-mb-aware program that you know won't break in an mb-setting (because
it doesn't care about packet length, etc), your loader could just mark
it as 'mb-aware'.

Command-line loaders like xdp-loader and 'ip' would probably need a
manual override switch to do this for the case where you have an old XDP
program that you still want to run even though you've enabled MB mode.

> We'd also have to be able to indicate "mb-awareness" for freplace
> programs.
>
> The implementation complexity seems OK, but operator UX is not good:
> it's not possible to slowly migrate a system to mb-awareness, it has
> to happen in one fell swoop.

I don't think it has to be quite that bleak :)

Specifically, there is no reason to block mb-aware programs from loading
even when the multi-buffer mode is disabled. So a migration plan would
look something like:

1. Start out with the mb-sysctl toggled off. This will make the system
   behave like it does today, i.e., XDP programs won't load on
   interfaces with large MTUs.

2. Start porting all your XDP programs to make them mb-aware, and switch
   their program type as you do. In many cases this is just a matter of
   checking that the programs don't care about packet length. While this
   is ongoing you will have a mix of mb-aware and non-mb-aware programs
   running, but there will be no actual mb frames.

3. Once all your programs have been ported and marked as such, flip the
   sysctl. This will make the system start refusing to load any XDP
   programs that are not mb-aware.

4. Actually raise the MTU of your interfaces :)

> 2. Add a compatibility shim for mb-unaware programs receiving an mb frame.
>
> We'd still need a way to indicate "MB-OK", but it could be a piece of
> metadata on a bpf_prog. Whatever code dispatches to an XDP program
> would have to include a prologue that linearises the xdp_buff if
> necessary which implies allocating memory. I don't know how hard it is
> to implement this.

I think it would be somewhat non-trivial, and more importantly would
absolutely slaughter performance. And if you're using XDP, presumably
you care about that, so I'm not sure we're doing anyone any favours by
implementing such a compatibility layer?

> There is also the question of freplace: do we extend linearising to
> them, or do they have to support MB?

Well, today freplace programs just inherit the type of whatever they are
attaching to, so we'd have to go out of our way to block this. I think
logically it would be up to whatever loader is attaching freplace
programs, because that's the one that knows the semantics. E.g.,
something like libxdp that uses freplace to chain load programs would
have to make sure that the dispatcher program is only marked as mb-aware
if *all* the constituent programs are.

> You raised an interesting point: couldn't we hit programs that can't
> handle data_end - data being above a certain length? I think we (=
> Cloudflare) actually have one of those, since we in some cases need to
> traverse the entire buffer to calculate a checksum (we encapsulate
> UDPv4 in IPv6, don't ask). Turns out it's actually really hard to
> calculate the checksum on a variable length packet in BPF so we've had
> to introduce limits. However, this case isn't too important: we made
> this choice consciously, knowing that MTU changes would break it.

Yeah, for this I think you're on your own ;)

> Other than that I like this option a lot: mb-aware and mb-unaware
> programs can co-exist, at the cost of performance. This allows
> gradually migrating to our stack so that it can handle jumbo frames.

See above re: coexisting.

> 3. Make non-linearity invisible to the BPF program
>
> Something I've wished for often is that I didn't have to deal with
> nonlinearity at all, based on my experience with cls_redirect [2].
> It's really hard to write a BPF program that handles non-linear skb,
> especially when you have to call adjust_head, etc. which invalidates
> packet buffers. This is probably impossible, but maybe someone has a
> crazy idea? :)

With the other helpers that we started discussing, I don't think you
have to? I.e., with an xdp_load_bytes() or an xdp_data_pointer()-type
helper that works across fragment boundaries I think you'd be fine, no?

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-22 20:01           ` Toke Høiland-Jørgensen
  2021-09-22 21:23             ` Zvi Effron
@ 2021-09-23 13:46             ` Jakub Kicinski
  2021-09-27 12:43               ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2021-09-23 13:46 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi,
	Daniel Borkmann, John Fastabend, Network Development, bpf

On Wed, 22 Sep 2021 22:01:17 +0200 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> >> Hmm, the assumption that XDP frames take up at most one page has been
> >> fundamental from the start of XDP. So what does linearise mean in this
> >> context? If we get a 9k packet, should we dynamically allocate a
> >> multi-page chunk of contiguous memory and copy the frame into that, or
> >> were you thinking something else?  
> >
> > My $.02 would be to not care about redirect at all.
> >
> > It's not like the user experience with redirect is anywhere close 
> > to amazing right now. Besides (with the exception of SW devices which
> > will likely gain mb support quickly) mixed-HW setups are very rare.
> > If the source of the redirect supports mb so will likely the target.  
> 
> It's not about device support it's about XDP program support: If I run
> an MB-aware XDP program on a physical interface and redirect the (MB)
> frame into a container, and there's an XDP program running inside that
> container that isn't MB-aware, bugs will ensue. Doesn't matter if the
> veth driver itself supports MB...

Ah, I see now.

> We could leave that as a "don't do that, then" kind of thing, but that
> was what we were proposing (as the "do nothing" option) and got some
> pushback on, hence why we're having this conversation :)

Let me make a general statement that we can't build large systems
without division of responsibilities. Device specific logic has to
be left to the driver. It's up to veth to apply its extra rules.

As Zvi said, tho, this can be left for later. IMHO initial patches can
drop all mb frames on redirect in the core, so we can make progress.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-22 21:23             ` Zvi Effron
@ 2021-09-23 18:45               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-23 18:45 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Jakub Kicinski, Alexei Starovoitov, Lorenz Bauer,
	Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	Network Development, bpf

Zvi Effron <zeffron@riotgames.com> writes:

> On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jakub Kicinski <kuba@kernel.org> writes:
>>
>> > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
>> >> >> Neither of those are desirable outcomes, I think; and if we add a
>> >> >> separate "XDP multi-buff" switch, we might as well make it system-wide?
>> >> >
>> >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we
>> >> > make xdp_redirect to linearize in case the packet is being redirected
>> >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp
>> >> > progs attached) from mb aware driver?
>> >>
>> >> Hmm, the assumption that XDP frames take up at most one page has been
>> >> fundamental from the start of XDP. So what does linearise mean in this
>> >> context? If we get a 9k packet, should we dynamically allocate a
>> >> multi-page chunk of contiguous memory and copy the frame into that, or
>> >> were you thinking something else?
>> >
>> > My $.02 would be to not care about redirect at all.
>> >
>> > It's not like the user experience with redirect is anywhere close
>> > to amazing right now. Besides (with the exception of SW devices which
>> > will likely gain mb support quickly) mixed-HW setups are very rare.
>> > If the source of the redirect supports mb so will likely the target.
>>
>> It's not about device support it's about XDP program support: If I run
>> an MB-aware XDP program on a physical interface and redirect the (MB)
>> frame into a container, and there's an XDP program running inside that
>> container that isn't MB-aware, bugs will ensue. Doesn't matter if the
>> veth driver itself supports MB...
>>
>> We could leave that as a "don't do that, then" kind of thing, but that
>> was what we were proposing (as the "do nothing" option) and got some
>> pushback on, hence why we're having this conversation :)
>>
>> -Toke
>>
>
> I hadn't even considered the case of redirecting to a veth pair on the same
> system. I'm assuming from your statement that the buffers are passed directly
> to the ingress inside the container and don't go through the sort of egress
> process they would if leaving the system? And I'm assuming that's as an
> optimization?

Yeah, if we redirect an XDP frame to a veth, the peer will get the same
xdp_frame, without ever building an SKB.

> I'm not sure that makes a difference, though. It's not about whether the
> driver's code is mb-capable, it's about whether the driver _as currently
> configured_ could generate multiple buffers. If it can, then only an mb-aware
> program should be able to be attached to it (and tail called from whatever's
> attached to it). If it can't, then there should be no way to have multiple
> buffers come to it.
>
> So in the situation you've described, either the veth driver should be in a
> state where it coalesces the multiple buffers into one, fragmenting the frame
> if necessary or drops the frame, or the program attached inside the container
> would need to be mb-aware. I'm assuming with the veth driver as written, this
> might mean that all programs attached to the veth driver would need to be
> mb-aware, which is obviously undesirable.

Hmm, I guess that as long as mb-frames only show up for large MTUs, the
MTU of the veth device would be a limiting factor just like for physical
devices, so we could just apply the same logic there. Not sure why I
didn't consider that before :/

> All of which significantly adds to the complexity to support mb-aware, so maybe
> this could be developed later? Initially we could have a sysctl toggling the
> state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a
> state for dynamic control once all XDP supporting drivers support the necessary
> dynamic functionality (if ever). At that point we'd have actual experience with
> the sysctl and could see how much of a burden having static control is.
>
> I may have been misinterpreting your use case though, and you were talking
> about the XDP program running on the egress side of the redirect? Is that what
> you were talking about case?

No I was talking about exactly what you outlined above. Although longer
term, I also think we can use XDP mb as a way to avoid having to
linearise SKBs when running XDP on them in veth (and for generic XDP) :)

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-22 21:11     ` Zvi Effron
@ 2021-09-23 19:00       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-23 19:00 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Jakub Kicinski, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann,
	John Fastabend, netdev, bpf

Zvi Effron <zeffron@riotgames.com> writes:

> On Wed, Sep 22, 2021 at 1:03 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jakub Kicinski <kuba@kernel.org> writes:
>>
>> > On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote:
>> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
>> >> anything breaking by manually making sure to not enable multi-buffer
>> >> support while loading any XDP programs that will malfunction if
>> >> presented with an mb frame. This will probably break in interesting
>> >> ways, but it's nice and simple from an implementation PoV. With this
>> >> we don't need the declaration discussed above either.
>> >>
>> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>> >> the program doesn't understand it. This is relatively simple to
>> >> implement, but it also makes for difficult-to-understand issues (why
>> >> are my packets suddenly being dropped?), and it will incur runtime
>> >> overhead.
>> >>
>> >> 3. Reject loading of programs that are not MB-aware when running in an
>> >> MB-enabled mode. This would make things break in more obvious ways,
>> >> and still allow a userspace loader to declare a program "MB-aware" to
>> >> force it to run if necessary. The problem then becomes at what level
>> >> to block this?
>> >>
>> >> Doing this at the driver level is not enough: while a particular
>> >> driver knows if it's running in multi-buff mode, we can't know for
>> >> sure if a particular XDP program is multi-buff aware at attach time:
>> >> it could be tail-calling other programs, or redirecting packets to
>> >> another interface where it will be processed by a non-MB aware
>> >> program.
>> >>
>> >> So another option is to make it a global toggle: e.g., create a new
>> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP
>> >> program that doesn't support multi-buffer mode, and if it's unset,
>> >> disable multi-buffer mode in all drivers. This will make it explicit
>> >> when the multi-buffer mode is used, and prevent any accidental subtle
>> >> malfunction of existing XDP programs. The drawback is that it's a
>> >> mode switch, so more configuration complexity.
>> >
>> > 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB
>> > thru tail calls.
>> >
>> > IMHO that's very simple and covers majority of use cases.
>>
>> Using the program type (or maybe the expected_attach_type) was how I was
>> imagining we'd encode the "I am MB aware" flag, yes. I hadn't actually
>> considered that this could be used to also restrict tail call/freplace
>> attachment, but that's a good point. So this leaves just the redirect
>> issue, then, see my other reply.
>>
>
> I really like this apporoach as well, but before we commit to it, how likely
> are we to encounter this type of situation (where we need to indicate whether
> an XDP program supports a new capability) again in the future. And if we do,
> are we willing to require that all programs supporting that new feature are
> also mb-aware? Essentially, the suboptimal case I'm envisioning is needing to
> have XDP_MB, XD_MB_NEW_THING, XDP_NEW_THING, and XDP all as program types. That
> leads to exponential explosion in program types.

Hmm, that's a good point. Maybe it would be better to communicate it via
a flag; we do have a prog_flags attribute on BPF_PROG_LOAD we could use.

> Also, every time we add a program type to encode a feature (instead of a truly
> new type), we're essentially forcing a recompilation (and redeployment) of all
> existing programs that take advantage of the libbpf section name program
> typing. (I'm sure there are tools that can rename a section in an ELF file
> without recompilation, but recompilation seems the simplest way to correct the
> ELF files for most people.)

Isn't this true regardless of how we encode it? I mean when we add a new
feature that needs an explicit support declaration, programs need to
declare that they support it somehow. No matter how it's actually
encoded, this will either entail recompiling the program, or having the
loader override the value at load-time.

For instance, say we do use a flag instead of a new prog type, how would
a BPF program set that flag? Defining a new section type in libbpf would
be an obvious answer (i.e., SEC("xdp") sets prog_type=xdp, and
SEC("xdp_mb") sets prog_type=xdp, flags=XDP_MB).

> If we think this is a one-off, it's probably fine, but if we think
> it'll happen again, is it worth it to find a solution that will work
> for future cases now, instead of having XDP, XDP_MB, and then having
> to find a solution for future cases?

Well, really the right solution is a general "XDP feature flags"
capability. There was some work done on this, but unfortunately it
stalled out. Not sure it's feasible to resurrect that as a prerequisite
for landing xdp_mb, though, so I guess we'll need a one-off thing here :/

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-23 12:59   ` Toke Høiland-Jørgensen
@ 2021-09-24 10:18     ` Lorenz Bauer
  2021-09-24 17:55       ` Zvi Effron
  2021-09-24 19:38       ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 26+ messages in thread
From: Lorenz Bauer @ 2021-09-24 10:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf

On Thu, 23 Sept 2021 at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> I don't think it has to be quite that bleak :)
>
> Specifically, there is no reason to block mb-aware programs from loading
> even when the multi-buffer mode is disabled. So a migration plan would
> look something like:

...

> 2. Start porting all your XDP programs to make them mb-aware, and switch
>    their program type as you do. In many cases this is just a matter of
>    checking that the programs don't care about packet length. [...]

Porting is only easy if we are guaranteed that the first PAGE_SIZE
bytes (or whatever the current limit is) are available via ->data
without trickery. Otherwise we have to convert all direct packet
access to the new API, whatever that ends up being. It seemed to me
like you were saying there is no such guarantee, and it could be
driver dependent, which is the worst possible outcome imo. This is the
status quo for TC classifiers, which is a great source of hard to
diagnose bugs.

> 3. Once all your programs have been ported and marked as such, flip the
>    sysctl. This will make the system start refusing to load any XDP
>    programs that are not mb-aware.

By this you mean reboot the system and early in boot change the
sysctl? That could work I guess.

> > 2. Add a compatibility shim for mb-unaware programs receiving an mb frame.
> >
> > We'd still need a way to indicate "MB-OK", but it could be a piece of
> > metadata on a bpf_prog. Whatever code dispatches to an XDP program
> > would have to include a prologue that linearises the xdp_buff if
> > necessary which implies allocating memory. I don't know how hard it is
> > to implement this.
>
> I think it would be somewhat non-trivial, and more importantly would
> absolutely slaughter performance. And if you're using XDP, presumably
> you care about that, so I'm not sure we're doing anyone any favours by
> implementing such a compatibility layer?

I see your point: having a single non-mb-aware program trash
performance is bad for marketing. Better to not let people bump the
MTU in that case.

> > 3. Make non-linearity invisible to the BPF program
> >
> > Something I've wished for often is that I didn't have to deal with
> > nonlinearity at all, based on my experience with cls_redirect [2].
> > It's really hard to write a BPF program that handles non-linear skb,
> > especially when you have to call adjust_head, etc. which invalidates
> > packet buffers. This is probably impossible, but maybe someone has a
> > crazy idea? :)
>
> With the other helpers that we started discussing, I don't think you
> have to? I.e., with an xdp_load_bytes() or an xdp_data_pointer()-type
> helper that works across fragment boundaries I think you'd be fine, no?

I'll take a look!

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-24 10:18     ` Lorenz Bauer
@ 2021-09-24 17:55       ` Zvi Effron
  2021-09-24 19:38       ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 26+ messages in thread
From: Zvi Effron @ 2021-09-24 17:55 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, Lorenzo Bianconi,
	Daniel Borkmann, John Fastabend, Networking, bpf

On Fri, Sep 24, 2021 at 3:19 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Thu, 23 Sept 2021 at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > I don't think it has to be quite that bleak :)
> >
> > Specifically, there is no reason to block mb-aware programs from loading
> > even when the multi-buffer mode is disabled. So a migration plan would
> > look something like:
>
> ...
>
> > 2. Start porting all your XDP programs to make them mb-aware, and switch
> > their program type as you do. In many cases this is just a matter of
> > checking that the programs don't care about packet length. [...]
>
> Porting is only easy if we are guaranteed that the first PAGE_SIZE
> bytes (or whatever the current limit is) are available via ->data
> without trickery. Otherwise we have to convert all direct packet
> access to the new API, whatever that ends up being. It seemed to me
> like you were saying there is no such guarantee, and it could be
> driver dependent, which is the worst possible outcome imo. This is the
> status quo for TC classifiers, which is a great source of hard to
> diagnose bugs.
>
> > 3. Once all your programs have been ported and marked as such, flip the
> > sysctl. This will make the system start refusing to load any XDP
> > programs that are not mb-aware.
>
> By this you mean reboot the system and early in boot change the
> sysctl? That could work I guess.
>

I don't think a reboot is required. Just an unload of all
mb-unaware XDP programs to allow adjusting the sysctl via normal sysctl
changing methods (i.e. sysctl -w). Even if rebooting, it should just need an
entry in /etc/sysctl.conf, nothing necessarily early in the boot process. The
sysctl would only need to be set before the first load of an mn-unaware XDP
program, which, if everything is ported correctly, would never happen.

> > > 2. Add a compatibility shim for mb-unaware programs receiving an mb frame.
> > >
> > > We'd still need a way to indicate "MB-OK", but it could be a piece of
> > > metadata on a bpf_prog. Whatever code dispatches to an XDP program
> > > would have to include a prologue that linearises the xdp_buff if
> > > necessary which implies allocating memory. I don't know how hard it is
> > > to implement this.
> >
> > I think it would be somewhat non-trivial, and more importantly would
> > absolutely slaughter performance. And if you're using XDP, presumably
> > you care about that, so I'm not sure we're doing anyone any favours by
> > implementing such a compatibility layer?
>
> I see your point: having a single non-mb-aware program trash
> performance is bad for marketing. Better to not let people bump the
> MTU in that case.
>
> > > 3. Make non-linearity invisible to the BPF program
> > >
> > > Something I've wished for often is that I didn't have to deal with
> > > nonlinearity at all, based on my experience with cls_redirect [2].
> > > It's really hard to write a BPF program that handles non-linear skb,
> > > especially when you have to call adjust_head, etc. which invalidates
> > > packet buffers. This is probably impossible, but maybe someone has a
> > > crazy idea? :)
> >
> > With the other helpers that we started discussing, I don't think you
> > have to? I.e., with an xdp_load_bytes() or an xdp_data_pointer()-type
> > helper that works across fragment boundaries I think you'd be fine, no?
>
> I'll take a look!
>
> Lorenz
>
> --
> Lorenz Bauer | Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-24 10:18     ` Lorenz Bauer
  2021-09-24 17:55       ` Zvi Effron
@ 2021-09-24 19:38       ` Toke Høiland-Jørgensen
  2021-09-28  8:47         ` Lorenz Bauer
  1 sibling, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-24 19:38 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Thu, 23 Sept 2021 at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> I don't think it has to be quite that bleak :)
>>
>> Specifically, there is no reason to block mb-aware programs from loading
>> even when the multi-buffer mode is disabled. So a migration plan would
>> look something like:
>
> ...
>
>> 2. Start porting all your XDP programs to make them mb-aware, and switch
>>    their program type as you do. In many cases this is just a matter of
>>    checking that the programs don't care about packet length. [...]
>
> Porting is only easy if we are guaranteed that the first PAGE_SIZE
> bytes (or whatever the current limit is) are available via ->data
> without trickery. Otherwise we have to convert all direct packet
> access to the new API, whatever that ends up being. It seemed to me
> like you were saying there is no such guarantee, and it could be
> driver dependent, which is the worst possible outcome imo. This is the
> status quo for TC classifiers, which is a great source of hard to
> diagnose bugs.

Well, for the changes we're proposing now it will certainly be the case
that the first PAGE_SIZE will always be present. But once we have the
capability, I would expect people would want to do more with it, so we
can't really guarantee this in the future. We could require that any
other use be opt-in at the driver level, I suppose, but not sure if that
would be enough?

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-23 13:46             ` Jakub Kicinski
@ 2021-09-27 12:43               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-27 12:43 UTC (permalink / raw)
  To: Jakub Kicinski, Toke Høiland-Jørgensen
  Cc: brouer, Alexei Starovoitov, Zvi Effron, Lorenz Bauer,
	Lorenzo Bianconi, Daniel Borkmann, John Fastabend,
	Network Development, bpf


On 23/09/2021 15.46, Jakub Kicinski wrote:
> Let me make a general statement that we can't build large systems
> without division of responsibilities. Device specific logic has to
> be left to the driver. It's up to veth to apply its extra rules.
> 
> As Zvi said, tho, this can be left for later. IMHO initial patches can
> drop all mb frames on redirect in the core, so we can make progress.

I agree that for *initial* patchset for REDIRECT (func calling 
ndo_xdp_xmit) it can drop MB frames on redirect in the core, when the 
drivers ndo_xdp_xmit doesn't support this.
BUT only so we can make progress.

As soon as possible (preferably in same kernel release) a follow-up 
patchset should make a serious attempt at updating all drivers 
ndo_xdp_xmit to support multi-buffer (MB).

As MB use the same/compatible layout as skb_shared_info, it should be a 
fairly small effort to update drivers, for *transmitting* multi-buff. As 
drivers already support this for normal SKBs.

It will be a larger effort to support XDP multi-buff on RX, as often the 
RX-loop need to be adjusted/reordered to "delay" calling XDP-prog until 
all fragments have been "collected".  And MTU check adjusted.

This imply that the driver net_device xdp_features need two feature bits 
for this MB feature.
IMHO it makes sense to detangle RX and TX, as IMHO we should allow MB 
frames to be TX redirect out a driver that have an old non-MB aware 
XDP-prog loaded.

--Jesper


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-24 19:38       ` Toke Høiland-Jørgensen
@ 2021-09-28  8:47         ` Lorenz Bauer
  2021-09-28 13:43           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenz Bauer @ 2021-09-28  8:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf

On Fri, 24 Sept 2021 at 20:38, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Porting is only easy if we are guaranteed that the first PAGE_SIZE
> > bytes (or whatever the current limit is) are available via ->data
> > without trickery. Otherwise we have to convert all direct packet
> > access to the new API, whatever that ends up being. It seemed to me
> > like you were saying there is no such guarantee, and it could be
> > driver dependent, which is the worst possible outcome imo. This is the
> > status quo for TC classifiers, which is a great source of hard to
> > diagnose bugs.
>
> Well, for the changes we're proposing now it will certainly be the case
> that the first PAGE_SIZE will always be present. But once we have the
> capability, I would expect people would want to do more with it, so we
> can't really guarantee this in the future. We could require that any
> other use be opt-in at the driver level, I suppose, but not sure if that
> would be enough?

I'm not sure what you mean by "opt-in at driver level"? Make smaller
initial fragments a feature on the driver? We shouldn't let drivers
dictate the semantics of a program type, it defeats the purpose of the
context abstraction. We're using XDP precisely because we don't want
to deal with vendor specific network stack bypass, etc. I would prefer
not specifying the first fragment size over the driver knob,
unfortunately it invalidates your assumption that porting is going to
be trivial.

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Redux: Backwards compatibility for XDP multi-buff
  2021-09-28  8:47         ` Lorenz Bauer
@ 2021-09-28 13:43           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-28 13:43 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Fri, 24 Sept 2021 at 20:38, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > Porting is only easy if we are guaranteed that the first PAGE_SIZE
>> > bytes (or whatever the current limit is) are available via ->data
>> > without trickery. Otherwise we have to convert all direct packet
>> > access to the new API, whatever that ends up being. It seemed to me
>> > like you were saying there is no such guarantee, and it could be
>> > driver dependent, which is the worst possible outcome imo. This is the
>> > status quo for TC classifiers, which is a great source of hard to
>> > diagnose bugs.
>>
>> Well, for the changes we're proposing now it will certainly be the case
>> that the first PAGE_SIZE will always be present. But once we have the
>> capability, I would expect people would want to do more with it, so we
>> can't really guarantee this in the future. We could require that any
>> other use be opt-in at the driver level, I suppose, but not sure if that
>> would be enough?
>
> I'm not sure what you mean by "opt-in at driver level"? Make smaller
> initial fragments a feature on the driver? We shouldn't let drivers
> dictate the semantics of a program type, it defeats the purpose of the
> context abstraction. We're using XDP precisely because we don't want
> to deal with vendor specific network stack bypass, etc. I would prefer
> not specifying the first fragment size over the driver knob,

I didn't mean that every driver should get to just define their own
semantics; rather the opposite: If we do end up adding support for
anything other than "first page of data is in the first fragment", we
should define separate semantics for this and make it an explicit knob
to turn on such a mode.

> unfortunately it invalidates your assumption that porting is going to
> be trivial.

Well, I did say "for some programs" ;)

But OK, point taken. It would be great if you could chime in on the
helper discussion[0]. I.e., which helper API would make porting simplest
for you?

-Toke

[0] https://lore.kernel.org/r/20210920110216.4c54c9a3@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-09-28 13:43 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen
2021-09-21 17:31 ` Zvi Effron
2021-09-21 18:22   ` Toke Høiland-Jørgensen
2021-09-21 19:17     ` Zvi Effron
2021-09-21 22:14       ` Toke Høiland-Jørgensen
2021-09-21 23:10         ` Zvi Effron
2021-09-22 20:13           ` Toke Høiland-Jørgensen
2021-09-21 20:12     ` Alexei Starovoitov
2021-09-21 22:20       ` Toke Høiland-Jørgensen
2021-09-21 22:51         ` Jakub Kicinski
2021-09-22 20:01           ` Toke Høiland-Jørgensen
2021-09-22 21:23             ` Zvi Effron
2021-09-23 18:45               ` Toke Høiland-Jørgensen
2021-09-23 13:46             ` Jakub Kicinski
2021-09-27 12:43               ` Jesper Dangaard Brouer
2021-09-21 22:54 ` Jakub Kicinski
2021-09-22 20:02   ` Toke Høiland-Jørgensen
2021-09-22 21:11     ` Zvi Effron
2021-09-23 19:00       ` Toke Høiland-Jørgensen
2021-09-23 10:33 ` Lorenz Bauer
2021-09-23 12:59   ` Toke Høiland-Jørgensen
2021-09-24 10:18     ` Lorenz Bauer
2021-09-24 17:55       ` Zvi Effron
2021-09-24 19:38       ` Toke Høiland-Jørgensen
2021-09-28  8:47         ` Lorenz Bauer
2021-09-28 13:43           ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).