* Redux: Backwards compatibility for XDP multi-buff @ 2021-09-21 16:06 Toke Høiland-Jørgensen 2021-09-21 17:31 ` Zvi Effron ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-21 16:06 UTC (permalink / raw) To: Lorenz Bauer Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf Hi Lorenz (Cc. the other people who participated in today's discussion) Following our discussion at the LPC session today, I dug up my previous summary of the issue and some possible solutions[0]. Seems no on actually replied last time, which is why we went with the "do nothing" approach, I suppose. I'm including the full text of the original email below; please take a look, and let's see if we can converge on a consensus here. First off, a problem description: If an existing XDP program is exposed to an xdp_buff that is really a multi-buffer, while it will continue to run, it may end up with subtle and hard-to-debug bugs: If it's parsing the packet it'll only see part of the payload and not be aware of that fact, and if it's calculating the packet length, that will also only be wrong (only counting the first fragment). So what to do about this? First of all, to do anything about it, XDP programs need to be able to declare themselves "multi-buffer aware" (but see point 1 below). We could try to auto-detect it in the verifier by which helpers the program is using, but since existing programs could be perfectly happy to just keep running, it probably needs to be something the program communicates explicitly. One option is to use the expected_attach_type to encode this; programs can then declare it in the source by section name, or the userspace loader can set the type for existing programs if needed. With this, the kernel will know if a given XDP program is multi-buff aware and can decide what to do with that information. For this we came up with basically three options: 1. Do nothing. This would make it up to users / sysadmins to avoid anything breaking by manually making sure to not enable multi-buffer support while loading any XDP programs that will malfunction if presented with an mb frame. This will probably break in interesting ways, but it's nice and simple from an implementation PoV. With this we don't need the declaration discussed above either. 2. Add a check at runtime and drop the frames if they are mb-enabled and the program doesn't understand it. This is relatively simple to implement, but it also makes for difficult-to-understand issues (why are my packets suddenly being dropped?), and it will incur runtime overhead. 3. Reject loading of programs that are not MB-aware when running in an MB-enabled mode. This would make things break in more obvious ways, and still allow a userspace loader to declare a program "MB-aware" to force it to run if necessary. The problem then becomes at what level to block this? Doing this at the driver level is not enough: while a particular driver knows if it's running in multi-buff mode, we can't know for sure if a particular XDP program is multi-buff aware at attach time: it could be tail-calling other programs, or redirecting packets to another interface where it will be processed by a non-MB aware program. So another option is to make it a global toggle: e.g., create a new sysctl to enable multi-buffer. If this is set, reject loading any XDP program that doesn't support multi-buffer mode, and if it's unset, disable multi-buffer mode in all drivers. This will make it explicit when the multi-buffer mode is used, and prevent any accidental subtle malfunction of existing XDP programs. The drawback is that it's a mode switch, so more configuration complexity. None of these options are ideal, of course, but I hope the above explanation at least makes sense. If anyone has any better ideas (or can spot any flaws in the reasoning above) please don't hesitate to let us know! -Toke [0] https://lore.kernel.org/r/8735srxglb.fsf@toke.dk ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen @ 2021-09-21 17:31 ` Zvi Effron 2021-09-21 18:22 ` Toke Høiland-Jørgensen 2021-09-21 22:54 ` Jakub Kicinski 2021-09-23 10:33 ` Lorenz Bauer 2 siblings, 1 reply; 26+ messages in thread From: Zvi Effron @ 2021-09-21 17:31 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Hi Lorenz (Cc. the other people who participated in today's discussion) > > Following our discussion at the LPC session today, I dug up my previous > summary of the issue and some possible solutions[0]. Seems no on > actually replied last time, which is why we went with the "do nothing" > approach, I suppose. I'm including the full text of the original email > below; please take a look, and let's see if we can converge on a > consensus here. > > First off, a problem description: If an existing XDP program is exposed > to an xdp_buff that is really a multi-buffer, while it will continue to > run, it may end up with subtle and hard-to-debug bugs: If it's parsing > the packet it'll only see part of the payload and not be aware of that > fact, and if it's calculating the packet length, that will also only be > wrong (only counting the first fragment). > > So what to do about this? First of all, to do anything about it, XDP > programs need to be able to declare themselves "multi-buffer aware" (but > see point 1 below). We could try to auto-detect it in the verifier by > which helpers the program is using, but since existing programs could be > perfectly happy to just keep running, it probably needs to be something > the program communicates explicitly. One option is to use the > expected_attach_type to encode this; programs can then declare it in the > source by section name, or the userspace loader can set the type for > existing programs if needed. > > With this, the kernel will know if a given XDP program is multi-buff > aware and can decide what to do with that information. For this we came > up with basically three options: > > 1. Do nothing. This would make it up to users / sysadmins to avoid > anything breaking by manually making sure to not enable multi-buffer > support while loading any XDP programs that will malfunction if > presented with an mb frame. This will probably break in interesting > ways, but it's nice and simple from an implementation PoV. With this > we don't need the declaration discussed above either. > > 2. Add a check at runtime and drop the frames if they are mb-enabled and > the program doesn't understand it. This is relatively simple to > implement, but it also makes for difficult-to-understand issues (why > are my packets suddenly being dropped?), and it will incur runtime > overhead. > > 3. Reject loading of programs that are not MB-aware when running in an > MB-enabled mode. This would make things break in more obvious ways, > and still allow a userspace loader to declare a program "MB-aware" to > force it to run if necessary. The problem then becomes at what level > to block this? > I think there's another potential problem with this as well: what happens to already loaded programs that are not MB-aware? Are they forcibly unloaded? > Doing this at the driver level is not enough: while a particular > driver knows if it's running in multi-buff mode, we can't know for > sure if a particular XDP program is multi-buff aware at attach time: > it could be tail-calling other programs, or redirecting packets to > another interface where it will be processed by a non-MB aware > program. > > So another option is to make it a global toggle: e.g., create a new > sysctl to enable multi-buffer. If this is set, reject loading any XDP > program that doesn't support multi-buffer mode, and if it's unset, > disable multi-buffer mode in all drivers. This will make it explicit > when the multi-buffer mode is used, and prevent any accidental subtle > malfunction of existing XDP programs. The drawback is that it's a > mode switch, so more configuration complexity. > Could we combine the last two bits here into a global toggle that doesn't require a sysctl? If any driver is put into multi-buffer mode, then the system switches to requiring all programs be multi-buffer? When the last multi-buffer enabled driver switches out of multi-buffer, remove the system-wide restriction? Regarding my above question, if non-MB-aware XDP programs are not forcibly unloaded, then a global toggle is also insufficient. An existing non-MB-aware XDP program would still beed to be rejected at attach time by the driver. > None of these options are ideal, of course, but I hope the above > explanation at least makes sense. If anyone has any better ideas (or can > spot any flaws in the reasoning above) please don't hesitate to let us > know! > > -Toke > > [0] https://lore.kernel.org/r/8735srxglb.fsf@toke.dk > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 17:31 ` Zvi Effron @ 2021-09-21 18:22 ` Toke Høiland-Jørgensen 2021-09-21 19:17 ` Zvi Effron 2021-09-21 20:12 ` Alexei Starovoitov 0 siblings, 2 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-21 18:22 UTC (permalink / raw) To: Zvi Effron Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf Zvi Effron <zeffron@riotgames.com> writes: > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) >> >> Following our discussion at the LPC session today, I dug up my previous >> summary of the issue and some possible solutions[0]. Seems no on >> actually replied last time, which is why we went with the "do nothing" >> approach, I suppose. I'm including the full text of the original email >> below; please take a look, and let's see if we can converge on a >> consensus here. >> >> First off, a problem description: If an existing XDP program is exposed >> to an xdp_buff that is really a multi-buffer, while it will continue to >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing >> the packet it'll only see part of the payload and not be aware of that >> fact, and if it's calculating the packet length, that will also only be >> wrong (only counting the first fragment). >> >> So what to do about this? First of all, to do anything about it, XDP >> programs need to be able to declare themselves "multi-buffer aware" (but >> see point 1 below). We could try to auto-detect it in the verifier by >> which helpers the program is using, but since existing programs could be >> perfectly happy to just keep running, it probably needs to be something >> the program communicates explicitly. One option is to use the >> expected_attach_type to encode this; programs can then declare it in the >> source by section name, or the userspace loader can set the type for >> existing programs if needed. >> >> With this, the kernel will know if a given XDP program is multi-buff >> aware and can decide what to do with that information. For this we came >> up with basically three options: >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid >> anything breaking by manually making sure to not enable multi-buffer >> support while loading any XDP programs that will malfunction if >> presented with an mb frame. This will probably break in interesting >> ways, but it's nice and simple from an implementation PoV. With this >> we don't need the declaration discussed above either. >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and >> the program doesn't understand it. This is relatively simple to >> implement, but it also makes for difficult-to-understand issues (why >> are my packets suddenly being dropped?), and it will incur runtime >> overhead. >> >> 3. Reject loading of programs that are not MB-aware when running in an >> MB-enabled mode. This would make things break in more obvious ways, >> and still allow a userspace loader to declare a program "MB-aware" to >> force it to run if necessary. The problem then becomes at what level >> to block this? >> > > I think there's another potential problem with this as well: what happens to > already loaded programs that are not MB-aware? Are they forcibly unloaded? I'd say probably the opposite: You can't toggle whatever switch we end up with if there are any non-MB-aware programs (you'd have to unload them first)... >> Doing this at the driver level is not enough: while a particular >> driver knows if it's running in multi-buff mode, we can't know for >> sure if a particular XDP program is multi-buff aware at attach time: >> it could be tail-calling other programs, or redirecting packets to >> another interface where it will be processed by a non-MB aware >> program. >> >> So another option is to make it a global toggle: e.g., create a new >> sysctl to enable multi-buffer. If this is set, reject loading any XDP >> program that doesn't support multi-buffer mode, and if it's unset, >> disable multi-buffer mode in all drivers. This will make it explicit >> when the multi-buffer mode is used, and prevent any accidental subtle >> malfunction of existing XDP programs. The drawback is that it's a >> mode switch, so more configuration complexity. >> > > Could we combine the last two bits here into a global toggle that doesn't > require a sysctl? If any driver is put into multi-buffer mode, then the system > switches to requiring all programs be multi-buffer? When the last multi-buffer > enabled driver switches out of multi-buffer, remove the system-wide > restriction? Well, the trouble here is that we don't necessarily have an explicit "multi-buf mode" for devices. For instance, you could raise the MTU of a device without it necessarily involving any XDP multi-buffer stuff (if you're not running XDP on that device). So if we did turn "raising the MTU" into such a mode switch, we would end up blocking any MTU changes if any XDP programs are loaded. Or having an MTU change cause a force-unload of all XDP programs. Neither of those are desirable outcomes, I think; and if we add a separate "XDP multi-buff" switch, we might as well make it system-wide? > Regarding my above question, if non-MB-aware XDP programs are not forcibly > unloaded, then a global toggle is also insufficient. An existing non-MB-aware > XDP program would still beed to be rejected at attach time by the > driver. See above. -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 18:22 ` Toke Høiland-Jørgensen @ 2021-09-21 19:17 ` Zvi Effron 2021-09-21 22:14 ` Toke Høiland-Jørgensen 2021-09-21 20:12 ` Alexei Starovoitov 1 sibling, 1 reply; 26+ messages in thread From: Zvi Effron @ 2021-09-21 19:17 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Zvi Effron <zeffron@riotgames.com> writes: > > > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Hi Lorenz (Cc. the other people who participated in today's discussion) > >> > >> Following our discussion at the LPC session today, I dug up my previous > >> summary of the issue and some possible solutions[0]. Seems no on > >> actually replied last time, which is why we went with the "do nothing" > >> approach, I suppose. I'm including the full text of the original email > >> below; please take a look, and let's see if we can converge on a > >> consensus here. > >> > >> First off, a problem description: If an existing XDP program is exposed > >> to an xdp_buff that is really a multi-buffer, while it will continue to > >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing > >> the packet it'll only see part of the payload and not be aware of that > >> fact, and if it's calculating the packet length, that will also only be > >> wrong (only counting the first fragment). > >> > >> So what to do about this? First of all, to do anything about it, XDP > >> programs need to be able to declare themselves "multi-buffer aware" (but > >> see point 1 below). We could try to auto-detect it in the verifier by > >> which helpers the program is using, but since existing programs could be > >> perfectly happy to just keep running, it probably needs to be something > >> the program communicates explicitly. One option is to use the > >> expected_attach_type to encode this; programs can then declare it in the > >> source by section name, or the userspace loader can set the type for > >> existing programs if needed. > >> > >> With this, the kernel will know if a given XDP program is multi-buff > >> aware and can decide what to do with that information. For this we came > >> up with basically three options: > >> > >> 1. Do nothing. This would make it up to users / sysadmins to avoid > >> anything breaking by manually making sure to not enable multi-buffer > >> support while loading any XDP programs that will malfunction if > >> presented with an mb frame. This will probably break in interesting > >> ways, but it's nice and simple from an implementation PoV. With this > >> we don't need the declaration discussed above either. > >> > >> 2. Add a check at runtime and drop the frames if they are mb-enabled and > >> the program doesn't understand it. This is relatively simple to > >> implement, but it also makes for difficult-to-understand issues (why > >> are my packets suddenly being dropped?), and it will incur runtime > >> overhead. > >> > >> 3. Reject loading of programs that are not MB-aware when running in an > >> MB-enabled mode. This would make things break in more obvious ways, > >> and still allow a userspace loader to declare a program "MB-aware" to > >> force it to run if necessary. The problem then becomes at what level > >> to block this? > >> > > > > I think there's another potential problem with this as well: what happens to > > already loaded programs that are not MB-aware? Are they forcibly unloaded? > > I'd say probably the opposite: You can't toggle whatever switch we end > up with if there are any non-MB-aware programs (you'd have to unload > them first)... > How would we communicate that issue? dmesg? I'm not very familiar with how sysctl change failure causes are communicated to users, so this might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer 1` (or whatever ends up actually being the toggle) to active multi-buffer, and it fails because there's a loaded non-aware program, that seems like a potential for a lot of administrator pain. > >> Doing this at the driver level is not enough: while a particular > >> driver knows if it's running in multi-buff mode, we can't know for > >> sure if a particular XDP program is multi-buff aware at attach time: > >> it could be tail-calling other programs, or redirecting packets to > >> another interface where it will be processed by a non-MB aware > >> program. > >> > >> So another option is to make it a global toggle: e.g., create a new > >> sysctl to enable multi-buffer. If this is set, reject loading any XDP > >> program that doesn't support multi-buffer mode, and if it's unset, > >> disable multi-buffer mode in all drivers. This will make it explicit > >> when the multi-buffer mode is used, and prevent any accidental subtle > >> malfunction of existing XDP programs. The drawback is that it's a > >> mode switch, so more configuration complexity. > >> > > > > Could we combine the last two bits here into a global toggle that doesn't > > require a sysctl? If any driver is put into multi-buffer mode, then the system > > switches to requiring all programs be multi-buffer? When the last multi-buffer > > enabled driver switches out of multi-buffer, remove the system-wide > > restriction? > > Well, the trouble here is that we don't necessarily have an explicit > "multi-buf mode" for devices. For instance, you could raise the MTU of a > device without it necessarily involving any XDP multi-buffer stuff (if > you're not running XDP on that device). So if we did turn "raising the > MTU" into such a mode switch, we would end up blocking any MTU changes > if any XDP programs are loaded. Or having an MTU change cause a > force-unload of all XDP programs. Maybe I missed something then, but you had stated that "while a particular driver knows if it's running in multi-buff mode" so I assumed that the driver would be able to tell when to toggle the mode on. I had been thinking that when a driver turned multi-buffer off, it could trigger a check of all drivers, but that also seems like it could just be a global refcount of all the drivers that have requested multi-buffer mode. When a driver enables multi-buffer for itself, it increments the refcount, and when it disables, it decrements. A non-zero count means the system is in multi-buffer mode. Obviously this is more complex than just requiring the administrator to enable the system-wide mode. > > Neither of those are desirable outcomes, I think; and if we add a > separate "XDP multi-buff" switch, we might as well make it system-wide? > > > Regarding my above question, if non-MB-aware XDP programs are not forcibly > > unloaded, then a global toggle is also insufficient. An existing non-MB-aware > > XDP program would still beed to be rejected at attach time by the > > driver. > > See above. > > -Toke > --Zvi ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 19:17 ` Zvi Effron @ 2021-09-21 22:14 ` Toke Høiland-Jørgensen 2021-09-21 23:10 ` Zvi Effron 0 siblings, 1 reply; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-21 22:14 UTC (permalink / raw) To: Zvi Effron Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf Zvi Effron <zeffron@riotgames.com> writes: > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Zvi Effron <zeffron@riotgames.com> writes: >> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) >> >> >> >> Following our discussion at the LPC session today, I dug up my previous >> >> summary of the issue and some possible solutions[0]. Seems no on >> >> actually replied last time, which is why we went with the "do nothing" >> >> approach, I suppose. I'm including the full text of the original email >> >> below; please take a look, and let's see if we can converge on a >> >> consensus here. >> >> >> >> First off, a problem description: If an existing XDP program is exposed >> >> to an xdp_buff that is really a multi-buffer, while it will continue to >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing >> >> the packet it'll only see part of the payload and not be aware of that >> >> fact, and if it's calculating the packet length, that will also only be >> >> wrong (only counting the first fragment). >> >> >> >> So what to do about this? First of all, to do anything about it, XDP >> >> programs need to be able to declare themselves "multi-buffer aware" (but >> >> see point 1 below). We could try to auto-detect it in the verifier by >> >> which helpers the program is using, but since existing programs could be >> >> perfectly happy to just keep running, it probably needs to be something >> >> the program communicates explicitly. One option is to use the >> >> expected_attach_type to encode this; programs can then declare it in the >> >> source by section name, or the userspace loader can set the type for >> >> existing programs if needed. >> >> >> >> With this, the kernel will know if a given XDP program is multi-buff >> >> aware and can decide what to do with that information. For this we came >> >> up with basically three options: >> >> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid >> >> anything breaking by manually making sure to not enable multi-buffer >> >> support while loading any XDP programs that will malfunction if >> >> presented with an mb frame. This will probably break in interesting >> >> ways, but it's nice and simple from an implementation PoV. With this >> >> we don't need the declaration discussed above either. >> >> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and >> >> the program doesn't understand it. This is relatively simple to >> >> implement, but it also makes for difficult-to-understand issues (why >> >> are my packets suddenly being dropped?), and it will incur runtime >> >> overhead. >> >> >> >> 3. Reject loading of programs that are not MB-aware when running in an >> >> MB-enabled mode. This would make things break in more obvious ways, >> >> and still allow a userspace loader to declare a program "MB-aware" to >> >> force it to run if necessary. The problem then becomes at what level >> >> to block this? >> >> >> > >> > I think there's another potential problem with this as well: what happens to >> > already loaded programs that are not MB-aware? Are they forcibly unloaded? >> >> I'd say probably the opposite: You can't toggle whatever switch we end >> up with if there are any non-MB-aware programs (you'd have to unload >> them first)... >> > > How would we communicate that issue? dmesg? I'm not very familiar with > how sysctl change failure causes are communicated to users, so this > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer > 1` (or whatever ends up actually being the toggle) to active > multi-buffer, and it fails because there's a loaded non-aware program, > that seems like a potential for a lot of administrator pain. Hmm, good question. Document that this only fails if there's a non-mb-aware XDP program loaded? Or use some other mechanism with better feedback? >> >> Doing this at the driver level is not enough: while a particular >> >> driver knows if it's running in multi-buff mode, we can't know for >> >> sure if a particular XDP program is multi-buff aware at attach time: >> >> it could be tail-calling other programs, or redirecting packets to >> >> another interface where it will be processed by a non-MB aware >> >> program. >> >> >> >> So another option is to make it a global toggle: e.g., create a new >> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP >> >> program that doesn't support multi-buffer mode, and if it's unset, >> >> disable multi-buffer mode in all drivers. This will make it explicit >> >> when the multi-buffer mode is used, and prevent any accidental subtle >> >> malfunction of existing XDP programs. The drawback is that it's a >> >> mode switch, so more configuration complexity. >> >> >> > >> > Could we combine the last two bits here into a global toggle that doesn't >> > require a sysctl? If any driver is put into multi-buffer mode, then the system >> > switches to requiring all programs be multi-buffer? When the last multi-buffer >> > enabled driver switches out of multi-buffer, remove the system-wide >> > restriction? >> >> Well, the trouble here is that we don't necessarily have an explicit >> "multi-buf mode" for devices. For instance, you could raise the MTU of a >> device without it necessarily involving any XDP multi-buffer stuff (if >> you're not running XDP on that device). So if we did turn "raising the >> MTU" into such a mode switch, we would end up blocking any MTU changes >> if any XDP programs are loaded. Or having an MTU change cause a >> force-unload of all XDP programs. > > Maybe I missed something then, but you had stated that "while a > particular driver knows if it's running in multi-buff mode" so I > assumed that the driver would be able to tell when to toggle the mode > on. Well, a driver knows when it is attaching an XDP program whether it (the driver) is configured in a way such that this XDP program could encounter a multi-buf. > I had been thinking that when a driver turned multi-buffer off, it > could trigger a check of all drivers, but that also seems like it > could just be a global refcount of all the drivers that have requested > multi-buffer mode. When a driver enables multi-buffer for itself, it > increments the refcount, and when it disables, it decrements. A > non-zero count means the system is in multi-buffer mode. I guess we could do a refcount-type thing when an multi-buf XDP program is first attached (as per above). But I think it may be easier to just do it at load-time, then, so it doesn't have to be in the driver, but the BPF core could just enforce it. This would basically amount to a rule saying "you can't mix mb-aware and non-mb-aware programs", and the first type to be loaded determines which mode the system is in. This would be fairly simple to implement and enforce, I suppose. The drawback is that it's potentially racy in the order programs are loaded... -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 22:14 ` Toke Høiland-Jørgensen @ 2021-09-21 23:10 ` Zvi Effron 2021-09-22 20:13 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 26+ messages in thread From: Zvi Effron @ 2021-09-21 23:10 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Zvi Effron <zeffron@riotgames.com> writes: > > > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Zvi Effron <zeffron@riotgames.com> writes: > >> > >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> >> > >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) > >> >> > >> >> Following our discussion at the LPC session today, I dug up my previous > >> >> summary of the issue and some possible solutions[0]. Seems no on > >> >> actually replied last time, which is why we went with the "do nothing" > >> >> approach, I suppose. I'm including the full text of the original email > >> >> below; please take a look, and let's see if we can converge on a > >> >> consensus here. > >> >> > >> >> First off, a problem description: If an existing XDP program is exposed > >> >> to an xdp_buff that is really a multi-buffer, while it will continue to > >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing > >> >> the packet it'll only see part of the payload and not be aware of that > >> >> fact, and if it's calculating the packet length, that will also only be > >> >> wrong (only counting the first fragment). > >> >> > >> >> So what to do about this? First of all, to do anything about it, XDP > >> >> programs need to be able to declare themselves "multi-buffer aware" (but > >> >> see point 1 below). We could try to auto-detect it in the verifier by > >> >> which helpers the program is using, but since existing programs could be > >> >> perfectly happy to just keep running, it probably needs to be something > >> >> the program communicates explicitly. One option is to use the > >> >> expected_attach_type to encode this; programs can then declare it in the > >> >> source by section name, or the userspace loader can set the type for > >> >> existing programs if needed. > >> >> > >> >> With this, the kernel will know if a given XDP program is multi-buff > >> >> aware and can decide what to do with that information. For this we came > >> >> up with basically three options: > >> >> > >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid > >> >> anything breaking by manually making sure to not enable multi-buffer > >> >> support while loading any XDP programs that will malfunction if > >> >> presented with an mb frame. This will probably break in interesting > >> >> ways, but it's nice and simple from an implementation PoV. With this > >> >> we don't need the declaration discussed above either. > >> >> > >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and > >> >> the program doesn't understand it. This is relatively simple to > >> >> implement, but it also makes for difficult-to-understand issues (why > >> >> are my packets suddenly being dropped?), and it will incur runtime > >> >> overhead. > >> >> > >> >> 3. Reject loading of programs that are not MB-aware when running in an > >> >> MB-enabled mode. This would make things break in more obvious ways, > >> >> and still allow a userspace loader to declare a program "MB-aware" to > >> >> force it to run if necessary. The problem then becomes at what level > >> >> to block this? > >> >> > >> > > >> > I think there's another potential problem with this as well: what happens to > >> > already loaded programs that are not MB-aware? Are they forcibly unloaded? > >> > >> I'd say probably the opposite: You can't toggle whatever switch we end > >> up with if there are any non-MB-aware programs (you'd have to unload > >> them first)... > >> > > > > How would we communicate that issue? dmesg? I'm not very familiar with > > how sysctl change failure causes are communicated to users, so this > > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer > > 1` (or whatever ends up actually being the toggle) to active > > multi-buffer, and it fails because there's a loaded non-aware program, > > that seems like a potential for a lot of administrator pain. > > Hmm, good question. Document that this only fails if there's a > non-mb-aware XDP program loaded? Or use some other mechanism with better > feedback? > > >> >> Doing this at the driver level is not enough: while a particular > >> >> driver knows if it's running in multi-buff mode, we can't know for > >> >> sure if a particular XDP program is multi-buff aware at attach time: > >> >> it could be tail-calling other programs, or redirecting packets to > >> >> another interface where it will be processed by a non-MB aware > >> >> program. > >> >> > >> >> So another option is to make it a global toggle: e.g., create a new > >> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP > >> >> program that doesn't support multi-buffer mode, and if it's unset, > >> >> disable multi-buffer mode in all drivers. This will make it explicit > >> >> when the multi-buffer mode is used, and prevent any accidental subtle > >> >> malfunction of existing XDP programs. The drawback is that it's a > >> >> mode switch, so more configuration complexity. > >> >> > >> > > >> > Could we combine the last two bits here into a global toggle that doesn't > >> > require a sysctl? If any driver is put into multi-buffer mode, then the system > >> > switches to requiring all programs be multi-buffer? When the last multi-buffer > >> > enabled driver switches out of multi-buffer, remove the system-wide > >> > restriction? > >> > >> Well, the trouble here is that we don't necessarily have an explicit > >> "multi-buf mode" for devices. For instance, you could raise the MTU of a > >> device without it necessarily involving any XDP multi-buffer stuff (if > >> you're not running XDP on that device). So if we did turn "raising the > >> MTU" into such a mode switch, we would end up blocking any MTU changes > >> if any XDP programs are loaded. Or having an MTU change cause a > >> force-unload of all XDP programs. > > > > Maybe I missed something then, but you had stated that "while a > > particular driver knows if it's running in multi-buff mode" so I > > assumed that the driver would be able to tell when to toggle the mode > > on. > > Well, a driver knows when it is attaching an XDP program whether it (the > driver) is configured in a way such that this XDP program could > encounter a multi-buf. I know drivers sometimes reconfigure themselves when an XDP program is attached, but is there any information provided by the attach (other than that an XDP program is attaching) that they use to make configuration decisions during that reconfiguration? Without modifying the driver to intentionally configure itself differently based on whether or not the program is mb-aware (which is believe is currently not the case for any driver), won't the configuration of a driver be identical post XDP attach regardless of whether or not the program is mb-aware or not? I was thinking the driver would make it's mb-aware determination (and refcount adjustments) when its configuration changes for any reason that could potentially affect mb-aware status (mostly MTU adjustments, I suspect). > > > I had been thinking that when a driver turned multi-buffer off, it > > could trigger a check of all drivers, but that also seems like it > > could just be a global refcount of all the drivers that have requested > > multi-buffer mode. When a driver enables multi-buffer for itself, it > > increments the refcount, and when it disables, it decrements. A > > non-zero count means the system is in multi-buffer mode. > > I guess we could do a refcount-type thing when an multi-buf XDP program > is first attached (as per above). But I think it may be easier to just > do it at load-time, then, so it doesn't have to be in the driver, but > the BPF core could just enforce it. > > This would basically amount to a rule saying "you can't mix mb-aware and > non-mb-aware programs", and the first type to be loaded determines which > mode the system is in. This would be fairly simple to implement and > enforce, I suppose. The drawback is that it's potentially racy in the > order programs are loaded... > Accepting or rejecting at load time would definitely simplify things a bit. But I think the raciness is worse than just based on the first program to load. If we're doing refcounting at attach/detach time, then I can load an mb-aware and an mb-unaware program before attaching anything. What do I do when I attach one of them? The other would be in violation. If instead of making the determination at attach time, we make it at load time, I think it'd be better to go back to the sysctl controlling it, and simply not allow changing the sysctl if any XDP program at all is loaded, as opposed to if a non-aware program is installed. Then we're back to the sysctl controlling whether or not mb-aware is required. We stil have a communication to the administrator problem, but it's simplified a bit from "some loaded program doesn't comply" and having to track down which one to "there is an XDP program installed". > -Toke > Side note: how do extension programs fit into this? An extension program that's going to freplace a function in an XDP program (that receives the context) would also need to mb-aware or not, but not all extension programs can attach to such functions, and we wouldn't want those programs to be impacted. Is this as simple as marking every XDP program and every extension program that takes an XDP context parameter as needing to be marked as mb-aware? --Zvi On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Zvi Effron <zeffron@riotgames.com> writes: > > > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Zvi Effron <zeffron@riotgames.com> writes: > >> > >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> >> > >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) > >> >> > >> >> Following our discussion at the LPC session today, I dug up my previous > >> >> summary of the issue and some possible solutions[0]. Seems no on > >> >> actually replied last time, which is why we went with the "do nothing" > >> >> approach, I suppose. I'm including the full text of the original email > >> >> below; please take a look, and let's see if we can converge on a > >> >> consensus here. > >> >> > >> >> First off, a problem description: If an existing XDP program is exposed > >> >> to an xdp_buff that is really a multi-buffer, while it will continue to > >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing > >> >> the packet it'll only see part of the payload and not be aware of that > >> >> fact, and if it's calculating the packet length, that will also only be > >> >> wrong (only counting the first fragment). > >> >> > >> >> So what to do about this? First of all, to do anything about it, XDP > >> >> programs need to be able to declare themselves "multi-buffer aware" (but > >> >> see point 1 below). We could try to auto-detect it in the verifier by > >> >> which helpers the program is using, but since existing programs could be > >> >> perfectly happy to just keep running, it probably needs to be something > >> >> the program communicates explicitly. One option is to use the > >> >> expected_attach_type to encode this; programs can then declare it in the > >> >> source by section name, or the userspace loader can set the type for > >> >> existing programs if needed. > >> >> > >> >> With this, the kernel will know if a given XDP program is multi-buff > >> >> aware and can decide what to do with that information. For this we came > >> >> up with basically three options: > >> >> > >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid > >> >> anything breaking by manually making sure to not enable multi-buffer > >> >> support while loading any XDP programs that will malfunction if > >> >> presented with an mb frame. This will probably break in interesting > >> >> ways, but it's nice and simple from an implementation PoV. With this > >> >> we don't need the declaration discussed above either. > >> >> > >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and > >> >> the program doesn't understand it. This is relatively simple to > >> >> implement, but it also makes for difficult-to-understand issues (why > >> >> are my packets suddenly being dropped?), and it will incur runtime > >> >> overhead. > >> >> > >> >> 3. Reject loading of programs that are not MB-aware when running in an > >> >> MB-enabled mode. This would make things break in more obvious ways, > >> >> and still allow a userspace loader to declare a program "MB-aware" to > >> >> force it to run if necessary. The problem then becomes at what level > >> >> to block this? > >> >> > >> > > >> > I think there's another potential problem with this as well: what happens to > >> > already loaded programs that are not MB-aware? Are they forcibly unloaded? > >> > >> I'd say probably the opposite: You can't toggle whatever switch we end > >> up with if there are any non-MB-aware programs (you'd have to unload > >> them first)... > >> > > > > How would we communicate that issue? dmesg? I'm not very familiar with > > how sysctl change failure causes are communicated to users, so this > > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer > > 1` (or whatever ends up actually being the toggle) to active > > multi-buffer, and it fails because there's a loaded non-aware program, > > that seems like a potential for a lot of administrator pain. > > Hmm, good question. Document that this only fails if there's a > non-mb-aware XDP program loaded? Or use some other mechanism with better > feedback? > > >> >> Doing this at the driver level is not enough: while a particular > >> >> driver knows if it's running in multi-buff mode, we can't know for > >> >> sure if a particular XDP program is multi-buff aware at attach time: > >> >> it could be tail-calling other programs, or redirecting packets to > >> >> another interface where it will be processed by a non-MB aware > >> >> program. > >> >> > >> >> So another option is to make it a global toggle: e.g., create a new > >> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP > >> >> program that doesn't support multi-buffer mode, and if it's unset, > >> >> disable multi-buffer mode in all drivers. This will make it explicit > >> >> when the multi-buffer mode is used, and prevent any accidental subtle > >> >> malfunction of existing XDP programs. The drawback is that it's a > >> >> mode switch, so more configuration complexity. > >> >> > >> > > >> > Could we combine the last two bits here into a global toggle that doesn't > >> > require a sysctl? If any driver is put into multi-buffer mode, then the system > >> > switches to requiring all programs be multi-buffer? When the last multi-buffer > >> > enabled driver switches out of multi-buffer, remove the system-wide > >> > restriction? > >> > >> Well, the trouble here is that we don't necessarily have an explicit > >> "multi-buf mode" for devices. For instance, you could raise the MTU of a > >> device without it necessarily involving any XDP multi-buffer stuff (if > >> you're not running XDP on that device). So if we did turn "raising the > >> MTU" into such a mode switch, we would end up blocking any MTU changes > >> if any XDP programs are loaded. Or having an MTU change cause a > >> force-unload of all XDP programs. > > > > Maybe I missed something then, but you had stated that "while a > > particular driver knows if it's running in multi-buff mode" so I > > assumed that the driver would be able to tell when to toggle the mode > > on. > > Well, a driver knows when it is attaching an XDP program whether it (the > driver) is configured in a way such that this XDP program could > encounter a multi-buf. > > > I had been thinking that when a driver turned multi-buffer off, it > > could trigger a check of all drivers, but that also seems like it > > could just be a global refcount of all the drivers that have requested > > multi-buffer mode. When a driver enables multi-buffer for itself, it > > increments the refcount, and when it disables, it decrements. A > > non-zero count means the system is in multi-buffer mode. > > I guess we could do a refcount-type thing when an multi-buf XDP program > is first attached (as per above). But I think it may be easier to just > do it at load-time, then, so it doesn't have to be in the driver, but > the BPF core could just enforce it. > > This would basically amount to a rule saying "you can't mix mb-aware and > non-mb-aware programs", and the first type to be loaded determines which > mode the system is in. This would be fairly simple to implement and > enforce, I suppose. The drawback is that it's potentially racy in the > order programs are loaded... > > -Toke > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 23:10 ` Zvi Effron @ 2021-09-22 20:13 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-22 20:13 UTC (permalink / raw) To: Zvi Effron Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf Zvi Effron <zeffron@riotgames.com> writes: > On Tue, Sep 21, 2021 at 3:14 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Zvi Effron <zeffron@riotgames.com> writes: >> >> > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> >> >> Zvi Effron <zeffron@riotgames.com> writes: >> >> >> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> >> >> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) >> >> >> >> >> >> Following our discussion at the LPC session today, I dug up my previous >> >> >> summary of the issue and some possible solutions[0]. Seems no on >> >> >> actually replied last time, which is why we went with the "do nothing" >> >> >> approach, I suppose. I'm including the full text of the original email >> >> >> below; please take a look, and let's see if we can converge on a >> >> >> consensus here. >> >> >> >> >> >> First off, a problem description: If an existing XDP program is exposed >> >> >> to an xdp_buff that is really a multi-buffer, while it will continue to >> >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing >> >> >> the packet it'll only see part of the payload and not be aware of that >> >> >> fact, and if it's calculating the packet length, that will also only be >> >> >> wrong (only counting the first fragment). >> >> >> >> >> >> So what to do about this? First of all, to do anything about it, XDP >> >> >> programs need to be able to declare themselves "multi-buffer aware" (but >> >> >> see point 1 below). We could try to auto-detect it in the verifier by >> >> >> which helpers the program is using, but since existing programs could be >> >> >> perfectly happy to just keep running, it probably needs to be something >> >> >> the program communicates explicitly. One option is to use the >> >> >> expected_attach_type to encode this; programs can then declare it in the >> >> >> source by section name, or the userspace loader can set the type for >> >> >> existing programs if needed. >> >> >> >> >> >> With this, the kernel will know if a given XDP program is multi-buff >> >> >> aware and can decide what to do with that information. For this we came >> >> >> up with basically three options: >> >> >> >> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid >> >> >> anything breaking by manually making sure to not enable multi-buffer >> >> >> support while loading any XDP programs that will malfunction if >> >> >> presented with an mb frame. This will probably break in interesting >> >> >> ways, but it's nice and simple from an implementation PoV. With this >> >> >> we don't need the declaration discussed above either. >> >> >> >> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and >> >> >> the program doesn't understand it. This is relatively simple to >> >> >> implement, but it also makes for difficult-to-understand issues (why >> >> >> are my packets suddenly being dropped?), and it will incur runtime >> >> >> overhead. >> >> >> >> >> >> 3. Reject loading of programs that are not MB-aware when running in an >> >> >> MB-enabled mode. This would make things break in more obvious ways, >> >> >> and still allow a userspace loader to declare a program "MB-aware" to >> >> >> force it to run if necessary. The problem then becomes at what level >> >> >> to block this? >> >> >> >> >> > >> >> > I think there's another potential problem with this as well: what happens to >> >> > already loaded programs that are not MB-aware? Are they forcibly unloaded? >> >> >> >> I'd say probably the opposite: You can't toggle whatever switch we end >> >> up with if there are any non-MB-aware programs (you'd have to unload >> >> them first)... >> >> >> > >> > How would we communicate that issue? dmesg? I'm not very familiar with >> > how sysctl change failure causes are communicated to users, so this >> > might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer >> > 1` (or whatever ends up actually being the toggle) to active >> > multi-buffer, and it fails because there's a loaded non-aware program, >> > that seems like a potential for a lot of administrator pain. >> >> Hmm, good question. Document that this only fails if there's a >> non-mb-aware XDP program loaded? Or use some other mechanism with better >> feedback? >> >> >> >> Doing this at the driver level is not enough: while a particular >> >> >> driver knows if it's running in multi-buff mode, we can't know for >> >> >> sure if a particular XDP program is multi-buff aware at attach time: >> >> >> it could be tail-calling other programs, or redirecting packets to >> >> >> another interface where it will be processed by a non-MB aware >> >> >> program. >> >> >> >> >> >> So another option is to make it a global toggle: e.g., create a new >> >> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP >> >> >> program that doesn't support multi-buffer mode, and if it's unset, >> >> >> disable multi-buffer mode in all drivers. This will make it explicit >> >> >> when the multi-buffer mode is used, and prevent any accidental subtle >> >> >> malfunction of existing XDP programs. The drawback is that it's a >> >> >> mode switch, so more configuration complexity. >> >> >> >> >> > >> >> > Could we combine the last two bits here into a global toggle that doesn't >> >> > require a sysctl? If any driver is put into multi-buffer mode, then the system >> >> > switches to requiring all programs be multi-buffer? When the last multi-buffer >> >> > enabled driver switches out of multi-buffer, remove the system-wide >> >> > restriction? >> >> >> >> Well, the trouble here is that we don't necessarily have an explicit >> >> "multi-buf mode" for devices. For instance, you could raise the MTU of a >> >> device without it necessarily involving any XDP multi-buffer stuff (if >> >> you're not running XDP on that device). So if we did turn "raising the >> >> MTU" into such a mode switch, we would end up blocking any MTU changes >> >> if any XDP programs are loaded. Or having an MTU change cause a >> >> force-unload of all XDP programs. >> > >> > Maybe I missed something then, but you had stated that "while a >> > particular driver knows if it's running in multi-buff mode" so I >> > assumed that the driver would be able to tell when to toggle the mode >> > on. >> >> Well, a driver knows when it is attaching an XDP program whether it (the >> driver) is configured in a way such that this XDP program could >> encounter a multi-buf. > > I know drivers sometimes reconfigure themselves when an XDP program is > attached, but is there any information provided by the attach (other than that > an XDP program is attaching) that they use to make configuration decisions > during that reconfiguration? > > Without modifying the driver to intentionally configure itself differently > based on whether or not the program is mb-aware (which is believe is currently > not the case for any driver), won't the configuration of a driver be identical > post XDP attach regardless of whether or not the program is mb-aware or not? > > I was thinking the driver would make it's mb-aware determination (and refcount > adjustments) when its configuration changes for any reason that could > potentially affect mb-aware status (mostly MTU adjustments, I suspect). > >> >> > I had been thinking that when a driver turned multi-buffer off, it >> > could trigger a check of all drivers, but that also seems like it >> > could just be a global refcount of all the drivers that have requested >> > multi-buffer mode. When a driver enables multi-buffer for itself, it >> > increments the refcount, and when it disables, it decrements. A >> > non-zero count means the system is in multi-buffer mode. >> >> I guess we could do a refcount-type thing when an multi-buf XDP program >> is first attached (as per above). But I think it may be easier to just >> do it at load-time, then, so it doesn't have to be in the driver, but >> the BPF core could just enforce it. >> >> This would basically amount to a rule saying "you can't mix mb-aware and >> non-mb-aware programs", and the first type to be loaded determines which >> mode the system is in. This would be fairly simple to implement and >> enforce, I suppose. The drawback is that it's potentially racy in the >> order programs are loaded... >> > > Accepting or rejecting at load time would definitely simplify things a bit. But > I think the raciness is worse than just based on the first program to load. If > we're doing refcounting at attach/detach time, then I can load an mb-aware and > an mb-unaware program before attaching anything. What do I do when I attach one > of them? The other would be in violation. > > If instead of making the determination at attach time, we make it at load time, > I think it'd be better to go back to the sysctl controlling it, and simply not > allow changing the sysctl if any XDP program at all is loaded, as opposed to > if a non-aware program is installed. > > Then we're back to the sysctl controlling whether or not mb-aware is required. > We stil have a communication to the administrator problem, but it's simplified > a bit from "some loaded program doesn't comply" and having to track down which > one to "there is an XDP program installed". Right, that does simplify things. But if we encode the "mb-aware" property in the program type (or sub-type, AKA expected_attach_type) discovering which program is blocking the toggle should be fairly simple, no? >> -Toke >> > > Side note: how do extension programs fit into this? An extension program that's > going to freplace a function in an XDP program (that receives the context) > would also need to mb-aware or not, but not all extension programs can attach > to such functions, and we wouldn't want those programs to be impacted. Is this > as simple as marking every XDP program and every extension program that takes > an XDP context parameter as needing to be marked as mb-aware? Hmm, that's also a good question. I was mentally assuming that freplace programs could be limited by target type, which would mean that just encoding the "mb-aware" property in attach type would be enough. But I see that this is not, in fact, the case. Right now expected_attach_type is unpused for freplace programs, so we could use that if the kernel is to enforce this. Or we could make it up to the userspace loader to ensure this by whatever mechanism it wants. For libxdp that would mean setting the type of the dispatcher program based on the component program and refusing to mix those, for instance... -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 18:22 ` Toke Høiland-Jørgensen 2021-09-21 19:17 ` Zvi Effron @ 2021-09-21 20:12 ` Alexei Starovoitov 2021-09-21 22:20 ` Toke Høiland-Jørgensen 1 sibling, 1 reply; 26+ messages in thread From: Alexei Starovoitov @ 2021-09-21 20:12 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Zvi Effron <zeffron@riotgames.com> writes: > > > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Hi Lorenz (Cc. the other people who participated in today's discussion) > >> > >> Following our discussion at the LPC session today, I dug up my previous > >> summary of the issue and some possible solutions[0]. Seems no on > >> actually replied last time, which is why we went with the "do nothing" > >> approach, I suppose. I'm including the full text of the original email > >> below; please take a look, and let's see if we can converge on a > >> consensus here. > >> > >> First off, a problem description: If an existing XDP program is exposed > >> to an xdp_buff that is really a multi-buffer, while it will continue to > >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing > >> the packet it'll only see part of the payload and not be aware of that > >> fact, and if it's calculating the packet length, that will also only be > >> wrong (only counting the first fragment). > >> > >> So what to do about this? First of all, to do anything about it, XDP > >> programs need to be able to declare themselves "multi-buffer aware" (but > >> see point 1 below). We could try to auto-detect it in the verifier by > >> which helpers the program is using, but since existing programs could be > >> perfectly happy to just keep running, it probably needs to be something > >> the program communicates explicitly. One option is to use the > >> expected_attach_type to encode this; programs can then declare it in the > >> source by section name, or the userspace loader can set the type for > >> existing programs if needed. > >> > >> With this, the kernel will know if a given XDP program is multi-buff > >> aware and can decide what to do with that information. For this we came > >> up with basically three options: > >> > >> 1. Do nothing. This would make it up to users / sysadmins to avoid > >> anything breaking by manually making sure to not enable multi-buffer > >> support while loading any XDP programs that will malfunction if > >> presented with an mb frame. This will probably break in interesting > >> ways, but it's nice and simple from an implementation PoV. With this > >> we don't need the declaration discussed above either. > >> > >> 2. Add a check at runtime and drop the frames if they are mb-enabled and > >> the program doesn't understand it. This is relatively simple to > >> implement, but it also makes for difficult-to-understand issues (why > >> are my packets suddenly being dropped?), and it will incur runtime > >> overhead. > >> > >> 3. Reject loading of programs that are not MB-aware when running in an > >> MB-enabled mode. This would make things break in more obvious ways, > >> and still allow a userspace loader to declare a program "MB-aware" to > >> force it to run if necessary. The problem then becomes at what level > >> to block this? > >> > > > > I think there's another potential problem with this as well: what happens to > > already loaded programs that are not MB-aware? Are they forcibly unloaded? > > I'd say probably the opposite: You can't toggle whatever switch we end > up with if there are any non-MB-aware programs (you'd have to unload > them first)... > > >> Doing this at the driver level is not enough: while a particular > >> driver knows if it's running in multi-buff mode, we can't know for > >> sure if a particular XDP program is multi-buff aware at attach time: > >> it could be tail-calling other programs, or redirecting packets to > >> another interface where it will be processed by a non-MB aware > >> program. > >> > >> So another option is to make it a global toggle: e.g., create a new > >> sysctl to enable multi-buffer. If this is set, reject loading any XDP > >> program that doesn't support multi-buffer mode, and if it's unset, > >> disable multi-buffer mode in all drivers. This will make it explicit > >> when the multi-buffer mode is used, and prevent any accidental subtle > >> malfunction of existing XDP programs. The drawback is that it's a > >> mode switch, so more configuration complexity. > >> > > > > Could we combine the last two bits here into a global toggle that doesn't > > require a sysctl? If any driver is put into multi-buffer mode, then the system > > switches to requiring all programs be multi-buffer? When the last multi-buffer > > enabled driver switches out of multi-buffer, remove the system-wide > > restriction? > > Well, the trouble here is that we don't necessarily have an explicit > "multi-buf mode" for devices. For instance, you could raise the MTU of a > device without it necessarily involving any XDP multi-buffer stuff (if > you're not running XDP on that device). So if we did turn "raising the > MTU" into such a mode switch, we would end up blocking any MTU changes > if any XDP programs are loaded. Or having an MTU change cause a > force-unload of all XDP programs. MTU change that bumps driver into multi-buf mode or enable the header split that also bumps it into multi-buf mode probably shouldn't be allowed when non-mb aware xdp prog is attached. That would be the simplest and least surprising behavior. Force unload could cause security issues. > Neither of those are desirable outcomes, I think; and if we add a > separate "XDP multi-buff" switch, we might as well make it system-wide? If we have an internal flag 'this driver supports multi-buf xdp' cannot we make xdp_redirect to linearize in case the packet is being redirected to non multi-buf aware driver (potentially with corresponding non mb aware xdp progs attached) from mb aware driver? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 20:12 ` Alexei Starovoitov @ 2021-09-21 22:20 ` Toke Høiland-Jørgensen 2021-09-21 22:51 ` Jakub Kicinski 0 siblings, 1 reply; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-21 22:20 UTC (permalink / raw) To: Alexei Starovoitov Cc: Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Zvi Effron <zeffron@riotgames.com> writes: >> >> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) >> >> >> >> Following our discussion at the LPC session today, I dug up my previous >> >> summary of the issue and some possible solutions[0]. Seems no on >> >> actually replied last time, which is why we went with the "do nothing" >> >> approach, I suppose. I'm including the full text of the original email >> >> below; please take a look, and let's see if we can converge on a >> >> consensus here. >> >> >> >> First off, a problem description: If an existing XDP program is exposed >> >> to an xdp_buff that is really a multi-buffer, while it will continue to >> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing >> >> the packet it'll only see part of the payload and not be aware of that >> >> fact, and if it's calculating the packet length, that will also only be >> >> wrong (only counting the first fragment). >> >> >> >> So what to do about this? First of all, to do anything about it, XDP >> >> programs need to be able to declare themselves "multi-buffer aware" (but >> >> see point 1 below). We could try to auto-detect it in the verifier by >> >> which helpers the program is using, but since existing programs could be >> >> perfectly happy to just keep running, it probably needs to be something >> >> the program communicates explicitly. One option is to use the >> >> expected_attach_type to encode this; programs can then declare it in the >> >> source by section name, or the userspace loader can set the type for >> >> existing programs if needed. >> >> >> >> With this, the kernel will know if a given XDP program is multi-buff >> >> aware and can decide what to do with that information. For this we came >> >> up with basically three options: >> >> >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid >> >> anything breaking by manually making sure to not enable multi-buffer >> >> support while loading any XDP programs that will malfunction if >> >> presented with an mb frame. This will probably break in interesting >> >> ways, but it's nice and simple from an implementation PoV. With this >> >> we don't need the declaration discussed above either. >> >> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and >> >> the program doesn't understand it. This is relatively simple to >> >> implement, but it also makes for difficult-to-understand issues (why >> >> are my packets suddenly being dropped?), and it will incur runtime >> >> overhead. >> >> >> >> 3. Reject loading of programs that are not MB-aware when running in an >> >> MB-enabled mode. This would make things break in more obvious ways, >> >> and still allow a userspace loader to declare a program "MB-aware" to >> >> force it to run if necessary. The problem then becomes at what level >> >> to block this? >> >> >> > >> > I think there's another potential problem with this as well: what happens to >> > already loaded programs that are not MB-aware? Are they forcibly unloaded? >> >> I'd say probably the opposite: You can't toggle whatever switch we end >> up with if there are any non-MB-aware programs (you'd have to unload >> them first)... >> >> >> Doing this at the driver level is not enough: while a particular >> >> driver knows if it's running in multi-buff mode, we can't know for >> >> sure if a particular XDP program is multi-buff aware at attach time: >> >> it could be tail-calling other programs, or redirecting packets to >> >> another interface where it will be processed by a non-MB aware >> >> program. >> >> >> >> So another option is to make it a global toggle: e.g., create a new >> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP >> >> program that doesn't support multi-buffer mode, and if it's unset, >> >> disable multi-buffer mode in all drivers. This will make it explicit >> >> when the multi-buffer mode is used, and prevent any accidental subtle >> >> malfunction of existing XDP programs. The drawback is that it's a >> >> mode switch, so more configuration complexity. >> >> >> > >> > Could we combine the last two bits here into a global toggle that doesn't >> > require a sysctl? If any driver is put into multi-buffer mode, then the system >> > switches to requiring all programs be multi-buffer? When the last multi-buffer >> > enabled driver switches out of multi-buffer, remove the system-wide >> > restriction? >> >> Well, the trouble here is that we don't necessarily have an explicit >> "multi-buf mode" for devices. For instance, you could raise the MTU of a >> device without it necessarily involving any XDP multi-buffer stuff (if >> you're not running XDP on that device). So if we did turn "raising the >> MTU" into such a mode switch, we would end up blocking any MTU changes >> if any XDP programs are loaded. Or having an MTU change cause a >> force-unload of all XDP programs. > > MTU change that bumps driver into multi-buf mode or enable > the header split that also bumps it into multi-buf mode > probably shouldn't be allowed when non-mb aware xdp prog is attached. > That would be the simplest and least surprising behavior. > Force unload could cause security issues. Yeah, I agree, force-unload is not a good solution - I mostly mentioned it for completeness. I think it would be fine for a driver to disallow config changes if a non-mb-aware program is loaded; that basically amounts to the checks drivers do already if an XDP-program is loaded, so we could just keep those around for the non-mb-case. >> Neither of those are desirable outcomes, I think; and if we add a >> separate "XDP multi-buff" switch, we might as well make it system-wide? > > If we have an internal flag 'this driver supports multi-buf xdp' cannot we > make xdp_redirect to linearize in case the packet is being redirected > to non multi-buf aware driver (potentially with corresponding non mb aware xdp > progs attached) from mb aware driver? Hmm, the assumption that XDP frames take up at most one page has been fundamental from the start of XDP. So what does linearise mean in this context? If we get a 9k packet, should we dynamically allocate a multi-page chunk of contiguous memory and copy the frame into that, or were you thinking something else? -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 22:20 ` Toke Høiland-Jørgensen @ 2021-09-21 22:51 ` Jakub Kicinski 2021-09-22 20:01 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 26+ messages in thread From: Jakub Kicinski @ 2021-09-21 22:51 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote: > >> Neither of those are desirable outcomes, I think; and if we add a > >> separate "XDP multi-buff" switch, we might as well make it system-wide? > > > > If we have an internal flag 'this driver supports multi-buf xdp' cannot we > > make xdp_redirect to linearize in case the packet is being redirected > > to non multi-buf aware driver (potentially with corresponding non mb aware xdp > > progs attached) from mb aware driver? > > Hmm, the assumption that XDP frames take up at most one page has been > fundamental from the start of XDP. So what does linearise mean in this > context? If we get a 9k packet, should we dynamically allocate a > multi-page chunk of contiguous memory and copy the frame into that, or > were you thinking something else? My $.02 would be to not care about redirect at all. It's not like the user experience with redirect is anywhere close to amazing right now. Besides (with the exception of SW devices which will likely gain mb support quickly) mixed-HW setups are very rare. If the source of the redirect supports mb so will likely the target. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 22:51 ` Jakub Kicinski @ 2021-09-22 20:01 ` Toke Høiland-Jørgensen 2021-09-22 21:23 ` Zvi Effron 2021-09-23 13:46 ` Jakub Kicinski 0 siblings, 2 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-22 20:01 UTC (permalink / raw) To: Jakub Kicinski Cc: Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf Jakub Kicinski <kuba@kernel.org> writes: > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote: >> >> Neither of those are desirable outcomes, I think; and if we add a >> >> separate "XDP multi-buff" switch, we might as well make it system-wide? >> > >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we >> > make xdp_redirect to linearize in case the packet is being redirected >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp >> > progs attached) from mb aware driver? >> >> Hmm, the assumption that XDP frames take up at most one page has been >> fundamental from the start of XDP. So what does linearise mean in this >> context? If we get a 9k packet, should we dynamically allocate a >> multi-page chunk of contiguous memory and copy the frame into that, or >> were you thinking something else? > > My $.02 would be to not care about redirect at all. > > It's not like the user experience with redirect is anywhere close > to amazing right now. Besides (with the exception of SW devices which > will likely gain mb support quickly) mixed-HW setups are very rare. > If the source of the redirect supports mb so will likely the target. It's not about device support it's about XDP program support: If I run an MB-aware XDP program on a physical interface and redirect the (MB) frame into a container, and there's an XDP program running inside that container that isn't MB-aware, bugs will ensue. Doesn't matter if the veth driver itself supports MB... We could leave that as a "don't do that, then" kind of thing, but that was what we were proposing (as the "do nothing" option) and got some pushback on, hence why we're having this conversation :) -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-22 20:01 ` Toke Høiland-Jørgensen @ 2021-09-22 21:23 ` Zvi Effron 2021-09-23 18:45 ` Toke Høiland-Jørgensen 2021-09-23 13:46 ` Jakub Kicinski 1 sibling, 1 reply; 26+ messages in thread From: Zvi Effron @ 2021-09-22 21:23 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Jakub Kicinski, Alexei Starovoitov, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Jakub Kicinski <kuba@kernel.org> writes: > > > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote: > >> >> Neither of those are desirable outcomes, I think; and if we add a > >> >> separate "XDP multi-buff" switch, we might as well make it system-wide? > >> > > >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we > >> > make xdp_redirect to linearize in case the packet is being redirected > >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp > >> > progs attached) from mb aware driver? > >> > >> Hmm, the assumption that XDP frames take up at most one page has been > >> fundamental from the start of XDP. So what does linearise mean in this > >> context? If we get a 9k packet, should we dynamically allocate a > >> multi-page chunk of contiguous memory and copy the frame into that, or > >> were you thinking something else? > > > > My $.02 would be to not care about redirect at all. > > > > It's not like the user experience with redirect is anywhere close > > to amazing right now. Besides (with the exception of SW devices which > > will likely gain mb support quickly) mixed-HW setups are very rare. > > If the source of the redirect supports mb so will likely the target. > > It's not about device support it's about XDP program support: If I run > an MB-aware XDP program on a physical interface and redirect the (MB) > frame into a container, and there's an XDP program running inside that > container that isn't MB-aware, bugs will ensue. Doesn't matter if the > veth driver itself supports MB... > > We could leave that as a "don't do that, then" kind of thing, but that > was what we were proposing (as the "do nothing" option) and got some > pushback on, hence why we're having this conversation :) > > -Toke > I hadn't even considered the case of redirecting to a veth pair on the same system. I'm assuming from your statement that the buffers are passed directly to the ingress inside the container and don't go through the sort of egress process they would if leaving the system? And I'm assuming that's as an optimization? I'm not sure that makes a difference, though. It's not about whether the driver's code is mb-capable, it's about whether the driver _as currently configured_ could generate multiple buffers. If it can, then only an mb-aware program should be able to be attached to it (and tail called from whatever's attached to it). If it can't, then there should be no way to have multiple buffers come to it. So in the situation you've described, either the veth driver should be in a state where it coalesces the multiple buffers into one, fragmenting the frame if necessary or drops the frame, or the program attached inside the container would need to be mb-aware. I'm assuming with the veth driver as written, this might mean that all programs attached to the veth driver would need to be mb-aware, which is obviously undesirable. All of which significantly adds to the complexity to support mb-aware, so maybe this could be developed later? Initially we could have a sysctl toggling the state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a state for dynamic control once all XDP supporting drivers support the necessary dynamic functionality (if ever). At that point we'd have actual experience with the sysctl and could see how much of a burden having static control is. I may have been misinterpreting your use case though, and you were talking about the XDP program running on the egress side of the redirect? Is that what you were talking about case? --Zvi ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-22 21:23 ` Zvi Effron @ 2021-09-23 18:45 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-23 18:45 UTC (permalink / raw) To: Zvi Effron Cc: Jakub Kicinski, Alexei Starovoitov, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf Zvi Effron <zeffron@riotgames.com> writes: > On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Jakub Kicinski <kuba@kernel.org> writes: >> >> > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote: >> >> >> Neither of those are desirable outcomes, I think; and if we add a >> >> >> separate "XDP multi-buff" switch, we might as well make it system-wide? >> >> > >> >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we >> >> > make xdp_redirect to linearize in case the packet is being redirected >> >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp >> >> > progs attached) from mb aware driver? >> >> >> >> Hmm, the assumption that XDP frames take up at most one page has been >> >> fundamental from the start of XDP. So what does linearise mean in this >> >> context? If we get a 9k packet, should we dynamically allocate a >> >> multi-page chunk of contiguous memory and copy the frame into that, or >> >> were you thinking something else? >> > >> > My $.02 would be to not care about redirect at all. >> > >> > It's not like the user experience with redirect is anywhere close >> > to amazing right now. Besides (with the exception of SW devices which >> > will likely gain mb support quickly) mixed-HW setups are very rare. >> > If the source of the redirect supports mb so will likely the target. >> >> It's not about device support it's about XDP program support: If I run >> an MB-aware XDP program on a physical interface and redirect the (MB) >> frame into a container, and there's an XDP program running inside that >> container that isn't MB-aware, bugs will ensue. Doesn't matter if the >> veth driver itself supports MB... >> >> We could leave that as a "don't do that, then" kind of thing, but that >> was what we were proposing (as the "do nothing" option) and got some >> pushback on, hence why we're having this conversation :) >> >> -Toke >> > > I hadn't even considered the case of redirecting to a veth pair on the same > system. I'm assuming from your statement that the buffers are passed directly > to the ingress inside the container and don't go through the sort of egress > process they would if leaving the system? And I'm assuming that's as an > optimization? Yeah, if we redirect an XDP frame to a veth, the peer will get the same xdp_frame, without ever building an SKB. > I'm not sure that makes a difference, though. It's not about whether the > driver's code is mb-capable, it's about whether the driver _as currently > configured_ could generate multiple buffers. If it can, then only an mb-aware > program should be able to be attached to it (and tail called from whatever's > attached to it). If it can't, then there should be no way to have multiple > buffers come to it. > > So in the situation you've described, either the veth driver should be in a > state where it coalesces the multiple buffers into one, fragmenting the frame > if necessary or drops the frame, or the program attached inside the container > would need to be mb-aware. I'm assuming with the veth driver as written, this > might mean that all programs attached to the veth driver would need to be > mb-aware, which is obviously undesirable. Hmm, I guess that as long as mb-frames only show up for large MTUs, the MTU of the veth device would be a limiting factor just like for physical devices, so we could just apply the same logic there. Not sure why I didn't consider that before :/ > All of which significantly adds to the complexity to support mb-aware, so maybe > this could be developed later? Initially we could have a sysctl toggling the > state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a > state for dynamic control once all XDP supporting drivers support the necessary > dynamic functionality (if ever). At that point we'd have actual experience with > the sysctl and could see how much of a burden having static control is. > > I may have been misinterpreting your use case though, and you were talking > about the XDP program running on the egress side of the redirect? Is that what > you were talking about case? No I was talking about exactly what you outlined above. Although longer term, I also think we can use XDP mb as a way to avoid having to linearise SKBs when running XDP on them in veth (and for generic XDP) :) -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-22 20:01 ` Toke Høiland-Jørgensen 2021-09-22 21:23 ` Zvi Effron @ 2021-09-23 13:46 ` Jakub Kicinski 2021-09-27 12:43 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 26+ messages in thread From: Jakub Kicinski @ 2021-09-23 13:46 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf On Wed, 22 Sep 2021 22:01:17 +0200 Toke Høiland-Jørgensen wrote: > Jakub Kicinski <kuba@kernel.org> writes: > >> Hmm, the assumption that XDP frames take up at most one page has been > >> fundamental from the start of XDP. So what does linearise mean in this > >> context? If we get a 9k packet, should we dynamically allocate a > >> multi-page chunk of contiguous memory and copy the frame into that, or > >> were you thinking something else? > > > > My $.02 would be to not care about redirect at all. > > > > It's not like the user experience with redirect is anywhere close > > to amazing right now. Besides (with the exception of SW devices which > > will likely gain mb support quickly) mixed-HW setups are very rare. > > If the source of the redirect supports mb so will likely the target. > > It's not about device support it's about XDP program support: If I run > an MB-aware XDP program on a physical interface and redirect the (MB) > frame into a container, and there's an XDP program running inside that > container that isn't MB-aware, bugs will ensue. Doesn't matter if the > veth driver itself supports MB... Ah, I see now. > We could leave that as a "don't do that, then" kind of thing, but that > was what we were proposing (as the "do nothing" option) and got some > pushback on, hence why we're having this conversation :) Let me make a general statement that we can't build large systems without division of responsibilities. Device specific logic has to be left to the driver. It's up to veth to apply its extra rules. As Zvi said, tho, this can be left for later. IMHO initial patches can drop all mb frames on redirect in the core, so we can make progress. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-23 13:46 ` Jakub Kicinski @ 2021-09-27 12:43 ` Jesper Dangaard Brouer 0 siblings, 0 replies; 26+ messages in thread From: Jesper Dangaard Brouer @ 2021-09-27 12:43 UTC (permalink / raw) To: Jakub Kicinski, Toke Høiland-Jørgensen Cc: brouer, Alexei Starovoitov, Zvi Effron, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Network Development, bpf On 23/09/2021 15.46, Jakub Kicinski wrote: > Let me make a general statement that we can't build large systems > without division of responsibilities. Device specific logic has to > be left to the driver. It's up to veth to apply its extra rules. > > As Zvi said, tho, this can be left for later. IMHO initial patches can > drop all mb frames on redirect in the core, so we can make progress. I agree that for *initial* patchset for REDIRECT (func calling ndo_xdp_xmit) it can drop MB frames on redirect in the core, when the drivers ndo_xdp_xmit doesn't support this. BUT only so we can make progress. As soon as possible (preferably in same kernel release) a follow-up patchset should make a serious attempt at updating all drivers ndo_xdp_xmit to support multi-buffer (MB). As MB use the same/compatible layout as skb_shared_info, it should be a fairly small effort to update drivers, for *transmitting* multi-buff. As drivers already support this for normal SKBs. It will be a larger effort to support XDP multi-buff on RX, as often the RX-loop need to be adjusted/reordered to "delay" calling XDP-prog until all fragments have been "collected". And MTU check adjusted. This imply that the driver net_device xdp_features need two feature bits for this MB feature. IMHO it makes sense to detangle RX and TX, as IMHO we should allow MB frames to be TX redirect out a driver that have an old non-MB aware XDP-prog loaded. --Jesper ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen 2021-09-21 17:31 ` Zvi Effron @ 2021-09-21 22:54 ` Jakub Kicinski 2021-09-22 20:02 ` Toke Høiland-Jørgensen 2021-09-23 10:33 ` Lorenz Bauer 2 siblings, 1 reply; 26+ messages in thread From: Jakub Kicinski @ 2021-09-21 22:54 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote: > 1. Do nothing. This would make it up to users / sysadmins to avoid > anything breaking by manually making sure to not enable multi-buffer > support while loading any XDP programs that will malfunction if > presented with an mb frame. This will probably break in interesting > ways, but it's nice and simple from an implementation PoV. With this > we don't need the declaration discussed above either. > > 2. Add a check at runtime and drop the frames if they are mb-enabled and > the program doesn't understand it. This is relatively simple to > implement, but it also makes for difficult-to-understand issues (why > are my packets suddenly being dropped?), and it will incur runtime > overhead. > > 3. Reject loading of programs that are not MB-aware when running in an > MB-enabled mode. This would make things break in more obvious ways, > and still allow a userspace loader to declare a program "MB-aware" to > force it to run if necessary. The problem then becomes at what level > to block this? > > Doing this at the driver level is not enough: while a particular > driver knows if it's running in multi-buff mode, we can't know for > sure if a particular XDP program is multi-buff aware at attach time: > it could be tail-calling other programs, or redirecting packets to > another interface where it will be processed by a non-MB aware > program. > > So another option is to make it a global toggle: e.g., create a new > sysctl to enable multi-buffer. If this is set, reject loading any XDP > program that doesn't support multi-buffer mode, and if it's unset, > disable multi-buffer mode in all drivers. This will make it explicit > when the multi-buffer mode is used, and prevent any accidental subtle > malfunction of existing XDP programs. The drawback is that it's a > mode switch, so more configuration complexity. 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB thru tail calls. IMHO that's very simple and covers majority of use cases. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 22:54 ` Jakub Kicinski @ 2021-09-22 20:02 ` Toke Høiland-Jørgensen 2021-09-22 21:11 ` Zvi Effron 0 siblings, 1 reply; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-22 20:02 UTC (permalink / raw) To: Jakub Kicinski Cc: Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf Jakub Kicinski <kuba@kernel.org> writes: > On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote: >> 1. Do nothing. This would make it up to users / sysadmins to avoid >> anything breaking by manually making sure to not enable multi-buffer >> support while loading any XDP programs that will malfunction if >> presented with an mb frame. This will probably break in interesting >> ways, but it's nice and simple from an implementation PoV. With this >> we don't need the declaration discussed above either. >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and >> the program doesn't understand it. This is relatively simple to >> implement, but it also makes for difficult-to-understand issues (why >> are my packets suddenly being dropped?), and it will incur runtime >> overhead. >> >> 3. Reject loading of programs that are not MB-aware when running in an >> MB-enabled mode. This would make things break in more obvious ways, >> and still allow a userspace loader to declare a program "MB-aware" to >> force it to run if necessary. The problem then becomes at what level >> to block this? >> >> Doing this at the driver level is not enough: while a particular >> driver knows if it's running in multi-buff mode, we can't know for >> sure if a particular XDP program is multi-buff aware at attach time: >> it could be tail-calling other programs, or redirecting packets to >> another interface where it will be processed by a non-MB aware >> program. >> >> So another option is to make it a global toggle: e.g., create a new >> sysctl to enable multi-buffer. If this is set, reject loading any XDP >> program that doesn't support multi-buffer mode, and if it's unset, >> disable multi-buffer mode in all drivers. This will make it explicit >> when the multi-buffer mode is used, and prevent any accidental subtle >> malfunction of existing XDP programs. The drawback is that it's a >> mode switch, so more configuration complexity. > > 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB > thru tail calls. > > IMHO that's very simple and covers majority of use cases. Using the program type (or maybe the expected_attach_type) was how I was imagining we'd encode the "I am MB aware" flag, yes. I hadn't actually considered that this could be used to also restrict tail call/freplace attachment, but that's a good point. So this leaves just the redirect issue, then, see my other reply. -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-22 20:02 ` Toke Høiland-Jørgensen @ 2021-09-22 21:11 ` Zvi Effron 2021-09-23 19:00 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 26+ messages in thread From: Zvi Effron @ 2021-09-22 21:11 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Jakub Kicinski, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf On Wed, Sep 22, 2021 at 1:03 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Jakub Kicinski <kuba@kernel.org> writes: > > > On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote: > >> 1. Do nothing. This would make it up to users / sysadmins to avoid > >> anything breaking by manually making sure to not enable multi-buffer > >> support while loading any XDP programs that will malfunction if > >> presented with an mb frame. This will probably break in interesting > >> ways, but it's nice and simple from an implementation PoV. With this > >> we don't need the declaration discussed above either. > >> > >> 2. Add a check at runtime and drop the frames if they are mb-enabled and > >> the program doesn't understand it. This is relatively simple to > >> implement, but it also makes for difficult-to-understand issues (why > >> are my packets suddenly being dropped?), and it will incur runtime > >> overhead. > >> > >> 3. Reject loading of programs that are not MB-aware when running in an > >> MB-enabled mode. This would make things break in more obvious ways, > >> and still allow a userspace loader to declare a program "MB-aware" to > >> force it to run if necessary. The problem then becomes at what level > >> to block this? > >> > >> Doing this at the driver level is not enough: while a particular > >> driver knows if it's running in multi-buff mode, we can't know for > >> sure if a particular XDP program is multi-buff aware at attach time: > >> it could be tail-calling other programs, or redirecting packets to > >> another interface where it will be processed by a non-MB aware > >> program. > >> > >> So another option is to make it a global toggle: e.g., create a new > >> sysctl to enable multi-buffer. If this is set, reject loading any XDP > >> program that doesn't support multi-buffer mode, and if it's unset, > >> disable multi-buffer mode in all drivers. This will make it explicit > >> when the multi-buffer mode is used, and prevent any accidental subtle > >> malfunction of existing XDP programs. The drawback is that it's a > >> mode switch, so more configuration complexity. > > > > 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB > > thru tail calls. > > > > IMHO that's very simple and covers majority of use cases. > > Using the program type (or maybe the expected_attach_type) was how I was > imagining we'd encode the "I am MB aware" flag, yes. I hadn't actually > considered that this could be used to also restrict tail call/freplace > attachment, but that's a good point. So this leaves just the redirect > issue, then, see my other reply. > I really like this apporoach as well, but before we commit to it, how likely are we to encounter this type of situation (where we need to indicate whether an XDP program supports a new capability) again in the future. And if we do, are we willing to require that all programs supporting that new feature are also mb-aware? Essentially, the suboptimal case I'm envisioning is needing to have XDP_MB, XD_MB_NEW_THING, XDP_NEW_THING, and XDP all as program types. That leads to exponential explosion in program types. Also, every time we add a program type to encode a feature (instead of a truly new type), we're essentially forcing a recompilation (and redeployment) of all existing programs that take advantage of the libbpf section name program typing. (I'm sure there are tools that can rename a section in an ELF file without recompilation, but recompilation seems the simplest way to correct the ELF files for most people.) If we think this is a one-off, it's probably fine, but if we think it'll happen again, is it worth it to find a solution that will work for future cases now, instead of having XDP, XDP_MB, and then having to find a solution for future cases? --Zvi > -Toke > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-22 21:11 ` Zvi Effron @ 2021-09-23 19:00 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-23 19:00 UTC (permalink / raw) To: Zvi Effron Cc: Jakub Kicinski, Lorenz Bauer, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, netdev, bpf Zvi Effron <zeffron@riotgames.com> writes: > On Wed, Sep 22, 2021 at 1:03 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Jakub Kicinski <kuba@kernel.org> writes: >> >> > On Tue, 21 Sep 2021 18:06:35 +0200 Toke Høiland-Jørgensen wrote: >> >> 1. Do nothing. This would make it up to users / sysadmins to avoid >> >> anything breaking by manually making sure to not enable multi-buffer >> >> support while loading any XDP programs that will malfunction if >> >> presented with an mb frame. This will probably break in interesting >> >> ways, but it's nice and simple from an implementation PoV. With this >> >> we don't need the declaration discussed above either. >> >> >> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and >> >> the program doesn't understand it. This is relatively simple to >> >> implement, but it also makes for difficult-to-understand issues (why >> >> are my packets suddenly being dropped?), and it will incur runtime >> >> overhead. >> >> >> >> 3. Reject loading of programs that are not MB-aware when running in an >> >> MB-enabled mode. This would make things break in more obvious ways, >> >> and still allow a userspace loader to declare a program "MB-aware" to >> >> force it to run if necessary. The problem then becomes at what level >> >> to block this? >> >> >> >> Doing this at the driver level is not enough: while a particular >> >> driver knows if it's running in multi-buff mode, we can't know for >> >> sure if a particular XDP program is multi-buff aware at attach time: >> >> it could be tail-calling other programs, or redirecting packets to >> >> another interface where it will be processed by a non-MB aware >> >> program. >> >> >> >> So another option is to make it a global toggle: e.g., create a new >> >> sysctl to enable multi-buffer. If this is set, reject loading any XDP >> >> program that doesn't support multi-buffer mode, and if it's unset, >> >> disable multi-buffer mode in all drivers. This will make it explicit >> >> when the multi-buffer mode is used, and prevent any accidental subtle >> >> malfunction of existing XDP programs. The drawback is that it's a >> >> mode switch, so more configuration complexity. >> > >> > 4. Add new program type, XDP_MB. Do not allow mixing of XDP vs XDP_MB >> > thru tail calls. >> > >> > IMHO that's very simple and covers majority of use cases. >> >> Using the program type (or maybe the expected_attach_type) was how I was >> imagining we'd encode the "I am MB aware" flag, yes. I hadn't actually >> considered that this could be used to also restrict tail call/freplace >> attachment, but that's a good point. So this leaves just the redirect >> issue, then, see my other reply. >> > > I really like this apporoach as well, but before we commit to it, how likely > are we to encounter this type of situation (where we need to indicate whether > an XDP program supports a new capability) again in the future. And if we do, > are we willing to require that all programs supporting that new feature are > also mb-aware? Essentially, the suboptimal case I'm envisioning is needing to > have XDP_MB, XD_MB_NEW_THING, XDP_NEW_THING, and XDP all as program types. That > leads to exponential explosion in program types. Hmm, that's a good point. Maybe it would be better to communicate it via a flag; we do have a prog_flags attribute on BPF_PROG_LOAD we could use. > Also, every time we add a program type to encode a feature (instead of a truly > new type), we're essentially forcing a recompilation (and redeployment) of all > existing programs that take advantage of the libbpf section name program > typing. (I'm sure there are tools that can rename a section in an ELF file > without recompilation, but recompilation seems the simplest way to correct the > ELF files for most people.) Isn't this true regardless of how we encode it? I mean when we add a new feature that needs an explicit support declaration, programs need to declare that they support it somehow. No matter how it's actually encoded, this will either entail recompiling the program, or having the loader override the value at load-time. For instance, say we do use a flag instead of a new prog type, how would a BPF program set that flag? Defining a new section type in libbpf would be an obvious answer (i.e., SEC("xdp") sets prog_type=xdp, and SEC("xdp_mb") sets prog_type=xdp, flags=XDP_MB). > If we think this is a one-off, it's probably fine, but if we think > it'll happen again, is it worth it to find a solution that will work > for future cases now, instead of having XDP, XDP_MB, and then having > to find a solution for future cases? Well, really the right solution is a general "XDP feature flags" capability. There was some work done on this, but unfortunately it stalled out. Not sure it's feasible to resurrect that as a prerequisite for landing xdp_mb, though, so I guess we'll need a one-off thing here :/ -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen 2021-09-21 17:31 ` Zvi Effron 2021-09-21 22:54 ` Jakub Kicinski @ 2021-09-23 10:33 ` Lorenz Bauer 2021-09-23 12:59 ` Toke Høiland-Jørgensen 2 siblings, 1 reply; 26+ messages in thread From: Lorenz Bauer @ 2021-09-23 10:33 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf On Tue, 21 Sept 2021 at 17:06, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Hi Lorenz (Cc. the other people who participated in today's discussion) > > Following our discussion at the LPC session today, I dug up my previous > summary of the issue and some possible solutions[0]. Seems no on > actually replied last time, which is why we went with the "do nothing" > approach, I suppose. I'm including the full text of the original email > below; please take a look, and let's see if we can converge on a > consensus here. Hi Toke, Thanks for looping me in again. A bit of context what XDP at Cloudflare looks like: * We have a chain of XDP programs attached to a real network device. This implements DDoS protection and L4 load balancing. This is maintained by the team I am on. * We have hundreds of network namespaces with veth that have XDP attached to them. Traffic is routed from the root namespace into these. This is maintained by the Magic Transit team, see this talk from last year's LPC [1] I'll try to summarise what I've picked up from the thread and add my own 2c. Options being considered: 1. Make sure mb-aware and mb-unaware programs don't mix. This could either be in the form of a sysctl or a dynamic property similar to a refcount. We'd need to discern mb-aware from mb-unaware somehow, most easily via a new program type. This means recompiling existing programs (but then we expect that to be necessary anyways). We'd also have to be able to indicate "mb-awareness" for freplace programs. The implementation complexity seems OK, but operator UX is not good: it's not possible to slowly migrate a system to mb-awareness, it has to happen in one fell swoop. This would be really problematic for us, since we already have multiple teams writing and deploying XDP independently of each other. This number is only going to grow. It seems there will also be trickiness around redirecting into different devices? Not something we do today, but it's kind of an obvious optimization to start redirecting into network namespaces from XDP instead of relying on routing. 2. Add a compatibility shim for mb-unaware programs receiving an mb frame. We'd still need a way to indicate "MB-OK", but it could be a piece of metadata on a bpf_prog. Whatever code dispatches to an XDP program would have to include a prologue that linearises the xdp_buff if necessary which implies allocating memory. I don't know how hard it is to implement this. There is also the question of freplace: do we extend linearising to them, or do they have to support MB? You raised an interesting point: couldn't we hit programs that can't handle data_end - data being above a certain length? I think we (= Cloudflare) actually have one of those, since we in some cases need to traverse the entire buffer to calculate a checksum (we encapsulate UDPv4 in IPv6, don't ask). Turns out it's actually really hard to calculate the checksum on a variable length packet in BPF so we've had to introduce limits. However, this case isn't too important: we made this choice consciously, knowing that MTU changes would break it. Other than that I like this option a lot: mb-aware and mb-unaware programs can co-exist, at the cost of performance. This allows gradually migrating to our stack so that it can handle jumbo frames. 3. Make non-linearity invisible to the BPF program Something I've wished for often is that I didn't have to deal with nonlinearity at all, based on my experience with cls_redirect [2]. It's really hard to write a BPF program that handles non-linear skb, especially when you have to call adjust_head, etc. which invalidates packet buffers. This is probably impossible, but maybe someone has a crazy idea? :) Lorenz 1: https://youtu.be/UkvxPyIJAko?t=10057 2: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/bpf/progs/test_cls_redirect.c -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-23 10:33 ` Lorenz Bauer @ 2021-09-23 12:59 ` Toke Høiland-Jørgensen 2021-09-24 10:18 ` Lorenz Bauer 0 siblings, 1 reply; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-23 12:59 UTC (permalink / raw) To: Lorenz Bauer Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf Lorenz Bauer <lmb@cloudflare.com> writes: > On Tue, 21 Sept 2021 at 17:06, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Hi Lorenz (Cc. the other people who participated in today's discussion) >> >> Following our discussion at the LPC session today, I dug up my previous >> summary of the issue and some possible solutions[0]. Seems no on >> actually replied last time, which is why we went with the "do nothing" >> approach, I suppose. I'm including the full text of the original email >> below; please take a look, and let's see if we can converge on a >> consensus here. > > Hi Toke, > > Thanks for looping me in again. A bit of context what XDP at > Cloudflare looks like: > > * We have a chain of XDP programs attached to a real network device. > This implements DDoS protection and L4 load balancing. This is > maintained by the team I am on. > * We have hundreds of network namespaces with veth that have XDP > attached to them. Traffic is routed from the root namespace into > these. This is maintained by the Magic Transit team, see this talk > from last year's LPC [1] > I'll try to summarise what I've picked up from the thread and add my > own 2c. Options being considered: > > 1. Make sure mb-aware and mb-unaware programs don't mix. I think I would rather state this as "make sure mb-unaware programs never encounter an mb frame". The programs can mix just fine in a single-buffer world since mb-aware programs are backwards compatible. All the multibuf helpers will also do the right thing even if there's only a single buffer in a given packet. > This could either be in the form of a sysctl or a dynamic property > similar to a refcount. We'd need to discern mb-aware from mb-unaware > somehow, most easily via a new program type. This means recompiling > existing programs (but then we expect that to be necessary anyways). Small nit here: while in most cases this property will probably be set by recompilation, the loader can override it as well. So if you have a non-mb-aware program that you know won't break in an mb-setting (because it doesn't care about packet length, etc), your loader could just mark it as 'mb-aware'. Command-line loaders like xdp-loader and 'ip' would probably need a manual override switch to do this for the case where you have an old XDP program that you still want to run even though you've enabled MB mode. > We'd also have to be able to indicate "mb-awareness" for freplace > programs. > > The implementation complexity seems OK, but operator UX is not good: > it's not possible to slowly migrate a system to mb-awareness, it has > to happen in one fell swoop. I don't think it has to be quite that bleak :) Specifically, there is no reason to block mb-aware programs from loading even when the multi-buffer mode is disabled. So a migration plan would look something like: 1. Start out with the mb-sysctl toggled off. This will make the system behave like it does today, i.e., XDP programs won't load on interfaces with large MTUs. 2. Start porting all your XDP programs to make them mb-aware, and switch their program type as you do. In many cases this is just a matter of checking that the programs don't care about packet length. While this is ongoing you will have a mix of mb-aware and non-mb-aware programs running, but there will be no actual mb frames. 3. Once all your programs have been ported and marked as such, flip the sysctl. This will make the system start refusing to load any XDP programs that are not mb-aware. 4. Actually raise the MTU of your interfaces :) > 2. Add a compatibility shim for mb-unaware programs receiving an mb frame. > > We'd still need a way to indicate "MB-OK", but it could be a piece of > metadata on a bpf_prog. Whatever code dispatches to an XDP program > would have to include a prologue that linearises the xdp_buff if > necessary which implies allocating memory. I don't know how hard it is > to implement this. I think it would be somewhat non-trivial, and more importantly would absolutely slaughter performance. And if you're using XDP, presumably you care about that, so I'm not sure we're doing anyone any favours by implementing such a compatibility layer? > There is also the question of freplace: do we extend linearising to > them, or do they have to support MB? Well, today freplace programs just inherit the type of whatever they are attaching to, so we'd have to go out of our way to block this. I think logically it would be up to whatever loader is attaching freplace programs, because that's the one that knows the semantics. E.g., something like libxdp that uses freplace to chain load programs would have to make sure that the dispatcher program is only marked as mb-aware if *all* the constituent programs are. > You raised an interesting point: couldn't we hit programs that can't > handle data_end - data being above a certain length? I think we (= > Cloudflare) actually have one of those, since we in some cases need to > traverse the entire buffer to calculate a checksum (we encapsulate > UDPv4 in IPv6, don't ask). Turns out it's actually really hard to > calculate the checksum on a variable length packet in BPF so we've had > to introduce limits. However, this case isn't too important: we made > this choice consciously, knowing that MTU changes would break it. Yeah, for this I think you're on your own ;) > Other than that I like this option a lot: mb-aware and mb-unaware > programs can co-exist, at the cost of performance. This allows > gradually migrating to our stack so that it can handle jumbo frames. See above re: coexisting. > 3. Make non-linearity invisible to the BPF program > > Something I've wished for often is that I didn't have to deal with > nonlinearity at all, based on my experience with cls_redirect [2]. > It's really hard to write a BPF program that handles non-linear skb, > especially when you have to call adjust_head, etc. which invalidates > packet buffers. This is probably impossible, but maybe someone has a > crazy idea? :) With the other helpers that we started discussing, I don't think you have to? I.e., with an xdp_load_bytes() or an xdp_data_pointer()-type helper that works across fragment boundaries I think you'd be fine, no? -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-23 12:59 ` Toke Høiland-Jørgensen @ 2021-09-24 10:18 ` Lorenz Bauer 2021-09-24 17:55 ` Zvi Effron 2021-09-24 19:38 ` Toke Høiland-Jørgensen 0 siblings, 2 replies; 26+ messages in thread From: Lorenz Bauer @ 2021-09-24 10:18 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf On Thu, 23 Sept 2021 at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > I don't think it has to be quite that bleak :) > > Specifically, there is no reason to block mb-aware programs from loading > even when the multi-buffer mode is disabled. So a migration plan would > look something like: ... > 2. Start porting all your XDP programs to make them mb-aware, and switch > their program type as you do. In many cases this is just a matter of > checking that the programs don't care about packet length. [...] Porting is only easy if we are guaranteed that the first PAGE_SIZE bytes (or whatever the current limit is) are available via ->data without trickery. Otherwise we have to convert all direct packet access to the new API, whatever that ends up being. It seemed to me like you were saying there is no such guarantee, and it could be driver dependent, which is the worst possible outcome imo. This is the status quo for TC classifiers, which is a great source of hard to diagnose bugs. > 3. Once all your programs have been ported and marked as such, flip the > sysctl. This will make the system start refusing to load any XDP > programs that are not mb-aware. By this you mean reboot the system and early in boot change the sysctl? That could work I guess. > > 2. Add a compatibility shim for mb-unaware programs receiving an mb frame. > > > > We'd still need a way to indicate "MB-OK", but it could be a piece of > > metadata on a bpf_prog. Whatever code dispatches to an XDP program > > would have to include a prologue that linearises the xdp_buff if > > necessary which implies allocating memory. I don't know how hard it is > > to implement this. > > I think it would be somewhat non-trivial, and more importantly would > absolutely slaughter performance. And if you're using XDP, presumably > you care about that, so I'm not sure we're doing anyone any favours by > implementing such a compatibility layer? I see your point: having a single non-mb-aware program trash performance is bad for marketing. Better to not let people bump the MTU in that case. > > 3. Make non-linearity invisible to the BPF program > > > > Something I've wished for often is that I didn't have to deal with > > nonlinearity at all, based on my experience with cls_redirect [2]. > > It's really hard to write a BPF program that handles non-linear skb, > > especially when you have to call adjust_head, etc. which invalidates > > packet buffers. This is probably impossible, but maybe someone has a > > crazy idea? :) > > With the other helpers that we started discussing, I don't think you > have to? I.e., with an xdp_load_bytes() or an xdp_data_pointer()-type > helper that works across fragment boundaries I think you'd be fine, no? I'll take a look! Lorenz -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-24 10:18 ` Lorenz Bauer @ 2021-09-24 17:55 ` Zvi Effron 2021-09-24 19:38 ` Toke Høiland-Jørgensen 1 sibling, 0 replies; 26+ messages in thread From: Zvi Effron @ 2021-09-24 17:55 UTC (permalink / raw) To: Lorenz Bauer Cc: Toke Høiland-Jørgensen, Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf On Fri, Sep 24, 2021 at 3:19 AM Lorenz Bauer <lmb@cloudflare.com> wrote: > > On Thu, 23 Sept 2021 at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > I don't think it has to be quite that bleak :) > > > > Specifically, there is no reason to block mb-aware programs from loading > > even when the multi-buffer mode is disabled. So a migration plan would > > look something like: > > ... > > > 2. Start porting all your XDP programs to make them mb-aware, and switch > > their program type as you do. In many cases this is just a matter of > > checking that the programs don't care about packet length. [...] > > Porting is only easy if we are guaranteed that the first PAGE_SIZE > bytes (or whatever the current limit is) are available via ->data > without trickery. Otherwise we have to convert all direct packet > access to the new API, whatever that ends up being. It seemed to me > like you were saying there is no such guarantee, and it could be > driver dependent, which is the worst possible outcome imo. This is the > status quo for TC classifiers, which is a great source of hard to > diagnose bugs. > > > 3. Once all your programs have been ported and marked as such, flip the > > sysctl. This will make the system start refusing to load any XDP > > programs that are not mb-aware. > > By this you mean reboot the system and early in boot change the > sysctl? That could work I guess. > I don't think a reboot is required. Just an unload of all mb-unaware XDP programs to allow adjusting the sysctl via normal sysctl changing methods (i.e. sysctl -w). Even if rebooting, it should just need an entry in /etc/sysctl.conf, nothing necessarily early in the boot process. The sysctl would only need to be set before the first load of an mn-unaware XDP program, which, if everything is ported correctly, would never happen. > > > 2. Add a compatibility shim for mb-unaware programs receiving an mb frame. > > > > > > We'd still need a way to indicate "MB-OK", but it could be a piece of > > > metadata on a bpf_prog. Whatever code dispatches to an XDP program > > > would have to include a prologue that linearises the xdp_buff if > > > necessary which implies allocating memory. I don't know how hard it is > > > to implement this. > > > > I think it would be somewhat non-trivial, and more importantly would > > absolutely slaughter performance. And if you're using XDP, presumably > > you care about that, so I'm not sure we're doing anyone any favours by > > implementing such a compatibility layer? > > I see your point: having a single non-mb-aware program trash > performance is bad for marketing. Better to not let people bump the > MTU in that case. > > > > 3. Make non-linearity invisible to the BPF program > > > > > > Something I've wished for often is that I didn't have to deal with > > > nonlinearity at all, based on my experience with cls_redirect [2]. > > > It's really hard to write a BPF program that handles non-linear skb, > > > especially when you have to call adjust_head, etc. which invalidates > > > packet buffers. This is probably impossible, but maybe someone has a > > > crazy idea? :) > > > > With the other helpers that we started discussing, I don't think you > > have to? I.e., with an xdp_load_bytes() or an xdp_data_pointer()-type > > helper that works across fragment boundaries I think you'd be fine, no? > > I'll take a look! > > Lorenz > > -- > Lorenz Bauer | Systems Engineer > 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK > > www.cloudflare.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-24 10:18 ` Lorenz Bauer 2021-09-24 17:55 ` Zvi Effron @ 2021-09-24 19:38 ` Toke Høiland-Jørgensen 2021-09-28 8:47 ` Lorenz Bauer 1 sibling, 1 reply; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-24 19:38 UTC (permalink / raw) To: Lorenz Bauer Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf Lorenz Bauer <lmb@cloudflare.com> writes: > On Thu, 23 Sept 2021 at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> I don't think it has to be quite that bleak :) >> >> Specifically, there is no reason to block mb-aware programs from loading >> even when the multi-buffer mode is disabled. So a migration plan would >> look something like: > > ... > >> 2. Start porting all your XDP programs to make them mb-aware, and switch >> their program type as you do. In many cases this is just a matter of >> checking that the programs don't care about packet length. [...] > > Porting is only easy if we are guaranteed that the first PAGE_SIZE > bytes (or whatever the current limit is) are available via ->data > without trickery. Otherwise we have to convert all direct packet > access to the new API, whatever that ends up being. It seemed to me > like you were saying there is no such guarantee, and it could be > driver dependent, which is the worst possible outcome imo. This is the > status quo for TC classifiers, which is a great source of hard to > diagnose bugs. Well, for the changes we're proposing now it will certainly be the case that the first PAGE_SIZE will always be present. But once we have the capability, I would expect people would want to do more with it, so we can't really guarantee this in the future. We could require that any other use be opt-in at the driver level, I suppose, but not sure if that would be enough? -Toke ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-24 19:38 ` Toke Høiland-Jørgensen @ 2021-09-28 8:47 ` Lorenz Bauer 2021-09-28 13:43 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 26+ messages in thread From: Lorenz Bauer @ 2021-09-28 8:47 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf On Fri, 24 Sept 2021 at 20:38, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Porting is only easy if we are guaranteed that the first PAGE_SIZE > > bytes (or whatever the current limit is) are available via ->data > > without trickery. Otherwise we have to convert all direct packet > > access to the new API, whatever that ends up being. It seemed to me > > like you were saying there is no such guarantee, and it could be > > driver dependent, which is the worst possible outcome imo. This is the > > status quo for TC classifiers, which is a great source of hard to > > diagnose bugs. > > Well, for the changes we're proposing now it will certainly be the case > that the first PAGE_SIZE will always be present. But once we have the > capability, I would expect people would want to do more with it, so we > can't really guarantee this in the future. We could require that any > other use be opt-in at the driver level, I suppose, but not sure if that > would be enough? I'm not sure what you mean by "opt-in at driver level"? Make smaller initial fragments a feature on the driver? We shouldn't let drivers dictate the semantics of a program type, it defeats the purpose of the context abstraction. We're using XDP precisely because we don't want to deal with vendor specific network stack bypass, etc. I would prefer not specifying the first fragment size over the driver knob, unfortunately it invalidates your assumption that porting is going to be trivial. Lorenz -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Redux: Backwards compatibility for XDP multi-buff 2021-09-28 8:47 ` Lorenz Bauer @ 2021-09-28 13:43 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 26+ messages in thread From: Toke Høiland-Jørgensen @ 2021-09-28 13:43 UTC (permalink / raw) To: Lorenz Bauer Cc: Lorenzo Bianconi, Daniel Borkmann, John Fastabend, Networking, bpf Lorenz Bauer <lmb@cloudflare.com> writes: > On Fri, 24 Sept 2021 at 20:38, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> > >> > Porting is only easy if we are guaranteed that the first PAGE_SIZE >> > bytes (or whatever the current limit is) are available via ->data >> > without trickery. Otherwise we have to convert all direct packet >> > access to the new API, whatever that ends up being. It seemed to me >> > like you were saying there is no such guarantee, and it could be >> > driver dependent, which is the worst possible outcome imo. This is the >> > status quo for TC classifiers, which is a great source of hard to >> > diagnose bugs. >> >> Well, for the changes we're proposing now it will certainly be the case >> that the first PAGE_SIZE will always be present. But once we have the >> capability, I would expect people would want to do more with it, so we >> can't really guarantee this in the future. We could require that any >> other use be opt-in at the driver level, I suppose, but not sure if that >> would be enough? > > I'm not sure what you mean by "opt-in at driver level"? Make smaller > initial fragments a feature on the driver? We shouldn't let drivers > dictate the semantics of a program type, it defeats the purpose of the > context abstraction. We're using XDP precisely because we don't want > to deal with vendor specific network stack bypass, etc. I would prefer > not specifying the first fragment size over the driver knob, I didn't mean that every driver should get to just define their own semantics; rather the opposite: If we do end up adding support for anything other than "first page of data is in the first fragment", we should define separate semantics for this and make it an explicit knob to turn on such a mode. > unfortunately it invalidates your assumption that porting is going to > be trivial. Well, I did say "for some programs" ;) But OK, point taken. It would be great if you could chime in on the helper discussion[0]. I.e., which helper API would make porting simplest for you? -Toke [0] https://lore.kernel.org/r/20210920110216.4c54c9a3@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2021-09-28 13:43 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-21 16:06 Redux: Backwards compatibility for XDP multi-buff Toke Høiland-Jørgensen 2021-09-21 17:31 ` Zvi Effron 2021-09-21 18:22 ` Toke Høiland-Jørgensen 2021-09-21 19:17 ` Zvi Effron 2021-09-21 22:14 ` Toke Høiland-Jørgensen 2021-09-21 23:10 ` Zvi Effron 2021-09-22 20:13 ` Toke Høiland-Jørgensen 2021-09-21 20:12 ` Alexei Starovoitov 2021-09-21 22:20 ` Toke Høiland-Jørgensen 2021-09-21 22:51 ` Jakub Kicinski 2021-09-22 20:01 ` Toke Høiland-Jørgensen 2021-09-22 21:23 ` Zvi Effron 2021-09-23 18:45 ` Toke Høiland-Jørgensen 2021-09-23 13:46 ` Jakub Kicinski 2021-09-27 12:43 ` Jesper Dangaard Brouer 2021-09-21 22:54 ` Jakub Kicinski 2021-09-22 20:02 ` Toke Høiland-Jørgensen 2021-09-22 21:11 ` Zvi Effron 2021-09-23 19:00 ` Toke Høiland-Jørgensen 2021-09-23 10:33 ` Lorenz Bauer 2021-09-23 12:59 ` Toke Høiland-Jørgensen 2021-09-24 10:18 ` Lorenz Bauer 2021-09-24 17:55 ` Zvi Effron 2021-09-24 19:38 ` Toke Høiland-Jørgensen 2021-09-28 8:47 ` Lorenz Bauer 2021-09-28 13:43 ` Toke Høiland-Jørgensen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.