* [RFC net-next] net: tc: flow indirect framework issue @ 2022-04-13 5:52 Mattias Forsblad 2022-04-13 7:05 ` Baowen Zheng 0 siblings, 1 reply; 8+ messages in thread From: Mattias Forsblad @ 2022-04-13 5:52 UTC (permalink / raw) To: netdev Cc: roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso, Baowen Zheng, Vladimir Oltean, Tobias Waldekranz, Mattias Forsblad Hello all, I'm currently working to get offloading of tc rules (clsact/matchall/drop) on a bridge offloaded to HW. The patch series is here: https://lore.kernel.org/netdev/20220411131619.43js6owwkalcdwwa@skbuf/T/#m07bff9e205e9ac03d15a4e758b4129235da88aba However I'm having some trouble with it. More specific in the limitations section in the link above, quote: Limitations If there is tc rules on a bridge and all the ports leave the bridge and then joins the bridge again, the indirect framwork doesn't seem to reoffload them at join. The tc rules need to be torn down and re-added. This seems to be because of limitations in the tc framework. The same issue can bee seen it you have a bridge with no ports and then adds a tc rule, like so: tc qdisc add dev br0 clsact tc filter add dev br0 ingress pref 1 proto all matchall action drop And then adds a port to that bridge ip link set dev swp0 master br0 <---- flow_indr_dev_register() bc this I'm seeing the callback(TC_SETUP_BLOCK) from flow_indr_dev_register() but I'm not getting any callbacks that I've added via flow_block_cb_add() Do you maybe have some idea why I'm seeing this behavior? Am i doing something wrong or is it a known issue or something else? Best regards, Mattias Forsblad mattias.forsblad@gmail.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC net-next] net: tc: flow indirect framework issue 2022-04-13 5:52 [RFC net-next] net: tc: flow indirect framework issue Mattias Forsblad @ 2022-04-13 7:05 ` Baowen Zheng 2022-04-13 9:07 ` Vladimir Oltean 0 siblings, 1 reply; 8+ messages in thread From: Baowen Zheng @ 2022-04-13 7:05 UTC (permalink / raw) To: Mattias Forsblad, netdev Cc: roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso, Vladimir Oltean, Tobias Waldekranz On April 13, 2022 1:53 PM, Mattias wrote: >Hello all, > >I'm currently working to get offloading of tc rules (clsact/matchall/drop) on a >bridge offloaded to HW. The patch series is here: > >https://lore.kernel.org/netdev/20220411131619.43js6owwkalcdwwa@skbuf/ >T/#m07bff9e205e9ac03d15a4e758b4129235da88aba > >However I'm having some trouble with it. More specific in the limitations >section in the link above, quote: > >Limitations >If there is tc rules on a bridge and all the ports leave the bridge and then joins >the bridge again, the indirect framwork doesn't seem to reoffload them at >join. The tc rules need to be torn down and re-added. This seems to be >because of limitations in the tc framework. > >The same issue can bee seen it you have a bridge with no ports and then adds >a tc rule, like so: > >tc qdisc add dev br0 clsact >tc filter add dev br0 ingress pref 1 proto all matchall action drop > >And then adds a port to that bridge >ip link set dev swp0 master br0 <---- flow_indr_dev_register() bc this Hi Mattias, in my understand, your problem may because you register your callback here. I am not sure why you choose to register your hook here(after the interface is added to the bridge to just trigger the callback in necessary case.) Then your function is called and add your cb to the tcf_block. But since the matchall filter has been created so you can not get your callback triggered. Maybe you can try to regist your callback in your module load stage I think your callback will be triggered, or change the command order as: tc qdisc add dev br0 clsact ip link set dev swp0 master br0 tc filter add dev br0 ingress pref 1 proto all matchall action drop I am not sure whether it will take effect. > >I'm seeing the callback(TC_SETUP_BLOCK) from flow_indr_dev_register() but >I'm not getting any callbacks that I've added via flow_block_cb_add() > >Do you maybe have some idea why I'm seeing this behavior? >Am i doing something wrong or is it a known issue or something else? > >Best regards, > >Mattias Forsblad >mattias.forsblad@gmail.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next] net: tc: flow indirect framework issue 2022-04-13 7:05 ` Baowen Zheng @ 2022-04-13 9:07 ` Vladimir Oltean 2022-04-13 12:24 ` Mattias Forsblad 0 siblings, 1 reply; 8+ messages in thread From: Vladimir Oltean @ 2022-04-13 9:07 UTC (permalink / raw) To: Baowen Zheng Cc: Mattias Forsblad, netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso, Tobias Waldekranz Hi Baowen, On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote: > Hi Mattias, in my understand, your problem may because you register > your callback here. I am not sure why you choose to register your hook > here(after the interface is added to the bridge to just trigger the > callback in necessary case.) > Then your function is called and add your cb to the tcf_block. But > since the matchall filter has been created so you can not get your > callback triggered. The bridge device has a certain lifetime, and the physical port device has a different and completely independent lifetime. So the port device may join the bridge when the bridge already has some filters on its ingress chain. Even with the indirect flow block callback registered at module load stage as you suggest, the port driver needs to have a reason to intercept a tc filter on a certain bridge. And when the port isn't a member (yet) of that bridge, it has no reason to. Of course, you could make the port driver speculatively monitor all tc filters installed on all bridges in the system (related or unrelated to ports belonging to this driver), just to not miss callbacks which may be needed later. But that is quite suboptimal. Mattias' question comes from the fact that there is already some logic in flow_indr_dev_register() to replay missed flow block binding events, added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect devices dependency on qdisc order creation"). That logic works, but it replays only the binding, not the actual filters, which again, would be necessary. > Maybe you can try to regist your callback in your module load stage I > think your callback will be triggered, or change the command order as: > tc qdisc add dev br0 clsact > ip link set dev swp0 master br0 > tc filter add dev br0 ingress pref 1 proto all matchall action drop > I am not sure whether it will take effect. I think the idea is to make the given command order work, not to change it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next] net: tc: flow indirect framework issue 2022-04-13 9:07 ` Vladimir Oltean @ 2022-04-13 12:24 ` Mattias Forsblad 2022-04-13 13:36 ` Pablo Neira Ayuso 0 siblings, 1 reply; 8+ messages in thread From: Mattias Forsblad @ 2022-04-13 12:24 UTC (permalink / raw) To: Vladimir Oltean, Baowen Zheng Cc: netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso, Tobias Waldekranz On 2022-04-13 11:07, Vladimir Oltean wrote: > Hi Baowen, > > On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote: >> Hi Mattias, in my understand, your problem may because you register >> your callback here. I am not sure why you choose to register your hook >> here(after the interface is added to the bridge to just trigger the >> callback in necessary case.) > >> Then your function is called and add your cb to the tcf_block. But >> since the matchall filter has been created so you can not get your >> callback triggered. > > The bridge device has a certain lifetime, and the physical port device > has a different and completely independent lifetime. So the port device > may join the bridge when the bridge already has some filters on its > ingress chain. > > Even with the indirect flow block callback registered at module load > stage as you suggest, the port driver needs to have a reason to > intercept a tc filter on a certain bridge. And when the port isn't a > member (yet) of that bridge, it has no reason to. > > Of course, you could make the port driver speculatively monitor all tc > filters installed on all bridges in the system (related or unrelated to > ports belonging to this driver), just to not miss callbacks which may be > needed later. But that is quite suboptimal. > > Mattias' question comes from the fact that there is already some logic > in flow_indr_dev_register() to replay missed flow block binding events, > added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect > devices dependency on qdisc order creation"). That logic works, but it > replays only the binding, not the actual filters, which again, would be > necessary. > >> Maybe you can try to regist your callback in your module load stage I >> think your callback will be triggered, or change the command order as: >> tc qdisc add dev br0 clsact >> ip link set dev swp0 master br0 >> tc filter add dev br0 ingress pref 1 proto all matchall action drop >> I am not sure whether it will take effect. > > I think the idea is to make the given command order work, not to change it. Re-ordering the tc commands doesn't solve the issue when all ports leave the bridge, which will lead to flow_indr_dev_unregister() and later re-joins the bridge (flow_indr_dev_register()). We'll need filter replay for this. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next] net: tc: flow indirect framework issue 2022-04-13 12:24 ` Mattias Forsblad @ 2022-04-13 13:36 ` Pablo Neira Ayuso 2022-04-13 14:15 ` Vladimir Oltean 2022-04-14 8:57 ` Jakub Kicinski 0 siblings, 2 replies; 8+ messages in thread From: Pablo Neira Ayuso @ 2022-04-13 13:36 UTC (permalink / raw) To: Mattias Forsblad Cc: Vladimir Oltean, Baowen Zheng, netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Tobias Waldekranz On Wed, Apr 13, 2022 at 02:24:38PM +0200, Mattias Forsblad wrote: > On 2022-04-13 11:07, Vladimir Oltean wrote: > > Hi Baowen, > > > > On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote: [...] > > Mattias' question comes from the fact that there is already some logic > > in flow_indr_dev_register() to replay missed flow block binding events, > > added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect > > devices dependency on qdisc order creation"). That logic works, but it > > replays only the binding, not the actual filters, which again, would be > > necessary. A bit of a long email... This commit 74fc4f828769 handles this scenario: 1) eth0 is gone (module removal) 2) vxlan0 device is still in place, tc ingress also contains rules for vxlan0. 3) eth0 is reloaded. A bit of background: tc ingress removes rules for eth0 if eth0 is gone (I am refering to software rules, in general). In this model, the tc ingress rules are attached to the device, and if the device eth0 is gone, those rules are also gone and, then, once this device eth0 comes back, the user has to the tc ingress rules software for eth0 again. There is no replay mechanism for tc ingress rules in this case. IIRC, Eli's patch re-adds the flow block for vxlan0 because he got a bug report that says that after reloading the driver module and eth0 comes back, rules for tc vxlan0 were not hardware offloaded. The indirect flow block infrastructure is tracking devices such as vxlan0 that the given driver *might* be able to hardware offload. But from the control plane (user) perspective, this detail is hidden. To me, the problem is that there is no way from the control plane to relate vxlan0 with the real device that performs the hardware offload. There is also no flag for the user to request "please hardware offload vxlan0 tc ingress rules". Instead, the flow indirect block infrastructure performs the hardware offload "transparently" to the user. I think some people believe doing things fully transparent is good, at the cost of adding more kernel complexity and hiding details that are relevant to the user (such as if hardware offload is enabled for vxlan0 and what is the real device that is actually being used for the vxlan0 to be offloaded). So, there are no flags when setting up the vxlan0 device for the user to say: "I would like to hardware offload vxlan0", and going slightly further there is not "please attach this vxlan0 device to eth0 for hardware offload". Any real device could be potentially used to offload vxlan0, the user does not know which one is actually used. Exposing this information is a bit more work on top of the user, but: 1) it will be transparent: the control plane shows that the vxlan0 is hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be removed too, because it depends on eth0. 2) The control plane validates if hardware offload for vxlan0. If this is not possible, display an error to the user: "sorry, I cannot offload vxlan0 on eth0 for reason X". Since this is not exposed to the control plane, the existing infrastructure follows a snooping scheme, but tracking devices that might be able to hardware offload. There is no obvious way to relate vxlan0 with the real device (eth0) that is actually performing the hardware offloading. Does replay make sense for vxlan0 when the user has to manually reload rules for eth0? So why vxlan0 rules need to be transparently replayed but eth0 rules need to be manually reloaded in tc ingress? > >> Maybe you can try to regist your callback in your module load stage I > >> think your callback will be triggered, or change the command order as: > >> tc qdisc add dev br0 clsact > >> ip link set dev swp0 master br0 > >> tc filter add dev br0 ingress pref 1 proto all matchall action drop > >> I am not sure whether it will take effect. > > > > I think the idea is to make the given command order work, not to change it. > > Re-ordering the tc commands doesn't solve the issue when all ports leave the > bridge, which will lead to flow_indr_dev_unregister() and later re-joins > the bridge (flow_indr_dev_register()). We'll need filter replay for this. Existing drivers call flow_indr_dev_register() from module init, so they start tracking any device that might be offloaded since the beginning, see below. Mattias Forsblad said: >tc qdisc add dev br0 clsact >tc filter add dev br0 ingress pref 1 proto all matchall action drop > >And then adds a port to that bridge >ip link set dev swp0 master br0 <---- flow_indr_dev_register() bc this Regarding your issue: Why does it call flow_indr_dev_register() here? Most drivers call flow_indr_dev_register() much earlier, when swp0 becomes available. Then, tc qdisc add dev br0 clsact will trigger the indirect flow block path to reach your driver. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next] net: tc: flow indirect framework issue 2022-04-13 13:36 ` Pablo Neira Ayuso @ 2022-04-13 14:15 ` Vladimir Oltean 2022-04-14 8:57 ` Jakub Kicinski 1 sibling, 0 replies; 8+ messages in thread From: Vladimir Oltean @ 2022-04-13 14:15 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Mattias Forsblad, Baowen Zheng, netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Tobias Waldekranz Hi Pablo, On Wed, Apr 13, 2022 at 03:36:32PM +0200, Pablo Neira Ayuso wrote: > On Wed, Apr 13, 2022 at 02:24:38PM +0200, Mattias Forsblad wrote: > > On 2022-04-13 11:07, Vladimir Oltean wrote: > > > Hi Baowen, > > > > > > On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote: > [...] > > > Mattias' question comes from the fact that there is already some logic > > > in flow_indr_dev_register() to replay missed flow block binding events, > > > added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect > > > devices dependency on qdisc order creation"). That logic works, but it > > > replays only the binding, not the actual filters, which again, would be > > > necessary. > > A bit of a long email... > > This commit 74fc4f828769 handles this scenario: > > 1) eth0 is gone (module removal) > 2) vxlan0 device is still in place, tc ingress also contains rules for > vxlan0. > 3) eth0 is reloaded. > > A bit of background: tc ingress removes rules for eth0 if eth0 is > gone (I am refering to software rules, in general). In this model, the > tc ingress rules are attached to the device, and if the device eth0 is > gone, those rules are also gone and, then, once this device eth0 comes > back, the user has to the tc ingress rules software for eth0 again. > There is no replay mechanism for tc ingress rules in this case. I don't understand the mechanics of this side note, sorry (maybe also because I don't know the mechanics of VXLAN / VXLAN offload in mlxsw). Presumably, tc filters on the ingress of vxlan0 would not just vanish if some random other net device in the system unregistered. Those rules would have to be in some way related to eth0, like "mirred egress redirect eth0" or something like that. In that case, yes I agree it is fully to be expected for this rule to disappear when eth0 disappears. But that makes the example also not applicable to what Mattias is trying to achieve, except maybe to illustrate that the indirect flow block framework isn't adequate for it. > IIRC, Eli's patch re-adds the flow block for vxlan0 because he got a > bug report that says that after reloading the driver module and eth0 > comes back, rules for tc vxlan0 were not hardware offloaded. > > The indirect flow block infrastructure is tracking devices such as > vxlan0 that the given driver *might* be able to hardware offload. > But from the control plane (user) perspective, this detail is hidden. > To me, the problem is that there is no way from the control plane to > relate vxlan0 with the real device that performs the hardware offload. > There is also no flag for the user to request "please hardware offload > vxlan0 tc ingress rules". Instead, the flow indirect block > infrastructure performs the hardware offload "transparently" to the user. It would be difficult to draw a full relationship between your VXLAN example and what Mattias is trying to do ("matchall action drop" on bridge device). Since the bridge is an upper device with multiple lowers, some lowers may monitor this tc rule and do something about it, while other lowers may not. What do you report to user space in that case, "partially offloaded"? Do you report the full list of offloaders? > I think some people believe doing things fully transparent is good, at > the cost of adding more kernel complexity and hiding details that are > relevant to the user (such as if hardware offload is enabled for > vxlan0 and what is the real device that is actually being used for the > vxlan0 to be offloaded). Assuming that this is what you're hinting at: when a DSA port leaves a bridge, the "matchall action drop" rule that was added by the user on the bridge should automagically disappear, because its offloaders disappeared? But maybe you liked that rule anyway... maybe the bridge had 4 DSA ports, and 50 non-DSA ports, and that rule applies with or without DSA. > So, there are no flags when setting up the vxlan0 device for the user > to say: "I would like to hardware offload vxlan0", and going slightly > further there is not "please attach this vxlan0 device to eth0 for > hardware offload". Any real device could be potentially used to > offload vxlan0, the user does not know which one is actually used. > > Exposing this information is a bit more work on top of the user, but: > > 1) it will be transparent: the control plane shows that the vxlan0 is > hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be > removed too, because it depends on eth0. > > 2) The control plane validates if hardware offload for vxlan0. If this > is not possible, display an error to the user: "sorry, I cannot > offload vxlan0 on eth0 for reason X". > > Since this is not exposed to the control plane, the existing > infrastructure follows a snooping scheme, but tracking devices that > might be able to hardware offload. > > There is no obvious way to relate vxlan0 with the real device > (eth0) that is actually performing the hardware offloading. > > Does replay make sense for vxlan0 when the user has to manually > reload rules for eth0? So why vxlan0 rules need to be transparently > replayed but eth0 rules need to be manually reloaded in tc ingress? Yes, filter replay for vxlan0 makes perfect sense to the extent that the list of filters to reoffload would be empty in the particular case you give. So nothing changes. But the filters that you have to react on do not always involve you (snooper) as a net device. > > >> Maybe you can try to regist your callback in your module load stage I > > >> think your callback will be triggered, or change the command order as: > > >> tc qdisc add dev br0 clsact > > >> ip link set dev swp0 master br0 > > >> tc filter add dev br0 ingress pref 1 proto all matchall action drop > > >> I am not sure whether it will take effect. > > > > > > I think the idea is to make the given command order work, not to change it. > > > > Re-ordering the tc commands doesn't solve the issue when all ports leave the > > bridge, which will lead to flow_indr_dev_unregister() and later re-joins > > the bridge (flow_indr_dev_register()). We'll need filter replay for this. > > Existing drivers call flow_indr_dev_register() from module init, so > they start tracking any device that might be offloaded since the > beginning, see below. There's a certain level of absurdity to the idea that, on a kernel compiled with CONFIG_NET_DSA=y, the DSA module would track and save state for every tc filter added on the ingress of any bridge in the system, *regardless* of whether any DSA switch has probed, or will even probe in that system. Distribution kernels come with CONFIG_NET_DSA enabled by default. > Mattias Forsblad said: > >tc qdisc add dev br0 clsact > >tc filter add dev br0 ingress pref 1 proto all matchall action drop > > > >And then adds a port to that bridge > >ip link set dev swp0 master br0 <---- flow_indr_dev_register() bc this > > Regarding your issue: Why does it call flow_indr_dev_register() here? > Most drivers call flow_indr_dev_register() much earlier, when swp0 > becomes available. Then, tc qdisc add dev br0 clsact will trigger the > indirect flow block path to reach your driver. So my main takeaway is that what Mattias is trying to do is fundamentally inadequate to the very limited use case that flow_indr_dev_register() was intended for. The obvious alternative is for the bridge driver itself to bind to its own flow block, watch for the same tc filters, and re-notify them via other mechanisms to drivers which are interested. The advantage would be that the filters have the same lifetime as the bridge device itself, and the bridge device is notified of port joins/leaves, so Mattias could avoid missing filters very cleanly. That would create some bloat of its own, namely in the bridge driver this time, but in the longer term, I think the tc filters on the bridge device itself may be interesting to more than one single offloading driver, so the cost of this bloat may amortize. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next] net: tc: flow indirect framework issue 2022-04-13 13:36 ` Pablo Neira Ayuso 2022-04-13 14:15 ` Vladimir Oltean @ 2022-04-14 8:57 ` Jakub Kicinski 2022-04-25 7:52 ` Mattias Forsblad 1 sibling, 1 reply; 8+ messages in thread From: Jakub Kicinski @ 2022-04-14 8:57 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Mattias Forsblad, Vladimir Oltean, Baowen Zheng, netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Tobias Waldekranz On Wed, 13 Apr 2022 15:36:32 +0200 Pablo Neira Ayuso wrote: > A bit of a long email... > > This commit 74fc4f828769 handles this scenario: > > 1) eth0 is gone (module removal) > 2) vxlan0 device is still in place, tc ingress also contains rules for > vxlan0. > 3) eth0 is reloaded. > > A bit of background: tc ingress removes rules for eth0 if eth0 is > gone (I am refering to software rules, in general). In this model, the > tc ingress rules are attached to the device, and if the device eth0 is > gone, those rules are also gone and, then, once this device eth0 comes > back, the user has to the tc ingress rules software for eth0 again. > There is no replay mechanism for tc ingress rules in this case. > > IIRC, Eli's patch re-adds the flow block for vxlan0 because he got a > bug report that says that after reloading the driver module and eth0 > comes back, rules for tc vxlan0 were not hardware offloaded. > > The indirect flow block infrastructure is tracking devices such as > vxlan0 that the given driver *might* be able to hardware offload. > But from the control plane (user) perspective, this detail is hidden. > To me, the problem is that there is no way from the control plane to > relate vxlan0 with the real device that performs the hardware offload. > There is also no flag for the user to request "please hardware offload > vxlan0 tc ingress rules". Instead, the flow indirect block > infrastructure performs the hardware offload "transparently" to the user. TBH I don't understand why indirect infra is important. Mattias said he gets a replay of the block bind. So it's the replay of rules that's broken. Whether the block bind came from indir infra or the block is shared and got bound to a new dev is not important. > I think some people believe doing things fully transparent is good, at > the cost of adding more kernel complexity and hiding details that are > relevant to the user (such as if hardware offload is enabled for > vxlan0 and what is the real device that is actually being used for the > vxlan0 to be offloaded). > > So, there are no flags when setting up the vxlan0 device for the user > to say: "I would like to hardware offload vxlan0", and going slightly > further there is not "please attach this vxlan0 device to eth0 for > hardware offload". Any real device could be potentially used to > offload vxlan0, the user does not know which one is actually used. > > Exposing this information is a bit more work on top of the user, but: > > 1) it will be transparent: the control plane shows that the vxlan0 is > hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be > removed too, because it depends on eth0. > > 2) The control plane validates if hardware offload for vxlan0. If this > is not possible, display an error to the user: "sorry, I cannot > offload vxlan0 on eth0 for reason X". > > Since this is not exposed to the control plane, the existing > infrastructure follows a snooping scheme, but tracking devices that > might be able to hardware offload. > > There is no obvious way to relate vxlan0 with the real device > (eth0) that is actually performing the hardware offloading. Let's not over-complicate things, Mattias just needs replay to work. 90% sure it worked when we did the work back in the day with John H, before the nft rewrite etc. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC net-next] net: tc: flow indirect framework issue 2022-04-14 8:57 ` Jakub Kicinski @ 2022-04-25 7:52 ` Mattias Forsblad 0 siblings, 0 replies; 8+ messages in thread From: Mattias Forsblad @ 2022-04-25 7:52 UTC (permalink / raw) To: Jakub Kicinski, Pablo Neira Ayuso Cc: Vladimir Oltean, Baowen Zheng, netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Tobias Waldekranz On 2022-04-14 10:57, Jakub Kicinski wrote: >> I think some people believe doing things fully transparent is good, at >> the cost of adding more kernel complexity and hiding details that are >> relevant to the user (such as if hardware offload is enabled for >> vxlan0 and what is the real device that is actually being used for the >> vxlan0 to be offloaded). >> >> So, there are no flags when setting up the vxlan0 device for the user >> to say: "I would like to hardware offload vxlan0", and going slightly >> further there is not "please attach this vxlan0 device to eth0 for >> hardware offload". Any real device could be potentially used to >> offload vxlan0, the user does not know which one is actually used. >> >> Exposing this information is a bit more work on top of the user, but: >> >> 1) it will be transparent: the control plane shows that the vxlan0 is >> hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be >> removed too, because it depends on eth0. >> >> 2) The control plane validates if hardware offload for vxlan0. If this >> is not possible, display an error to the user: "sorry, I cannot >> offload vxlan0 on eth0 for reason X". >> >> Since this is not exposed to the control plane, the existing >> infrastructure follows a snooping scheme, but tracking devices that >> might be able to hardware offload. >> >> There is no obvious way to relate vxlan0 with the real device >> (eth0) that is actually performing the hardware offloading. > > Let's not over-complicate things, Mattias just needs replay to work. > 90% sure it worked when we did the work back in the day with John H, > before the nft rewrite etc. To me the first thing to determine is how flow_indr_dev_register should work? With only a superficial knowledge of tc I'd seem to me that if we have a function called tcf_action_reoffload_cb and tc has all the information about current blocks/filters/rules it should really reoffload those. The other way would mean bookkeeping the same information at multiple places. It also means restrictions on which sequence one should setup a network topology. Would we like it that way? ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-04-25 7:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-13 5:52 [RFC net-next] net: tc: flow indirect framework issue Mattias Forsblad 2022-04-13 7:05 ` Baowen Zheng 2022-04-13 9:07 ` Vladimir Oltean 2022-04-13 12:24 ` Mattias Forsblad 2022-04-13 13:36 ` Pablo Neira Ayuso 2022-04-13 14:15 ` Vladimir Oltean 2022-04-14 8:57 ` Jakub Kicinski 2022-04-25 7:52 ` Mattias Forsblad
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.