All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next] net: tc: flow indirect framework issue
@ 2022-04-13  5:52 Mattias Forsblad
  2022-04-13  7:05 ` Baowen Zheng
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Forsblad @ 2022-04-13  5:52 UTC (permalink / raw)
  To: netdev
  Cc: roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso,
	Baowen Zheng, Vladimir Oltean, Tobias Waldekranz,
	Mattias Forsblad

Hello all,

I'm currently working to get offloading of tc rules (clsact/matchall/drop) 
on a bridge offloaded to HW. The patch series is here:

https://lore.kernel.org/netdev/20220411131619.43js6owwkalcdwwa@skbuf/T/#m07bff9e205e9ac03d15a4e758b4129235da88aba

However I'm having some trouble with it. More specific in the limitations section
in the link above, quote:

Limitations
If there is tc rules on a bridge and all the ports leave the bridge
and then joins the bridge again, the indirect framwork doesn't seem
to reoffload them at join. The tc rules need to be torn down and
re-added. This seems to be because of limitations in the tc
framework.

The same issue can bee seen it you have a bridge with no ports
and then adds a tc rule, like so:

tc qdisc add dev br0 clsact
tc filter add dev br0 ingress pref 1 proto all matchall action drop

And then adds a port to that bridge
ip link set dev swp0 master br0   <---- flow_indr_dev_register() bc this

I'm seeing the callback(TC_SETUP_BLOCK) from flow_indr_dev_register()
but I'm not getting any callbacks that I've added via flow_block_cb_add()

Do you maybe have some idea why I'm seeing this behavior?
Am i doing something wrong or is it a known issue or something else?

Best regards,

Mattias Forsblad
mattias.forsblad@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-13  5:52 [RFC net-next] net: tc: flow indirect framework issue Mattias Forsblad
@ 2022-04-13  7:05 ` Baowen Zheng
  2022-04-13  9:07   ` Vladimir Oltean
  0 siblings, 1 reply; 8+ messages in thread
From: Baowen Zheng @ 2022-04-13  7:05 UTC (permalink / raw)
  To: Mattias Forsblad, netdev
  Cc: roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso,
	Vladimir Oltean, Tobias Waldekranz

On April 13, 2022 1:53 PM, Mattias wrote:
>Hello all,
>
>I'm currently working to get offloading of tc rules (clsact/matchall/drop) on a
>bridge offloaded to HW. The patch series is here:
>
>https://lore.kernel.org/netdev/20220411131619.43js6owwkalcdwwa@skbuf/
>T/#m07bff9e205e9ac03d15a4e758b4129235da88aba
>
>However I'm having some trouble with it. More specific in the limitations
>section in the link above, quote:
>
>Limitations
>If there is tc rules on a bridge and all the ports leave the bridge and then joins
>the bridge again, the indirect framwork doesn't seem to reoffload them at
>join. The tc rules need to be torn down and re-added. This seems to be
>because of limitations in the tc framework.
>
>The same issue can bee seen it you have a bridge with no ports and then adds
>a tc rule, like so:
>
>tc qdisc add dev br0 clsact
>tc filter add dev br0 ingress pref 1 proto all matchall action drop
>
>And then adds a port to that bridge
>ip link set dev swp0 master br0   <---- flow_indr_dev_register() bc this
Hi Mattias, in my understand, your problem may because you register your callback here. I am not sure why you choose to register your hook here(after the interface is added to the bridge to just trigger the callback in necessary case.)
Then your function is called and add your cb to the tcf_block. But since the matchall filter has been created so you can not get your callback triggered. 

Maybe you can try to regist your callback in your module load stage I think your callback will be triggered, or change the command order as: 
tc qdisc add dev br0 clsact
ip link set dev swp0 master br0
tc filter add dev br0 ingress pref 1 proto all matchall action drop
I am not sure whether it will take effect.
>
>I'm seeing the callback(TC_SETUP_BLOCK) from flow_indr_dev_register() but
>I'm not getting any callbacks that I've added via flow_block_cb_add()
>
>Do you maybe have some idea why I'm seeing this behavior?
>Am i doing something wrong or is it a known issue or something else?
>
>Best regards,
>
>Mattias Forsblad
>mattias.forsblad@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-13  7:05 ` Baowen Zheng
@ 2022-04-13  9:07   ` Vladimir Oltean
  2022-04-13 12:24     ` Mattias Forsblad
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2022-04-13  9:07 UTC (permalink / raw)
  To: Baowen Zheng
  Cc: Mattias Forsblad, netdev, roid, vladbu, Eli Cohen, Jiri Pirko,
	Pablo Neira Ayuso, Tobias Waldekranz

Hi Baowen,

On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote:
> Hi Mattias, in my understand, your problem may because you register
> your callback here. I am not sure why you choose to register your hook
> here(after the interface is added to the bridge to just trigger the
> callback in necessary case.)

> Then your function is called and add your cb to the tcf_block. But
> since the matchall filter has been created so you can not get your
> callback triggered. 

The bridge device has a certain lifetime, and the physical port device
has a different and completely independent lifetime. So the port device
may join the bridge when the bridge already has some filters on its
ingress chain.

Even with the indirect flow block callback registered at module load
stage as you suggest, the port driver needs to have a reason to
intercept a tc filter on a certain bridge. And when the port isn't a
member (yet) of that bridge, it has no reason to.

Of course, you could make the port driver speculatively monitor all tc
filters installed on all bridges in the system (related or unrelated to
ports belonging to this driver), just to not miss callbacks which may be
needed later. But that is quite suboptimal.

Mattias' question comes from the fact that there is already some logic
in flow_indr_dev_register() to replay missed flow block binding events,
added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect
devices dependency on qdisc order creation"). That logic works, but it
replays only the binding, not the actual filters, which again, would be
necessary.

> Maybe you can try to regist your callback in your module load stage I
> think your callback will be triggered, or change the command order as: 
> tc qdisc add dev br0 clsact
> ip link set dev swp0 master br0
> tc filter add dev br0 ingress pref 1 proto all matchall action drop
> I am not sure whether it will take effect.

I think the idea is to make the given command order work, not to change it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-13  9:07   ` Vladimir Oltean
@ 2022-04-13 12:24     ` Mattias Forsblad
  2022-04-13 13:36       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Forsblad @ 2022-04-13 12:24 UTC (permalink / raw)
  To: Vladimir Oltean, Baowen Zheng
  Cc: netdev, roid, vladbu, Eli Cohen, Jiri Pirko, Pablo Neira Ayuso,
	Tobias Waldekranz

On 2022-04-13 11:07, Vladimir Oltean wrote:
> Hi Baowen,
> 
> On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote:
>> Hi Mattias, in my understand, your problem may because you register
>> your callback here. I am not sure why you choose to register your hook
>> here(after the interface is added to the bridge to just trigger the
>> callback in necessary case.)
> 
>> Then your function is called and add your cb to the tcf_block. But
>> since the matchall filter has been created so you can not get your
>> callback triggered. 
> 
> The bridge device has a certain lifetime, and the physical port device
> has a different and completely independent lifetime. So the port device
> may join the bridge when the bridge already has some filters on its
> ingress chain.
> 
> Even with the indirect flow block callback registered at module load
> stage as you suggest, the port driver needs to have a reason to
> intercept a tc filter on a certain bridge. And when the port isn't a
> member (yet) of that bridge, it has no reason to.
> 
> Of course, you could make the port driver speculatively monitor all tc
> filters installed on all bridges in the system (related or unrelated to
> ports belonging to this driver), just to not miss callbacks which may be
> needed later. But that is quite suboptimal.
> 
> Mattias' question comes from the fact that there is already some logic
> in flow_indr_dev_register() to replay missed flow block binding events,
> added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect
> devices dependency on qdisc order creation"). That logic works, but it
> replays only the binding, not the actual filters, which again, would be
> necessary.
> 
>> Maybe you can try to regist your callback in your module load stage I
>> think your callback will be triggered, or change the command order as: 
>> tc qdisc add dev br0 clsact
>> ip link set dev swp0 master br0
>> tc filter add dev br0 ingress pref 1 proto all matchall action drop
>> I am not sure whether it will take effect.
> 
> I think the idea is to make the given command order work, not to change it.

Re-ordering the tc commands doesn't solve the issue when all ports leave the
bridge, which will lead to flow_indr_dev_unregister() and later re-joins
the bridge (flow_indr_dev_register()). We'll need filter replay for this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-13 12:24     ` Mattias Forsblad
@ 2022-04-13 13:36       ` Pablo Neira Ayuso
  2022-04-13 14:15         ` Vladimir Oltean
  2022-04-14  8:57         ` Jakub Kicinski
  0 siblings, 2 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2022-04-13 13:36 UTC (permalink / raw)
  To: Mattias Forsblad
  Cc: Vladimir Oltean, Baowen Zheng, netdev, roid, vladbu, Eli Cohen,
	Jiri Pirko, Tobias Waldekranz

On Wed, Apr 13, 2022 at 02:24:38PM +0200, Mattias Forsblad wrote:
> On 2022-04-13 11:07, Vladimir Oltean wrote:
> > Hi Baowen,
> > 
> > On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote:
[...]
> > Mattias' question comes from the fact that there is already some logic
> > in flow_indr_dev_register() to replay missed flow block binding events,
> > added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect
> > devices dependency on qdisc order creation"). That logic works, but it
> > replays only the binding, not the actual filters, which again, would be
> > necessary.

A bit of a long email...

This commit 74fc4f828769 handles this scenario:

1) eth0 is gone (module removal)
2) vxlan0 device is still in place, tc ingress also contains rules for
   vxlan0.
3) eth0 is reloaded.

A bit of background: tc ingress removes rules for eth0 if eth0 is
gone (I am refering to software rules, in general). In this model, the
tc ingress rules are attached to the device, and if the device eth0 is
gone, those rules are also gone and, then, once this device eth0 comes
back, the user has to the tc ingress rules software for eth0 again.
There is no replay mechanism for tc ingress rules in this case.

IIRC, Eli's patch re-adds the flow block for vxlan0 because he got a
bug report that says that after reloading the driver module and eth0
comes back, rules for tc vxlan0 were not hardware offloaded.

The indirect flow block infrastructure is tracking devices such as
vxlan0 that the given driver *might* be able to hardware offload.
But from the control plane (user) perspective, this detail is hidden.
To me, the problem is that there is no way from the control plane to
relate vxlan0 with the real device that performs the hardware offload.
There is also no flag for the user to request "please hardware offload
vxlan0 tc ingress rules". Instead, the flow indirect block
infrastructure performs the hardware offload "transparently" to the user.

I think some people believe doing things fully transparent is good, at
the cost of adding more kernel complexity and hiding details that are
relevant to the user (such as if hardware offload is enabled for
vxlan0 and what is the real device that is actually being used for the
vxlan0 to be offloaded).

So, there are no flags when setting up the vxlan0 device for the user
to say: "I would like to hardware offload vxlan0", and going slightly
further there is not "please attach this vxlan0 device to eth0 for
hardware offload". Any real device could be potentially used to
offload vxlan0, the user does not know which one is actually used.

Exposing this information is a bit more work on top of the user, but:

1) it will be transparent: the control plane shows that the vxlan0 is
   hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be
   removed too, because it depends on eth0.

2) The control plane validates if hardware offload for vxlan0. If this
   is not possible, display an error to the user: "sorry, I cannot
   offload vxlan0 on eth0 for reason X".

Since this is not exposed to the control plane, the existing
infrastructure follows a snooping scheme, but tracking devices that
might be able to hardware offload.

There is no obvious way to relate vxlan0 with the real device
(eth0) that is actually performing the hardware offloading.

Does replay make sense for vxlan0 when the user has to manually
reload rules for eth0? So why vxlan0 rules need to be transparently
replayed but eth0 rules need to be manually reloaded in tc ingress?

> >> Maybe you can try to regist your callback in your module load stage I
> >> think your callback will be triggered, or change the command order as: 
> >> tc qdisc add dev br0 clsact
> >> ip link set dev swp0 master br0
> >> tc filter add dev br0 ingress pref 1 proto all matchall action drop
> >> I am not sure whether it will take effect.
> > 
> > I think the idea is to make the given command order work, not to change it.
> 
> Re-ordering the tc commands doesn't solve the issue when all ports leave the
> bridge, which will lead to flow_indr_dev_unregister() and later re-joins
> the bridge (flow_indr_dev_register()). We'll need filter replay for this.

Existing drivers call flow_indr_dev_register() from module init, so
they start tracking any device that might be offloaded since the
beginning, see below.

Mattias Forsblad said:
>tc qdisc add dev br0 clsact
>tc filter add dev br0 ingress pref 1 proto all matchall action drop
>
>And then adds a port to that bridge
>ip link set dev swp0 master br0   <---- flow_indr_dev_register() bc this

Regarding your issue: Why does it call flow_indr_dev_register() here?
Most drivers call flow_indr_dev_register() much earlier, when swp0
becomes available.  Then, tc qdisc add dev br0 clsact will trigger the
indirect flow block path to reach your driver.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-13 13:36       ` Pablo Neira Ayuso
@ 2022-04-13 14:15         ` Vladimir Oltean
  2022-04-14  8:57         ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Vladimir Oltean @ 2022-04-13 14:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Mattias Forsblad, Baowen Zheng, netdev, roid, vladbu, Eli Cohen,
	Jiri Pirko, Tobias Waldekranz

Hi Pablo,

On Wed, Apr 13, 2022 at 03:36:32PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Apr 13, 2022 at 02:24:38PM +0200, Mattias Forsblad wrote:
> > On 2022-04-13 11:07, Vladimir Oltean wrote:
> > > Hi Baowen,
> > > 
> > > On Wed, Apr 13, 2022 at 07:05:39AM +0000, Baowen Zheng wrote:
> [...]
> > > Mattias' question comes from the fact that there is already some logic
> > > in flow_indr_dev_register() to replay missed flow block binding events,
> > > added by Eli Cohen in commit 74fc4f828769 ("net: Fix offloading indirect
> > > devices dependency on qdisc order creation"). That logic works, but it
> > > replays only the binding, not the actual filters, which again, would be
> > > necessary.
> 
> A bit of a long email...
> 
> This commit 74fc4f828769 handles this scenario:
> 
> 1) eth0 is gone (module removal)
> 2) vxlan0 device is still in place, tc ingress also contains rules for
>    vxlan0.
> 3) eth0 is reloaded.
> 
> A bit of background: tc ingress removes rules for eth0 if eth0 is
> gone (I am refering to software rules, in general). In this model, the
> tc ingress rules are attached to the device, and if the device eth0 is
> gone, those rules are also gone and, then, once this device eth0 comes
> back, the user has to the tc ingress rules software for eth0 again.
> There is no replay mechanism for tc ingress rules in this case.

I don't understand the mechanics of this side note, sorry (maybe also
because I don't know the mechanics of VXLAN / VXLAN offload in mlxsw).

Presumably, tc filters on the ingress of vxlan0 would not just vanish if
some random other net device in the system unregistered. Those rules
would have to be in some way related to eth0, like "mirred egress
redirect eth0" or something like that. In that case, yes I agree it is
fully to be expected for this rule to disappear when eth0 disappears.
But that makes the example also not applicable to what Mattias is trying
to achieve, except maybe to illustrate that the indirect flow block
framework isn't adequate for it.

> IIRC, Eli's patch re-adds the flow block for vxlan0 because he got a
> bug report that says that after reloading the driver module and eth0
> comes back, rules for tc vxlan0 were not hardware offloaded.
> 
> The indirect flow block infrastructure is tracking devices such as
> vxlan0 that the given driver *might* be able to hardware offload.
> But from the control plane (user) perspective, this detail is hidden.
> To me, the problem is that there is no way from the control plane to
> relate vxlan0 with the real device that performs the hardware offload.
> There is also no flag for the user to request "please hardware offload
> vxlan0 tc ingress rules". Instead, the flow indirect block
> infrastructure performs the hardware offload "transparently" to the user.

It would be difficult to draw a full relationship between your VXLAN
example and what Mattias is trying to do ("matchall action drop" on
bridge device). Since the bridge is an upper device with multiple
lowers, some lowers may monitor this tc rule and do something about it,
while other lowers may not. What do you report to user space in that
case, "partially offloaded"? Do you report the full list of offloaders?

> I think some people believe doing things fully transparent is good, at
> the cost of adding more kernel complexity and hiding details that are
> relevant to the user (such as if hardware offload is enabled for
> vxlan0 and what is the real device that is actually being used for the
> vxlan0 to be offloaded).

Assuming that this is what you're hinting at: when a DSA port leaves a
bridge, the "matchall action drop" rule that was added by the user on
the bridge should automagically disappear, because its offloaders
disappeared?

But maybe you liked that rule anyway... maybe the bridge had 4 DSA
ports, and 50 non-DSA ports, and that rule applies with or without DSA.

> So, there are no flags when setting up the vxlan0 device for the user
> to say: "I would like to hardware offload vxlan0", and going slightly
> further there is not "please attach this vxlan0 device to eth0 for
> hardware offload". Any real device could be potentially used to
> offload vxlan0, the user does not know which one is actually used.
> 
> Exposing this information is a bit more work on top of the user, but:
> 
> 1) it will be transparent: the control plane shows that the vxlan0 is
>    hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be
>    removed too, because it depends on eth0.
> 
> 2) The control plane validates if hardware offload for vxlan0. If this
>    is not possible, display an error to the user: "sorry, I cannot
>    offload vxlan0 on eth0 for reason X".
> 
> Since this is not exposed to the control plane, the existing
> infrastructure follows a snooping scheme, but tracking devices that
> might be able to hardware offload.
> 
> There is no obvious way to relate vxlan0 with the real device
> (eth0) that is actually performing the hardware offloading.
> 
> Does replay make sense for vxlan0 when the user has to manually
> reload rules for eth0? So why vxlan0 rules need to be transparently
> replayed but eth0 rules need to be manually reloaded in tc ingress?

Yes, filter replay for vxlan0 makes perfect sense to the extent that the
list of filters to reoffload would be empty in the particular case you
give. So nothing changes. But the filters that you have to react on do
not always involve you (snooper) as a net device.

> > >> Maybe you can try to regist your callback in your module load stage I
> > >> think your callback will be triggered, or change the command order as: 
> > >> tc qdisc add dev br0 clsact
> > >> ip link set dev swp0 master br0
> > >> tc filter add dev br0 ingress pref 1 proto all matchall action drop
> > >> I am not sure whether it will take effect.
> > > 
> > > I think the idea is to make the given command order work, not to change it.
> > 
> > Re-ordering the tc commands doesn't solve the issue when all ports leave the
> > bridge, which will lead to flow_indr_dev_unregister() and later re-joins
> > the bridge (flow_indr_dev_register()). We'll need filter replay for this.
> 
> Existing drivers call flow_indr_dev_register() from module init, so
> they start tracking any device that might be offloaded since the
> beginning, see below.

There's a certain level of absurdity to the idea that, on a kernel
compiled with CONFIG_NET_DSA=y, the DSA module would track and save
state for every tc filter added on the ingress of any bridge in the
system, *regardless* of whether any DSA switch has probed, or will even
probe in that system. Distribution kernels come with CONFIG_NET_DSA
enabled by default.

> Mattias Forsblad said:
> >tc qdisc add dev br0 clsact
> >tc filter add dev br0 ingress pref 1 proto all matchall action drop
> >
> >And then adds a port to that bridge
> >ip link set dev swp0 master br0   <---- flow_indr_dev_register() bc this
> 
> Regarding your issue: Why does it call flow_indr_dev_register() here?
> Most drivers call flow_indr_dev_register() much earlier, when swp0
> becomes available.  Then, tc qdisc add dev br0 clsact will trigger the
> indirect flow block path to reach your driver.

So my main takeaway is that what Mattias is trying to do is
fundamentally inadequate to the very limited use case that
flow_indr_dev_register() was intended for.

The obvious alternative is for the bridge driver itself to bind to its
own flow block, watch for the same tc filters, and re-notify them via
other mechanisms to drivers which are interested. The advantage would be
that the filters have the same lifetime as the bridge device itself, and
the bridge device is notified of port joins/leaves, so Mattias could
avoid missing filters very cleanly.

That would create some bloat of its own, namely in the bridge driver
this time, but in the longer term, I think the tc filters on the bridge
device itself may be interesting to more than one single offloading
driver, so the cost of this bloat may amortize.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-13 13:36       ` Pablo Neira Ayuso
  2022-04-13 14:15         ` Vladimir Oltean
@ 2022-04-14  8:57         ` Jakub Kicinski
  2022-04-25  7:52           ` Mattias Forsblad
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2022-04-14  8:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Mattias Forsblad, Vladimir Oltean, Baowen Zheng, netdev, roid,
	vladbu, Eli Cohen, Jiri Pirko, Tobias Waldekranz

On Wed, 13 Apr 2022 15:36:32 +0200 Pablo Neira Ayuso wrote:
> A bit of a long email...
> 
> This commit 74fc4f828769 handles this scenario:
> 
> 1) eth0 is gone (module removal)
> 2) vxlan0 device is still in place, tc ingress also contains rules for
>    vxlan0.
> 3) eth0 is reloaded.
> 
> A bit of background: tc ingress removes rules for eth0 if eth0 is
> gone (I am refering to software rules, in general). In this model, the
> tc ingress rules are attached to the device, and if the device eth0 is
> gone, those rules are also gone and, then, once this device eth0 comes
> back, the user has to the tc ingress rules software for eth0 again.
> There is no replay mechanism for tc ingress rules in this case.
> 
> IIRC, Eli's patch re-adds the flow block for vxlan0 because he got a
> bug report that says that after reloading the driver module and eth0
> comes back, rules for tc vxlan0 were not hardware offloaded.
> 
> The indirect flow block infrastructure is tracking devices such as
> vxlan0 that the given driver *might* be able to hardware offload.
> But from the control plane (user) perspective, this detail is hidden.
> To me, the problem is that there is no way from the control plane to
> relate vxlan0 with the real device that performs the hardware offload.
> There is also no flag for the user to request "please hardware offload
> vxlan0 tc ingress rules". Instead, the flow indirect block
> infrastructure performs the hardware offload "transparently" to the user.

TBH I don't understand why indirect infra is important. Mattias said he
gets a replay of the block bind. So it's the replay of rules that's
broken. Whether the block bind came from indir infra or the block is
shared and got bound to a new dev is not important.

> I think some people believe doing things fully transparent is good, at
> the cost of adding more kernel complexity and hiding details that are
> relevant to the user (such as if hardware offload is enabled for
> vxlan0 and what is the real device that is actually being used for the
> vxlan0 to be offloaded).
> 
> So, there are no flags when setting up the vxlan0 device for the user
> to say: "I would like to hardware offload vxlan0", and going slightly
> further there is not "please attach this vxlan0 device to eth0 for
> hardware offload". Any real device could be potentially used to
> offload vxlan0, the user does not know which one is actually used.
> 
> Exposing this information is a bit more work on top of the user, but:
> 
> 1) it will be transparent: the control plane shows that the vxlan0 is
>    hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be
>    removed too, because it depends on eth0.
> 
> 2) The control plane validates if hardware offload for vxlan0. If this
>    is not possible, display an error to the user: "sorry, I cannot
>    offload vxlan0 on eth0 for reason X".
> 
> Since this is not exposed to the control plane, the existing
> infrastructure follows a snooping scheme, but tracking devices that
> might be able to hardware offload.
> 
> There is no obvious way to relate vxlan0 with the real device
> (eth0) that is actually performing the hardware offloading.

Let's not over-complicate things, Mattias just needs replay to work.
90% sure it worked when we did the work back in the day with John H,
before the nft rewrite etc.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net-next] net: tc: flow indirect framework issue
  2022-04-14  8:57         ` Jakub Kicinski
@ 2022-04-25  7:52           ` Mattias Forsblad
  0 siblings, 0 replies; 8+ messages in thread
From: Mattias Forsblad @ 2022-04-25  7:52 UTC (permalink / raw)
  To: Jakub Kicinski, Pablo Neira Ayuso
  Cc: Vladimir Oltean, Baowen Zheng, netdev, roid, vladbu, Eli Cohen,
	Jiri Pirko, Tobias Waldekranz

On 2022-04-14 10:57, Jakub Kicinski wrote:
>> I think some people believe doing things fully transparent is good, at
>> the cost of adding more kernel complexity and hiding details that are
>> relevant to the user (such as if hardware offload is enabled for
>> vxlan0 and what is the real device that is actually being used for the
>> vxlan0 to be offloaded).
>>
>> So, there are no flags when setting up the vxlan0 device for the user
>> to say: "I would like to hardware offload vxlan0", and going slightly
>> further there is not "please attach this vxlan0 device to eth0 for
>> hardware offload". Any real device could be potentially used to
>> offload vxlan0, the user does not know which one is actually used.
>>
>> Exposing this information is a bit more work on top of the user, but:
>>
>> 1) it will be transparent: the control plane shows that the vxlan0 is
>>    hardware offloaded. Then if eth0 is gone, vxlan0 tc ingress can be
>>    removed too, because it depends on eth0.
>>
>> 2) The control plane validates if hardware offload for vxlan0. If this
>>    is not possible, display an error to the user: "sorry, I cannot
>>    offload vxlan0 on eth0 for reason X".
>>
>> Since this is not exposed to the control plane, the existing
>> infrastructure follows a snooping scheme, but tracking devices that
>> might be able to hardware offload.
>>
>> There is no obvious way to relate vxlan0 with the real device
>> (eth0) that is actually performing the hardware offloading.
> 
> Let's not over-complicate things, Mattias just needs replay to work.
> 90% sure it worked when we did the work back in the day with John H,
> before the nft rewrite etc.

To me the first thing to determine is how flow_indr_dev_register should work?
With only a superficial knowledge of tc I'd seem to me that if we
have a function called tcf_action_reoffload_cb and tc has all the information
about current blocks/filters/rules it should really reoffload those. The other way
would mean bookkeeping the same information at multiple places. It also
means restrictions on which sequence one should setup a network topology.
Would we like it that way?
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-25  7:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13  5:52 [RFC net-next] net: tc: flow indirect framework issue Mattias Forsblad
2022-04-13  7:05 ` Baowen Zheng
2022-04-13  9:07   ` Vladimir Oltean
2022-04-13 12:24     ` Mattias Forsblad
2022-04-13 13:36       ` Pablo Neira Ayuso
2022-04-13 14:15         ` Vladimir Oltean
2022-04-14  8:57         ` Jakub Kicinski
2022-04-25  7:52           ` Mattias Forsblad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.