All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: dsa: fix flow dissection on Tx path
@ 2019-12-05 10:02 Alexander Lobakin
  2019-12-05 12:58 ` Andrew Lunn
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Alexander Lobakin @ 2019-12-05 10:02 UTC (permalink / raw)
  To: David S. Miller
  Cc: Muciri Gatimu, Shashidhar Lakkavalli, John Crispin, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Stanislav Fomichev,
	Daniel Borkmann, Song Liu, Alexei Starovoitov, Matteo Croce,
	Jakub Sitnicki, Eric Dumazet, Paul Blakey, Yoshiki Komachi,
	Alexander Lobakin, netdev, linux-kernel

Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
ability to override protocol and network offset during flow dissection
for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
in order to fix skb hashing for RPS on Rx path.

However, skb_hash() and added part of code can be invoked not only on
Rx, but also on Tx path if we have a multi-queued device and:
 - kernel is running on UP system or
 - XPS is not configured.

The call stack in this two cases will be like: dev_queue_xmit() ->
__dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
skb_tx_hash() -> skb_get_hash().

The problem is that skbs queued for Tx have both network offset and
correct protocol already set up even after inserting a CPU tag by DSA
tagger, so calling tag_ops->flow_dissect() on this path actually only
breaks flow dissection and hashing.

This can be observed by adding debug prints just before and right after
tag_ops->flow_dissect() call to the related block of code:

Before the patch:

Rx path (RPS):

[   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
[   19.244271] tag_ops->flow_dissect()
[   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */

[   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
[   19.219746] tag_ops->flow_dissect()
[   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */

[   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
[   18.658332] tag_ops->flow_dissect()
[   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */

Tx path (UP system):

[   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
[   18.763933] tag_ops->flow_dissect()
[   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */

[   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
[   22.804392] tag_ops->flow_dissect()
[   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */

[   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
[   16.902705] tag_ops->flow_dissect()
[   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */

After:

Rx path (RPS):

[   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
[   16.525260] tag_ops->flow_dissect()
[   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */

[   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
[   15.490417] tag_ops->flow_dissect()
[   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */

[   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
[   17.138895] tag_ops->flow_dissect()
[   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */

Tx path (UP system):

[   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */

[   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */

[   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */

In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
to prevent code from calling tag_ops->flow_dissect() on Tx.
I also decided to initialize 'offset' variable so tagger callbacks can
now safely leave it untouched without provoking a chaos.

Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
---
 net/core/flow_dissector.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 69395b804709..d524a693e00f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -969,9 +969,10 @@ bool __skb_flow_dissect(const struct net *net,
 		nhoff = skb_network_offset(skb);
 		hlen = skb_headlen(skb);
 #if IS_ENABLED(CONFIG_NET_DSA)
-		if (unlikely(skb->dev && netdev_uses_dsa(skb->dev))) {
+		if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) &&
+			     proto == htons(ETH_P_XDSA))) {
 			const struct dsa_device_ops *ops;
-			int offset;
+			int offset = 0;
 
 			ops = skb->dev->dsa_ptr->tag_ops;
 			if (ops->flow_dissect &&
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 10:02 [PATCH net] net: dsa: fix flow dissection on Tx path Alexander Lobakin
@ 2019-12-05 12:58 ` Andrew Lunn
  2019-12-05 13:34   ` Alexander Lobakin
  2019-12-06  3:28 ` Florian Fainelli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Andrew Lunn @ 2019-12-05 12:58 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Vivien Didelot, Florian Fainelli,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel

On Thu, Dec 05, 2019 at 01:02:35PM +0300, Alexander Lobakin wrote:
> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
> ability to override protocol and network offset during flow dissection
> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
> in order to fix skb hashing for RPS on Rx path.
> 
> However, skb_hash() and added part of code can be invoked not only on
> Rx, but also on Tx path if we have a multi-queued device and:
>  - kernel is running on UP system or
>  - XPS is not configured.
> 
> The call stack in this two cases will be like: dev_queue_xmit() ->
> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
> skb_tx_hash() -> skb_get_hash().
> 
> The problem is that skbs queued for Tx have both network offset and
> correct protocol already set up even after inserting a CPU tag by DSA
> tagger, so calling tag_ops->flow_dissect() on this path actually only
> breaks flow dissection and hashing.

Hi Alexander

What i'm missing here is an explanation why the flow dissector is
called here if the protocol is already set? It suggests there is a
case when the protocol is not correctly set, and we do need to look
into the frame?

     Andrew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 12:58 ` Andrew Lunn
@ 2019-12-05 13:34   ` Alexander Lobakin
  2019-12-05 14:01     ` Andrew Lunn
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Lobakin @ 2019-12-05 13:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Vivien Didelot, Florian Fainelli,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel

Andrew Lunn wrote 05.12.2019 15:58:
> On Thu, Dec 05, 2019 at 01:02:35PM +0300, Alexander Lobakin wrote:
>> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
>> ability to override protocol and network offset during flow dissection
>> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
>> in order to fix skb hashing for RPS on Rx path.
>> 
>> However, skb_hash() and added part of code can be invoked not only on
>> Rx, but also on Tx path if we have a multi-queued device and:
>>  - kernel is running on UP system or
>>  - XPS is not configured.
>> 
>> The call stack in this two cases will be like: dev_queue_xmit() ->
>> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
>> skb_tx_hash() -> skb_get_hash().
>> 
>> The problem is that skbs queued for Tx have both network offset and
>> correct protocol already set up even after inserting a CPU tag by DSA
>> tagger, so calling tag_ops->flow_dissect() on this path actually only
>> breaks flow dissection and hashing.
> 
> Hi Alexander

Hi,

> What i'm missing here is an explanation why the flow dissector is
> called here if the protocol is already set? It suggests there is a
> case when the protocol is not correctly set, and we do need to look
> into the frame?

If we have a device with multiple Tx queues, but XPS is not configured
or system is running on uniprocessor system, then networking core code
selects Tx queue depending on the flow to utilize as much Tx queues as
possible but without breaking frames order.
This selection happens in net/core/dev.c:skb_tx_hash() as:

reciprocal_scale(skb_get_hash(skb), qcount)

where 'qcount' is the total number of Tx queues on the network device.

If skb has not been hashed prior to this line, then skb_get_hash() will
call flow dissector to generate a new hash. That's why flow dissection
can occur on Tx path.

>      Andrew

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 13:34   ` Alexander Lobakin
@ 2019-12-05 14:01     ` Andrew Lunn
  2019-12-05 14:58       ` Alexander Lobakin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Lunn @ 2019-12-05 14:01 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Vivien Didelot, Florian Fainelli,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel

> Hi,
> 
> > What i'm missing here is an explanation why the flow dissector is
> > called here if the protocol is already set? It suggests there is a
> > case when the protocol is not correctly set, and we do need to look
> > into the frame?
> 
> If we have a device with multiple Tx queues, but XPS is not configured
> or system is running on uniprocessor system, then networking core code
> selects Tx queue depending on the flow to utilize as much Tx queues as
> possible but without breaking frames order.
> This selection happens in net/core/dev.c:skb_tx_hash() as:
> 
> reciprocal_scale(skb_get_hash(skb), qcount)
> 
> where 'qcount' is the total number of Tx queues on the network device.
> 
> If skb has not been hashed prior to this line, then skb_get_hash() will
> call flow dissector to generate a new hash. That's why flow dissection
> can occur on Tx path.


Hi Alexander

So it looks like you are now skipping this hash. Which in your
testing, give better results, because the protocol is already set
correctly. But are there cases when the protocol is not set correctly?
We really do need to look into the frame?

How about when an outer header has just been removed? The frame was
received on a GRE tunnel, the GRE header has just been removed, and
now the frame is on its way out? Is the protocol still GRE, and we
should look into the frame to determine if it is IPv4, ARP etc?

Your patch looks to improve things for the cases you have tested, but
i'm wondering if there are other use cases where we really do need to
look into the frame? In which case, your fix is doing the wrong thing.
Should we be extending the tagger to handle the TX case as well as the
RX case?

   Andrew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 14:01     ` Andrew Lunn
@ 2019-12-05 14:58       ` Alexander Lobakin
  2019-12-06  3:32         ` Florian Fainelli
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Lobakin @ 2019-12-05 14:58 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Vivien Didelot, Florian Fainelli,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel

Andrew Lunn wrote 05.12.2019 17:01:
>> Hi,
>> 
>> > What i'm missing here is an explanation why the flow dissector is
>> > called here if the protocol is already set? It suggests there is a
>> > case when the protocol is not correctly set, and we do need to look
>> > into the frame?
>> 
>> If we have a device with multiple Tx queues, but XPS is not configured
>> or system is running on uniprocessor system, then networking core code
>> selects Tx queue depending on the flow to utilize as much Tx queues as
>> possible but without breaking frames order.
>> This selection happens in net/core/dev.c:skb_tx_hash() as:
>> 
>> reciprocal_scale(skb_get_hash(skb), qcount)
>> 
>> where 'qcount' is the total number of Tx queues on the network device.
>> 
>> If skb has not been hashed prior to this line, then skb_get_hash() 
>> will
>> call flow dissector to generate a new hash. That's why flow dissection
>> can occur on Tx path.
> 
> 
> Hi Alexander
> 
> So it looks like you are now skipping this hash. Which in your
> testing, give better results, because the protocol is already set
> correctly. But are there cases when the protocol is not set correctly?
> We really do need to look into the frame?

Actually no, I'm not skipping the entire hashing, I'm only skipping
tag_ops->flow_dissect() (helper that only alters network offset and
replaces fake ETH_P_XDSA with the actual protocol) call on Tx path,
because this only breaks flow dissection logics. All skbs are still
processed and hashed by the generic code that goes after that call.

> How about when an outer header has just been removed? The frame was
> received on a GRE tunnel, the GRE header has just been removed, and
> now the frame is on its way out? Is the protocol still GRE, and we
> should look into the frame to determine if it is IPv4, ARP etc?
> 
> Your patch looks to improve things for the cases you have tested, but
> i'm wondering if there are other use cases where we really do need to
> look into the frame? In which case, your fix is doing the wrong thing.
> Should we be extending the tagger to handle the TX case as well as the
> RX case?

We really have two options: don't call tag_ops->flow_dissect() on Tx
(this patch), or extend tagger callbacks to handle Tx path too. I was
using both of this for several months each and couldn't detect cases
where the first one was worse than the second.
I mean, there _might_ be such cases in theory, and if they will appear
we should extend our taggers. But for now I don't see the necessity to
do this as generic flow dissection logics works as expected after this
patch and is completely broken without it.
And remember that we have the reverse logic on Tx and all skbs are
firstly queued on slave netdevice and only then on master/CPU port.

It would be nice to see what other people think about it anyways.

>    Andrew

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 10:02 [PATCH net] net: dsa: fix flow dissection on Tx path Alexander Lobakin
  2019-12-05 12:58 ` Andrew Lunn
@ 2019-12-06  3:28 ` Florian Fainelli
  2019-12-06 15:06   ` Alexander Lobakin
  2019-12-06 19:32 ` Rainer Sickinger
  2019-12-07  4:19 ` David Miller
  3 siblings, 1 reply; 13+ messages in thread
From: Florian Fainelli @ 2019-12-06  3:28 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller
  Cc: Muciri Gatimu, Shashidhar Lakkavalli, John Crispin, Andrew Lunn,
	Vivien Didelot, Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel



On 12/5/2019 2:02 AM, Alexander Lobakin wrote:
> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
> ability to override protocol and network offset during flow dissection
> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
> in order to fix skb hashing for RPS on Rx path.
> 
> However, skb_hash() and added part of code can be invoked not only on
> Rx, but also on Tx path if we have a multi-queued device and:
>  - kernel is running on UP system or
>  - XPS is not configured.
> 
> The call stack in this two cases will be like: dev_queue_xmit() ->
> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
> skb_tx_hash() -> skb_get_hash().
> 
> The problem is that skbs queued for Tx have both network offset and
> correct protocol already set up even after inserting a CPU tag by DSA
> tagger, so calling tag_ops->flow_dissect() on this path actually only
> breaks flow dissection and hashing.
> 
> This can be observed by adding debug prints just before and right after
> tag_ops->flow_dissect() call to the related block of code:
> 
> Before the patch:
> 
> Rx path (RPS):
> 
> [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
> [   19.244271] tag_ops->flow_dissect()
> [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
> 
> [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
> [   19.219746] tag_ops->flow_dissect()
> [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
> 
> [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
> [   18.658332] tag_ops->flow_dissect()
> [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
> 
> Tx path (UP system):
> 
> [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
> [   18.763933] tag_ops->flow_dissect()
> [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
> 
> [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
> [   22.804392] tag_ops->flow_dissect()
> [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
> 
> [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
> [   16.902705] tag_ops->flow_dissect()
> [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
> 
> After:
> 
> Rx path (RPS):
> 
> [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
> [   16.525260] tag_ops->flow_dissect()
> [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
> 
> [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
> [   15.490417] tag_ops->flow_dissect()
> [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
> 
> [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
> [   17.138895] tag_ops->flow_dissect()
> [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
> 
> Tx path (UP system):
> 
> [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
> 
> [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
> 
> [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
> 
> In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
> to prevent code from calling tag_ops->flow_dissect() on Tx.
> I also decided to initialize 'offset' variable so tagger callbacks can
> now safely leave it untouched without provoking a chaos.
> 
> Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 14:58       ` Alexander Lobakin
@ 2019-12-06  3:32         ` Florian Fainelli
  2019-12-06  7:37           ` Alexander Lobakin
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Fainelli @ 2019-12-06  3:32 UTC (permalink / raw)
  To: Alexander Lobakin, Andrew Lunn
  Cc: David S. Miller, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Vivien Didelot, Stanislav Fomichev,
	Daniel Borkmann, Song Liu, Alexei Starovoitov, Matteo Croce,
	Jakub Sitnicki, Eric Dumazet, Paul Blakey, Yoshiki Komachi,
	netdev, linux-kernel



On 12/5/2019 6:58 AM, Alexander Lobakin wrote:
> Andrew Lunn wrote 05.12.2019 17:01:
>>> Hi,
>>>
>>> > What i'm missing here is an explanation why the flow dissector is
>>> > called here if the protocol is already set? It suggests there is a
>>> > case when the protocol is not correctly set, and we do need to look
>>> > into the frame?
>>>
>>> If we have a device with multiple Tx queues, but XPS is not configured
>>> or system is running on uniprocessor system, then networking core code
>>> selects Tx queue depending on the flow to utilize as much Tx queues as
>>> possible but without breaking frames order.
>>> This selection happens in net/core/dev.c:skb_tx_hash() as:
>>>
>>> reciprocal_scale(skb_get_hash(skb), qcount)
>>>
>>> where 'qcount' is the total number of Tx queues on the network device.
>>>
>>> If skb has not been hashed prior to this line, then skb_get_hash() will
>>> call flow dissector to generate a new hash. That's why flow dissection
>>> can occur on Tx path.
>>
>>
>> Hi Alexander
>>
>> So it looks like you are now skipping this hash. Which in your
>> testing, give better results, because the protocol is already set
>> correctly. But are there cases when the protocol is not set correctly?
>> We really do need to look into the frame?
> 
> Actually no, I'm not skipping the entire hashing, I'm only skipping
> tag_ops->flow_dissect() (helper that only alters network offset and
> replaces fake ETH_P_XDSA with the actual protocol) call on Tx path,
> because this only breaks flow dissection logics. All skbs are still
> processed and hashed by the generic code that goes after that call.
> 
>> How about when an outer header has just been removed? The frame was
>> received on a GRE tunnel, the GRE header has just been removed, and
>> now the frame is on its way out? Is the protocol still GRE, and we
>> should look into the frame to determine if it is IPv4, ARP etc?
>>
>> Your patch looks to improve things for the cases you have tested, but
>> i'm wondering if there are other use cases where we really do need to
>> look into the frame? In which case, your fix is doing the wrong thing.
>> Should we be extending the tagger to handle the TX case as well as the
>> RX case?
> 
> We really have two options: don't call tag_ops->flow_dissect() on Tx
> (this patch), or extend tagger callbacks to handle Tx path too. I was
> using both of this for several months each and couldn't detect cases
> where the first one was worse than the second.
> I mean, there _might_ be such cases in theory, and if they will appear
> we should extend our taggers. But for now I don't see the necessity to
> do this as generic flow dissection logics works as expected after this
> patch and is completely broken without it.
> And remember that we have the reverse logic on Tx and all skbs are
> firstly queued on slave netdevice and only then on master/CPU port.
> 
> It would be nice to see what other people think about it anyways.

Your patch seems appropriate to me and quite frankly I am not sure why
flow dissection on RX is done at the DSA master device level, where we
have not parsed the DSA tag yet, instead of being done at the DSA slave
network device level. It seems to me that if the DSA master has N RX
queues, we should be creating the DSA slave devices with the same amount
of RX queues and perform RPS there against a standard Ethernet frame
(sans DSA tag).

For TX the story is a little different because we can have multiqueue
DSA slave network devices in order to steer traffic towards particular
switch queues and we could do XPS there that way.

What do you think?
-- 
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-06  3:32         ` Florian Fainelli
@ 2019-12-06  7:37           ` Alexander Lobakin
  2019-12-06 18:05             ` Florian Fainelli
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Lobakin @ 2019-12-06  7:37 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, David S. Miller, Muciri Gatimu,
	Shashidhar Lakkavalli, John Crispin, Vivien Didelot,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel

Florian Fainelli wrote 06.12.2019 06:32:
> On 12/5/2019 6:58 AM, Alexander Lobakin wrote:
>> Andrew Lunn wrote 05.12.2019 17:01:
>>>> Hi,
>>>> 
>>>> > What i'm missing here is an explanation why the flow dissector is
>>>> > called here if the protocol is already set? It suggests there is a
>>>> > case when the protocol is not correctly set, and we do need to look
>>>> > into the frame?
>>>> 
>>>> If we have a device with multiple Tx queues, but XPS is not 
>>>> configured
>>>> or system is running on uniprocessor system, then networking core 
>>>> code
>>>> selects Tx queue depending on the flow to utilize as much Tx queues 
>>>> as
>>>> possible but without breaking frames order.
>>>> This selection happens in net/core/dev.c:skb_tx_hash() as:
>>>> 
>>>> reciprocal_scale(skb_get_hash(skb), qcount)
>>>> 
>>>> where 'qcount' is the total number of Tx queues on the network 
>>>> device.
>>>> 
>>>> If skb has not been hashed prior to this line, then skb_get_hash() 
>>>> will
>>>> call flow dissector to generate a new hash. That's why flow 
>>>> dissection
>>>> can occur on Tx path.
>>> 
>>> 
>>> Hi Alexander
>>> 
>>> So it looks like you are now skipping this hash. Which in your
>>> testing, give better results, because the protocol is already set
>>> correctly. But are there cases when the protocol is not set 
>>> correctly?
>>> We really do need to look into the frame?
>> 
>> Actually no, I'm not skipping the entire hashing, I'm only skipping
>> tag_ops->flow_dissect() (helper that only alters network offset and
>> replaces fake ETH_P_XDSA with the actual protocol) call on Tx path,
>> because this only breaks flow dissection logics. All skbs are still
>> processed and hashed by the generic code that goes after that call.
>> 
>>> How about when an outer header has just been removed? The frame was
>>> received on a GRE tunnel, the GRE header has just been removed, and
>>> now the frame is on its way out? Is the protocol still GRE, and we
>>> should look into the frame to determine if it is IPv4, ARP etc?
>>> 
>>> Your patch looks to improve things for the cases you have tested, but
>>> i'm wondering if there are other use cases where we really do need to
>>> look into the frame? In which case, your fix is doing the wrong 
>>> thing.
>>> Should we be extending the tagger to handle the TX case as well as 
>>> the
>>> RX case?
>> 
>> We really have two options: don't call tag_ops->flow_dissect() on Tx
>> (this patch), or extend tagger callbacks to handle Tx path too. I was
>> using both of this for several months each and couldn't detect cases
>> where the first one was worse than the second.
>> I mean, there _might_ be such cases in theory, and if they will appear
>> we should extend our taggers. But for now I don't see the necessity to
>> do this as generic flow dissection logics works as expected after this
>> patch and is completely broken without it.
>> And remember that we have the reverse logic on Tx and all skbs are
>> firstly queued on slave netdevice and only then on master/CPU port.
>> 
>> It would be nice to see what other people think about it anyways.
> 
> Your patch seems appropriate to me and quite frankly I am not sure why
> flow dissection on RX is done at the DSA master device level, where we
> have not parsed the DSA tag yet, instead of being done at the DSA slave
> network device level. It seems to me that if the DSA master has N RX
> queues, we should be creating the DSA slave devices with the same 
> amount
> of RX queues and perform RPS there against a standard Ethernet frame
> (sans DSA tag).
> 
> For TX the story is a little different because we can have multiqueue
> DSA slave network devices in order to steer traffic towards particular
> switch queues and we could do XPS there that way.
> 
> What do you think?

Hi Florian,

First of all, thank you for the "Reviewed-by"!

I agree with you that all the network stack processing should be
performed on standard frames without CPU tags and on corresponding
slave netdevices. So I think we really should think about extending
DSA core code to create slaves with at least as many Rx queues as
master device have. With this done we could remove .flow_dissect()
callback from DSA taggers entirely and simplify traffic flow.

Also, if we get back to Tx processing, number of Tx queues on slaves
should be equal to number of queues on switch inself in ideal case.
Maybe we should then apply this rule to Rx queues too, i.e. create
slaves with the number of Rx queues that switch has?

(for example, I'm currently working with the switches that have 8 Rxqs
and 8 Txqs, but their Ethernet controlers / CPU ports have only 4/4)

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-06  3:28 ` Florian Fainelli
@ 2019-12-06 15:06   ` Alexander Lobakin
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Lobakin @ 2019-12-06 15:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: Florian Fainelli, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Andrew Lunn, Vivien Didelot, Stanislav Fomichev,
	Daniel Borkmann, Song Liu, Alexei Starovoitov, Matteo Croce,
	Jakub Sitnicki, Eric Dumazet, Paul Blakey, Yoshiki Komachi,
	netdev, linux-kernel

Florian Fainelli wrote 06.12.2019 06:28:
> On 12/5/2019 2:02 AM, Alexander Lobakin wrote:
>> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
>> ability to override protocol and network offset during flow dissection
>> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
>> in order to fix skb hashing for RPS on Rx path.
>> 
>> However, skb_hash() and added part of code can be invoked not only on
>> Rx, but also on Tx path if we have a multi-queued device and:
>>  - kernel is running on UP system or
>>  - XPS is not configured.
>> 
>> The call stack in this two cases will be like: dev_queue_xmit() ->
>> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
>> skb_tx_hash() -> skb_get_hash().
>> 
>> The problem is that skbs queued for Tx have both network offset and
>> correct protocol already set up even after inserting a CPU tag by DSA
>> tagger, so calling tag_ops->flow_dissect() on this path actually only
>> breaks flow dissection and hashing.
>> 
>> This can be observed by adding debug prints just before and right 
>> after
>> tag_ops->flow_dissect() call to the related block of code:
>> 
>> Before the patch:
>> 
>> Rx path (RPS):
>> 
>> [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
>> [   19.244271] tag_ops->flow_dissect()
>> [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
>> 
>> [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
>> [   19.219746] tag_ops->flow_dissect()
>> [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
>> 
>> [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
>> [   18.658332] tag_ops->flow_dissect()
>> [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
>> 
>> Tx path (UP system):
>> 
>> [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
>> [   18.763933] tag_ops->flow_dissect()
>> [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
>> 
>> [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
>> [   22.804392] tag_ops->flow_dissect()
>> [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
>> 
>> [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
>> [   16.902705] tag_ops->flow_dissect()
>> [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
>> 
>> After:
>> 
>> Rx path (RPS):
>> 
>> [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
>> [   16.525260] tag_ops->flow_dissect()
>> [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
>> 
>> [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
>> [   15.490417] tag_ops->flow_dissect()
>> [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
>> 
>> [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
>> [   17.138895] tag_ops->flow_dissect()
>> [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
>> 
>> Tx path (UP system):
>> 
>> [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
>> 
>> [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
>> 
>> [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
>> 
>> In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
>> to prevent code from calling tag_ops->flow_dissect() on Tx.
>> I also decided to initialize 'offset' variable so tagger callbacks can
>> now safely leave it untouched without provoking a chaos.
>> 
>> Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> 
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

So Dave, you can pick it up into your fixes tree if I understand
correctly.

There will be further work on DSA Rx path, but it's a subject for
next Linux release cycles essentially.

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-06  7:37           ` Alexander Lobakin
@ 2019-12-06 18:05             ` Florian Fainelli
  0 siblings, 0 replies; 13+ messages in thread
From: Florian Fainelli @ 2019-12-06 18:05 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andrew Lunn, David S. Miller, Muciri Gatimu,
	Shashidhar Lakkavalli, John Crispin, Vivien Didelot,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, linux-kernel

On 12/5/19 11:37 PM, Alexander Lobakin wrote:
> Florian Fainelli wrote 06.12.2019 06:32:
>> On 12/5/2019 6:58 AM, Alexander Lobakin wrote:
>>> Andrew Lunn wrote 05.12.2019 17:01:
>>>>> Hi,
>>>>>
>>>>> > What i'm missing here is an explanation why the flow dissector is
>>>>> > called here if the protocol is already set? It suggests there is a
>>>>> > case when the protocol is not correctly set, and we do need to look
>>>>> > into the frame?
>>>>>
>>>>> If we have a device with multiple Tx queues, but XPS is not configured
>>>>> or system is running on uniprocessor system, then networking core code
>>>>> selects Tx queue depending on the flow to utilize as much Tx queues as
>>>>> possible but without breaking frames order.
>>>>> This selection happens in net/core/dev.c:skb_tx_hash() as:
>>>>>
>>>>> reciprocal_scale(skb_get_hash(skb), qcount)
>>>>>
>>>>> where 'qcount' is the total number of Tx queues on the network device.
>>>>>
>>>>> If skb has not been hashed prior to this line, then skb_get_hash()
>>>>> will
>>>>> call flow dissector to generate a new hash. That's why flow dissection
>>>>> can occur on Tx path.
>>>>
>>>>
>>>> Hi Alexander
>>>>
>>>> So it looks like you are now skipping this hash. Which in your
>>>> testing, give better results, because the protocol is already set
>>>> correctly. But are there cases when the protocol is not set correctly?
>>>> We really do need to look into the frame?
>>>
>>> Actually no, I'm not skipping the entire hashing, I'm only skipping
>>> tag_ops->flow_dissect() (helper that only alters network offset and
>>> replaces fake ETH_P_XDSA with the actual protocol) call on Tx path,
>>> because this only breaks flow dissection logics. All skbs are still
>>> processed and hashed by the generic code that goes after that call.
>>>
>>>> How about when an outer header has just been removed? The frame was
>>>> received on a GRE tunnel, the GRE header has just been removed, and
>>>> now the frame is on its way out? Is the protocol still GRE, and we
>>>> should look into the frame to determine if it is IPv4, ARP etc?
>>>>
>>>> Your patch looks to improve things for the cases you have tested, but
>>>> i'm wondering if there are other use cases where we really do need to
>>>> look into the frame? In which case, your fix is doing the wrong thing.
>>>> Should we be extending the tagger to handle the TX case as well as the
>>>> RX case?
>>>
>>> We really have two options: don't call tag_ops->flow_dissect() on Tx
>>> (this patch), or extend tagger callbacks to handle Tx path too. I was
>>> using both of this for several months each and couldn't detect cases
>>> where the first one was worse than the second.
>>> I mean, there _might_ be such cases in theory, and if they will appear
>>> we should extend our taggers. But for now I don't see the necessity to
>>> do this as generic flow dissection logics works as expected after this
>>> patch and is completely broken without it.
>>> And remember that we have the reverse logic on Tx and all skbs are
>>> firstly queued on slave netdevice and only then on master/CPU port.
>>>
>>> It would be nice to see what other people think about it anyways.
>>
>> Your patch seems appropriate to me and quite frankly I am not sure why
>> flow dissection on RX is done at the DSA master device level, where we
>> have not parsed the DSA tag yet, instead of being done at the DSA slave
>> network device level. It seems to me that if the DSA master has N RX
>> queues, we should be creating the DSA slave devices with the same amount
>> of RX queues and perform RPS there against a standard Ethernet frame
>> (sans DSA tag).
>>
>> For TX the story is a little different because we can have multiqueue
>> DSA slave network devices in order to steer traffic towards particular
>> switch queues and we could do XPS there that way.
>>
>> What do you think?
> 
> Hi Florian,
> 
> First of all, thank you for the "Reviewed-by"!
> 
> I agree with you that all the network stack processing should be
> performed on standard frames without CPU tags and on corresponding
> slave netdevices. So I think we really should think about extending
> DSA core code to create slaves with at least as many Rx queues as
> master device have. With this done we could remove .flow_dissect()
> callback from DSA taggers entirely and simplify traffic flow.

Indeed.

> 
> Also, if we get back to Tx processing, number of Tx queues on slaves
> should be equal to number of queues on switch inself in ideal case.
> Maybe we should then apply this rule to Rx queues too, i.e. create
> slaves with the number of Rx queues that switch has?

Yes, I would offer the same configuration knob we have today with TX
queues for RX queues.

> 
> (for example, I'm currently working with the switches that have 8 Rxqs
> and 8 Txqs, but their Ethernet controlers / CPU ports have only 4/4)

Yes, that is not uncommon unfortunately, we have a similar set-up with
BCM7278 which has 16 TX queues for its DSA master and we have 4 switch
ports with 8 TX queues per port, what I did is basically clamp the
number of DSA slave device TX queues to have a 1:1 mapping and that
seems acceptable to the users :)
-- 
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 10:02 [PATCH net] net: dsa: fix flow dissection on Tx path Alexander Lobakin
  2019-12-05 12:58 ` Andrew Lunn
  2019-12-06  3:28 ` Florian Fainelli
@ 2019-12-06 19:32 ` Rainer Sickinger
  2019-12-07  4:19 ` David Miller
  3 siblings, 0 replies; 13+ messages in thread
From: Rainer Sickinger @ 2019-12-06 19:32 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Muciri Gatimu, Shashidhar Lakkavalli,
	John Crispin, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Stanislav Fomichev, Daniel Borkmann, Song Liu,
	Alexei Starovoitov, Matteo Croce, Jakub Sitnicki, Eric Dumazet,
	Paul Blakey, Yoshiki Komachi, netdev, lkml

That is a really great improvement!

Am Do., 5. Dez. 2019 um 11:04 Uhr schrieb Alexander Lobakin <alobakin@dlink.ru>:
>
> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
> ability to override protocol and network offset during flow dissection
> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
> in order to fix skb hashing for RPS on Rx path.
>
> However, skb_hash() and added part of code can be invoked not only on
> Rx, but also on Tx path if we have a multi-queued device and:
>  - kernel is running on UP system or
>  - XPS is not configured.
>
> The call stack in this two cases will be like: dev_queue_xmit() ->
> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
> skb_tx_hash() -> skb_get_hash().
>
> The problem is that skbs queued for Tx have both network offset and
> correct protocol already set up even after inserting a CPU tag by DSA
> tagger, so calling tag_ops->flow_dissect() on this path actually only
> breaks flow dissection and hashing.
>
> This can be observed by adding debug prints just before and right after
> tag_ops->flow_dissect() call to the related block of code:
>
> Before the patch:
>
> Rx path (RPS):
>
> [   19.240001] Rx: proto: 0x00f8, nhoff: 0      /* ETH_P_XDSA */
> [   19.244271] tag_ops->flow_dissect()
> [   19.247811] Rx: proto: 0x0800, nhoff: 8      /* ETH_P_IP */
>
> [   19.215435] Rx: proto: 0x00f8, nhoff: 0      /* ETH_P_XDSA */
> [   19.219746] tag_ops->flow_dissect()
> [   19.223241] Rx: proto: 0x0806, nhoff: 8      /* ETH_P_ARP */
>
> [   18.654057] Rx: proto: 0x00f8, nhoff: 0      /* ETH_P_XDSA */
> [   18.658332] tag_ops->flow_dissect()
> [   18.661826] Rx: proto: 0x8100, nhoff: 8      /* ETH_P_8021Q */
>
> Tx path (UP system):
>
> [   18.759560] Tx: proto: 0x0800, nhoff: 26     /* ETH_P_IP */
> [   18.763933] tag_ops->flow_dissect()
> [   18.767485] Tx: proto: 0x920b, nhoff: 34     /* junk */
>
> [   22.800020] Tx: proto: 0x0806, nhoff: 26     /* ETH_P_ARP */
> [   22.804392] tag_ops->flow_dissect()
> [   22.807921] Tx: proto: 0x920b, nhoff: 34     /* junk */
>
> [   16.898342] Tx: proto: 0x86dd, nhoff: 26     /* ETH_P_IPV6 */
> [   16.902705] tag_ops->flow_dissect()
> [   16.906227] Tx: proto: 0x920b, nhoff: 34     /* junk */
>
> After:
>
> Rx path (RPS):
>
> [   16.520993] Rx: proto: 0x00f8, nhoff: 0      /* ETH_P_XDSA */
> [   16.525260] tag_ops->flow_dissect()
> [   16.528808] Rx: proto: 0x0800, nhoff: 8      /* ETH_P_IP */
>
> [   15.484807] Rx: proto: 0x00f8, nhoff: 0      /* ETH_P_XDSA */
> [   15.490417] tag_ops->flow_dissect()
> [   15.495223] Rx: proto: 0x0806, nhoff: 8      /* ETH_P_ARP */
>
> [   17.134621] Rx: proto: 0x00f8, nhoff: 0      /* ETH_P_XDSA */
> [   17.138895] tag_ops->flow_dissect()
> [   17.142388] Rx: proto: 0x8100, nhoff: 8      /* ETH_P_8021Q */
>
> Tx path (UP system):
>
> [   15.499558] Tx: proto: 0x0800, nhoff: 26     /* ETH_P_IP */
>
> [   20.664689] Tx: proto: 0x0806, nhoff: 26     /* ETH_P_ARP */
>
> [   18.565782] Tx: proto: 0x86dd, nhoff: 26     /* ETH_P_IPV6 */
>
> In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
> to prevent code from calling tag_ops->flow_dissect() on Tx.
> I also decided to initialize 'offset' variable so tagger callbacks can
> now safely leave it untouched without provoking a chaos.
>
> Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> ---
>  net/core/flow_dissector.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 69395b804709..d524a693e00f 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -969,9 +969,10 @@ bool __skb_flow_dissect(const struct net *net,
>                 nhoff = skb_network_offset(skb);
>                 hlen = skb_headlen(skb);
>  #if IS_ENABLED(CONFIG_NET_DSA)
> -               if (unlikely(skb->dev && netdev_uses_dsa(skb->dev))) {
> +               if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) &&
> +                            proto == htons(ETH_P_XDSA))) {
>                         const struct dsa_device_ops *ops;
> -                       int offset;
> +                       int offset = 0;
>
>                         ops = skb->dev->dsa_ptr->tag_ops;
>                         if (ops->flow_dissect &&
> --
> 2.24.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-05 10:02 [PATCH net] net: dsa: fix flow dissection on Tx path Alexander Lobakin
                   ` (2 preceding siblings ...)
  2019-12-06 19:32 ` Rainer Sickinger
@ 2019-12-07  4:19 ` David Miller
  2019-12-07  8:10   ` Alexander Lobakin
  3 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2019-12-07  4:19 UTC (permalink / raw)
  To: alobakin
  Cc: muciri, shashidhar.lakkavalli, john, andrew, vivien.didelot,
	f.fainelli, sdf, daniel, songliubraving, ast, mcroce, jakub,
	edumazet, paulb, komachi.yoshiki, netdev, linux-kernel

From: Alexander Lobakin <alobakin@dlink.ru>
Date: Thu,  5 Dec 2019 13:02:35 +0300

> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
> ability to override protocol and network offset during flow dissection
> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
> in order to fix skb hashing for RPS on Rx path.
> 
> However, skb_hash() and added part of code can be invoked not only on
> Rx, but also on Tx path if we have a multi-queued device and:
>  - kernel is running on UP system or
>  - XPS is not configured.
> 
> The call stack in this two cases will be like: dev_queue_xmit() ->
> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
> skb_tx_hash() -> skb_get_hash().
> 
> The problem is that skbs queued for Tx have both network offset and
> correct protocol already set up even after inserting a CPU tag by DSA
> tagger, so calling tag_ops->flow_dissect() on this path actually only
> breaks flow dissection and hashing.
> 
> This can be observed by adding debug prints just before and right after
> tag_ops->flow_dissect() call to the related block of code:
  ...
> In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
> to prevent code from calling tag_ops->flow_dissect() on Tx.
> I also decided to initialize 'offset' variable so tagger callbacks can
> now safely leave it untouched without provoking a chaos.
> 
> Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>

Applied and queued up for -stable.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: dsa: fix flow dissection on Tx path
  2019-12-07  4:19 ` David Miller
@ 2019-12-07  8:10   ` Alexander Lobakin
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Lobakin @ 2019-12-07  8:10 UTC (permalink / raw)
  To: David Miller
  Cc: rainersickinger.official, shashidhar.lakkavalli, john, andrew,
	vivien.didelot, f.fainelli, sdf, daniel, songliubraving, ast,
	mcroce, jakub, edumazet, paulb, komachi.yoshiki, netdev,
	linux-kernel

David Miller wrote 07.12.2019 07:19:
> From: Alexander Lobakin <alobakin@dlink.ru>
> Date: Thu,  5 Dec 2019 13:02:35 +0300
> 
>> Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
>> ability to override protocol and network offset during flow dissection
>> for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
>> in order to fix skb hashing for RPS on Rx path.
>> 
>> However, skb_hash() and added part of code can be invoked not only on
>> Rx, but also on Tx path if we have a multi-queued device and:
>>  - kernel is running on UP system or
>>  - XPS is not configured.
>> 
>> The call stack in this two cases will be like: dev_queue_xmit() ->
>> __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
>> skb_tx_hash() -> skb_get_hash().
>> 
>> The problem is that skbs queued for Tx have both network offset and
>> correct protocol already set up even after inserting a CPU tag by DSA
>> tagger, so calling tag_ops->flow_dissect() on this path actually only
>> breaks flow dissection and hashing.
>> 
>> This can be observed by adding debug prints just before and right 
>> after
>> tag_ops->flow_dissect() call to the related block of code:
>   ...
>> In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
>> to prevent code from calling tag_ops->flow_dissect() on Tx.
>> I also decided to initialize 'offset' variable so tagger callbacks can
>> now safely leave it untouched without provoking a chaos.
>> 
>> Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> 
> Applied and queued up for -stable.

David, Andrew, Florian, Rainer,
Thank you!

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-12-07  8:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 10:02 [PATCH net] net: dsa: fix flow dissection on Tx path Alexander Lobakin
2019-12-05 12:58 ` Andrew Lunn
2019-12-05 13:34   ` Alexander Lobakin
2019-12-05 14:01     ` Andrew Lunn
2019-12-05 14:58       ` Alexander Lobakin
2019-12-06  3:32         ` Florian Fainelli
2019-12-06  7:37           ` Alexander Lobakin
2019-12-06 18:05             ` Florian Fainelli
2019-12-06  3:28 ` Florian Fainelli
2019-12-06 15:06   ` Alexander Lobakin
2019-12-06 19:32 ` Rainer Sickinger
2019-12-07  4:19 ` David Miller
2019-12-07  8:10   ` Alexander Lobakin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.