All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 1/2] net-next: fix DSA flow_disection
@ 2017-06-20  8:06 John Crispin
  2017-06-20  8:06 ` [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct John Crispin
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: John Crispin @ 2017-06-20  8:06 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S . Miller,
	Sean Wang
  Cc: netdev, John Crispin

RPS and probably other kernel features are currently broken on some if not
all DSA devices. The root cause of this that skb_hash will call the
flow_disector. At this point the skb still contains the magic switch header
and the skb->protocol field is not set up to the correct 802.3 value yet.
by the time the tag specific code is called, removing the header and
properly setting the protocol an invalid hash is already set. In the case
of the mt7530 this will result in all flows always having the same hash.

The patch adds 2 new fields to the dsa_switch_ops allowing the
flow_disector to use them in order to be able to create the real hash of
the connection.

Signed-off-by: John Crispin <john@phrozen.org>
---
 include/net/dsa.h         |  6 ++++++
 net/core/flow_dissector.c | 12 ++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 58969b9a090c..8b0e8eca3c28 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -442,6 +442,12 @@ struct dsa_switch_ops {
 					 int port, struct net_device *br);
 	void	(*crosschip_bridge_leave)(struct dsa_switch *ds, int sw_index,
 					  int port, struct net_device *br);
+
+	/*
+	 * Network header and 802.3 protocol offsets
+	 */
+	int	hash_nh_off;
+	int	hash_proto_off;
 };
 
 struct dsa_switch_driver {
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index fc5fc4594c90..da45bdf57408 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -4,6 +4,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/if_vlan.h>
+#include <net/dsa.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/gre.h>
@@ -440,6 +441,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 			 skb->vlan_proto : skb->protocol;
 		nhoff = skb_network_offset(skb);
 		hlen = skb_headlen(skb);
+
+		if (unlikely(netdev_uses_dsa(skb->dev))) {
+			const struct dsa_switch_ops *ops;
+			u8 *p = (u8 *) data;
+
+			ops = skb->dev->dsa_ptr->ds[0]->ops;
+			if (ops->hash_proto_off)
+				proto = (u16) p[ops->hash_proto_off];
+			hlen -= ops->hash_nh_off;
+			nhoff += ops->hash_nh_off;
+		}
 	}
 
 	/* It is ensured by skb_flow_dissector_init() that control key will
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct
  2017-06-20  8:06 [RFC 1/2] net-next: fix DSA flow_disection John Crispin
@ 2017-06-20  8:06 ` John Crispin
  2017-06-20 13:54   ` Andrew Lunn
  2017-06-20 10:17 ` [RFC 1/2] net-next: fix DSA flow_disection Sergei Shtylyov
  2017-06-20 14:01 ` Andrew Lunn
  2 siblings, 1 reply; 12+ messages in thread
From: John Crispin @ 2017-06-20  8:06 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S . Miller,
	Sean Wang
  Cc: netdev, John Crispin

The MT7530 inserts the 4 magic header in between the 802.3 address and
protocol field. The patch defines these header such that the flow_disector
can properly parse the packet and thus allows hashing to function properly.

Signed-off-by: John Crispin <john@phrozen.org>
---
 drivers/net/dsa/mt7530.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 1e46418a3b74..b5385e554601 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -1019,6 +1019,8 @@ static struct dsa_switch_ops mt7530_switch_ops = {
 	.port_fdb_add		= mt7530_port_fdb_add,
 	.port_fdb_del		= mt7530_port_fdb_del,
 	.port_fdb_dump		= mt7530_port_fdb_dump,
+	.hash_nh_off		= 4,
+	.hash_nh_proto		= 2,
 };
 
 static int
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20  8:06 [RFC 1/2] net-next: fix DSA flow_disection John Crispin
  2017-06-20  8:06 ` [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct John Crispin
@ 2017-06-20 10:17 ` Sergei Shtylyov
  2017-06-20 14:01 ` Andrew Lunn
  2 siblings, 0 replies; 12+ messages in thread
From: Sergei Shtylyov @ 2017-06-20 10:17 UTC (permalink / raw)
  To: John Crispin, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S . Miller, Sean Wang
  Cc: netdev

Hello!

On 6/20/2017 11:06 AM, John Crispin wrote:

> RPS and probably other kernel features are currently broken on some if not
> all DSA devices. The root cause of this that skb_hash will call the

   "Is" missing between "this" and "that"?

> flow_disector. At this point the skb still contains the magic switch header

   Dissector?

> and the skb->protocol field is not set up to the correct 802.3 value yet.
> by the time the tag specific code is called, removing the header and
> properly setting the protocol an invalid hash is already set. In the case
> of the mt7530 this will result in all flows always having the same hash.
>
> The patch adds 2 new fields to the dsa_switch_ops allowing the
> flow_disector to use them in order to be able to create the real hash of

   Again.

> the connection.
>
> Signed-off-by: John Crispin <john@phrozen.org>
[...]
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index fc5fc4594c90..da45bdf57408 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
[...]
> @@ -440,6 +441,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>  			 skb->vlan_proto : skb->protocol;
>  		nhoff = skb_network_offset(skb);
>  		hlen = skb_headlen(skb);
> +
> +		if (unlikely(netdev_uses_dsa(skb->dev))) {
> +			const struct dsa_switch_ops *ops;
> +			u8 *p = (u8 *) data;

     Didn't checkpatch.pl complain about space after (u8 *)?

> +
> +			ops = skb->dev->dsa_ptr->ds[0]->ops;
> +			if (ops->hash_proto_off)
> +				proto = (u16) p[ops->hash_proto_off];

    Again, didn't it?

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct
  2017-06-20  8:06 ` [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct John Crispin
@ 2017-06-20 13:54   ` Andrew Lunn
  2017-06-20 17:27     ` John Crispin
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2017-06-20 13:54 UTC (permalink / raw)
  To: John Crispin
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev

On Tue, Jun 20, 2017 at 10:06:55AM +0200, John Crispin wrote:
> The MT7530 inserts the 4 magic header in between the 802.3 address and
> protocol field. The patch defines these header such that the flow_disector
> can properly parse the packet and thus allows hashing to function properly.

This is to do with tagging, not the switch driver. The Marvell switch
driver can be used with two different tagging protocols.

So i would put these fields in the dsa_device_ops.

   Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20  8:06 [RFC 1/2] net-next: fix DSA flow_disection John Crispin
  2017-06-20  8:06 ` [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct John Crispin
  2017-06-20 10:17 ` [RFC 1/2] net-next: fix DSA flow_disection Sergei Shtylyov
@ 2017-06-20 14:01 ` Andrew Lunn
  2017-06-20 17:30   ` Florian Fainelli
  2017-06-20 17:37   ` John Crispin
  2 siblings, 2 replies; 12+ messages in thread
From: Andrew Lunn @ 2017-06-20 14:01 UTC (permalink / raw)
  To: John Crispin
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev

On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
> RPS and probably other kernel features are currently broken on some if not
> all DSA devices. The root cause of this that skb_hash will call the
> flow_disector.

Hi John

What is the call path when the flow_disector is called? I'm wondering
if we can defer this, and call it later, after the tag code has
removed the header.

	Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct
  2017-06-20 13:54   ` Andrew Lunn
@ 2017-06-20 17:27     ` John Crispin
  2017-06-20 18:02       ` Andrew Lunn
  0 siblings, 1 reply; 12+ messages in thread
From: John Crispin @ 2017-06-20 17:27 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev



On 20/06/17 15:54, Andrew Lunn wrote:
> On Tue, Jun 20, 2017 at 10:06:55AM +0200, John Crispin wrote:
>> The MT7530 inserts the 4 magic header in between the 802.3 address and
>> protocol field. The patch defines these header such that the flow_disector
>> can properly parse the packet and thus allows hashing to function properly.
> This is to do with tagging, not the switch driver. The Marvell switch
> driver can be used with two different tagging protocols.
>
> So i would put these fields in the dsa_device_ops.
>
>     Andrew
Hi Andrew,

i originally did so but struct dsa_device_ops is defined inside 
net/dsa/dsa_priv.h so flow_dissector.c would need a

#include "../dsa/dsa_priv.h"

I was not sure if this is ok or if we would need to move the struct 
definition to include/net/dsa.h in that case

     John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20 14:01 ` Andrew Lunn
@ 2017-06-20 17:30   ` Florian Fainelli
  2017-06-20 17:38     ` John Crispin
  2017-06-20 17:37   ` John Crispin
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Fainelli @ 2017-06-20 17:30 UTC (permalink / raw)
  To: Andrew Lunn, John Crispin
  Cc: Vivien Didelot, David S . Miller, Sean Wang, netdev

On 06/20/2017 07:01 AM, Andrew Lunn wrote:
> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>> RPS and probably other kernel features are currently broken on some if not
>> all DSA devices. The root cause of this that skb_hash will call the
>> flow_disector.
> 
> Hi John
> 
> What is the call path when the flow_disector is called? I'm wondering
> if we can defer this, and call it later, after the tag code has
> removed the header.

Would not you usually want to configure RPS at the DSA network device
level where the switch tag has already been popped and you are
processing a regular Ethernet frame at that point?
-- 
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20 14:01 ` Andrew Lunn
  2017-06-20 17:30   ` Florian Fainelli
@ 2017-06-20 17:37   ` John Crispin
  2017-06-20 21:52     ` Andrew Lunn
  1 sibling, 1 reply; 12+ messages in thread
From: John Crispin @ 2017-06-20 17:37 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev



On 20/06/17 16:01, Andrew Lunn wrote:
> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>> RPS and probably other kernel features are currently broken on some if not
>> all DSA devices. The root cause of this that skb_hash will call the
>> flow_disector.
> Hi John
>
> What is the call path when the flow_disector is called? I'm wondering
> if we can defer this, and call it later, after the tag code has
> removed the header.
>
> 	Andrew

Hi Andrew,

the ethernet driver receives the frame and passes it down the line. 
Eventually it ends up inside netif_receive_skb_internal() where it gets 
added to the backlog. At this point get_rps_cpu() is called. Inside 
get_rps_cpu() the skb_get_hash() is called which utilizes the 
flow_dissector() ... which is broken for DSA devices. get_rps_cpu() will 
always return the same hash for all flows and the frame is always added 
to the backlog on the same core. Once inside the backlog it will 
traverse through the dsa layer and end up inside the tag driver and be 
passed to the slave device for further processing and keep its bad flow 
hash for its whole life cycle.

In theory we could reset the hash inside the tag driver but ideally the 
whole life cycle of the frame should happen on the same core to avoid 
possible reordering issues. In addition RPS is broken until the frame 
reaches the tag driver. In the case of the mediatek mt7623 we only have 
1 RX IRQ and in the worst case the RPS of the frame while still inside 
ethX will happen on the same core as where we handle IRQs. This will 
increase the IRQ latency and reduce the free cpu time, thus reducing 
maximum throughput. I did test resetting the hash inside the tag driver. 
Calculating the correct hash from the start did yield a huge performance 
difference however, at least on mt7623. We are talking about 30% extra 
max throughput. This might not be such a big problem if the SoC has a 
multi queue ethernet core but on mt7623 it does make a huge difference 
if we can use RPS to delegate all frame processing away from the core 
handling the IRQs.

     John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20 17:30   ` Florian Fainelli
@ 2017-06-20 17:38     ` John Crispin
  0 siblings, 0 replies; 12+ messages in thread
From: John Crispin @ 2017-06-20 17:38 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn
  Cc: Vivien Didelot, David S . Miller, Sean Wang, netdev



On 20/06/17 19:30, Florian Fainelli wrote:
> On 06/20/2017 07:01 AM, Andrew Lunn wrote:
>> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>>> RPS and probably other kernel features are currently broken on some if not
>>> all DSA devices. The root cause of this that skb_hash will call the
>>> flow_disector.
>> Hi John
>>
>> What is the call path when the flow_disector is called? I'm wondering
>> if we can defer this, and call it later, after the tag code has
>> removed the header.
> Would not you usually want to configure RPS at the DSA network device
> level where the switch tag has already been popped and you are
> processing a regular Ethernet frame at that point?
Hi Florian,

is explained in my mail to Andrew, you really want to be able to setup 
RPS for all devices in the chain to free up the core handling IRQs

     John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct
  2017-06-20 17:27     ` John Crispin
@ 2017-06-20 18:02       ` Andrew Lunn
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2017-06-20 18:02 UTC (permalink / raw)
  To: John Crispin
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev

> #include "../dsa/dsa_priv.h"
> 
> I was not sure if this is ok or if we would need to move the struct
> definition to include/net/dsa.h in that case

Hi John

Please move the structure.

       Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20 17:37   ` John Crispin
@ 2017-06-20 21:52     ` Andrew Lunn
  2017-06-21  4:33       ` John Crispin
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2017-06-20 21:52 UTC (permalink / raw)
  To: John Crispin
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev

> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote:
> 
> 
> On 20/06/17 16:01, Andrew Lunn wrote:
> >On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
> >>RPS and probably other kernel features are currently broken on some if not
> >>all DSA devices. The root cause of this that skb_hash will call the
> >>flow_disector.
> >Hi John
> >
> >What is the call path when the flow_disector is called? I'm wondering
> >if we can defer this, and call it later, after the tag code has
> >removed the header.
> >
> >	Andrew

Hi John

I follow your logic of doing the hash early

Is there any value in including the DSA header in the hash? That might
allow frames from different ingress ports to be spread over CPUs?

      Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] net-next: fix DSA flow_disection
  2017-06-20 21:52     ` Andrew Lunn
@ 2017-06-21  4:33       ` John Crispin
  0 siblings, 0 replies; 12+ messages in thread
From: John Crispin @ 2017-06-21  4:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, Florian Fainelli, David S . Miller, Sean Wang, netdev



On 20/06/17 23:52, Andrew Lunn wrote:
>> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote:
>>
>>
>> On 20/06/17 16:01, Andrew Lunn wrote:
>>> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>>>> RPS and probably other kernel features are currently broken on some if not
>>>> all DSA devices. The root cause of this that skb_hash will call the
>>>> flow_disector.
>>> Hi John
>>>
>>> What is the call path when the flow_disector is called? I'm wondering
>>> if we can defer this, and call it later, after the tag code has
>>> removed the header.
>>>
>>> 	Andrew
> Hi John
>
> I follow your logic of doing the hash early
>
> Is there any value in including the DSA header in the hash? That might
> allow frames from different ingress ports to be spread over CPUs?
>
>        Andrew
Hi Andrew,

adding the DSA header wont make any difference and would still require a 
patch to the flow dissector.

     John

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-21  4:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20  8:06 [RFC 1/2] net-next: fix DSA flow_disection John Crispin
2017-06-20  8:06 ` [RFC 2/2] net-next: mt7530: add nh and proto offsets to the ops struct John Crispin
2017-06-20 13:54   ` Andrew Lunn
2017-06-20 17:27     ` John Crispin
2017-06-20 18:02       ` Andrew Lunn
2017-06-20 10:17 ` [RFC 1/2] net-next: fix DSA flow_disection Sergei Shtylyov
2017-06-20 14:01 ` Andrew Lunn
2017-06-20 17:30   ` Florian Fainelli
2017-06-20 17:38     ` John Crispin
2017-06-20 17:37   ` John Crispin
2017-06-20 21:52     ` Andrew Lunn
2017-06-21  4:33       ` John Crispin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.