linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
@ 2019-07-22 23:37 Rasmus Villemoes
  2019-07-23 13:40 ` Andrew Lunn
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2019-07-22 23:37 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller
  Cc: Rasmus Villemoes, netdev, linux-kernel

We have an ERPS (Ethernet Ring Protection Switching) setup involving
mv88e6250 switches which we're in the process of switching to a BSP
based on the mainline driver. Breaking any link in the ring works as
expected, with the ring reconfiguring itself quickly and traffic
continuing with almost no noticable drops. However, when plugging back
the cable, we see 5+ second stalls.

This has been tracked down to the userspace application in charge of
the protocol missing a few CCM messages on the good link (the one that
was not unplugged), causing it to broadcast a "signal fail". That
message eventually reaches its link partner, which responds by
blocking the port. Meanwhile, the first node has continued to block
the port with the just plugged-in cable, breaking the network. And the
reason for those missing CCM messages has in turn been tracked down to
the VTU apparently being too busy servicing load/purge operations that
the normal lookups are delayed.

Initial state, the link between C and D is blocked in software.

     _____________________
    /                     \
   |                       |
   A ----- B ----- C *---- D

Unplug the cable between C and D.

     _____________________
    /                     \
   |                       |
   A ----- B ----- C *   * D

Reestablish the link between C and D.
     _____________________
    /                     \
   |                       |
   A ----- B ----- C *---- D

Somehow, enough VTU/ATU operations happen inside C that prevents
the application from receving the CCM messages from B in a timely
manner, so a Signal Fail message is sent by C. When B receives
that, it responds by blocking its port.

     _____________________
    /                     \
   |                       |
   A ----- B *---* C *---- D

Very shortly after this, the signal fail condition clears on the
BC link (some CCM messages finally make it through), so C
unblocks the port. However, a guard timer inside B prevents it
from removing the blocking before 5 seconds have elapsed.

It is not unlikely that our userspace ERPS implementation could be
smarter and/or is simply buggy. However, this patch fixes the symptoms
we see, and is a small optimization that should not break anything
(knock wood). The idea is simply to avoid doing an VTU load of an
entry identical to the one already present. To do that, we need to
know whether mv88e6xxx_vtu_get() actually found an existing entry, or
has just prepared a struct mv88e6xxx_vtu_entry for us to load. To that
end, let vlan->valid be an output parameter. The other two callers of
mv88e6xxx_vtu_get() are not affected by this patch since they pass
new=false.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 6b17cd961d06..2e500428670c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1423,7 +1423,6 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid,
 
 		/* Initialize a fresh VLAN entry */
 		memset(entry, 0, sizeof(*entry));
-		entry->valid = true;
 		entry->vid = vid;
 
 		/* Exclude all ports */
@@ -1618,6 +1617,9 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
 	if (err)
 		return err;
 
+	if (vlan.valid && vlan.member[port] == member)
+		return 0;
+	vlan.valid = true;
 	vlan.member[port] = member;
 
 	err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
  2019-07-22 23:37 [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations Rasmus Villemoes
@ 2019-07-23 13:40 ` Andrew Lunn
  2019-07-23 14:09   ` Rasmus Villemoes
  2019-07-27 20:14 ` David Miller
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Andrew Lunn @ 2019-07-23 13:40 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Vivien Didelot, Florian Fainelli, David S. Miller,
	Rasmus Villemoes, netdev, linux-kernel

On Mon, Jul 22, 2019 at 11:37:26PM +0000, Rasmus Villemoes wrote:
> We have an ERPS (Ethernet Ring Protection Switching) setup involving
> mv88e6250 switches which we're in the process of switching to a BSP
> based on the mainline driver. Breaking any link in the ring works as
> expected, with the ring reconfiguring itself quickly and traffic
> continuing with almost no noticable drops. However, when plugging back
> the cable, we see 5+ second stalls.

Hi Rasmus

I would prefer Vivien reviews this patch. But he is away at the
moment. Are you O.K. to wait a few days?

	Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
  2019-07-23 13:40 ` Andrew Lunn
@ 2019-07-23 14:09   ` Rasmus Villemoes
  0 siblings, 0 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2019-07-23 14:09 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, Florian Fainelli, David S. Miller,
	Rasmus Villemoes, netdev, linux-kernel

On 23/07/2019 15.40, Andrew Lunn wrote:
> On Mon, Jul 22, 2019 at 11:37:26PM +0000, Rasmus Villemoes wrote:
>> We have an ERPS (Ethernet Ring Protection Switching) setup involving
>> mv88e6250 switches which we're in the process of switching to a BSP
>> based on the mainline driver. Breaking any link in the ring works as
>> expected, with the ring reconfiguring itself quickly and traffic
>> continuing with almost no noticable drops. However, when plugging back
>> the cable, we see 5+ second stalls.
> 
> Hi Rasmus
> 
> I would prefer Vivien reviews this patch. But he is away at the
> moment. Are you O.K. to wait a few days?

Sure, no rush.

Rasmus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
  2019-07-22 23:37 [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations Rasmus Villemoes
  2019-07-23 13:40 ` Andrew Lunn
@ 2019-07-27 20:14 ` David Miller
  2019-07-29 17:29 ` David Miller
  2019-07-29 22:46 ` Vivien Didelot
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-07-27 20:14 UTC (permalink / raw)
  To: rasmus.villemoes
  Cc: andrew, vivien.didelot, f.fainelli, Rasmus.Villemoes, netdev,
	linux-kernel


Reminder that we need a review from Vivien on this.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
  2019-07-22 23:37 [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations Rasmus Villemoes
  2019-07-23 13:40 ` Andrew Lunn
  2019-07-27 20:14 ` David Miller
@ 2019-07-29 17:29 ` David Miller
  2019-07-29 22:46 ` Vivien Didelot
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-07-29 17:29 UTC (permalink / raw)
  To: rasmus.villemoes
  Cc: andrew, vivien.didelot, f.fainelli, Rasmus.Villemoes, netdev,
	linux-kernel


Andrew, I really can't wait forever to get this patch reviewed.

It's been an entire week already :-(

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
  2019-07-22 23:37 [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations Rasmus Villemoes
                   ` (2 preceding siblings ...)
  2019-07-29 17:29 ` David Miller
@ 2019-07-29 22:46 ` Vivien Didelot
  3 siblings, 0 replies; 6+ messages in thread
From: Vivien Didelot @ 2019-07-29 22:46 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Rasmus Villemoes,
	netdev, linux-kernel

Hi Rasmus,

On Mon, 22 Jul 2019 23:37:26 +0000, Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> We have an ERPS (Ethernet Ring Protection Switching) setup involving
> mv88e6250 switches which we're in the process of switching to a BSP
> based on the mainline driver. Breaking any link in the ring works as
> expected, with the ring reconfiguring itself quickly and traffic
> continuing with almost no noticable drops. However, when plugging back
> the cable, we see 5+ second stalls.
> 
> This has been tracked down to the userspace application in charge of
> the protocol missing a few CCM messages on the good link (the one that
> was not unplugged), causing it to broadcast a "signal fail". That
> message eventually reaches its link partner, which responds by
> blocking the port. Meanwhile, the first node has continued to block
> the port with the just plugged-in cable, breaking the network. And the
> reason for those missing CCM messages has in turn been tracked down to
> the VTU apparently being too busy servicing load/purge operations that
> the normal lookups are delayed.
> 
> Initial state, the link between C and D is blocked in software.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *---- D
> 
> Unplug the cable between C and D.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *   * D
> 
> Reestablish the link between C and D.
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *---- D
> 
> Somehow, enough VTU/ATU operations happen inside C that prevents
> the application from receving the CCM messages from B in a timely
> manner, so a Signal Fail message is sent by C. When B receives
> that, it responds by blocking its port.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B *---* C *---- D
> 
> Very shortly after this, the signal fail condition clears on the
> BC link (some CCM messages finally make it through), so C
> unblocks the port. However, a guard timer inside B prevents it
> from removing the blocking before 5 seconds have elapsed.
> 
> It is not unlikely that our userspace ERPS implementation could be
> smarter and/or is simply buggy. However, this patch fixes the symptoms
> we see, and is a small optimization that should not break anything
> (knock wood). The idea is simply to avoid doing an VTU load of an
> entry identical to the one already present. To do that, we need to
> know whether mv88e6xxx_vtu_get() actually found an existing entry, or
> has just prepared a struct mv88e6xxx_vtu_entry for us to load. To that
> end, let vlan->valid be an output parameter. The other two callers of
> mv88e6xxx_vtu_get() are not affected by this patch since they pass
> new=false.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 6b17cd961d06..2e500428670c 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -1423,7 +1423,6 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid,
>  
>  		/* Initialize a fresh VLAN entry */
>  		memset(entry, 0, sizeof(*entry));
> -		entry->valid = true;
>  		entry->vid = vid;
>  
>  		/* Exclude all ports */
> @@ -1618,6 +1617,9 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
>  	if (err)
>  		return err;
>  
> +	if (vlan.valid && vlan.member[port] == member)
> +		return 0;
> +	vlan.valid = true;
>  	vlan.member[port] = member;
>  
>  	err = mv88e6xxx_vtu_loadpurge(chip, &vlan);

I'd prefer not to use the mv88e6xxx_vtu_entry structure for output
parameters, this can be confusing. As you correctly mentioned, this
initialization is only done for _mv88e6xxx_port_vlan_add, so I'll
prepare a patch which gets rid of the boolean parameter and move that
logic up.


Thanks,
Vivien

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-29 22:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 23:37 [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations Rasmus Villemoes
2019-07-23 13:40 ` Andrew Lunn
2019-07-23 14:09   ` Rasmus Villemoes
2019-07-27 20:14 ` David Miller
2019-07-29 17:29 ` David Miller
2019-07-29 22:46 ` Vivien Didelot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).