All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sky2: flow control off
@ 2007-02-02 23:34 Stephen Hemminger
  2007-02-03 22:32 ` Willy Tarreau
  2007-02-07  0:18 ` Jeff Garzik
  0 siblings, 2 replies; 6+ messages in thread
From: Stephen Hemminger @ 2007-02-02 23:34 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Garzik; +Cc: netdev, linux-kernel

Turn flow control off for sky2. When flow control is on, the transmitter
may get randomly stuck. Perhaps there is hardware problem, but until
Marvell provides errata information for workaround, it should default to off.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
---
 drivers/net/sky2.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 822dd0b..a31dea5 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -3263,7 +3263,7 @@ #endif
 
 	/* Auto speed and flow control */
 	sky2->autoneg = AUTONEG_ENABLE;
-	sky2->flow_mode = FC_BOTH;
+	sky2->flow_mode = FC_NONE;
 
 	sky2->duplex = -1;
 	sky2->speed = -1;
-- 
1.4.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] sky2: flow control off
  2007-02-02 23:34 [PATCH] sky2: flow control off Stephen Hemminger
@ 2007-02-03 22:32 ` Willy Tarreau
  2007-02-05 17:22   ` Stephen Hemminger
  2007-02-07  0:18 ` Jeff Garzik
  1 sibling, 1 reply; 6+ messages in thread
From: Willy Tarreau @ 2007-02-03 22:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linus Torvalds, Jeff Garzik, netdev, linux-kernel

Hi Stephen,

On Fri, Feb 02, 2007 at 03:34:25PM -0800, Stephen Hemminger wrote:
> Turn flow control off for sky2. When flow control is on, the transmitter
> may get randomly stuck. Perhaps there is hardware problem, but until
> Marvell provides errata information for workaround, it should default to off.

Are you aware of any way to reproduce or at least to increases chances
of occurrences of this problem ? It looks much like a problem I've been
experiencing with Marvell's driver with a 88E8053 chip on 2.4. Doing
"ethtool -r" proved useful to reset the transceiver in production.

When trying to reproduce the problem, I noticed corrupted and duplicated
frames in tcpdump captures when packets between 147 and 496 bytes were
received then re-routed via the same NIC towards a 100 Mbps machine on
the same switch, a Gig Dlink doing flow control. Yes, I know, there are
a lot of variables ! But changing the target machine, or the packet
sizes was enough to stop the corruption. I could not get the transmitter
stuck as I sometimes have in production, but I think that both problems
are related. The fact that a comparable problem is encountered with your
driver really makes me think about a hardware bug :-/

BTW, switching to your driver (sky2-1.5), I could not reproduce the
corruption problem anymore, but I've not yet put it in production
to check if it fixes the transceiver problem. Anyway, it might be
interesting that users who encounter this problem try an old version
of the driver just in case they detect a difference.

regards,
Willy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sky2: flow control off
  2007-02-03 22:32 ` Willy Tarreau
@ 2007-02-05 17:22   ` Stephen Hemminger
  2007-02-05 18:42     ` Willy Tarreau
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2007-02-05 17:22 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

Here is what I saw.

The transmitter on the Marvell Yukon II (88e8053) hangs when doing transmit flow
control under load.  There appears to be a bug or race condition that 
causes the MAC to stop transmitting data.

There are two drivers for the Yukon II device on Linux. SysKonnect/Marvell
has one called sk98lin it is downloadable from syskonnect.def, and I wrote
one called sky2 that is part of the standard Linux kernel. This problem
is reproducible with the sky2 driver only; the sk98lin driver has a watchdog
routine that resets the hardware perodically, so it masks the problem.

When the failure mode occurs only after several minutes of sustained activity
and a situation where PAUSE frames would be received. In my testing I used

  server == 1000mbit  ===> switch --- 100mbit ---> client

Server was Mac Mini (88E8053) running Linux 2.6.20-rc7 and client was a 
Sony Vaio (88e8036) laptop.  The server was running NFS in kernel
and client was doing a large copy. The server was using UDP to cause
large amounts of 802 pause frames. The problem is not as reproducible with
TCP tests because TCP congestion control avoids over running the switch.

When failure occurs:
     * packets continue to be received and passed up the stack

     * GMAC status register is the pause state
     * transmit packets continue transferred by the DMA into the RAM buffer
     * when the the RAM buffer fills no more packets are DMA'd
     * when transmit queue in driver fills, it gets a watch dog timeout

     * switch appears to get confused and other ports hang as well.

During development of the sky2 driver a similar problem was observed on
receive if the receive DMA buffer was not 8 byte aligned.  For performance
reasons, Linux drivers usually offset the Rx buffer by 2 bytes so that
the TCP/IP headers are aligned for faster CPU access.  If the sky2 Rx
buffer was offset, then the receiver DMA would occasionally hung. The
workaround for receive was to align the receive buffer on a quad word
boundary.

This problem appears to be flow control related because after disabling
flow control, no errors occurred in a 48 hour test run.

There probably are other races and hangs that are related. I don't
consider all the hangs eliminated yet.


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sky2: flow control off
  2007-02-05 17:22   ` Stephen Hemminger
@ 2007-02-05 18:42     ` Willy Tarreau
  0 siblings, 0 replies; 6+ messages in thread
From: Willy Tarreau @ 2007-02-05 18:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Hi Stephen,

First, thanks for this detailed explanation.

On Mon, Feb 05, 2007 at 09:22:53AM -0800, Stephen Hemminger wrote:
> Here is what I saw.
> 
> The transmitter on the Marvell Yukon II (88e8053) hangs when doing transmit flow
> control under load.  There appears to be a bug or race condition that 
> causes the MAC to stop transmitting data.
> 
> There are two drivers for the Yukon II device on Linux. SysKonnect/Marvell
> has one called sk98lin it is downloadable from syskonnect.def, and I wrote
> one called sky2 that is part of the standard Linux kernel. This problem
> is reproducible with the sky2 driver only; the sk98lin driver has a watchdog
> routine that resets the hardware perodically, so it masks the problem.
> 
> When the failure mode occurs only after several minutes of sustained activity
> and a situation where PAUSE frames would be received. In my testing I used
> 
>   server == 1000mbit  ===> switch --- 100mbit ---> client
> 
> Server was Mac Mini (88E8053) running Linux 2.6.20-rc7 and client was a 
> Sony Vaio (88e8036) laptop.  The server was running NFS in kernel
> and client was doing a large copy. The server was using UDP to cause
> large amounts of 802 pause frames. The problem is not as reproducible with
> TCP tests because TCP congestion control avoids over running the switch.

I encountered *exactly* this problem with a one-leg firewall equipped with a
88E8053 attached to a 1000 Mbps switch, itself hosting 100 Mbps stations,
but with sk98lin (2.4). Running tcpdump on the firewall, I noticed duplicated
and corrupted frames. I could only reproduce the duplicated and corrupted
frames on a lab setup, not the Tx hangs, by sending high UDP traffic on the
port to a 100 Mbps host. Sending to 1000 Mbps hosts never triggered the
problem, hence my conclusions about flow control too. What I found interesting
is that using a very old version of the sky2 driver which I had with me
(sky2 v0.5), I could not trigger the problem anymore. But right now, I realize
that this version of the driver did not support flow control yet, which might
converge with your observations :

# ethtool -i eth0
driver: sky2
version: 0.5
firmware-version: N/A
bus-info: 01:00.0

# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:             off
TX:             off

> When failure occurs:
>      * packets continue to be received and passed up the stack
> 
>      * GMAC status register is the pause state
>      * transmit packets continue transferred by the DMA into the RAM buffer
>      * when the the RAM buffer fills no more packets are DMA'd
>      * when transmit queue in driver fills, it gets a watch dog timeout
> 
>      * switch appears to get confused and other ports hang as well.
> 
> During development of the sky2 driver a similar problem was observed on
> receive if the receive DMA buffer was not 8 byte aligned.  For performance
> reasons, Linux drivers usually offset the Rx buffer by 2 bytes so that
> the TCP/IP headers are aligned for faster CPU access.  If the sky2 Rx
> buffer was offset, then the receiver DMA would occasionally hung. The
> workaround for receive was to align the receive buffer on a quad word
> boundary.
> 
> This problem appears to be flow control related because after disabling
> flow control, no errors occurred in a 48 hour test run.

No problem here with the old driver without flow control either. I can try
to disable it right here on my setup with sk98lin, and test again. I did not
know that the sk98lin had a watchdog, it could explain why sometimes the
system entered a strange state (packets taking *seconds* to be forwarded).

Anyway, I'm more and more convinced that there are hardware bugs. It is
not normal at all that both the original syskonnect driver and your fresh
new code show such similar problems !

> There probably are other races and hangs that are related. I don't
> consider all the hangs eliminated yet.

Well, at least you have a more maintainable driver than what was the
previous one, so you will eventually manage to fix all problems ;-)

Best regards,
Willy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sky2: flow control off
  2007-02-02 23:34 [PATCH] sky2: flow control off Stephen Hemminger
  2007-02-03 22:32 ` Willy Tarreau
@ 2007-02-07  0:18 ` Jeff Garzik
  2007-02-07  3:57   ` Stephen Hemminger
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Garzik @ 2007-02-07  0:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linus Torvalds, netdev, linux-kernel, Andrew Morton

Stephen Hemminger wrote:
> Turn flow control off for sky2. When flow control is on, the transmitter
> may get randomly stuck. Perhaps there is hardware problem, but until
> Marvell provides errata information for workaround, it should default to off.
> 
> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
> ---
>  drivers/net/sky2.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 822dd0b..a31dea5 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -3263,7 +3263,7 @@ #endif
>  
>  	/* Auto speed and flow control */
>  	sky2->autoneg = AUTONEG_ENABLE;
> -	sky2->flow_mode = FC_BOTH;
> +	sky2->flow_mode = FC_NONE;

I ACK the patch... conditional on some -mm style testing and user ACKs.

Logic:  if there were no downsides to disabling flow control globally, 
the world's networks would have already done so.  Flow control can be 
quite helpful, so I while I understand the errata argument, I also want 
to understand the full effect of this tiny patch.

	Jeff




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sky2: flow control off
  2007-02-07  0:18 ` Jeff Garzik
@ 2007-02-07  3:57   ` Stephen Hemminger
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2007-02-07  3:57 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Stephen Hemminger, Linus Torvalds, netdev, linux-kernel, Andrew Morton

On Tue, 06 Feb 2007 19:18:07 -0500
Jeff Garzik <jgarzik@pobox.com> wrote:

> Stephen Hemminger wrote:
> > Turn flow control off for sky2. When flow control is on, the transmitter
> > may get randomly stuck. Perhaps there is hardware problem, but until
> > Marvell provides errata information for workaround, it should default to off.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
> > ---
> >  drivers/net/sky2.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> > index 822dd0b..a31dea5 100644
> > --- a/drivers/net/sky2.c
> > +++ b/drivers/net/sky2.c
> > @@ -3263,7 +3263,7 @@ #endif
> >  
> >  	/* Auto speed and flow control */
> >  	sky2->autoneg = AUTONEG_ENABLE;
> > -	sky2->flow_mode = FC_BOTH;
> > +	sky2->flow_mode = FC_NONE;
> 
> I ACK the patch... conditional on some -mm style testing and user ACKs.
> 
> Logic:  if there were no downsides to disabling flow control globally, 
> the world's networks would have already done so.  Flow control can be 
> quite helpful, so I while I understand the errata argument, I also want 
> to understand the full effect of this tiny patch.
> 

Actually, the E1000 had it off until recently. The downside is that if 
a system is connected on a switch with a gigabit to 100mbit port
and using a stupid protocol like NFS over UDP, then the packet
burst is sure to get truncated so the 8K fragmented UDP
never gets through.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-02-07  3:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-02 23:34 [PATCH] sky2: flow control off Stephen Hemminger
2007-02-03 22:32 ` Willy Tarreau
2007-02-05 17:22   ` Stephen Hemminger
2007-02-05 18:42     ` Willy Tarreau
2007-02-07  0:18 ` Jeff Garzik
2007-02-07  3:57   ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.