linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs
@ 2012-08-25  3:22 David Decotigny
  2012-08-25  3:22 ` [PATCH net-next v1 1/3] forcedeth: fix buffer overflow David Decotigny
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: David Decotigny @ 2012-08-25  3:22 UTC (permalink / raw)
  To: Ayaz Abdulla, netdev, linux-kernel
  Cc: David S. Miller, Eric Dumazet, Joe Perches, David Decotigny

On a dual port MCP55 10de:0373 (rev a3) NIC with both ports connected,
we identified a configuration that does freeze the whole NIC: having
autoneg & TX pause turned on while one port is physically connected
but interface is down (eg. eth1) eventually causes the whole NIC to
freeze (eth1 and... eth0). This triggers TX timeouts on the UP
interface and, more generally, an unreachable network.

In order to avoid the bug, all we have to do is make sure not to
configure TX pause on the hardware while NIC is down. This is what the
2nd patch of the series does (details included).

And, in case the NIC is in a bad state at reboot (should not happen
anymore thanks to patch above), third patch basically always makes
sure to fix the NIC when module is loaded.

I could only test this with a MCP55 10de:0373 (rev a3) PCI device on a
x86_64 host.

Any feedback on these patches welcome! In particular, please let me
know if this should not apply to other hardware.


############################################
# Patch Set Summary:

David Decotigny (3):
  forcedeth: fix buffer overflow
  forcedeth: fix TX timeout caused by TX pause on down link
  forcedeth: prevent TX timeouts after reboot

 drivers/net/ethernet/nvidia/forcedeth.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
1.7.10.2.5.g20d7bc9


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next v1 1/3] forcedeth: fix buffer overflow
  2012-08-25  3:22 [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Decotigny
@ 2012-08-25  3:22 ` David Decotigny
  2012-08-25  3:22 ` [PATCH net-next v1 2/3] forcedeth: fix TX timeout caused by TX pause on down link David Decotigny
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: David Decotigny @ 2012-08-25  3:22 UTC (permalink / raw)
  To: Ayaz Abdulla, netdev, linux-kernel
  Cc: David S. Miller, Eric Dumazet, Joe Perches, David Decotigny

Found by manual code inspection.

Tested: compile, reboot, ethtool -d ethX


Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index f45def0..51d19d8 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -4435,7 +4435,7 @@ static void nv_get_regs(struct net_device *dev, struct ethtool_regs *regs, void
 
 	regs->version = FORCEDETH_REGS_VER;
 	spin_lock_irq(&np->lock);
-	for (i = 0; i <= np->register_size/sizeof(u32); i++)
+	for (i = 0; i < np->register_size/sizeof(u32); i++)
 		rbuf[i] = readl(base + i*sizeof(u32));
 	spin_unlock_irq(&np->lock);
 }
-- 
1.7.10.2.5.g20d7bc9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next v1 2/3] forcedeth: fix TX timeout caused by TX pause on down link
  2012-08-25  3:22 [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Decotigny
  2012-08-25  3:22 ` [PATCH net-next v1 1/3] forcedeth: fix buffer overflow David Decotigny
@ 2012-08-25  3:22 ` David Decotigny
  2012-08-25  3:22 ` [PATCH net-next v1 3/3] forcedeth: prevent TX timeouts after reboot David Decotigny
  2012-08-30 17:05 ` [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Decotigny @ 2012-08-25  3:22 UTC (permalink / raw)
  To: Ayaz Abdulla, netdev, linux-kernel
  Cc: David S. Miller, Eric Dumazet, Joe Perches, David Decotigny

On some dual-port forcedeth devices such as MCP55 10de:0373 (rev a3),
when autoneg & TX pause are enabled while port is connected but
interface is down, the NIC will eventually freeze (TX timeouts,
network unreachable).

This patch ensures that TX pause is not configured in hardware when
interface is down. The TX pause request will be honored when interface
is later configured.

Tested:
 - hardware is MCP55 device id 10de:0373 (rev a3), dual-port
 - eth0 connected and UP, eth1 connected but DOWN
 - without this patch, following sequence would brick NIC:
      ifconfig eth0 down
      ifconfig eth1 up
      ifconfig eth1 down
      ethtool -A eth1 autoneg off rx on tx off
      ifconfig eth1 up
      ifconfig eth1 down
      ethtool -A eth1 autoneg on rx on tx on
      ifconfig eth1 up
      ifconfig eth1 down
      ifup eth0
      sleep 120  # or longer
      ethtool eth1
   Just in case, sequence to un-brick:
      ifconfig eth0 down
      ethtool -A eth1 autoneg off rx on tx off
      ifconfig eth1 up
      ifconfig eth1 down
      ifup eth0
 - with this patch: no TX timeout after "bricking" sequence above

Details:
 - The following register accesses have been identified as the ones
   causing the NIC to freeze in "bricking" sequence above:
    - write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register NvRegTxPauseFrame
    - write NVREG_MISC1_PAUSE_TX | NVREG_MISC1_FORCE to eth1's register NvRegMisc1
    - write 0 to eth1's register NvRegTransmitterControl
   This is what this patch avoids.



Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 51d19d8..8b82457 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -3409,7 +3409,7 @@ set_speed:
 
 	pause_flags = 0;
 	/* setup pause frame */
-	if (np->duplex != 0) {
+	if (netif_running(dev) && (np->duplex != 0)) {
 		if (np->autoneg && np->pause_flags & NV_PAUSEFRAME_AUTONEG) {
 			adv_pause = adv & (ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM);
 			lpa_pause = lpa & (LPA_PAUSE_CAP | LPA_PAUSE_ASYM);
@@ -5455,6 +5455,7 @@ static int nv_close(struct net_device *dev)
 
 	netif_stop_queue(dev);
 	spin_lock_irq(&np->lock);
+	nv_update_pause(dev, 0); /* otherwise stop_tx bricks NIC */
 	nv_stop_rxtx(dev);
 	nv_txrx_reset(dev);
 
-- 
1.7.10.2.5.g20d7bc9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next v1 3/3] forcedeth: prevent TX timeouts after reboot
  2012-08-25  3:22 [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Decotigny
  2012-08-25  3:22 ` [PATCH net-next v1 1/3] forcedeth: fix buffer overflow David Decotigny
  2012-08-25  3:22 ` [PATCH net-next v1 2/3] forcedeth: fix TX timeout caused by TX pause on down link David Decotigny
@ 2012-08-25  3:22 ` David Decotigny
  2012-08-30 17:05 ` [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Decotigny @ 2012-08-25  3:22 UTC (permalink / raw)
  To: Ayaz Abdulla, netdev, linux-kernel
  Cc: David S. Miller, Eric Dumazet, Joe Perches, David Decotigny

This complements patch "net-forcedeth: fix TX timeout caused by TX
pause on down link" which ensures that a lock-up sequence is not sent
to the NIC. Present patch ensures that if a NIC is already locked-up,
the driver will recover from it when initializing the device.

It does the equivalent of the following recovery sequence:
 - write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register
   NvRegTxPauseFrame
 - write NVREG_XMITCTL_START to eth1's register
   NvRegTransmitterControl
 - write 0 to eth1's register NvRegTransmitterControl
(this is at the heart of the "unbricking" sequence mentioned in patch
 "net-forcedeth: fix TX timeout caused by TX pause on down link")

Tested:
 - hardware is MCP55 device id 10de:0373 (rev a3), dual-port
 - reboot a kernel without any of patches mentioned
 - freeze the NIC (details on description for commit "net-forcedeth:
   fix TX timeout caused by TX pause on down link")
 - wait 5mn until ping hangs & TX timeout in dmesg
 - reboot on kernel with present patch
 - host is immediatly operational, no TX timeout



Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 8b82457..edd6221 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -5905,11 +5905,18 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		goto out_error;
 	}
 
+	netif_carrier_off(dev);
+
+	/* Some NICs freeze when TX pause is enabled while NIC is
+	 * down, and this stays across warm reboots. The sequence
+	 * below should be enough to recover from that state. */
+	nv_update_pause(dev, 0);
+	nv_start_tx(dev);
+	nv_stop_tx(dev);
+
 	if (id->driver_data & DEV_HAS_VLAN)
 		nv_vlan_mode(dev, dev->features);
 
-	netif_carrier_off(dev);
-
 	dev_info(&pci_dev->dev, "ifname %s, PHY OUI 0x%x @ %d, addr %pM\n",
 		 dev->name, np->phy_oui, np->phyaddr, dev->dev_addr);
 
-- 
1.7.10.2.5.g20d7bc9


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs
  2012-08-25  3:22 [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Decotigny
                   ` (2 preceding siblings ...)
  2012-08-25  3:22 ` [PATCH net-next v1 3/3] forcedeth: prevent TX timeouts after reboot David Decotigny
@ 2012-08-30 17:05 ` David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2012-08-30 17:05 UTC (permalink / raw)
  To: decot; +Cc: aabdulla, netdev, linux-kernel, edumazet, joe

From: David Decotigny <decot@googlers.com>
Date: Fri, 24 Aug 2012 20:22:50 -0700

> On a dual port MCP55 10de:0373 (rev a3) NIC with both ports connected,
> we identified a configuration that does freeze the whole NIC: having
> autoneg & TX pause turned on while one port is physically connected
> but interface is down (eg. eth1) eventually causes the whole NIC to
> freeze (eth1 and... eth0). This triggers TX timeouts on the UP
> interface and, more generally, an unreachable network.
> 
> In order to avoid the bug, all we have to do is make sure not to
> configure TX pause on the hardware while NIC is down. This is what the
> 2nd patch of the series does (details included).
> 
> And, in case the NIC is in a bad state at reboot (should not happen
> anymore thanks to patch above), third patch basically always makes
> sure to fix the NIC when module is loaded.
> 
> I could only test this with a MCP55 10de:0373 (rev a3) PCI device on a
> x86_64 host.

All applied to net-next, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-08-30 17:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-25  3:22 [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Decotigny
2012-08-25  3:22 ` [PATCH net-next v1 1/3] forcedeth: fix buffer overflow David Decotigny
2012-08-25  3:22 ` [PATCH net-next v1 2/3] forcedeth: fix TX timeout caused by TX pause on down link David Decotigny
2012-08-25  3:22 ` [PATCH net-next v1 3/3] forcedeth: prevent TX timeouts after reboot David Decotigny
2012-08-30 17:05 ` [PATCH net-next v1 0/3] forcedeth: fix device lock-up for dual-port NICs David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).