linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* via_rhine kernel crashes in 2.6.32
@ 2009-12-19 11:12 Andrey Rahmatullin
  2009-12-20 20:03 ` Andrey Rahmatullin
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Rahmatullin @ 2009-12-19 11:12 UTC (permalink / raw)
  To: linux-kernel, Roger Luethi

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Hello.

Recently I've started to experience kernel crashes under heavy network
load. It may be caused by switching to .32-rc0, but as .30 crashed too, it
may be caused by switching from Vuze to Deluge (maybe it uses the network
more actively). Unfortunately, most crashes can't be detected with
netconsole, and they usually don't fit into 1280x1024 (I think the kernel
prints 2-3 or more backtraces of random processes like hostapd and scmpc
and then completely locks up). Here are two screenshots that look like
complete one-screen traces:
http://wrar.name/temp/P1010714.JPG - this is from .32-rc0
(v2.6.32-7123-g75b0803)
http://wrar.name/temp/P1010711.JPG - this is from 2.6.30 (stock ALT Linux)

Of course I have lots of "eth0: Transmit timed out" and some "WARNING: at
net/sched/sch_generic.c:255 dev_watchdog" before crash, as described at
http://bugzilla.kernel.org/show_bug.cgi?id=11663

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-19 11:12 via_rhine kernel crashes in 2.6.32 Andrey Rahmatullin
@ 2009-12-20 20:03 ` Andrey Rahmatullin
  2009-12-21 12:03   ` Christian Kujau
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Rahmatullin @ 2009-12-20 20:03 UTC (permalink / raw)
  To: linux-kernel, Roger Luethi

[-- Attachment #1: Type: text/plain, Size: 159 bytes --]

Here are the proper screenshots from 2.6.33-rc1:
http://wrar.name/temp/P1010724.JPG
http://wrar.name/temp/P1010725.JPG


-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-20 20:03 ` Andrey Rahmatullin
@ 2009-12-21 12:03   ` Christian Kujau
  2009-12-21 12:36     ` Andrey Rahmatullin
  2009-12-21 18:18     ` Andrey Rahmatullin
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Kujau @ 2009-12-21 12:03 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev

[netdev Cc'ed]

On Mon, 21 Dec 2009 at 01:03, Andrey Rahmatullin wrote:
> Here are the proper screenshots from 2.6.33-rc1:
> http://wrar.name/temp/P1010724.JPG
> http://wrar.name/temp/P1010725.JPG

Wow, I had to rotate my laptop here. In you first posting [0] you said you 
"upgraded to .32" but had crashes in .30 too. What was the last working 
kernel?

In the .jpgs one can see something along the lines of 
schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could 
find touching these things [1] altogether is rather old, from 2.6.24. Is 
2.6.29 working for you? 

Christian.

[0] http://lkml.org/lkml/2009/12/19/36
[1] http://lists.openwall.net/netdev/2007/08/28/13
-- 
BOFH excuse #19:

floating point processor overflow

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 12:03   ` Christian Kujau
@ 2009-12-21 12:36     ` Andrey Rahmatullin
  2009-12-21 18:18     ` Andrey Rahmatullin
  1 sibling, 0 replies; 16+ messages in thread
From: Andrey Rahmatullin @ 2009-12-21 12:36 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev

[-- Attachment #1: Type: text/plain, Size: 641 bytes --]

On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote:
> Wow, I had to rotate my laptop here. In you first posting [0] you said you 
> "upgraded to .32" but had crashes in .30 too. What was the last working 
> kernel?
It was 2.6.31, but as I've already said I'm not sure it was because of
kernel version alone.

> In the .jpgs one can see something along the lines of 
> schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could 
> find touching these things [1] altogether is rather old, from 2.6.24. Is 
> 2.6.29 working for you? 
Well, I can try some old versions.

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 12:03   ` Christian Kujau
  2009-12-21 12:36     ` Andrey Rahmatullin
@ 2009-12-21 18:18     ` Andrey Rahmatullin
  2009-12-21 19:32       ` Christian Kujau
  1 sibling, 1 reply; 16+ messages in thread
From: Andrey Rahmatullin @ 2009-12-21 18:18 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote:
> In the .jpgs one can see something along the lines of 
> schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could 
> find touching these things [1] altogether is rather old, from 2.6.24. Is 
> 2.6.29 working for you? 
I've installed 2.6.27, the earliest kernel supported by udev 149. I
started to download two large files to my second machine (maybe 700 or
1000 Kbyte/s combined). Nothing happened. I stopped the downloads and
started deluged. The kernel showed "netdev watchdog timeout", but nothing
else happened. deluged opened something like 100 TCP connections and
started to upload some data at ~20 Kbyte/s. Nothing happened. I resumed
one of downloads, waited for some time, nothing happened. I resumed the
second download, the kernel crashed into an endless stream of backtraces
(did 2.6.27 support pause_on_oops?), containing "whatever from the idle
thread", or smth like that, which was also in other crash logs.

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 18:18     ` Andrey Rahmatullin
@ 2009-12-21 19:32       ` Christian Kujau
  2009-12-22 12:32         ` Jarek Poplawski
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Kujau @ 2009-12-21 19:32 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev

On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote:
> I've installed 2.6.27, the earliest kernel supported by udev 149. I
[...]
> one of downloads, waited for some time, nothing happened. I resumed the
> second download, the kernel crashed into an endless stream of backtraces
> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle
> thread", or smth like that, which was also in other crash logs.

So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30? 
I know it's a long shot, but since you seem to be able to reproduce this 
pretty reliably, can you try 2.6.23? Or at least something before 
bea3348eef27e6044b6161fd04c3152215f96411 [0]?

Christian.

[0] I'm *really* guessing here, if some netdev guru has some better
    understanding of the backtraces Andrey sent, please step forward.
-- 
BOFH excuse #433:

error: one bad user found in front of screen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 19:32       ` Christian Kujau
@ 2009-12-22 12:32         ` Jarek Poplawski
  2009-12-22 13:21           ` Jarek Poplawski
  0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2009-12-22 12:32 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev

On 21-12-2009 20:32, Christian Kujau wrote:
> On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote:
>> I've installed 2.6.27, the earliest kernel supported by udev 149. I
> [...]
>> one of downloads, waited for some time, nothing happened. I resumed the
>> second download, the kernel crashed into an endless stream of backtraces
>> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle
>> thread", or smth like that, which was also in other crash logs.
> 
> So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30? 
> I know it's a long shot, but since you seem to be able to reproduce this 
> pretty reliably, can you try 2.6.23? Or at least something before 
> bea3348eef27e6044b6161fd04c3152215f96411 [0]?
> 
> Christian.
> 
> [0] I'm *really* guessing here, if some netdev guru has some better
>     understanding of the backtraces Andrey sent, please step forward.

It looks like napi_disable() should be illegal in ndo_tx_timeout().
Here is a patch which moves most of the timeout work to a workqueue,
similarly to tg3 etc. It should prevent at least one of reported
bugs. Alas I can't even check-compile it at the moment, so let me
know on any problems.

Jarek P.

---

 drivers/net/via-rhine.c |   41 ++++++++++++++++++++++++++++-------------
 1 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+					        reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
 
 	napi_disable(&rp->napi);
 
-	spin_lock(&rp->lock);
+	spin_lock_irq(&rp->lock);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
 	rhine_chip_reset(dev);
 	init_registers(dev);
 
-	spin_unlock(&rp->lock);
+	spin_unlock_irq(&rp->lock);
 	enable_irq(rp->pdev->irq);
 
 	dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 12:32         ` Jarek Poplawski
@ 2009-12-22 13:21           ` Jarek Poplawski
  2009-12-22 13:38             ` Jarek Poplawski
  0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2009-12-22 13:21 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev

On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote:
> It looks like napi_disable() should be illegal in ndo_tx_timeout().
> Here is a patch which moves most of the timeout work to a workqueue,
> similarly to tg3 etc. It should prevent at least one of reported
> bugs. Alas I can't even check-compile it at the moment, so let me
> know on any problems.

It seems I needlessly changed locking btw, so here it is again.

Jarek P.

--- (take 2)

 drivers/net/via-rhine.c |   37 ++++++++++++++++++++++++++-----------
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..15a4063 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,15 +1184,11 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+					        reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 13:21           ` Jarek Poplawski
@ 2009-12-22 13:38             ` Jarek Poplawski
  2009-12-22 15:00               ` Andrey Rahmatullin
  0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2009-12-22 13:38 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev

On Tue, Dec 22, 2009 at 01:21:07PM +0000, Jarek Poplawski wrote:
> On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote:
> > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > Here is a patch which moves most of the timeout work to a workqueue,
> > similarly to tg3 etc. It should prevent at least one of reported
> > bugs. Alas I can't even check-compile it at the moment, so let me
> > know on any problems.
> 
> It seems I needlessly changed locking btw, so here it is again.

Hmm... On the other hand, it definitely needs at least _bh now...

Sorry,
Jarek P.
 
--- (take 3)

 drivers/net/via-rhine.c |   41 ++++++++++++++++++++++++++++-------------
 1 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+					        reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
 
 	napi_disable(&rp->napi);
 
-	spin_lock(&rp->lock);
+	spin_lock_bh(&rp->lock);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
 	rhine_chip_reset(dev);
 	init_registers(dev);
 
-	spin_unlock(&rp->lock);
+	spin_unlock_bh(&rp->lock);
 	enable_irq(rp->pdev->irq);
 
 	dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 13:38             ` Jarek Poplawski
@ 2009-12-22 15:00               ` Andrey Rahmatullin
  2009-12-22 15:26                 ` Roger Luethi
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Rahmatullin @ 2009-12-22 15:00 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Christian Kujau, LKML, Roger Luethi, netdev

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > Here is a patch which moves most of the timeout work to a workqueue,
> > > similarly to tg3 etc. It should prevent at least one of reported
> > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > know on any problems.
> > It seems I needlessly changed locking btw, so here it is again.
> Hmm... On the other hand, it definitely needs at least _bh now...
I've tried this patch. There are lots of "Transmit timed out", but no
crashes.

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 15:00               ` Andrey Rahmatullin
@ 2009-12-22 15:26                 ` Roger Luethi
  2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
  2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
  0 siblings, 2 replies; 16+ messages in thread
From: Roger Luethi @ 2009-12-22 15:26 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: Jarek Poplawski, Christian Kujau, LKML, netdev

On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > > Here is a patch which moves most of the timeout work to a workqueue,
> > > > similarly to tg3 etc. It should prevent at least one of reported
> > > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > > know on any problems.
> > > It seems I needlessly changed locking btw, so here it is again.
> > Hmm... On the other hand, it definitely needs at least _bh now...
> I've tried this patch. There are lots of "Transmit timed out", but no
> crashes.

ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
I suspect we shouldn't have to reset due to timeouts that often, but that's
another story.

Roger

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] net/via-rhine: Fix scheduling while atomic bugs
  2009-12-22 15:26                 ` Roger Luethi
@ 2009-12-22 17:36                   ` Jarek Poplawski
  2009-12-24  5:54                     ` David Miller
  2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
  1 sibling, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2009-12-22 17:36 UTC (permalink / raw)
  To: David Miller
  Cc: Roger Luethi, Andrey Rahmatullin, Christian Kujau, LKML, netdev

On Tue, Dec 22, 2009 at 04:26:59PM +0100, Roger Luethi wrote:
> On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
> > On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > > > Here is a patch which moves most of the timeout work to a workqueue,
> > > > > similarly to tg3 etc. It should prevent at least one of reported
> > > > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > > > know on any problems.
> > > > It seems I needlessly changed locking btw, so here it is again.
> > > Hmm... On the other hand, it definitely needs at least _bh now...
> > I've tried this patch. There are lots of "Transmit timed out", but no
> > crashes.
> 
> ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
> I suspect we shouldn't have to reset due to timeouts that often, but that's
> another story.
> 
> Roger

Thanks everybody,
Jarek P.
------------------->

There are BUGs "scheduling while atomic" triggered by the timer
rhine_tx_timeout(). They are caused by calling napi_disable() (with
msleep()). This patch fixes it by moving most of the timer content to
the workqueue function (similarly to other drivers, like tg3), with
spin_lock() changed to BH version.

Additionally, there is spin_lock_irq() moved in rhine_close() to
exclude napi_disable() etc., also tg3's way.

Reported-by: Andrey Rahmatullin <wrar@altlinux.org>
Tested-by: Andrey Rahmatullin <wrar@altlinux.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Roger Luethi <rl@hellgate.ch>
---

 drivers/net/via-rhine.c |   41 ++++++++++++++++++++++++++++-------------
 1 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+						reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
 
 	napi_disable(&rp->napi);
 
-	spin_lock(&rp->lock);
+	spin_lock_bh(&rp->lock);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
 	rhine_chip_reset(dev);
 	init_registers(dev);
 
-	spin_unlock(&rp->lock);
+	spin_unlock_bh(&rp->lock);
 	enable_irq(rp->pdev->irq);
 
 	dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 15:26                 ` Roger Luethi
  2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
@ 2009-12-23  9:52                   ` Jarek Poplawski
  2009-12-23 16:21                     ` Andrey Rahmatullin
  1 sibling, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2009-12-23  9:52 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Andrey Rahmatullin, Christian Kujau, LKML, netdev

On 22-12-2009 16:26, Roger Luethi wrote:
> On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
>> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
>>>>> It looks like napi_disable() should be illegal in ndo_tx_timeout().
>>>>> Here is a patch which moves most of the timeout work to a workqueue,
>>>>> similarly to tg3 etc. It should prevent at least one of reported
>>>>> bugs. Alas I can't even check-compile it at the moment, so let me
>>>>> know on any problems.
>>>> It seems I needlessly changed locking btw, so here it is again.
>>> Hmm... On the other hand, it definitely needs at least _bh now...
>> I've tried this patch. There are lots of "Transmit timed out", but no
>> crashes.
> 
> ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
> I suspect we shouldn't have to reset due to timeouts that often, but that's
> another story.

BTW, it seems a change in 2.6.31 might trigger these timeouts more
often than before. Andrey, could you try if this matters here?

Thanks,
Jarek P.

--- (on top of net-2.6 with the previous "Fix scheduling..." patch)

diff -Nurp a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
--- a/drivers/net/via-rhine.c	2009-12-23 09:28:25.000000000 +0000
+++ b/drivers/net/via-rhine.c	2009-12-23 09:33:57.000000000 +0000
@@ -1226,6 +1226,7 @@ static void rhine_tx_timeout(struct net_
 	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
 
 	schedule_work(&rp->reset_task);
+	netdev_get_tx_queue(dev, 0)->trans_start = jiffies;
 }
 
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
@ 2009-12-23 16:21                     ` Andrey Rahmatullin
  2009-12-23 16:30                       ` Jarek Poplawski
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Rahmatullin @ 2009-12-23 16:21 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Roger Luethi, Christian Kujau, LKML, netdev

On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote:
> BTW, it seems a change in 2.6.31 might trigger these timeouts more
> often than before. Andrey, could you try if this matters here?
They appear once per 4 seconds with or without this patch.

-- 
WBR, wRAR (ALT Linux Team)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-23 16:21                     ` Andrey Rahmatullin
@ 2009-12-23 16:30                       ` Jarek Poplawski
  0 siblings, 0 replies; 16+ messages in thread
From: Jarek Poplawski @ 2009-12-23 16:30 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: Roger Luethi, Christian Kujau, LKML, netdev

Andrey Rahmatullin wrote, On 12/23/2009 05:21 PM:

> On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote:
>> BTW, it seems a change in 2.6.31 might trigger these timeouts more
>> often than before. Andrey, could you try if this matters here?
> They appear once per 4 seconds with or without this patch.
> 

OK, thanks for checking this.

Jarek P.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net/via-rhine: Fix scheduling while atomic bugs
  2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
@ 2009-12-24  5:54                     ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2009-12-24  5:54 UTC (permalink / raw)
  To: jarkao2; +Cc: rl, wrar, lists, linux-kernel, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 22 Dec 2009 18:36:42 +0100

> There are BUGs "scheduling while atomic" triggered by the timer
> rhine_tx_timeout(). They are caused by calling napi_disable() (with
> msleep()). This patch fixes it by moving most of the timer content to
> the workqueue function (similarly to other drivers, like tg3), with
> spin_lock() changed to BH version.
> 
> Additionally, there is spin_lock_irq() moved in rhine_close() to
> exclude napi_disable() etc., also tg3's way.
> 
> Reported-by: Andrey Rahmatullin <wrar@altlinux.org>
> Tested-by: Andrey Rahmatullin <wrar@altlinux.org>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-12-24  5:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-19 11:12 via_rhine kernel crashes in 2.6.32 Andrey Rahmatullin
2009-12-20 20:03 ` Andrey Rahmatullin
2009-12-21 12:03   ` Christian Kujau
2009-12-21 12:36     ` Andrey Rahmatullin
2009-12-21 18:18     ` Andrey Rahmatullin
2009-12-21 19:32       ` Christian Kujau
2009-12-22 12:32         ` Jarek Poplawski
2009-12-22 13:21           ` Jarek Poplawski
2009-12-22 13:38             ` Jarek Poplawski
2009-12-22 15:00               ` Andrey Rahmatullin
2009-12-22 15:26                 ` Roger Luethi
2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
2009-12-24  5:54                     ` David Miller
2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
2009-12-23 16:21                     ` Andrey Rahmatullin
2009-12-23 16:30                       ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).