All of lore.kernel.org
 help / color / mirror / Atom feed
* 3c59x: shared interrupt problem
@ 2009-03-09 22:42 Gerhard Pircher
  2009-03-09 23:49 ` Stephen Hemminger
  2009-03-27  7:59 ` Steffen Klassert
  0 siblings, 2 replies; 16+ messages in thread
From: Gerhard Pircher @ 2009-03-09 22:42 UTC (permalink / raw)
  To: netdev

Hi!

Large network transfers fail on my machine (with kernel versions
>v2.6.26) with the kernel oops below. eth0 (3c59x driver) normally
shares its IRQ line with 3 OHCI USB ports (IRQ 7), as the excerpt of
/proc/interrupt shows. Removing USB support from the kernel makes it
work again. I wasn't able to do a full git bisect run yet, as v2.6.27
didn't produce a bootable kernel image for my machine. The machine is
an AmigaOne PowerPC G4 with an onboard 3c920 network chip.

Any idea?

best regards,

Gerhard

PS: Please put me on CC:, as I'm not subscribed to this mailing list.

/proc/interrupts:
           CPU0
  1:       1648   i8259     Level     i8042
  5:          0   i8259     Level     uhci_hcd:usb4, uhci_hcd:usb5
  6:          4   i8259     Level     floppy
  7:     236520   i8259     Level     ohci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3, eth0
  8:          2   i8259     Level     rtc0
  9:          0   i8259     Level     eth2
 12:        117   i8259     Level     i8042
 14:       8277   i8259     Level     ide0
 15:      17559   i8259     Level     ide1
BAD:          1

Kernel log:
Badness at net/sched/sch_generic.c:226
NIP: c0250118 LR: c0250118 CTR: c0013020
REGS: efffde90 TRAP: 0700   Not tainted  (2.6.29-rc6)
MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 42024024  XER: 00000000
TASK = c03915a0[0] 'swapper' THREAD: c03b2000
GPR00: c0250118 efffdf40 c03915a0 00000035 00008a62 ffffffff ffffffff 00000000 
GPR08: 00000000 c03c0000 00008a62 c0393104 22024042 00000000 0ffd5900 0080044c 
GPR16: 00000001 ffffffff 00000000 007ffc00 0ffd3158 0f0689b0 0ffff220 007ffbc0 
GPR24: 00000000 00000000 0000000a 00000004 efffc000 c024ffb0 00000100 ef847000 
NIP [c0250118] dev_watchdog+0x168/0x244
LR [c0250118] dev_watchdog+0x168/0x244
Call Trace:
[efffdf40] [c0250118] dev_watchdog+0x168/0x244 (unreliable)
[efffdfa0] [c002f564] run_timer_softirq+0x12c/0x1b4
[efffdfd0] [c002ab0c] __do_softirq+0x6c/0x108
[efffdff0] [c0011ef0] call_do_softirq+0x14/0x24
[c03b3e90] [c0006c30] do_softirq+0x64/0x88
[c03b3eb0] [c002a968] irq_exit+0x38/0x7c
[c03b3ec0] [c000f634] timer_interrupt+0x138/0x150
[c03b3ee0] [c0012bd4] ret_from_except+0x0/0x14
--- Exception: 901 at cpu_idle+0xa4/0xec
    LR = cpu_idle+0x98/0xec
[c03b3fa0] [c0009f38] cpu_idle+0x4c/0xec (unreliable)
[c03b3fb0] [c0297214] __got2_end+0x58/0x68
[c03b3fc0] [c03637e4] start_kernel+0x28c/0x2a0
[c03b3ff0] [0000380c] 0x380c
Instruction dump:
80099d6c 2f800000 40be0038 38810008 7fe3fb78 38a00040 4bfee811 7fe4fb78 
7c651b78 3c60c034 3863f264 4bdd6005 <0fe00000> 38000001 3d20c03c 90099d6c 
eth0: transmit timed out, tx_status 00 status e601.
  diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
eth0: Interrupt posted but not delivered -- IRQ blocked by another device?
  Flags; bus-master 1, dirty 16(0) current 16(0)
  Transmit list 00000000 vs. f101a200.
  0: @f101a200  length 80000156 status 00010156
  1: @f101a2a0  length 80000156 status 00010156
  2: @f101a340  length 80000156 status 00010156
  3: @f101a3e0  length 80000156 status 00010156
  4: @f101a480  length 80000156 status 00010156
  5: @f101a520  length 80000156 status 00010156
  6: @f101a5c0  length 80000156 status 00010156
  7: @f101a660  length 80000156 status 00010156
  8: @f101a700  length 8000003c status 0001003c
  9: @f101a7a0  length 8000003c status 0001003c
  10: @f101a840  length 8000003c status 0001003c
  11: @f101a8e0  length 8000003c status 0001003c
  12: @f101a980  length 8000003c status 0001003c
  13: @f101aa20  length 8000003c status 0001003c
  14: @f101aac0  length 80000036 status 80010036
  15: @f101ab60  length 800000f5 status 8c0100f5
eth0: Resetting the Tx ring pointer.

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-09 22:42 3c59x: shared interrupt problem Gerhard Pircher
@ 2009-03-09 23:49 ` Stephen Hemminger
  2009-03-10  8:16   ` Steffen Klassert
  2009-03-27  7:59 ` Steffen Klassert
  1 sibling, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2009-03-09 23:49 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: netdev

On Mon, 09 Mar 2009 23:42:53 +0100
"Gerhard Pircher" <gerhard_pircher@gmx.net> wrote:

> Hi!
> 
> Large network transfers fail on my machine (with kernel versions
> >v2.6.26) with the kernel oops below. eth0 (3c59x driver) normally
> shares its IRQ line with 3 OHCI USB ports (IRQ 7), as the excerpt of
> /proc/interrupt shows. Removing USB support from the kernel makes it
> work again. I wasn't able to do a full git bisect run yet, as v2.6.27
> didn't produce a bootable kernel image for my machine. The machine is
> an AmigaOne PowerPC G4 with an onboard 3c920 network chip.
> 
> Any idea?

Does this help, it looks like boomerang_interrupt was not doing
shared irq stuff correctly.

--- a/drivers/net/3c59x.c	2009-03-09 16:07:13.372670015 -0700
+++ b/drivers/net/3c59x.c	2009-03-09 16:08:50.214357441 -0700
@@ -2301,6 +2301,7 @@ boomerang_interrupt(int irq, void *dev_i
 	void __iomem *ioaddr;
 	int status;
 	int work_done = max_interrupt_work;
+	int handled = 0;
 
 	ioaddr = vp->ioaddr;
 
@@ -2323,6 +2324,7 @@ boomerang_interrupt(int irq, void *dev_i
 			printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
 		goto handler_exit;
 	}
+	handled = 1;
 
 	if (status & IntReq) {
 		status |= vp->deferred;
@@ -2417,7 +2419,7 @@ boomerang_interrupt(int irq, void *dev_i
 			   dev->name, status);
 handler_exit:
 	spin_unlock(&vp->lock);
-	return IRQ_HANDLED;
+	return IRQ_RETVAL(handled);
 }
 
 static int vortex_rx(struct net_device *dev)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-09 23:49 ` Stephen Hemminger
@ 2009-03-10  8:16   ` Steffen Klassert
  2009-03-10 21:55     ` Andrew Morton
       [not found]     ` <20090310090053.322240@gmx.net>
  0 siblings, 2 replies; 16+ messages in thread
From: Steffen Klassert @ 2009-03-10  8:16 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Gerhard Pircher, Andrew Morton, netdev

On Mon, Mar 09, 2009 at 04:49:27PM -0700, Stephen Hemminger wrote:
> On Mon, 09 Mar 2009 23:42:53 +0100
> "Gerhard Pircher" <gerhard_pircher@gmx.net> wrote:
> 
> > Hi!
> > 
> > Large network transfers fail on my machine (with kernel versions
> > >v2.6.26) with the kernel oops below. eth0 (3c59x driver) normally
> > shares its IRQ line with 3 OHCI USB ports (IRQ 7), as the excerpt of
> > /proc/interrupt shows. Removing USB support from the kernel makes it
> > work again. I wasn't able to do a full git bisect run yet, as v2.6.27
> > didn't produce a bootable kernel image for my machine. The machine is
> > an AmigaOne PowerPC G4 with an onboard 3c920 network chip.
> > 
> > Any idea?
> 
> Does this help, it looks like boomerang_interrupt was not doing
> shared irq stuff correctly.
> 
> --- a/drivers/net/3c59x.c	2009-03-09 16:07:13.372670015 -0700
> +++ b/drivers/net/3c59x.c	2009-03-09 16:08:50.214357441 -0700
> @@ -2301,6 +2301,7 @@ boomerang_interrupt(int irq, void *dev_i
>  	void __iomem *ioaddr;
>  	int status;
>  	int work_done = max_interrupt_work;
> +	int handled = 0;
>  
>  	ioaddr = vp->ioaddr;
>  
> @@ -2323,6 +2324,7 @@ boomerang_interrupt(int irq, void *dev_i
>  			printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
>  		goto handler_exit;
>  	}
> +	handled = 1;
>  
>  	if (status & IntReq) {
>  		status |= vp->deferred;
> @@ -2417,7 +2419,7 @@ boomerang_interrupt(int irq, void *dev_i
>  			   dev->name, status);
>  handler_exit:
>  	spin_unlock(&vp->lock);
> -	return IRQ_HANDLED;
> +	return IRQ_RETVAL(handled);
>  }
>  
>  static int vortex_rx(struct net_device *dev)
> 

This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
This patch was to workaround lots of "nobody cared" warnings generated by
boomerang_interrupt(). 
I added Andrew to the Cc, perhaps he can remember some details on this.

Steffen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-10  8:16   ` Steffen Klassert
@ 2009-03-10 21:55     ` Andrew Morton
  2009-03-11 11:38       ` Steffen Klassert
       [not found]     ` <20090310090053.322240@gmx.net>
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2009-03-10 21:55 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: shemminger, gerhard_pircher, netdev

On Tue, 10 Mar 2009 09:16:28 +0100
Steffen Klassert <klassert@mathematik.tu-chemnitz.de> wrote:

> On Mon, Mar 09, 2009 at 04:49:27PM -0700, Stephen Hemminger wrote:
> > On Mon, 09 Mar 2009 23:42:53 +0100
> > "Gerhard Pircher" <gerhard_pircher@gmx.net> wrote:
> > 
> > > Hi!
> > > 
> > > Large network transfers fail on my machine (with kernel versions
> > > >v2.6.26) with the kernel oops below. eth0 (3c59x driver) normally
> > > shares its IRQ line with 3 OHCI USB ports (IRQ 7), as the excerpt of
> > > /proc/interrupt shows. Removing USB support from the kernel makes it
> > > work again. I wasn't able to do a full git bisect run yet, as v2.6.27
> > > didn't produce a bootable kernel image for my machine. The machine is
> > > an AmigaOne PowerPC G4 with an onboard 3c920 network chip.
> > > 
> > > Any idea?
> > 
> > Does this help, it looks like boomerang_interrupt was not doing
> > shared irq stuff correctly.
> > 
> > --- a/drivers/net/3c59x.c	2009-03-09 16:07:13.372670015 -0700
> > +++ b/drivers/net/3c59x.c	2009-03-09 16:08:50.214357441 -0700
> > @@ -2301,6 +2301,7 @@ boomerang_interrupt(int irq, void *dev_i
> >  	void __iomem *ioaddr;
> >  	int status;
> >  	int work_done = max_interrupt_work;
> > +	int handled = 0;
> >  
> >  	ioaddr = vp->ioaddr;
> >  
> > @@ -2323,6 +2324,7 @@ boomerang_interrupt(int irq, void *dev_i
> >  			printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
> >  		goto handler_exit;
> >  	}
> > +	handled = 1;
> >  
> >  	if (status & IntReq) {
> >  		status |= vp->deferred;
> > @@ -2417,7 +2419,7 @@ boomerang_interrupt(int irq, void *dev_i
> >  			   dev->name, status);
> >  handler_exit:
> >  	spin_unlock(&vp->lock);
> > -	return IRQ_HANDLED;
> > +	return IRQ_RETVAL(handled);
> >  }
> >  
> >  static int vortex_rx(struct net_device *dev)
> > 
> 
> This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
> This patch was to workaround lots of "nobody cared" warnings generated by
> boomerang_interrupt(). 
> I added Andrew to the Cc, perhaps he can remember some details on this.
> 

Beats me.  Do you havea full copy of that patch, including changelog?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
       [not found]     ` <20090310090053.322240@gmx.net>
@ 2009-03-11 11:31       ` Steffen Klassert
  0 siblings, 0 replies; 16+ messages in thread
From: Steffen Klassert @ 2009-03-11 11:31 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: shemminger, netdev, akpm

On Tue, Mar 10, 2009 at 10:00:53AM +0100, Gerhard Pircher wrote:
> > 
> > This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
> > This patch was to workaround lots of "nobody cared" warnings generated
> > by boomerang_interrupt().
> > I added Andrew to the Cc, perhaps he can remember some details on this.
> I'm afraid this patch didn't fix the problem. I'm using scp to copy a big
> ISO file from my PC to the AmigaOne and the network transfer still stalls.
> I made a photo from the kernel oops printed out during shutdown.

Your photo shows exactly such a "nobody cared" warning that Andrew wanted 
to get rid of with his patch.

> BTW: shouldn't the driver use vortex_interrupt() to handle interrupts for
> a 3c920?
> 

It depends on the content of your NIC's eeprom which ISR the driver will use.
If your NIC is full bus master capable boomerang_interrupt() will be used,
if not vortex_interrupt() will be used. As far as I know the 3c920 is
of "tornado" type, so it should be full bus master capable.

Your first report showed dnComplete as a pending interrupt source.
Since the driver uses tx_interrupt_mitigation, this happens just if the 
tx ring is full. Could you please test the patch below? This disables
tx_interrupt_mitigation, so the dnComplete interrupt will be triggered 
with every packet.

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index b2563d3..c45c400 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -60,7 +60,7 @@ static int watchdog = 5000;
  * of possible Tx stalls if the system is blocking interrupts
  * somewhere else.  Undefine this to disable.
  */
-#define tx_interrupt_mitigation 1
+#define tx_interrupt_mitigation 0
 
 /* Put out somewhat more debugging messages. (0: no msg, 1 minimal .. 6). */
 #define vortex_debug debug

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-10 21:55     ` Andrew Morton
@ 2009-03-11 11:38       ` Steffen Klassert
  2009-03-13 22:51         ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Steffen Klassert @ 2009-03-11 11:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: shemminger, gerhard_pircher, netdev

On Tue, Mar 10, 2009 at 02:55:42PM -0700, Andrew Morton wrote:
> > 
> > This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
> > This patch was to workaround lots of "nobody cared" warnings generated by
> > boomerang_interrupt(). 
> > I added Andrew to the Cc, perhaps he can remember some details on this.
> > 
> 
> Beats me.  Do you havea full copy of that patch, including changelog?
> 

It was this one:

#### ChangeSet ####
2003-05-19 10:27:49-07:00, akpm@digeo.com 
  [PATCH] 3c59x irqreturn fix
  
  Apparently boomerang_interrupt() is generating lots of "nobody cared"
  warnings - one per packet it seems.  Frankly, I don't have a clue why.
  
  These are ancient cards and the driver is otherwise stable, so just
  change it to return IRQ_HANDLED and move on...

==== drivers/net/3c59x.c ====
2003-05-17 14:09:34-07:00, akpm@digeo.com +2 -7
  3c59x irqreturn fix

--- 1.34/drivers/net/3c59x.c	2003-04-20 22:41:08 -07:00
+++ 1.35/drivers/net/3c59x.c	2003-05-17 14:09:34 -07:00
@@ -2321,7 +2321,6 @@ boomerang_interrupt(int irq, void *dev_i
 	long ioaddr;
 	int status;
 	int work_done = max_interrupt_work;
-	int handled;
 
 	ioaddr = dev->base_addr;
 
@@ -2336,18 +2335,14 @@ boomerang_interrupt(int irq, void *dev_i
 	if (vortex_debug > 6)
 		printk(KERN_DEBUG "boomerang_interrupt. status=0x%4x\n", status);
 
-	if ((status & IntLatch) == 0) {
-		handled = 0;
+	if ((status & IntLatch) == 0)
 		goto handler_exit;		/* No interrupt: shared IRQs can cause this */
-	}
 
 	if (status == 0xffff) {		/* h/w no longer present (hotplug)? */
 		if (vortex_debug > 1)
 			printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
-		handled = 0;
 		goto handler_exit;
 	}
-	handled = 1;
 
 	if (status & IntReq) {
 		status |= vp->deferred;
@@ -2442,7 +2437,7 @@ boomerang_interrupt(int irq, void *dev_i
 			   dev->name, status);
 handler_exit:
 	spin_unlock(&vp->lock);
-	return IRQ_RETVAL(handled);
+	return IRQ_HANDLED;
 }
 
 static int vortex_rx(struct net_device *dev)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-11 11:38       ` Steffen Klassert
@ 2009-03-13 22:51         ` David Miller
  2009-03-14 14:08           ` Steffen Klassert
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2009-03-13 22:51 UTC (permalink / raw)
  To: klassert; +Cc: akpm, shemminger, gerhard_pircher, netdev

From: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
Date: Wed, 11 Mar 2009 12:38:15 +0100

> On Tue, Mar 10, 2009 at 02:55:42PM -0700, Andrew Morton wrote:
> > > 
> > > This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
> > > This patch was to workaround lots of "nobody cared" warnings generated by
> > > boomerang_interrupt(). 
> > > I added Andrew to the Cc, perhaps he can remember some details on this.
> > > 
> > 
> > Beats me.  Do you havea full copy of that patch, including changelog?
> > 
> 
> It was this one:
> 
> #### ChangeSet ####
> 2003-05-19 10:27:49-07:00, akpm@digeo.com 
>   [PATCH] 3c59x irqreturn fix
>   
>   Apparently boomerang_interrupt() is generating lots of "nobody cared"
>   warnings - one per packet it seems.  Frankly, I don't have a clue why.
>   
>   These are ancient cards and the driver is otherwise stable, so just
>   change it to return IRQ_HANDLED and move on...

So basically it's a band-aid because we didn't investigate why
this happens.

I think we should put the change in, and then look into things
properly if users report this issue again.

The code there right now is just completely wrong when the
3c59x interrupt is shared with another device.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-13 22:51         ` David Miller
@ 2009-03-14 14:08           ` Steffen Klassert
  2009-03-14 18:40             ` David Miller
  2009-03-17  9:37             ` Gerhard Pircher
  0 siblings, 2 replies; 16+ messages in thread
From: Steffen Klassert @ 2009-03-14 14:08 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, shemminger, gerhard_pircher, netdev

On Fri, Mar 13, 2009 at 03:51:16PM -0700, David Miller wrote:
> > #### ChangeSet ####
> > 2003-05-19 10:27:49-07:00, akpm@digeo.com 
> >   [PATCH] 3c59x irqreturn fix
> >   
> >   Apparently boomerang_interrupt() is generating lots of "nobody cared"
> >   warnings - one per packet it seems.  Frankly, I don't have a clue why.
> >   
> >   These are ancient cards and the driver is otherwise stable, so just
> >   change it to return IRQ_HANDLED and move on...
> 
> So basically it's a band-aid because we didn't investigate why
> this happens.
> 
> I think we should put the change in, and then look into things
> properly if users report this issue again.
> 

Gerhard reported at least one of these "nobody cared" messages at shutdown
after applying this change. He wanted to provide us with further informations
about this issue next week. Best would be to put it in together with a fix.
So I would suggest to wait, perhaps we can fix it with his informations. 


> The code there right now is just completely wrong when the
> 3c59x interrupt is shared with another device.

Indeed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-14 14:08           ` Steffen Klassert
@ 2009-03-14 18:40             ` David Miller
  2009-03-17  9:37             ` Gerhard Pircher
  1 sibling, 0 replies; 16+ messages in thread
From: David Miller @ 2009-03-14 18:40 UTC (permalink / raw)
  To: klassert; +Cc: akpm, shemminger, gerhard_pircher, netdev

From: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
Date: Sat, 14 Mar 2009 15:08:41 +0100

> Gerhard reported at least one of these "nobody cared" messages at shutdown
> after applying this change. He wanted to provide us with further informations
> about this issue next week. Best would be to put it in together with a fix.
> So I would suggest to wait, perhaps we can fix it with his informations. 

Fair enough.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-14 14:08           ` Steffen Klassert
  2009-03-14 18:40             ` David Miller
@ 2009-03-17  9:37             ` Gerhard Pircher
  1 sibling, 0 replies; 16+ messages in thread
From: Gerhard Pircher @ 2009-03-17  9:37 UTC (permalink / raw)
  To: Steffen Klassert, davem; +Cc: netdev, shemminger, akpm


-------- Original-Nachricht --------
> Datum: Sat, 14 Mar 2009 15:08:41 +0100
> Von: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
> An: David Miller <davem@davemloft.net>
> CC: akpm@linux-foundation.org, shemminger@vyatta.com, gerhard_pircher@gmx.net, netdev@vger.kernel.org
> Betreff: Re: 3c59x: shared interrupt problem

> On Fri, Mar 13, 2009 at 03:51:16PM -0700, David Miller wrote:
> > > #### ChangeSet ####
> > > 2003-05-19 10:27:49-07:00, akpm@digeo.com 
> > >   [PATCH] 3c59x irqreturn fix
> > >   
> > >   Apparently boomerang_interrupt() is generating lots of "nobody
> > >   cared" warnings - one per packet it seems.  Frankly, I don't have
> > >   a clue why.
> > >   
> > >   These are ancient cards and the driver is otherwise stable, so
> > >   just change it to return IRQ_HANDLED and move on...
> > 
> > So basically it's a band-aid because we didn't investigate why
> > this happens.
> > 
> > I think we should put the change in, and then look into things
> > properly if users report this issue again.
> > 
> 
> Gerhard reported at least one of these "nobody cared" messages at
> shutdown after applying this change. He wanted to provide us with
> further informations about this issue next week. Best would be to put
> it in together with a fix.
> So I would suggest to wait, perhaps we can fix it with his
> informations.
Okay, here's a small status update in order to show that I'm really
doing something. :)
Increasing the value of vortex_debug doesn't really help. It slows
down the network transfer too much to trigger the bug. Does somebody
know where to insert some printks in the driver to get a useful debug
output?
Bisecting shows that even v2.6.26-rc1 fails. But I have a v2.6.27-rc7
image that works fine!? Looks like I have to put more effort in
bisecting.

Gerhard
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-09 22:42 3c59x: shared interrupt problem Gerhard Pircher
  2009-03-09 23:49 ` Stephen Hemminger
@ 2009-03-27  7:59 ` Steffen Klassert
  2009-03-28 14:17   ` Gerhard Pircher
  2009-04-21 18:36   ` Gerhard Pircher
  1 sibling, 2 replies; 16+ messages in thread
From: Steffen Klassert @ 2009-03-27  7:59 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: netdev

On Mon, Mar 09, 2009 at 11:42:53PM +0100, Gerhard Pircher wrote:
> 
> Kernel log:
> Badness at net/sched/sch_generic.c:226
> NIP: c0250118 LR: c0250118 CTR: c0013020
> REGS: efffde90 TRAP: 0700   Not tainted  (2.6.29-rc6)
> MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 42024024  XER: 00000000
> TASK = c03915a0[0] 'swapper' THREAD: c03b2000
> GPR00: c0250118 efffdf40 c03915a0 00000035 00008a62 ffffffff ffffffff 00000000 
> GPR08: 00000000 c03c0000 00008a62 c0393104 22024042 00000000 0ffd5900 0080044c 
> GPR16: 00000001 ffffffff 00000000 007ffc00 0ffd3158 0f0689b0 0ffff220 007ffbc0 
> GPR24: 00000000 00000000 0000000a 00000004 efffc000 c024ffb0 00000100 ef847000 
> NIP [c0250118] dev_watchdog+0x168/0x244
> LR [c0250118] dev_watchdog+0x168/0x244
> Call Trace:
> [efffdf40] [c0250118] dev_watchdog+0x168/0x244 (unreliable)
> [efffdfa0] [c002f564] run_timer_softirq+0x12c/0x1b4
> [efffdfd0] [c002ab0c] __do_softirq+0x6c/0x108
> [efffdff0] [c0011ef0] call_do_softirq+0x14/0x24
> [c03b3e90] [c0006c30] do_softirq+0x64/0x88
> [c03b3eb0] [c002a968] irq_exit+0x38/0x7c
> [c03b3ec0] [c000f634] timer_interrupt+0x138/0x150
> [c03b3ee0] [c0012bd4] ret_from_except+0x0/0x14
> --- Exception: 901 at cpu_idle+0xa4/0xec
>     LR = cpu_idle+0x98/0xec
> [c03b3fa0] [c0009f38] cpu_idle+0x4c/0xec (unreliable)
> [c03b3fb0] [c0297214] __got2_end+0x58/0x68
> [c03b3fc0] [c03637e4] start_kernel+0x28c/0x2a0
> [c03b3ff0] [0000380c] 0x380c
> Instruction dump:
> 80099d6c 2f800000 40be0038 38810008 7fe3fb78 38a00040 4bfee811 7fe4fb78 
> 7c651b78 3c60c034 3863f264 4bdd6005 <0fe00000> 38000001 3d20c03c 90099d6c 
> eth0: transmit timed out, tx_status 00 status e601.
>   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
> eth0: Interrupt posted but not delivered -- IRQ blocked by another device?
>   Flags; bus-master 1, dirty 16(0) current 16(0)
>   Transmit list 00000000 vs. f101a200.
>   0: @f101a200  length 80000156 status 00010156
>   1: @f101a2a0  length 80000156 status 00010156
>   2: @f101a340  length 80000156 status 00010156
>   3: @f101a3e0  length 80000156 status 00010156
>   4: @f101a480  length 80000156 status 00010156
>   5: @f101a520  length 80000156 status 00010156
>   6: @f101a5c0  length 80000156 status 00010156
>   7: @f101a660  length 80000156 status 00010156
>   8: @f101a700  length 8000003c status 0001003c
>   9: @f101a7a0  length 8000003c status 0001003c
>   10: @f101a840  length 8000003c status 0001003c
>   11: @f101a8e0  length 8000003c status 0001003c
>   12: @f101a980  length 8000003c status 0001003c
>   13: @f101aa20  length 8000003c status 0001003c
>   14: @f101aac0  length 80000036 status 80010036
>   15: @f101ab60  length 800000f5 status 8c0100f5
> eth0: Resetting the Tx ring pointer.
> 

Do you see these messages always when your network hangs and does the network
recover after such a hang? Could you please send the output of 'tc -s qdisc show'
after a network hang?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-27  7:59 ` Steffen Klassert
@ 2009-03-28 14:17   ` Gerhard Pircher
  2009-04-21 18:36   ` Gerhard Pircher
  1 sibling, 0 replies; 16+ messages in thread
From: Gerhard Pircher @ 2009-03-28 14:17 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev


-------- Original-Nachricht --------
> Datum: Fri, 27 Mar 2009 08:59:37 +0100
> Von: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: netdev@vger.kernel.org
> Betreff: Re: 3c59x: shared interrupt problem

> On Mon, Mar 09, 2009 at 11:42:53PM +0100, Gerhard Pircher wrote:
> > 
> > Kernel log:
> > Badness at net/sched/sch_generic.c:226
> > NIP: c0250118 LR: c0250118 CTR: c0013020
> > REGS: efffde90 TRAP: 0700   Not tainted  (2.6.29-rc6)
> > MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 42024024  XER: 00000000
> > TASK = c03915a0[0] 'swapper' THREAD: c03b2000
> > GPR00: c0250118 efffdf40 c03915a0 00000035 00008a62 ffffffff ffffffff 00000000 
> > GPR08: 00000000 c03c0000 00008a62 c0393104 22024042 00000000 0ffd5900 0080044c 
> > GPR16: 00000001 ffffffff 00000000 007ffc00 0ffd3158 0f0689b0 0ffff220 007ffbc0 
> > GPR24: 00000000 00000000 0000000a 00000004 efffc000 c024ffb0 00000100 ef847000 
> > NIP [c0250118] dev_watchdog+0x168/0x244
> > LR [c0250118] dev_watchdog+0x168/0x244
> > Call Trace:
> > [efffdf40] [c0250118] dev_watchdog+0x168/0x244 (unreliable)
> > [efffdfa0] [c002f564] run_timer_softirq+0x12c/0x1b4
> > [efffdfd0] [c002ab0c] __do_softirq+0x6c/0x108
> > [efffdff0] [c0011ef0] call_do_softirq+0x14/0x24
> > [c03b3e90] [c0006c30] do_softirq+0x64/0x88
> > [c03b3eb0] [c002a968] irq_exit+0x38/0x7c
> > [c03b3ec0] [c000f634] timer_interrupt+0x138/0x150
> > [c03b3ee0] [c0012bd4] ret_from_except+0x0/0x14
> > --- Exception: 901 at cpu_idle+0xa4/0xec
> >     LR = cpu_idle+0x98/0xec
> > [c03b3fa0] [c0009f38] cpu_idle+0x4c/0xec (unreliable)
> > [c03b3fb0] [c0297214] __got2_end+0x58/0x68
> > [c03b3fc0] [c03637e4] start_kernel+0x28c/0x2a0
> > [c03b3ff0] [0000380c] 0x380c
> > Instruction dump:
> > 80099d6c 2f800000 40be0038 38810008 7fe3fb78 38a00040 4bfee811 7fe4fb78 
> > 7c651b78 3c60c034 3863f264 4bdd6005 <0fe00000> 38000001 3d20c03c 90099d6c 
> > eth0: transmit timed out, tx_status 00 status e601.
> >   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
> > eth0: Interrupt posted but not delivered -- IRQ blocked by another device?
> >   Flags; bus-master 1, dirty 16(0) current 16(0)
> >   Transmit list 00000000 vs. f101a200.
> >   0: @f101a200  length 80000156 status 00010156
> >   1: @f101a2a0  length 80000156 status 00010156
> >   2: @f101a340  length 80000156 status 00010156
> >   3: @f101a3e0  length 80000156 status 00010156
> >   4: @f101a480  length 80000156 status 00010156
> >   5: @f101a520  length 80000156 status 00010156
> >   6: @f101a5c0  length 80000156 status 00010156
> >   7: @f101a660  length 80000156 status 00010156
> >   8: @f101a700  length 8000003c status 0001003c
> >   9: @f101a7a0  length 8000003c status 0001003c
> >   10: @f101a840  length 8000003c status 0001003c
> >   11: @f101a8e0  length 8000003c status 0001003c
> >   12: @f101a980  length 8000003c status 0001003c
> >   13: @f101aa20  length 8000003c status 0001003c
> >   14: @f101aac0  length 80000036 status 80010036
> >   15: @f101ab60  length 800000f5 status 8c0100f5
> > eth0: Resetting the Tx ring pointer.
> > 
> 
> Do you see these messages always when your network hangs and does the
> network recover after such a hang? Could you please send the output of
> 'tc -s qdisc show' after a network hang?
IIRC I only got this message once during shutdown. Normally only
"IRQ 7 nobody cared" messages with a stacktrace of the interrupt
handlers are printed out with newer kernels (>=2.6.26) (see the
screenshots I made). Older kernels don't print out any messages
at all. Also the network never recovers after a bad interrupt is
reported in /proc/interrupts.
I'm far away from my machine for the next three weeks, so I can't
send you the output until then.
So far I could narrow down the problem to kernel versions v2.6.19
till 2.6.23-rc9. Bisecting is getting harder now, because either
arch/ppc doesn't work anymore for PPC32 or my platform patches for
arch/powerpc do not apply.

regards,

Gerhard
-- 
--
-- Dipl. Ing. (FH) Gerhard Pircher
-- E-mail : gerhard_pircher@gmx.net
--

Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-27  7:59 ` Steffen Klassert
  2009-03-28 14:17   ` Gerhard Pircher
@ 2009-04-21 18:36   ` Gerhard Pircher
  1 sibling, 0 replies; 16+ messages in thread
From: Gerhard Pircher @ 2009-04-21 18:36 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev


-------- Original-Nachricht --------
> Datum: Fri, 27 Mar 2009 08:59:37 +0100
> Von: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: netdev@vger.kernel.org
> Betreff: Re: 3c59x: shared interrupt problem

> Do you see these messages always when your network hangs and does the
> network recover after such a hang? Could you please send the output of
> 'tc -s qdisc show' after a network hang?
Sorry for the delay! Here's the output of tc after a network hang.

qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 4633909 bytes 64766 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 

regards,

Gerhard
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-12 14:39 ` Steffen Klassert
@ 2009-03-12 15:12   ` Gerhard Pircher
  0 siblings, 0 replies; 16+ messages in thread
From: Gerhard Pircher @ 2009-03-12 15:12 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: shemminger, netdev, akpm


-------- Original-Nachricht --------
> Datum: Thu, 12 Mar 2009 15:39:30 +0100
> Von: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: akpm@linux-foundation.org, netdev@vger.kernel.org, shemminger@vyatta.com
> Betreff: Re: 3c59x: shared interrupt problem

> On Wed, Mar 11, 2009 at 11:42:40PM +0100, Gerhard Pircher wrote:
> > 
> > > Your first report showed dnComplete as a pending interrupt source.
> > > Since the driver uses tx_interrupt_mitigation, this happens just if
> > > the tx ring is full. Could you please test the patch below? This
> > > disables tx_interrupt_mitigation, so the dnComplete interrupt will
> > > be triggered with every packet.
> > I disabled tx_interrupt_mitigation and tested it with and w/o the
> > patch for boomerang_interrupt(). The network transfer stalls in both
> > cases sooner or later.
> > 
> > Here are two photos from the kernel output:
> > - no tx_interrupt_mitigation and patch for booomerang_interrupt():
> >
> http://boot.homelinux.org:8080/kernel/oops_boomerang_irq_tx_irq_mitig.jpg
> > - no tx_interrupt_mitigation only:
> > http://boot.homelinux.org:8080/kernel/oops_tx_irq_mitig.jpg
> > 
> 
> Your pictures show just a message at shutdown. Are there any other
> unusual messages from the 3c59x driver in your logs? In particular I'm
> interested in "transmit timed out" messages.
Unfortunately no. But I had disabled debugging in the driver until now, as
it slows down the network transfer considerably. I'll rerun the test with
debugging enabled.

I'm going to retry 'git bisect' next week. Maybe I find a way to compile a
bootable image for older kernel versions.

Thanks!

Gerhard

-- 
Nur bis 16.03.! DSL-Komplettanschluss inkl. WLAN-Modem für nur 
17,95 ¿/mtl. + 1 Monat gratis!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
  2009-03-11 22:42 Gerhard Pircher
@ 2009-03-12 14:39 ` Steffen Klassert
  2009-03-12 15:12   ` Gerhard Pircher
  0 siblings, 1 reply; 16+ messages in thread
From: Steffen Klassert @ 2009-03-12 14:39 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: akpm, netdev, shemminger

On Wed, Mar 11, 2009 at 11:42:40PM +0100, Gerhard Pircher wrote:
> 
> > Your first report showed dnComplete as a pending interrupt source.
> > Since the driver uses tx_interrupt_mitigation, this happens just if the 
> > tx ring is full. Could you please test the patch below? This disables
> > tx_interrupt_mitigation, so the dnComplete interrupt will be triggered 
> > with every packet.
> I disabled tx_interrupt_mitigation and tested it with and w/o the patch
> for boomerang_interrupt(). The network transfer stalls in both cases
> sooner or later.
> 
> Here are two photos from the kernel output:
> - no tx_interrupt_mitigation and patch for booomerang_interrupt():
> http://boot.homelinux.org:8080/kernel/oops_boomerang_irq_tx_irq_mitig.jpg
> - no tx_interrupt_mitigation only:
> http://boot.homelinux.org:8080/kernel/oops_tx_irq_mitig.jpg
> 

Your pictures show just a message at shutdown. Are there any other unusual
messages from the 3c59x driver in your logs? In particular I'm interested in 
"transmit timed out" messages.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3c59x: shared interrupt problem
@ 2009-03-11 22:42 Gerhard Pircher
  2009-03-12 14:39 ` Steffen Klassert
  0 siblings, 1 reply; 16+ messages in thread
From: Gerhard Pircher @ 2009-03-11 22:42 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: akpm, netdev, shemminger


-------- Original-Nachricht --------
> Datum: Wed, 11 Mar 2009 12:31:30 +0100
> Von: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> CC: shemminger@vyatta.com, netdev@vger.kernel.org, akpm@linux-foundation.org
> Betreff: Re: 3c59x: shared interrupt problem

> On Tue, Mar 10, 2009 at 10:00:53AM +0100, Gerhard Pircher wrote:
> > > 
> > > This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
> > > This patch was to workaround lots of "nobody cared" warnings 
> > > generated by boomerang_interrupt().
> > > I added Andrew to the Cc, perhaps he can remember some details on
> > > this.
> > I'm afraid this patch didn't fix the problem. I'm using scp to copy a
> > big ISO file from my PC to the AmigaOne and the network transfer still
> > stalls.
> > I made a photo from the kernel oops printed out during shutdown.
> 
> Your photo shows exactly such a "nobody cared" warning that Andrew wanted 
> to get rid of with his patch.
> 
> > BTW: shouldn't the driver use vortex_interrupt() to handle interrupts
> > for a 3c920?
> 
> It depends on the content of your NIC's eeprom which ISR the driver will
> use.
> If your NIC is full bus master capable boomerang_interrupt() will be
> used, if not vortex_interrupt() will be used. As far as I know the 3c920
> is of "tornado" type, so it should be full bus master capable.
Okay, I thought there is something wrong, because the 3c920 is listed in
the "vortex" device table.

> Your first report showed dnComplete as a pending interrupt source.
> Since the driver uses tx_interrupt_mitigation, this happens just if the 
> tx ring is full. Could you please test the patch below? This disables
> tx_interrupt_mitigation, so the dnComplete interrupt will be triggered 
> with every packet.
I disabled tx_interrupt_mitigation and tested it with and w/o the patch
for boomerang_interrupt(). The network transfer stalls in both cases
sooner or later.

Here are two photos from the kernel output:
- no tx_interrupt_mitigation and patch for booomerang_interrupt():
http://boot.homelinux.org:8080/kernel/oops_boomerang_irq_tx_irq_mitig.jpg
- no tx_interrupt_mitigation only:
http://boot.homelinux.org:8080/kernel/oops_tx_irq_mitig.jpg

Thanks!

best regards,

Gerhard
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-04-21 18:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-09 22:42 3c59x: shared interrupt problem Gerhard Pircher
2009-03-09 23:49 ` Stephen Hemminger
2009-03-10  8:16   ` Steffen Klassert
2009-03-10 21:55     ` Andrew Morton
2009-03-11 11:38       ` Steffen Klassert
2009-03-13 22:51         ` David Miller
2009-03-14 14:08           ` Steffen Klassert
2009-03-14 18:40             ` David Miller
2009-03-17  9:37             ` Gerhard Pircher
     [not found]     ` <20090310090053.322240@gmx.net>
2009-03-11 11:31       ` Steffen Klassert
2009-03-27  7:59 ` Steffen Klassert
2009-03-28 14:17   ` Gerhard Pircher
2009-04-21 18:36   ` Gerhard Pircher
2009-03-11 22:42 Gerhard Pircher
2009-03-12 14:39 ` Steffen Klassert
2009-03-12 15:12   ` Gerhard Pircher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.