linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG 2.6.0-test11] pcnet32 oops
@ 2003-12-05 23:45 Jean Tourrilhes
  2003-12-06  0:59 ` David S. Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Jean Tourrilhes @ 2003-12-05 23:45 UTC (permalink / raw)
  To: Linux kernel mailing list, Thomas Bogendrfer; +Cc: Jeff Garzik, netdev

	Hi Frank,

	Would you mind having a look at the following bugs and forward
as appropriate ?
	I'm trying to get an AMD Lance going on my box. It mostly
works, but I get oops when I deconfigure the card or remove the
module. A few samples :

	At "ifdown" :
----------------------------------
Badness in local_bh_enable at kernel/softirq.c:121
Call Trace:
 [<c011ff91>] local_bh_enable+0x35/0x58
 [<c022d19a>] ip_ct_find_proto+0xd2/0xd8
 [<c022d923>] destroy_conntrack+0x13/0x19c
 [<c01e68c8>] __kfree_skb+0xc8/0xfc
 [<d085b160>] pcnet32_purge_tx_ring+0x94/0xc4 [pcnet32]
 [<d085b300>] pcnet32_restart+0x14/0x6c [pcnet32]
 [<d085c089>] pcnet32_set_multicast_list+0x7d/0x90 [pcnet32]
 [<c01ef02e>] __dev_mc_upload+0x1e/0x24
 [<c01ef05a>] dev_mc_upload+0x26/0x38
 [<c01eb585>] dev_change_flags+0x2d/0x104
 [<c0222f53>] devinet_ioctl+0x327/0x64c
 [<c0225457>] inet_ioctl+0xbb/0xf8
 [<c01e3801>] sock_ioctl+0x29d/0x2cc
 [<c015c959>] sys_ioctl+0x21d/0x264
 [<c01097ad>] error_code+0x2d/0x38
 [<c0108d43>] syscall_call+0x7/0xb

-------------------------------------------------


	At "rmmod" :
---------------------------------------------
kernel BUG at include/linux/dcache.h:274!
invalid operand: 0000 [#1]
CPU:    1
EIP:    0060:[sysfs_remove_dir+21/256]    Not tainted
EFLAGS: 00010246
EIP is at sysfs_remove_dir+0x15/0x100
eax: 00000000   ebx: d085e760   ecx: c02ccb2c   edx: ffff0001
esi: c1983220   edi: d085e744   ebp: 00000880   esp: c1a49f04
ds: 007b   es: 007b   ss: 0068
Process rmmod (pid: 750, threadinfo=c1a48000 task=ca972060)
Stack: d085e760 c02ccb2c d085e744 00000880 c01858c3 d085e760 d085e760 c01858db 
       d085e760 c02ccae0 c01aec48 d085e760 d085e744 00000000 00000000 c01aef5c 
       d085e744 d085e720 00000000 c018b7e6 d085e744 c1875000 d085c88c d085e720 
Call Trace:
 [kobject_del+75/88] kobject_del+0x4b/0x58
 [kobject_unregister+11/24] kobject_unregister+0xb/0x18
 [bus_remove_driver+92/108] bus_remove_driver+0x5c/0x6c
 [driver_unregister+12/50] driver_unregister+0xc/0x32
 [pci_unregister_driver+14/28] pci_unregister_driver+0xe/0x1c
 [_end+273681636/1070212184] pcnet32_cleanup_module+0xac/0xc0 [pcnet32]
 [sys_delete_module+394/432] sys_delete_module+0x18a/0x1b0
 [sys_munmap+69/100] sys_munmap+0x45/0x64
 [syscall_call+7/11] syscall_call+0x7/0xb

Code: 0f 0b 12 01 c2 dd 25 c0 f0 ff 06 85 f6 0f 84 d0 00 00 00 8b 
---------------------------------------------

	Info :
	Dual PIII 550 SMP, 2.6.0-test11, many PCI cards, hotplug

02:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] (rev 36)
        Subsystem: Hewlett-Packard Company Ethernet with LAN remote power Adapter
        Flags: bus master, medium devsel, latency 64, IRQ 18
        I/O ports at 7ce0 [size=32]
        Memory at fdfff400 (32-bit, non-prefetchable) [size=32]
        Expansion ROM at <unassigned> [disabled] [size=1M]
        Capabilities: [40] Power Management version 1

pcnet32.c:v1.27b 01.10.2002 tsbogend@alpha.franken.de
pcnet32: PCnet/FAST+ 79C972 at 0x7ce0, 00 10 83 34 ba e5
    tx_start_pt(0x0c00):~220 bytes, BCR18(9861):BurstWrEn BurstRdEn NoUFlow 
    SRAMSIZE=0x1700, SRAM_BND=0x0800, assigned IRQ 18.
eth0: registered as PCnet/FAST+ 79C972
pcnet32: 1 cards_found.
eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

	Good luck...

	Jean

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 2.6.0-test11] pcnet32 oops
  2003-12-05 23:45 [BUG 2.6.0-test11] pcnet32 oops Jean Tourrilhes
@ 2003-12-06  0:59 ` David S. Miller
  2003-12-09 14:15   ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 5+ messages in thread
From: David S. Miller @ 2003-12-06  0:59 UTC (permalink / raw)
  To: jt; +Cc: jt, linux-kernel, tsbogend, jgarzik, netdev

On Fri, 5 Dec 2003 15:45:10 -0800
Jean Tourrilhes <jt@bougret.hpl.hp.com> wrote:

> Badness in local_bh_enable at kernel/softirq.c:121
> Call Trace:
>  [<c011ff91>] local_bh_enable+0x35/0x58
>  [<c022d19a>] ip_ct_find_proto+0xd2/0xd8
>  [<c022d923>] destroy_conntrack+0x13/0x19c
>  [<c01e68c8>] __kfree_skb+0xc8/0xfc
>  [<d085b160>] pcnet32_purge_tx_ring+0x94/0xc4 [pcnet32]
>  [<d085b300>] pcnet32_restart+0x14/0x6c [pcnet32]
>  [<d085c089>] pcnet32_set_multicast_list+0x7d/0x90 [pcnet32]

This is the classic case of doing disabling/enabling of software
interrupts with hardware interrupts disabled, which is a bug.

In this case pcnet32_set_multicast_list() is disabling hardware
interrupts, and the packet freeing of pcnet32_purge_tx_ring()
is what leads to the software interrupt disable/enable.

However, I'm inclined to believe that we should change dev_kfree_skb_any()
to fix this class of problems, by making it check for hardware interrupts
being disabled as well as being in an interrupt.

But we might not want to do it like this, just reenabling the hardware
interrupts at the top level won't cause the software interrupt to fire
to actually execute the SKB freeing work.... Need to think about this
some more.

===== include/linux/netdevice.h 1.66 vs edited =====
--- 1.66/include/linux/netdevice.h	Sat Nov  1 14:11:04 2003
+++ edited/include/linux/netdevice.h	Fri Dec  5 16:53:01 2003
@@ -634,7 +634,7 @@
  */
 static inline void dev_kfree_skb_any(struct sk_buff *skb)
 {
-	if (in_irq())
+	if (in_irq() || irqs_disabled())
 		dev_kfree_skb_irq(skb);
 	else
 		dev_kfree_skb(skb);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 2.6.0-test11] pcnet32 oops
  2003-12-06  0:59 ` David S. Miller
@ 2003-12-09 14:15   ` Rask Ingemann Lambertsen
  2003-12-10  8:30     ` David S. Miller
  2003-12-10  8:32     ` David S. Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Rask Ingemann Lambertsen @ 2003-12-09 14:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: jt, jt, linux-kernel, tsbogend, jgarzik, netdev

On Fri, Dec 05, 2003 at 04:59:00PM -0800, David S. Miller wrote:
> This is the classic case of doing disabling/enabling of software
> interrupts with hardware interrupts disabled, which is a bug.
> 
> In this case pcnet32_set_multicast_list() is disabling hardware
> interrupts, and the packet freeing of pcnet32_purge_tx_ring()
> is what leads to the software interrupt disable/enable.

I think the root cause of this problem is that pcnet32_set_multicast_list()
dumps the entire TX ring on the floor (as a side effect of calling
pcnet32_restart()). I don't think dev->set_multicast_list() is supposed to
do that.

> However, I'm inclined to believe that we should change dev_kfree_skb_any()
> to fix this class of problems, by making it check for hardware interrupts
> being disabled as well as being in an interrupt.

I've been wondering about this too with the recent netpoll patches. Many
(including pcnet32) implement the poll controller simply as

	disable_irq (dev->irq);
	driver_interrupt_handler (dev->irq, dev, NULL);
	enable_irq (dev->irq);

If the interrupt handler calls dev_kfree_skb_any(), could you then run into
this kind of problem? Or is it just if you call spin_lock_irq*() that you
have a problem?

-- 
Regards,
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 2.6.0-test11] pcnet32 oops
  2003-12-09 14:15   ` Rask Ingemann Lambertsen
@ 2003-12-10  8:30     ` David S. Miller
  2003-12-10  8:32     ` David S. Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David S. Miller @ 2003-12-10  8:30 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: jt, jt, linux-kernel, tsbogend, jgarzik, netdev

On Tue, 9 Dec 2003 15:15:02 +0100
Rask Ingemann Lambertsen <rask@sygehus.dk> wrote:

> On Fri, Dec 05, 2003 at 04:59:00PM -0800, David S. Miller wrote:
> > This is the classic case of doing disabling/enabling of software
> > interrupts with hardware interrupts disabled, which is a bug.
> > 
> > In this case pcnet32_set_multicast_list() is disabling hardware
> > interrupts, and the packet freeing of pcnet32_purge_tx_ring()
> > is what leads to the software interrupt disable/enable.
> 
> I think the root cause of this problem is that pcnet32_set_multicast_list()
> dumps the entire TX ring on the floor (as a side effect of calling
> pcnet32_restart()). I don't think dev->set_multicast_list() is supposed to
> do that.

The task of dev->set_multicast_list() is to do whatever is necessary
to update the multicast filter.

If the pcnet32 hardware for some odd reason requires that you flush
the TX list, it is appropriate.

> I've been wondering about this too with the recent netpoll patches. Many
> (including pcnet32) implement the poll controller simply as
> 
> 	disable_irq (dev->irq);
> 	driver_interrupt_handler (dev->irq, dev, NULL);
> 	enable_irq (dev->irq);
> 
> If the interrupt handler calls dev_kfree_skb_any(), could you then run into
> this kind of problem? Or is it just if you call spin_lock_irq*() that you
> have a problem?

No, it would not be a problem because the thing that dev_kfree_skb_any()
_does_ test right now ('in_irq()') would trigger.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 2.6.0-test11] pcnet32 oops
  2003-12-09 14:15   ` Rask Ingemann Lambertsen
  2003-12-10  8:30     ` David S. Miller
@ 2003-12-10  8:32     ` David S. Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David S. Miller @ 2003-12-10  8:32 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen; +Cc: jt, jt, linux-kernel, tsbogend, jgarzik, netdev


Sorry, forgot to mention this in the previous email.

The way to fix this properly in the pcnet32 driver itself would
be to pass a stack local "struct sk_buff_head" list down into
these deep routines while we have the locks held.

Instead of freeing the TX skbs, we add them all to this list.

At the top level, the code drops the spinlock and enables cpu irqs,
then it frees up any SKBs that were put on that list.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-12-10  8:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-05 23:45 [BUG 2.6.0-test11] pcnet32 oops Jean Tourrilhes
2003-12-06  0:59 ` David S. Miller
2003-12-09 14:15   ` Rask Ingemann Lambertsen
2003-12-10  8:30     ` David S. Miller
2003-12-10  8:32     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).