* kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! @ 2007-01-05 19:03 Thibaut VARENE 2007-01-09 9:26 ` Jarek Poplawski 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-05 19:03 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: text/plain, Size: 917 bytes --] Hi, I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M RAM) box for a while: I can reliably kill my machine in about half an hour while watching some video read from a remote nfs volume (hence the "mplayer" task in the following dump). It was relatively uneasy to get proper debug info as the crash happens while video was playing on the screen, but it's there anyway :) This particular dump comes from kernel 2.6.19-ck2 but I reproduced the bug with vanilla 2.6.19 too, so the bug lives in mainline. I'm not really familiar with that particular code, but I'd gladly provide as much debug info as I can. The box is hooked to a gigabit switch and the NIC is configured as gigabit too. Interestingly, when I reboot immediately after the crash, the NIC gets a bogus MAC address, and I have to reboot again to get back to normal. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ [-- Attachment #2: panicdump.txt --] [-- Type: text/plain, Size: 4972 bytes --] kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: nfs lockd sunrpc eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci14 NIP: C020F0E0 LR: C0210C54 CTR: C0210B98 REGS: c7f6f670 TRAP: 0700 Not tainted (2.6.19-ck2) MSR: 00021032 <ME,IR,DR> CR: 24022488 XER: 00000000 TASK = c49a8d10[2227] 'mplayer' THREAD: c7f6e000 GPR00: 00000000 C7F6F720 C49A8D10 DFF41260 DFF41000 0000000B CE0CF932 00000000 GPR08: 00000CEA 00000001 00001000 00000CEB 44022422 1085F9B8 C50B0368 0000B241 GPR16: C7F6FD28 0000B240 00000000 DFF412DC C0380000 00009032 00000400 C7F6E000 GPR24: 00000000 00000000 DFF41000 C7F6E000 C0210B98 CE0EAC80 DFF41260 CE0CF900 NIP [C020F0E0] eth_alloc_tx_desc_index+0x44/0x50 LR [C0210C54] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [C7F6F720] [CE0CF930] 0xce0cf930 (unreliable) [C7F6F760] [C0299714] dev_hard_start_xmit+0x1d4/0x2c8 [C7F6F780] [C029C0E0] dev_queue_xmit+0x2bc/0x334 [C7F6F7A0] [C02B6E1C] ip_output+0x124/0x248 [C7F6F7C0] [C02B7E54] ip_queue_xmit+0x17c/0x404 [C7F6F830] [C02C91BC] tcp_transmit_skb+0x38c/0x7dc [C7F6F860] [C02C65E4] __tcp_ack_snd_check+0x64/0xbc [C7F6F870] [C02C8100] tcp_rcv_established+0x5d4/0x980 [C7F6F8A0] [C02CEDCC] tcp_v4_do_rcv+0xd8/0x3e4 [C7F6F8D0] [C02D1610] tcp_v4_rcv+0x788/0x98c [C7F6F900] [C02B2594] ip_local_deliver+0xe4/0x1a4 [C7F6F920] [C02B2A50] ip_rcv+0x288/0x46c [C7F6F950] [C0299308] netif_receive_skb+0x214/0x304 [C7F6F980] [C0211CBC] mv643xx_poll+0x41c/0x48c [C7F6F9D0] [C029B550] net_rx_action+0x98/0x200 [C7F6FA00] [C0026958] __do_softirq+0x80/0xf4 [C7F6FA30] [C0006930] do_softirq+0x58/0x5c [C7F6FA40] [C0026408] irq_exit+0x60/0x80 [C7F6FA50] [C00069DC] do_IRQ+0xa8/0xc8 [C7F6FA60] [C0012498] ret_from_except+0x0/0x14 --- Exception: 501 at __kmalloc+0x30/0xc0 LR = rpc_malloc+0x48/0xac [sunrpc] [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable) [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] [C7F6FB80] [E2B0EEB0] nfs_execute_read+0x40/0x64 [nfs] [C7F6FBB0] [E2B0F6A4] nfs_pagein_one+0x2a0/0x300 [nfs] [C7F6FBF0] [E2B0FA9C] nfs_readpages+0x118/0x1f8 [nfs] [C7F6FC40] [C00521DC] __do_page_cache_readahead+0x1e8/0x318 [C7F6FCD0] [C0052390] blockable_page_cache_readahead+0x84/0x114 [C7F6FCF0] [C00524A4] make_ahead_window+0x84/0xd4 [C7F6FD00] [C00525AC] page_cache_readahead+0xb8/0x220 [C7F6FD20] [C004B00C] do_generic_mapping_read+0x574/0x5e8 [C7F6FDC0] [C004D624] generic_file_aio_read+0x120/0x274 [C7F6FE00] [E2B06F00] nfs_file_read+0xc4/0xe4 [nfs] [C7F6FE30] [C006EB50] do_sync_read+0xc4/0x138 [C7F6FEF0] [C006F734] vfs_read+0xc4/0x1a4 [C7F6FF10] [C006FC24] sys_read+0x4c/0x90 [C7F6FF40] [C0011DF0] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xf4f54d8 LR = 0x101e83a8 Instruction dump: 5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850 7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 180 seconds.. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-05 19:03 kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! Thibaut VARENE @ 2007-01-09 9:26 ` Jarek Poplawski 2007-01-09 10:27 ` Thibaut VARENE 0 siblings, 1 reply; 20+ messages in thread From: Jarek Poplawski @ 2007-01-09 9:26 UTC (permalink / raw) To: Thibaut VARENE; +Cc: netdev On 05-01-2007 20:03, Thibaut VARENE wrote: > Hi, > > I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M ... > [C7F6FA60] [C0012498] ret_from_except+0x0/0x14 > --- Exception: 501 at __kmalloc+0x30/0xc0 > LR = rpc_malloc+0x48/0xac [sunrpc] > [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable) > [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] > [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] > [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] Aren't there any other warnings displayed before? No problems with memory or disk? Regards, Jarek P. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 9:26 ` Jarek Poplawski @ 2007-01-09 10:27 ` Thibaut VARENE 2007-01-09 10:52 ` Jarek Poplawski ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Thibaut VARENE @ 2007-01-09 10:27 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > On 05-01-2007 20:03, Thibaut VARENE wrote: > > Hi, > > > > I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M > ... > > [C7F6FA60] [C0012498] ret_from_except+0x0/0x14 > > --- Exception: 501 at __kmalloc+0x30/0xc0 > > LR = rpc_malloc+0x48/0xac [sunrpc] > > [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable) > > [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] > > [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] > > [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] > > Aren't there any other warnings displayed before? No, I've pasted the full dump that appeared on the serial console I had setup when the crash occured. > No problems with memory or disk? I suspected both and changed both the disk and the ram for quality parts, that I tested afterwards. Both passed thorough tests. Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), works absolutely fine. HTH T-Bone (PS: please CC me in answers) -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 10:27 ` Thibaut VARENE @ 2007-01-09 10:52 ` Jarek Poplawski 2007-01-09 10:56 ` Thibaut VARENE 2007-01-09 10:57 ` Jarek Poplawski 2007-01-09 13:02 ` Jarek Poplawski 2 siblings, 1 reply; 20+ messages in thread From: Jarek Poplawski @ 2007-01-09 10:52 UTC (permalink / raw) To: Thibaut VARENE; +Cc: netdev On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: ... > I suspected both and changed both the disk and the ram for quality > parts, that I tested afterwards. Both passed thorough tests. You wrote about half an hour, so overheating was also considered, I presume. > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > works absolutely fine. So it looks like the card/driver (or maybe this specimen?). Jarek P. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 10:52 ` Jarek Poplawski @ 2007-01-09 10:56 ` Thibaut VARENE 2007-01-09 11:48 ` Jarek Poplawski 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-09 10:56 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: > ... > > I suspected both and changed both the disk and the ram for quality > > parts, that I tested afterwards. Both passed thorough tests. > > You wrote about half an hour, so overheating was also > considered, I presume. Yes, but since it works fine with the other NIC... :) > > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > > works absolutely fine. > > So it looks like the card/driver (or maybe this specimen?). I'm suspecting the driver, but I'm not a specialist :) It's true that this particular card specimen could be damaged even though that seems a bit unlikely. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 10:56 ` Thibaut VARENE @ 2007-01-09 11:48 ` Jarek Poplawski 0 siblings, 0 replies; 20+ messages in thread From: Jarek Poplawski @ 2007-01-09 11:48 UTC (permalink / raw) To: Thibaut VARENE; +Cc: netdev On Tue, Jan 09, 2007 at 11:56:59AM +0100, Thibaut VARENE wrote: ... > I'm suspecting the driver, but I'm not a specialist :) No problem, me also! I've also suspected the driver, looked at the code, found nothing yet (as expected), but this info about exception in malloc, introduces some doubts... Jarek P. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 10:27 ` Thibaut VARENE 2007-01-09 10:52 ` Jarek Poplawski @ 2007-01-09 10:57 ` Jarek Poplawski 2007-01-09 13:02 ` Jarek Poplawski 2 siblings, 0 replies; 20+ messages in thread From: Jarek Poplawski @ 2007-01-09 10:57 UTC (permalink / raw) To: Thibaut VARENE; +Cc: netdev On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: ... > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > works absolutely fine. ... and the speed could matter here too ... Jarek P. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 10:27 ` Thibaut VARENE 2007-01-09 10:52 ` Jarek Poplawski 2007-01-09 10:57 ` Jarek Poplawski @ 2007-01-09 13:02 ` Jarek Poplawski 2007-01-09 17:44 ` Thibaut VARENE 2 siblings, 1 reply; 20+ messages in thread From: Jarek Poplawski @ 2007-01-09 13:02 UTC (permalink / raw) To: Thibaut VARENE; +Cc: netdev On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: ... > I suspected both and changed both the disk and the ram for quality > parts, that I tested afterwards. Both passed thorough tests. > > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > works absolutely fine. If you are not tired, I'd suggest two more tests: - as above but with NIC set to 100Mbps also, - long downloading but without nfs e.g. ftp (btw. there were some patches after 2.6.19 for rpc memory races). Jarek P. PS: Maintainers were cc-ed, I hope? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 13:02 ` Jarek Poplawski @ 2007-01-09 17:44 ` Thibaut VARENE 2007-01-09 20:05 ` Dale Farnsworth 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-09 17:44 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev, dale, mlachwani [-- Attachment #1: Type: text/plain, Size: 1946 bytes --] On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: > ... > > I suspected both and changed both the disk and the ram for quality > > parts, that I tested afterwards. Both passed thorough tests. > > > > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > > works absolutely fine. > > If you are not tired, I'd suggest two more tests: I volunteered to help :) For the sake of testing up-to-date code, I performed the following tests with 2.6.20-rc4. First test was the usual nfs video playback. Crashdump is panic-2.6.20-rc4-nfs.txt. Went down in about 20mn. > - as above but with NIC set to 100Mbps also, Couldn't crash the machine (or at least it didn't happen in the time frame I was willing to wait for doing ftp downloads, ~20mn). One note though: The throughput of the card was terribly sucky when set in 100-FD: I couldn't get more than 5,5MB/s doing ftp get writing to /dev/null (to rule out disk perf), ie, half the max link speed, though the /only/ thing I changed in the setup was the link speed (same switch - made sure it properly detected link speed/duplex, same file server, same everything else). When configured in 1000-FD, still writing to /dev/null I could get about 60MB/s. Again half link speed, but there, I suppose that the remote fileserver couldn't pull data faster from the disks :) > - long downloading but without nfs e.g. ftp That was fast and easy. In 1000-FD, I took down the box in 2s (after downloading 90MB). Crashdump is panic-2.6.20-rc4-ftp.txt > (btw. there were some patches after 2.6.19 > for rpc memory races). It seems that's something else. I think I also reproduced the bug while surfing the internet with firefox, but I didn't have serial line hooked to capture a dump, unfortunately. > PS: Maintainers were cc-ed, I hope? Now they are :) HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ [-- Attachment #2: panic-2.6.20-rc4-ftp.txt --] [-- Type: text/plain, Size: 3726 bytes --] Debian GNU/Linux 4.0 Alucard ttyS0 Alucard login: ------------[ cut here ]------------ kernel BUG at drivers/net/mv643xx_eth.c:1071! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci1394 parport_pc pae NIP: C0210B40 LR: C02126DC CTR: C0212620 REGS: da247ac0 TRAP: 0700 Not tainted (2.6.20-rc4) MSR: 00021032 <ME,IR,DR> CR: 28222488 XER: 00000000 TASK = db82a050[1780] 'ncftp' THREAD: da246000 GPR00: 00000000 DA247B70 DB82A050 CFB14260 CFB14000 0000000B DED5FD72 00000000 GPR08: 00000819 00000001 00001000 0000081A 48222422 10056CD0 28004422 C03D9BF8 GPR16: 00000000 00000000 00000000 DA246000 00000001 CFB142BC 00009032 00000000 GPR24: 00000000 00000000 C03E0000 CFB14000 C0212620 DEDFD160 CFB14260 DED5FD40 NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50 LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [DA247B70] [DED5FD70] 0xded5fd70 (unreliable) [DA247BB0] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8 [DA247BD0] [C02A1BF4] dev_queue_xmit+0x2bc/0x334 [DA247BF0] [C02BC8A8] ip_output+0x120/0x244 [DA247C10] [C02BD8DC] ip_queue_xmit+0x17c/0x408 [DA247C80] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc [DA247CC0] [C02CBF80] __tcp_ack_snd_check+0x64/0xbc [DA247CD0] [C02CDA94] tcp_rcv_established+0x5d4/0x980 [DA247D00] [C02D4764] tcp_v4_do_rcv+0xe0/0x3c0 [DA247D30] [C0294B58] release_sock+0x7c/0xf4 [DA247D50] [C02C5C1C] tcp_recvmsg+0x4c8/0xbcc [DA247DB0] [C0294490] sock_common_recvmsg+0x3c/0x60 [DA247DD0] [C02920E4] sock_aio_read+0x10c/0x114 [DA247E30] [C006F210] do_sync_read+0xc4/0x138 [DA247EF0] [C006FECC] vfs_read+0x19c/0x1a4 [DA247F10] [C00702E4] sys_read+0x4c/0x90 [DA247F40] [C00122EC] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff5ba98 LR = 0x10032fc0 Instruction dump: 5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850 7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some . atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. [-- Attachment #3: panic-2.6.20-rc4-nfs.txt --] [-- Type: text/plain, Size: 4455 bytes --] Debian GNU/Linux 4.0 Alucard ttyS0 Alucard login: [drm] Setting GART location based on new memory map [drm] Loading R200 Microcode [drm] writeback test succeeded in 1 usecs ------------[ cut here ]------------ kernel BUG at drivers/net/mv643xx_eth.c:1071! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: nfs lockd sunrpc NIP: C0210B40 LR: C02126DC CTR: C0212620 REGS: d8961aa0 TRAP: 0700 Not tainted (2.6.20-rc4) MSR: 00021032 <ME,IR,DR> CR: 24022088 XER: 00000000 TASK = dffd91e0[3879] 'rpciod/0' THREAD: d8960000 GPR00: 00000000 D8961B50 DFFD91E0 CFB1E260 CFB1E000 0000000B DECA91B2 00000000 GPR08: 00000B6A 00000001 00001000 00000B6B 44022022 FFF045B4 009B52B4 017FFA7C GPR16: 009B52AC 017FFA80 009B4E68 D8960000 C03B0000 CFB1E2BC 00009032 00000000 GPR24: 00000000 00000000 C03E0000 CFB1E000 C0212620 DF0033A0 CFB1E260 DECA9180 NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50 LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [D8961B50] [DECA91B0] 0xdeca91b0 (unreliable) [D8961B90] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8 [D8961BB0] [C02A1BF4] dev_queue_xmit+0x2bc/0x334 [D8961BD0] [C02BC8A8] ip_output+0x120/0x244 [D8961BF0] [C02BD8DC] ip_queue_xmit+0x17c/0x408 [D8961C60] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc [D8961CA0] [C02CBF80] __tcp_ack_snd_check+0x64/0xbc [D8961CB0] [C02CDA94] tcp_rcv_established+0x5d4/0x980 [D8961CE0] [C02D4764] tcp_v4_do_rcv+0xe0/0x3c0 [D8961D10] [C02D6F2C] tcp_v4_rcv+0x760/0x940 [D8961D40] [C02B805C] ip_local_deliver+0xe4/0x1a4 [D8961D60] [C02B8518] ip_rcv+0x288/0x46c [D8961D90] [C029EE4C] netif_receive_skb+0x214/0x304 [D8961DC0] [C0213744] mv643xx_poll+0x41c/0x48c [D8961E10] [C02A1064] net_rx_action+0x98/0x200 [D8961E40] [C0026D48] __do_softirq+0x80/0xf4 [D8961E70] [C00068F4] do_softirq+0x58/0x5c [D8961E80] [C00267FC] irq_exit+0x60/0x80 [D8961E90] [C00069A0] do_IRQ+0xa8/0xc8 [D8961EA0] [C0012994] ret_from_except+0x0/0x14 --- Exception: 501 at add_wait_queue+0x50/0x84 LR = worker_thread+0x100/0x154 [D8961F60] [D988CE28] 0xd988ce28 (unreliable) [D8961F70] [C0035AB4] worker_thread+0x100/0x154 [D8961FC0] [C0039B4C] kthread+0xc0/0xfc [D8961FF0] [C00131C4] kernel_thread+0x44/0x60 Instruction dump: 5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850 7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some . atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 17:44 ` Thibaut VARENE @ 2007-01-09 20:05 ` Dale Farnsworth 2007-01-09 21:05 ` Thibaut VARENE 0 siblings, 1 reply; 20+ messages in thread From: Dale Farnsworth @ 2007-01-09 20:05 UTC (permalink / raw) To: Thibaut VARENE; +Cc: Jarek Poplawski, netdev, mlachwani On Tue, Jan 09, 2007 at 06:44:49PM +0100, Thibaut VARENE wrote: > On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > >On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: > >... > >> I suspected both and changed both the disk and the ram for quality > >> parts, that I tested afterwards. Both passed thorough tests. > >> > >> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > >> works absolutely fine. > > > >If you are not tired, I'd suggest two more tests: > > I volunteered to help :) Thank you Thibaut. Please try the following patch: From: Dale Farnsworth <dale@farnsworth.org> Reserve one unused descriptor in the TX ring to facilitate testing for when the ring is full. --- Signed-off-by: Dale Farnsworth <dale@farnsworth.org> drivers/net/mv643xx_eth.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index 9997081..72f82ba 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -289,7 +289,7 @@ static void mv643xx_eth_tx_timeout_task( eth_port_reset(mp->port_num); eth_port_start(dev); - if (mp->tx_ring_size - mp->tx_desc_count >= MAX_DESCS_PER_SKB) + if (mp->tx_ring_size - mp->tx_desc_count > MAX_DESCS_PER_SKB) netif_wake_queue(dev); } @@ -356,7 +356,7 @@ static void mv643xx_eth_free_completed_t struct mv643xx_private *mp = netdev_priv(dev); if (mv643xx_eth_free_tx_descs(dev, 0) && - mp->tx_ring_size - mp->tx_desc_count >= MAX_DESCS_PER_SKB) + mp->tx_ring_size - mp->tx_desc_count > MAX_DESCS_PER_SKB) netif_wake_queue(dev); } @@ -536,7 +536,7 @@ static irqreturn_t mv643xx_eth_int_handl ETH_TX_QUEUES_ENABLED); if (!netif_carrier_ok(dev)) { netif_carrier_on(dev); - if (mp->tx_ring_size - mp->tx_desc_count >= + if (mp->tx_ring_size - mp->tx_desc_count > MAX_DESCS_PER_SKB) netif_wake_queue(dev); } @@ -1194,7 +1194,7 @@ static int mv643xx_eth_start_xmit(struct BUG_ON(netif_queue_stopped(dev)); BUG_ON(skb == NULL); - if (mp->tx_ring_size - mp->tx_desc_count < MAX_DESCS_PER_SKB) { + if (mp->tx_ring_size - mp->tx_desc_count <= MAX_DESCS_PER_SKB) { printk(KERN_ERR "%s: transmit with queue full\n", dev->name); netif_stop_queue(dev); return 1; @@ -1216,7 +1216,7 @@ static int mv643xx_eth_start_xmit(struct stats->tx_packets++; dev->trans_start = jiffies; - if (mp->tx_ring_size - mp->tx_desc_count < MAX_DESCS_PER_SKB) + if (mp->tx_ring_size - mp->tx_desc_count <= MAX_DESCS_PER_SKB) netif_stop_queue(dev); spin_unlock_irqrestore(&mp->lock, flags); ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 20:05 ` Dale Farnsworth @ 2007-01-09 21:05 ` Thibaut VARENE 2007-01-10 17:12 ` Thibaut VARENE 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-09 21:05 UTC (permalink / raw) To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani On 1/9/07, Dale Farnsworth <dale@farnsworth.org> wrote: > On Tue, Jan 09, 2007 at 06:44:49PM +0100, Thibaut VARENE wrote: > > On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > > >On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: > > >... > > >> I suspected both and changed both the disk and the ram for quality > > >> parts, that I tested afterwards. Both passed thorough tests. > > >> > > >> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > > >> works absolutely fine. > > > > > >If you are not tired, I'd suggest two more tests: > > > > I volunteered to help :) > > Thank you Thibaut. Please try the following patch: > > From: Dale Farnsworth <dale@farnsworth.org> > > Reserve one unused descriptor in the TX ring > to facilitate testing for when the ring is full. Dale, tried it and unfortunately: Alucard login: ------------[ cut here ]------------ kernel BUG at drivers/net/mv643xx_eth.c:1071! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd vt8231 ohci1394 ieee13t NIP: C0210B40 LR: C02126DC CTR: C0212620 REGS: dd2d7b40 TRAP: 0700 Not tainted (2.6.20-rc4) MSR: 00021032 <ME,IR,DR> CR: 28242488 XER: 00000000 TASK = da03c640[1775] 'ncftp' THREAD: dd2d6000 GPR00: 00000000 DD2D7BF0 DA03C640 CFB16260 CFB16000 0000000B DF79FDD2 00000000 GPR08: 00000BA9 00000001 00001000 00000BAA 28242482 10056CD0 28004422 C03D9BF8 GPR16: 00000000 00000000 00000000 DD2D6000 00000001 CFB162BC 00009032 00000000 GPR24: 000005A8 00000000 C03E0000 CFB16000 C0212620 CFCB3260 CFB16260 DF79FDA0 NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50 LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [DD2D7BF0] [DF79FDD0] 0xdf79fdd0 (unreliable) [DD2D7C30] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8 [DD2D7C50] [C02A1BF4] dev_queue_xmit+0x2bc/0x334 [DD2D7C70] [C02BC8A8] ip_output+0x120/0x244 [DD2D7C90] [C02BD8DC] ip_queue_xmit+0x17c/0x408 [DD2D7D00] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc [DD2D7D40] [C02C2FC0] tcp_cleanup_rbuf+0xb8/0x158 [DD2D7D50] [C02C5C14] tcp_recvmsg+0x4c0/0xbcc [DD2D7DB0] [C0294490] sock_common_recvmsg+0x3c/0x60 [DD2D7DD0] [C02920E4] sock_aio_read+0x10c/0x114 [DD2D7E30] [C006F210] do_sync_read+0xc4/0x138 [DD2D7EF0] [C006FECC] vfs_read+0x19c/0x1a4 [DD2D7F10] [C00702E4] sys_read+0x4c/0x90 [DD2D7F40] [C00122EC] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff5ba98 LR = 0x10032fc0 Instruction dump: 5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850 7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some . atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-09 21:05 ` Thibaut VARENE @ 2007-01-10 17:12 ` Thibaut VARENE 2007-01-11 10:42 ` [PATCH] " Jarek Poplawski 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-10 17:12 UTC (permalink / raw) To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani On 1/9/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote: > On 1/9/07, Dale Farnsworth <dale@farnsworth.org> wrote: > > > > Thank you Thibaut. Please try the following patch: > > > > From: Dale Farnsworth <dale@farnsworth.org> > > > > Reserve one unused descriptor in the TX ring > > to facilitate testing for when the ring is full. > > Dale, > > tried it and unfortunately: Also, I don't know if you read that bit, but everytime I reboot the box immediately after a crash, the NIC gets a bogus (always the same it seems) MAC address, and I have to reboot one more time to get back to the "normal" MAC address. Dunno if that hints anything though. HTH -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-10 17:12 ` Thibaut VARENE @ 2007-01-11 10:42 ` Jarek Poplawski 2007-01-21 12:18 ` Thibaut VARENE 0 siblings, 1 reply; 20+ messages in thread From: Jarek Poplawski @ 2007-01-11 10:42 UTC (permalink / raw) To: Thibaut VARENE; +Cc: Dale Farnsworth, netdev, mlachwani On Wed, Jan 10, 2007 at 06:12:29PM +0100, Thibaut VARENE wrote: > On 1/9/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote: > >On 1/9/07, Dale Farnsworth <dale@farnsworth.org> wrote: > >> > >> Thank you Thibaut. Please try the following patch: > >> > >> From: Dale Farnsworth <dale@farnsworth.org> > >> > >> Reserve one unused descriptor in the TX ring > >> to facilitate testing for when the ring is full. > > > >Dale, > > > >tried it and unfortunately: > > Also, I don't know if you read that bit, but everytime I reboot the > box immediately after a crash, the NIC gets a bogus (always the same > it seems) MAC address, and I have to reboot one more time to get back > to the "normal" MAC address. > > Dunno if that hints anything though. There is something in the code about MAC writing and saving some config during initialization, so probably it's possible if reinitialization was broken. I tried to look more into the code and here are my (maybe wrong) conclusions: - It looks like something could be broken during tx descs freeing or eth_tx_timeout_task. I compared the timeout code with e100 and tg3 and have a feeling mv643xx_eth is doing less but I'm not able to estimate the importance of this. - Such errors, IMHO, could be possible with races and not enough locking, and btw. I think suspected function isn't properly locked: mp->tx_desc_count in while condition isn't protected at all. Below I attach a patch proposal but I'm not sure some irq off or spin_lock isn't also needed elswere. If it's only locking it would be suitable to do the test with a kernel compiled without PREEMPT and SMP, but if irqs nothing should change... Regards, Jarek P. PS: alas I didn't even check compiling - I had no time to find all compile dependencies of this driver --- Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c linux-2.6.20-rc4/drivers/net/mv643xx_eth.c --- linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c 2006-12-18 08:57:52.000000000 +0100 +++ linux-2.6.20-rc4/drivers/net/mv643xx_eth.c 2007-01-11 08:55:34.000000000 +0100 @@ -312,8 +312,8 @@ int mv643xx_eth_free_tx_descs(struct net int count; int released = 0; + spin_lock_irqsave(&mp->lock, flags); while (mp->tx_desc_count > 0) { - spin_lock_irqsave(&mp->lock, flags); tx_index = mp->tx_used_desc_q; desc = &mp->p_tx_desc_area[tx_index]; cmd_sts = desc->cmd_sts; @@ -348,8 +348,10 @@ int mv643xx_eth_free_tx_descs(struct net dev_kfree_skb_irq(skb); released = 1; + spin_lock_irqsave(&mp->lock, flags); } + spin_unlock_irqrestore(&mp->lock, flags); return released; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-11 10:42 ` [PATCH] " Jarek Poplawski @ 2007-01-21 12:18 ` Thibaut VARENE 2007-01-21 13:02 ` Thibaut VARENE 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-21 12:18 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Dale Farnsworth, netdev, mlachwani On 1/11/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > > PS: alas I didn't even check compiling - I had no time to > find all compile dependencies of this driver > --- > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > --- > > diff -Nurp linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c linux-2.6.20-rc4/drivers/net/mv643xx_eth.c > --- linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c 2006-12-18 08:57:52.000000000 +0100 > +++ linux-2.6.20-rc4/drivers/net/mv643xx_eth.c 2007-01-11 08:55:34.000000000 +0100 > @@ -312,8 +312,8 @@ int mv643xx_eth_free_tx_descs(struct net > int count; > int released = 0; > > + spin_lock_irqsave(&mp->lock, flags); > while (mp->tx_desc_count > 0) { > - spin_lock_irqsave(&mp->lock, flags); > tx_index = mp->tx_used_desc_q; > desc = &mp->p_tx_desc_area[tx_index]; > cmd_sts = desc->cmd_sts; > @@ -348,8 +348,10 @@ int mv643xx_eth_free_tx_descs(struct net > dev_kfree_skb_irq(skb); Hmm, I think this is guaranteed not to work. In between those lines the lock is released, while data in the mp structure is still being accessed. It seems that this bit of code is indeed not race-safe though, I'm gonna try to figure something. > released = 1; > + spin_lock_irqsave(&mp->lock, flags); > } > > + spin_unlock_irqrestore(&mp->lock, flags); > return released; > } Ugh, this is really unclean... Taking a lock "for nothing" like that has a perf cost. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-21 12:18 ` Thibaut VARENE @ 2007-01-21 13:02 ` Thibaut VARENE 2007-01-22 10:02 ` Jarek Poplawski 0 siblings, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-21 13:02 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Dale Farnsworth, netdev, mlachwani [-- Attachment #1: Type: text/plain, Size: 980 bytes --] On 1/21/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote: > On 1/11/07, Jarek Poplawski <jarkao2@o2.pl> wrote: > > > > PS: alas I didn't even check compiling - I had no time to > > find all compile dependencies of this driver > > --- > Hmm, I think this is guaranteed not to work. In between those lines > the lock is released, while data in the mp structure is still being > accessed. It seems that this bit of code is indeed not race-safe > though, I'm gonna try to figure something. This was indeed the right spot. The attached raw hack seems to fix the bug (I couldn't crash the box so far). I haven't checked that the same "situation" happens elsewhere in the code, I leave that as an exercise for the maintainers (or until I experience another kind of crash :) The patch is a bit ugly (printk with irq disabled will not show, etc) but at least it does work. I'm sure somebody will figure something HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: fix_mv643xx_race.patch --] [-- Type: text/x-patch; name="fix_mv643xx_race.patch", Size: 768 bytes --] --- linux-2.6.19.orig/drivers/net/mv643xx_eth.c 2007-01-21 13:56:04.450689123 +0100 +++ linux-2.6.19/drivers/net/mv643xx_eth.c 2007-01-21 13:39:58.228404763 +0100 @@ -312,8 +312,8 @@ int count; int released = 0; + spin_lock_irqsave(&mp->lock, flags); while (mp->tx_desc_count > 0) { - spin_lock_irqsave(&mp->lock, flags); tx_index = mp->tx_used_desc_q; desc = &mp->p_tx_desc_area[tx_index]; cmd_sts = desc->cmd_sts; @@ -332,8 +332,6 @@ if (skb) mp->tx_skb[tx_index] = NULL; - spin_unlock_irqrestore(&mp->lock, flags); - if (cmd_sts & ETH_ERROR_SUMMARY) { printk("%s: Error in TX\n", dev->name); mp->stats.tx_errors++; @@ -349,6 +347,7 @@ released = 1; } + spin_unlock_irqrestore(&mp->lock, flags); return released; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-21 13:02 ` Thibaut VARENE @ 2007-01-22 10:02 ` Jarek Poplawski 2007-01-22 17:06 ` Dale Farnsworth 0 siblings, 1 reply; 20+ messages in thread From: Jarek Poplawski @ 2007-01-22 10:02 UTC (permalink / raw) To: Thibaut VARENE; +Cc: Dale Farnsworth, netdev, mlachwani On Sun, Jan 21, 2007 at 02:02:15PM +0100, Thibaut VARENE wrote: > On 1/21/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote: ... > >Hmm, I think this is guaranteed not to work. In between those lines > >the lock is released, while data in the mp structure is still being > >accessed. It seems that this bit of code is indeed not race-safe > >though, I'm gonna try to figure something. I only changed the part I was quite sure is wrong. I didn't know the internals of this place but thought probably somebody had some reasons to enable irqs here. I hope the maintainers will decide the range of necessary changes considering your testing and the patch. > This was indeed the right spot. The attached raw hack seems to fix the > bug (I couldn't crash the box so far). I haven't checked that the > same "situation" happens elsewhere in the code, I leave that as an > exercise for the maintainers (or until I experience another kind of > crash :) > > The patch is a bit ugly (printk with irq disabled will not show, etc) > but at least it does work. I'm sure somebody will figure something Congratulations and regards, Jarek P. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-22 10:02 ` Jarek Poplawski @ 2007-01-22 17:06 ` Dale Farnsworth 2007-01-23 8:17 ` Jarek Poplawski 2007-01-23 11:52 ` Thibaut VARENE 0 siblings, 2 replies; 20+ messages in thread From: Dale Farnsworth @ 2007-01-22 17:06 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Thibaut VARENE, netdev, mlachwani Jarek and Thibaut, Thank you both very much for your work finding and fixing this bug. Jarek, can you verify that the following patch fixes the problem you were seeing? -Dale ----- Patch follows ----- From: Dale Farnsworth <dale@farnsworth.org> mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs The bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org> and Jarek Poplawski <jarkao2@o2.pl>. This patch is a modification of their fixes. We acquire and release the lock for each descriptor that is freed to minimize the time the lock is held. --- drivers/net/mv643xx_eth.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index c41ae42..0d32381 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -314,6 +314,13 @@ int mv643xx_eth_free_tx_descs(struct net while (mp->tx_desc_count > 0) { spin_lock_irqsave(&mp->lock, flags); + + /* maybe tx_desc_count changed before the lock was acquired */ + if (mp->tx_desc_count <= 0) { + spin_unlock_irqrestore(&mp->lock, flags); + return released; + } + tx_index = mp->tx_used_desc_q; desc = &mp->p_tx_desc_area[tx_index]; cmd_sts = desc->cmd_sts; ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-22 17:06 ` Dale Farnsworth @ 2007-01-23 8:17 ` Jarek Poplawski 2007-01-23 11:52 ` Thibaut VARENE 1 sibling, 0 replies; 20+ messages in thread From: Jarek Poplawski @ 2007-01-23 8:17 UTC (permalink / raw) To: Dale Farnsworth; +Cc: Thibaut VARENE, netdev, mlachwani On Mon, Jan 22, 2007 at 10:06:16AM -0700, Dale Farnsworth wrote: > Jarek and Thibaut, > > Thank you both very much for your work finding and fixing this bug. > Jarek, can you verify that the following patch fixes the problem you > were seeing? > > -Dale Sorry, only Thibaut can verify this. I don't have such card. I can only confirm that your patch fixes unproper locking of mp->tx_desc_count in the while condition. But I'm not sure your way is optimal now because mp->tx_desc_count is checked 2 times per every loop. I think it is right only if you know the function mv643xx_eth_free_tx_descs is called mostly while mp->tx_desc_count == 0 or 1. > ----- Patch follows ----- > > From: Dale Farnsworth <dale@farnsworth.org> > > mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs > > The bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org> > and Jarek Poplawski <jarkao2@o2.pl>. This patch is a modification of their > fixes. We acquire and release the lock for each descriptor that is freed > to minimize the time the lock is held. > > --- From: Dale Farnsworth <dale@farnsworth.org> mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs The bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org> and Jarek Poplawski <jarkao2@o2.pl> noticed a locking problem. This patch is a modification of their fixes. We acquire and release the lock for each descriptor that is freed to minimize the time the lock is held. --- I did small adjustment of my role here. Regards, Jarek P. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-22 17:06 ` Dale Farnsworth 2007-01-23 8:17 ` Jarek Poplawski @ 2007-01-23 11:52 ` Thibaut VARENE 2007-01-23 12:42 ` Thibaut VARENE 1 sibling, 1 reply; 20+ messages in thread From: Thibaut VARENE @ 2007-01-23 11:52 UTC (permalink / raw) To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani On 1/22/07, Dale Farnsworth <dale@farnsworth.org> wrote: > Jarek and Thibaut, > > Thank you both very much for your work finding and fixing this bug. > Jarek, can you verify that the following patch fixes the problem you > were seeing? > > -Dale Hi Dale, The patch seems to work fine. Just thinking out loud (as I really don't know this part of the kernel), here are a few remarks: - As Jarek pointed out, you're checking twice the value of mp->tx_desc_count, which means dereferencing a pointer and accessing memory twice. I don't know how perf-critical this bit of code is, but I wonder which of keeping the lock for a long time or doing what you is better (I'm being anal and you probably know that better than me :) - Also, lines 344-349, in the test condition, cmd_sts (an indirection to mp content) is accessed (dunno if it's ok to do that outside of the lock), and on line 346, mp->stats.tx.errors is incremented outside of the spinlock protection. But then, I don't know what that lock is meant to protect, just pointing this out :) Thanks for your help, I hope the fix will go upstream asap :) And about being the author of the patch, since I'm not, I don't really mind 8) HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! 2007-01-23 11:52 ` Thibaut VARENE @ 2007-01-23 12:42 ` Thibaut VARENE 0 siblings, 0 replies; 20+ messages in thread From: Thibaut VARENE @ 2007-01-23 12:42 UTC (permalink / raw) To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani On 1/23/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote: > - As Jarek pointed out, you're checking twice the value of > mp->tx_desc_count, which means dereferencing a pointer and accessing > memory twice. I don't know how perf-critical this bit of code is, but > I wonder which of keeping the lock for a long time or doing what you > is better (I'm being anal and you probably know that better than me :) Forget that. That's an irq disabling lock, it's worse than anything else :) -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-01-23 12:42 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-01-05 19:03 kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! Thibaut VARENE 2007-01-09 9:26 ` Jarek Poplawski 2007-01-09 10:27 ` Thibaut VARENE 2007-01-09 10:52 ` Jarek Poplawski 2007-01-09 10:56 ` Thibaut VARENE 2007-01-09 11:48 ` Jarek Poplawski 2007-01-09 10:57 ` Jarek Poplawski 2007-01-09 13:02 ` Jarek Poplawski 2007-01-09 17:44 ` Thibaut VARENE 2007-01-09 20:05 ` Dale Farnsworth 2007-01-09 21:05 ` Thibaut VARENE 2007-01-10 17:12 ` Thibaut VARENE 2007-01-11 10:42 ` [PATCH] " Jarek Poplawski 2007-01-21 12:18 ` Thibaut VARENE 2007-01-21 13:02 ` Thibaut VARENE 2007-01-22 10:02 ` Jarek Poplawski 2007-01-22 17:06 ` Dale Farnsworth 2007-01-23 8:17 ` Jarek Poplawski 2007-01-23 11:52 ` Thibaut VARENE 2007-01-23 12:42 ` Thibaut VARENE
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.