All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
@ 2007-01-05 19:03 Thibaut VARENE
  2007-01-09  9:26 ` Jarek Poplawski
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-05 19:03 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

Hi,

I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M
RAM) box for a while: I can reliably kill my machine in about half an
hour while watching some video read from a remote nfs volume (hence
the "mplayer" task in the following dump). It was relatively uneasy to
get proper debug info as the crash happens while video was playing on
the screen, but it's there anyway :)

This particular dump comes from kernel 2.6.19-ck2 but I reproduced the
bug with vanilla 2.6.19 too, so the bug lives in mainline. I'm not
really familiar with that particular code, but I'd gladly provide as
much debug info as I can.

The box is hooked to a gigabit switch and the NIC is configured as
gigabit too. Interestingly, when I reboot immediately after the crash,
the NIC gets a bogus MAC address, and I have to reboot again to get
back to normal.

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

[-- Attachment #2: panicdump.txt --]
[-- Type: text/plain, Size: 4972 bytes --]

kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!        
Oops: Exception in kernel mode, sig: 5 [#1]                                     
PREEMPT                                                                         
Modules linked in: nfs lockd sunrpc eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci14
NIP: C020F0E0 LR: C0210C54 CTR: C0210B98                                        
REGS: c7f6f670 TRAP: 0700   Not tainted  (2.6.19-ck2)                           
MSR: 00021032 <ME,IR,DR>  CR: 24022488  XER: 00000000                           
TASK = c49a8d10[2227] 'mplayer' THREAD: c7f6e000                                
GPR00: 00000000 C7F6F720 C49A8D10 DFF41260 DFF41000 0000000B CE0CF932 00000000  
GPR08: 00000CEA 00000001 00001000 00000CEB 44022422 1085F9B8 C50B0368 0000B241  
GPR16: C7F6FD28 0000B240 00000000 DFF412DC C0380000 00009032 00000400 C7F6E000  
GPR24: 00000000 00000000 DFF41000 C7F6E000 C0210B98 CE0EAC80 DFF41260 CE0CF900  
NIP [C020F0E0] eth_alloc_tx_desc_index+0x44/0x50                                
LR [C0210C54] mv643xx_eth_start_xmit+0xbc/0x3b8                                 
Call Trace:                                                                     
[C7F6F720] [CE0CF930] 0xce0cf930 (unreliable)                                   
[C7F6F760] [C0299714] dev_hard_start_xmit+0x1d4/0x2c8                           
[C7F6F780] [C029C0E0] dev_queue_xmit+0x2bc/0x334                                
[C7F6F7A0] [C02B6E1C] ip_output+0x124/0x248                                     
[C7F6F7C0] [C02B7E54] ip_queue_xmit+0x17c/0x404                                 
[C7F6F830] [C02C91BC] tcp_transmit_skb+0x38c/0x7dc                              
[C7F6F860] [C02C65E4] __tcp_ack_snd_check+0x64/0xbc                             
[C7F6F870] [C02C8100] tcp_rcv_established+0x5d4/0x980                           
[C7F6F8A0] [C02CEDCC] tcp_v4_do_rcv+0xd8/0x3e4                                  
[C7F6F8D0] [C02D1610] tcp_v4_rcv+0x788/0x98c                                    
[C7F6F900] [C02B2594] ip_local_deliver+0xe4/0x1a4                               
[C7F6F920] [C02B2A50] ip_rcv+0x288/0x46c                                        
[C7F6F950] [C0299308] netif_receive_skb+0x214/0x304                             
[C7F6F980] [C0211CBC] mv643xx_poll+0x41c/0x48c                                  
[C7F6F9D0] [C029B550] net_rx_action+0x98/0x200                                  
[C7F6FA00] [C0026958] __do_softirq+0x80/0xf4                                    
[C7F6FA30] [C0006930] do_softirq+0x58/0x5c                                      
[C7F6FA40] [C0026408] irq_exit+0x60/0x80                                        
[C7F6FA50] [C00069DC] do_IRQ+0xa8/0xc8                                          
[C7F6FA60] [C0012498] ret_from_except+0x0/0x14                                  
--- Exception: 501 at __kmalloc+0x30/0xc0                                       
    LR = rpc_malloc+0x48/0xac [sunrpc]                                          
[C7F6FB20] [C3D72508] 0xc3d72508 (unreliable)                                   
[C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc]                             
[C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc]                         
[C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc]                         
[C7F6FB80] [E2B0EEB0] nfs_execute_read+0x40/0x64 [nfs]                          
[C7F6FBB0] [E2B0F6A4] nfs_pagein_one+0x2a0/0x300 [nfs]                          
[C7F6FBF0] [E2B0FA9C] nfs_readpages+0x118/0x1f8 [nfs]                           
[C7F6FC40] [C00521DC] __do_page_cache_readahead+0x1e8/0x318                     
[C7F6FCD0] [C0052390] blockable_page_cache_readahead+0x84/0x114                 
[C7F6FCF0] [C00524A4] make_ahead_window+0x84/0xd4                               
[C7F6FD00] [C00525AC] page_cache_readahead+0xb8/0x220                           
[C7F6FD20] [C004B00C] do_generic_mapping_read+0x574/0x5e8                       
[C7F6FDC0] [C004D624] generic_file_aio_read+0x120/0x274                         
[C7F6FE00] [E2B06F00] nfs_file_read+0xc4/0xe4 [nfs]                             
[C7F6FE30] [C006EB50] do_sync_read+0xc4/0x138                                   
[C7F6FEF0] [C006F734] vfs_read+0xc4/0x1a4                                       
[C7F6FF10] [C006FC24] sys_read+0x4c/0x90                                        
[C7F6FF40] [C0011DF0] ret_from_syscall+0x0/0x38                                 
--- Exception: c01 at 0xf4f54d8                                                 
    LR = 0x101e83a8                                                             
Instruction dump:                                                               
5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850         
7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001       
 <0>Kernel panic - not syncing: Fatal exception in interrupt                    
 <0>Rebooting in 180 seconds..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-05 19:03 kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! Thibaut VARENE
@ 2007-01-09  9:26 ` Jarek Poplawski
  2007-01-09 10:27   ` Thibaut VARENE
  0 siblings, 1 reply; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-09  9:26 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: netdev

On 05-01-2007 20:03, Thibaut VARENE wrote:
> Hi,
> 
> I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M
...
> [C7F6FA60] [C0012498] ret_from_except+0x0/0x14                                  
> --- Exception: 501 at __kmalloc+0x30/0xc0                                       
>     LR = rpc_malloc+0x48/0xac [sunrpc]                                          
> [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable)                                   
> [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc]                             
> [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc]                         
> [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc]                         

Aren't there any other warnings displayed before?
No problems with memory or disk?

Regards,
Jarek P. 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09  9:26 ` Jarek Poplawski
@ 2007-01-09 10:27   ` Thibaut VARENE
  2007-01-09 10:52     ` Jarek Poplawski
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-09 10:27 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
> On 05-01-2007 20:03, Thibaut VARENE wrote:
> > Hi,
> >
> > I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M
> ...
> > [C7F6FA60] [C0012498] ret_from_except+0x0/0x14
> > --- Exception: 501 at __kmalloc+0x30/0xc0
> >     LR = rpc_malloc+0x48/0xac [sunrpc]
> > [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable)
> > [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc]
> > [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc]
> > [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc]
>
> Aren't there any other warnings displayed before?

No, I've pasted the full dump that appeared on the serial console I
had setup when the crash occured.

> No problems with memory or disk?

I suspected both and changed both the disk and the ram for quality
parts, that I tested afterwards. Both passed thorough tests.

Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
works absolutely fine.

HTH

T-Bone

(PS: please CC me in answers)

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 10:27   ` Thibaut VARENE
@ 2007-01-09 10:52     ` Jarek Poplawski
  2007-01-09 10:56       ` Thibaut VARENE
  2007-01-09 10:57     ` Jarek Poplawski
  2007-01-09 13:02     ` Jarek Poplawski
  2 siblings, 1 reply; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-09 10:52 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: netdev

On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
...
> I suspected both and changed both the disk and the ram for quality
> parts, that I tested afterwards. Both passed thorough tests.

You wrote about half an hour, so overheating was also
considered, I presume.

> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> works absolutely fine.

So it looks like the card/driver (or maybe this specimen?).
 
Jarek P.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 10:52     ` Jarek Poplawski
@ 2007-01-09 10:56       ` Thibaut VARENE
  2007-01-09 11:48         ` Jarek Poplawski
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-09 10:56 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
> On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
> ...
> > I suspected both and changed both the disk and the ram for quality
> > parts, that I tested afterwards. Both passed thorough tests.
>
> You wrote about half an hour, so overheating was also
> considered, I presume.

Yes, but since it works fine with the other NIC... :)

> > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> > works absolutely fine.
>
> So it looks like the card/driver (or maybe this specimen?).

I'm suspecting the driver, but I'm not a specialist :)
It's true that this particular card specimen could be damaged even
though that seems a bit unlikely.

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 10:27   ` Thibaut VARENE
  2007-01-09 10:52     ` Jarek Poplawski
@ 2007-01-09 10:57     ` Jarek Poplawski
  2007-01-09 13:02     ` Jarek Poplawski
  2 siblings, 0 replies; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-09 10:57 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: netdev

On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
...
> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> works absolutely fine.

... and the speed could matter here too ...

Jarek P.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 10:56       ` Thibaut VARENE
@ 2007-01-09 11:48         ` Jarek Poplawski
  0 siblings, 0 replies; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-09 11:48 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: netdev

On Tue, Jan 09, 2007 at 11:56:59AM +0100, Thibaut VARENE wrote:
...
> I'm suspecting the driver, but I'm not a specialist :)

No problem, me also!

I've also suspected the driver, looked at the
code, found nothing yet (as expected), but this
info about exception in malloc, introduces some
doubts...  

Jarek P.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 10:27   ` Thibaut VARENE
  2007-01-09 10:52     ` Jarek Poplawski
  2007-01-09 10:57     ` Jarek Poplawski
@ 2007-01-09 13:02     ` Jarek Poplawski
  2007-01-09 17:44       ` Thibaut VARENE
  2 siblings, 1 reply; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-09 13:02 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: netdev

On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
...
> I suspected both and changed both the disk and the ram for quality
> parts, that I tested afterwards. Both passed thorough tests.
> 
> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> works absolutely fine.

If you are not tired, I'd suggest two more tests:

- as above but with NIC set to 100Mbps also,

- long downloading but without nfs e.g. ftp
(btw. there were some patches after 2.6.19
for rpc memory races).

Jarek P.

PS: Maintainers were cc-ed, I hope?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 13:02     ` Jarek Poplawski
@ 2007-01-09 17:44       ` Thibaut VARENE
  2007-01-09 20:05         ` Dale Farnsworth
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-09 17:44 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, dale, mlachwani

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
> On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
> ...
> > I suspected both and changed both the disk and the ram for quality
> > parts, that I tested afterwards. Both passed thorough tests.
> >
> > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> > works absolutely fine.
>
> If you are not tired, I'd suggest two more tests:

I volunteered to help :)

For the sake of testing up-to-date code, I performed the following
tests with 2.6.20-rc4.

First test was the usual nfs video playback. Crashdump is
panic-2.6.20-rc4-nfs.txt. Went down in about 20mn.

> - as above but with NIC set to 100Mbps also,

Couldn't crash the machine (or at least it didn't happen in the time
frame I was willing to wait for doing ftp downloads, ~20mn). One note
though:

The throughput of the card was terribly sucky when set in 100-FD: I
couldn't get more than 5,5MB/s doing ftp get writing to /dev/null (to
rule out disk perf), ie, half the max link speed, though the /only/
thing I changed in the setup was the link speed (same switch - made
sure it properly detected link speed/duplex, same file server, same
everything else).

When configured in 1000-FD, still writing to /dev/null I could get
about 60MB/s. Again half link speed, but there, I suppose that the
remote fileserver couldn't pull data faster from the disks :)

> - long downloading but without nfs e.g. ftp

That was fast and easy. In 1000-FD, I took down the box in 2s (after
downloading 90MB). Crashdump is panic-2.6.20-rc4-ftp.txt

> (btw. there were some patches after 2.6.19
> for rpc memory races).

It seems that's something else. I think I also reproduced the bug
while surfing the internet with firefox, but I didn't have serial line
hooked to capture a dump, unfortunately.

> PS: Maintainers were cc-ed, I hope?

Now they are :)

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

[-- Attachment #2: panic-2.6.20-rc4-ftp.txt --]
[-- Type: text/plain, Size: 3726 bytes --]

Debian GNU/Linux 4.0 Alucard ttyS0                                              
                                                                                
Alucard login: ------------[ cut here ]------------                             
kernel BUG at drivers/net/mv643xx_eth.c:1071!                                   
Oops: Exception in kernel mode, sig: 5 [#1]                                     
PREEMPT                                                                         
Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci1394 parport_pc pae
NIP: C0210B40 LR: C02126DC CTR: C0212620                                        
REGS: da247ac0 TRAP: 0700   Not tainted  (2.6.20-rc4)                           
MSR: 00021032 <ME,IR,DR>  CR: 28222488  XER: 00000000                           
TASK = db82a050[1780] 'ncftp' THREAD: da246000                                  
GPR00: 00000000 DA247B70 DB82A050 CFB14260 CFB14000 0000000B DED5FD72 00000000  
GPR08: 00000819 00000001 00001000 0000081A 48222422 10056CD0 28004422 C03D9BF8  
GPR16: 00000000 00000000 00000000 DA246000 00000001 CFB142BC 00009032 00000000  
GPR24: 00000000 00000000 C03E0000 CFB14000 C0212620 DEDFD160 CFB14260 DED5FD40  
NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50                                
LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8                                 
Call Trace:                                                                     
[DA247B70] [DED5FD70] 0xded5fd70 (unreliable)                                   
[DA247BB0] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8                           
[DA247BD0] [C02A1BF4] dev_queue_xmit+0x2bc/0x334                                
[DA247BF0] [C02BC8A8] ip_output+0x120/0x244                                     
[DA247C10] [C02BD8DC] ip_queue_xmit+0x17c/0x408                                 
[DA247C80] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc                              
[DA247CC0] [C02CBF80] __tcp_ack_snd_check+0x64/0xbc                             
[DA247CD0] [C02CDA94] tcp_rcv_established+0x5d4/0x980                           
[DA247D00] [C02D4764] tcp_v4_do_rcv+0xe0/0x3c0                                  
[DA247D30] [C0294B58] release_sock+0x7c/0xf4                                    
[DA247D50] [C02C5C1C] tcp_recvmsg+0x4c8/0xbcc                                   
[DA247DB0] [C0294490] sock_common_recvmsg+0x3c/0x60                             
[DA247DD0] [C02920E4] sock_aio_read+0x10c/0x114                                 
[DA247E30] [C006F210] do_sync_read+0xc4/0x138                                   
[DA247EF0] [C006FECC] vfs_read+0x19c/0x1a4                                      
[DA247F10] [C00702E4] sys_read+0x4c/0x90                                        
[DA247F40] [C00122EC] ret_from_syscall+0x0/0x38                                 
--- Exception: c01 at 0xff5ba98                                                 
    LR = 0x10032fc0                                                             
Instruction dump:                                                               
5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850         
7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001       
 <0>Kernel panic - not syncing: Fatal exception in interrupt                    
 <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some .
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.

[-- Attachment #3: panic-2.6.20-rc4-nfs.txt --]
[-- Type: text/plain, Size: 4455 bytes --]

Debian GNU/Linux 4.0 Alucard ttyS0                                              
                                                                                
Alucard login: [drm] Setting GART location based on new memory map              
[drm] Loading R200 Microcode                                                    
[drm] writeback test succeeded in 1 usecs                                       
------------[ cut here ]------------                                            
kernel BUG at drivers/net/mv643xx_eth.c:1071!                                   
Oops: Exception in kernel mode, sig: 5 [#1]                                     
PREEMPT                                                                         
Modules linked in: nfs lockd sunrpc                                             
NIP: C0210B40 LR: C02126DC CTR: C0212620                                        
REGS: d8961aa0 TRAP: 0700   Not tainted  (2.6.20-rc4)                           
MSR: 00021032 <ME,IR,DR>  CR: 24022088  XER: 00000000                           
TASK = dffd91e0[3879] 'rpciod/0' THREAD: d8960000                               
GPR00: 00000000 D8961B50 DFFD91E0 CFB1E260 CFB1E000 0000000B DECA91B2 00000000  
GPR08: 00000B6A 00000001 00001000 00000B6B 44022022 FFF045B4 009B52B4 017FFA7C  
GPR16: 009B52AC 017FFA80 009B4E68 D8960000 C03B0000 CFB1E2BC 00009032 00000000  
GPR24: 00000000 00000000 C03E0000 CFB1E000 C0212620 DF0033A0 CFB1E260 DECA9180  
NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50                                
LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8                                 
Call Trace:                                                                     
[D8961B50] [DECA91B0] 0xdeca91b0 (unreliable)                                   
[D8961B90] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8                           
[D8961BB0] [C02A1BF4] dev_queue_xmit+0x2bc/0x334                                
[D8961BD0] [C02BC8A8] ip_output+0x120/0x244                                     
[D8961BF0] [C02BD8DC] ip_queue_xmit+0x17c/0x408                                 
[D8961C60] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc                              
[D8961CA0] [C02CBF80] __tcp_ack_snd_check+0x64/0xbc                             
[D8961CB0] [C02CDA94] tcp_rcv_established+0x5d4/0x980                           
[D8961CE0] [C02D4764] tcp_v4_do_rcv+0xe0/0x3c0                                  
[D8961D10] [C02D6F2C] tcp_v4_rcv+0x760/0x940                                    
[D8961D40] [C02B805C] ip_local_deliver+0xe4/0x1a4                               
[D8961D60] [C02B8518] ip_rcv+0x288/0x46c                                        
[D8961D90] [C029EE4C] netif_receive_skb+0x214/0x304                             
[D8961DC0] [C0213744] mv643xx_poll+0x41c/0x48c                                  
[D8961E10] [C02A1064] net_rx_action+0x98/0x200                                  
[D8961E40] [C0026D48] __do_softirq+0x80/0xf4                                    
[D8961E70] [C00068F4] do_softirq+0x58/0x5c                                      
[D8961E80] [C00267FC] irq_exit+0x60/0x80                                        
[D8961E90] [C00069A0] do_IRQ+0xa8/0xc8                                          
[D8961EA0] [C0012994] ret_from_except+0x0/0x14                                  
--- Exception: 501 at add_wait_queue+0x50/0x84                                  
    LR = worker_thread+0x100/0x154                                              
[D8961F60] [D988CE28] 0xd988ce28 (unreliable)                                   
[D8961F70] [C0035AB4] worker_thread+0x100/0x154                                 
[D8961FC0] [C0039B4C] kthread+0xc0/0xfc                                         
[D8961FF0] [C00131C4] kernel_thread+0x44/0x60                                   
Instruction dump:                                                               
5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850         
7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001       
 <0>Kernel panic - not syncing: Fatal exception in interrupt                    
 <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some .
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 17:44       ` Thibaut VARENE
@ 2007-01-09 20:05         ` Dale Farnsworth
  2007-01-09 21:05           ` Thibaut VARENE
  0 siblings, 1 reply; 20+ messages in thread
From: Dale Farnsworth @ 2007-01-09 20:05 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: Jarek Poplawski, netdev, mlachwani

On Tue, Jan 09, 2007 at 06:44:49PM +0100, Thibaut VARENE wrote:
> On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
> >On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
> >...
> >> I suspected both and changed both the disk and the ram for quality
> >> parts, that I tested afterwards. Both passed thorough tests.
> >>
> >> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> >> works absolutely fine.
> >
> >If you are not tired, I'd suggest two more tests:
> 
> I volunteered to help :)

Thank you Thibaut.  Please try the following patch:

From: Dale Farnsworth <dale@farnsworth.org>

Reserve one unused descriptor in the TX ring
to facilitate testing for when the ring is full.

---

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>

 drivers/net/mv643xx_eth.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 9997081..72f82ba 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -289,7 +289,7 @@ static void mv643xx_eth_tx_timeout_task(
 	eth_port_reset(mp->port_num);
 	eth_port_start(dev);
 
-	if (mp->tx_ring_size - mp->tx_desc_count >= MAX_DESCS_PER_SKB)
+	if (mp->tx_ring_size - mp->tx_desc_count > MAX_DESCS_PER_SKB)
 		netif_wake_queue(dev);
 }
 
@@ -356,7 +356,7 @@ static void mv643xx_eth_free_completed_t
 	struct mv643xx_private *mp = netdev_priv(dev);
 
 	if (mv643xx_eth_free_tx_descs(dev, 0) &&
-	    mp->tx_ring_size - mp->tx_desc_count >= MAX_DESCS_PER_SKB)
+	    mp->tx_ring_size - mp->tx_desc_count > MAX_DESCS_PER_SKB)
 		netif_wake_queue(dev);
 }
 
@@ -536,7 +536,7 @@ static irqreturn_t mv643xx_eth_int_handl
 						   ETH_TX_QUEUES_ENABLED);
 			if (!netif_carrier_ok(dev)) {
 				netif_carrier_on(dev);
-				if (mp->tx_ring_size - mp->tx_desc_count >=
+				if (mp->tx_ring_size - mp->tx_desc_count >
 							MAX_DESCS_PER_SKB)
 					netif_wake_queue(dev);
 			}
@@ -1194,7 +1194,7 @@ static int mv643xx_eth_start_xmit(struct
 	BUG_ON(netif_queue_stopped(dev));
 	BUG_ON(skb == NULL);
 
-	if (mp->tx_ring_size - mp->tx_desc_count < MAX_DESCS_PER_SKB) {
+	if (mp->tx_ring_size - mp->tx_desc_count <= MAX_DESCS_PER_SKB) {
 		printk(KERN_ERR "%s: transmit with queue full\n", dev->name);
 		netif_stop_queue(dev);
 		return 1;
@@ -1216,7 +1216,7 @@ static int mv643xx_eth_start_xmit(struct
 	stats->tx_packets++;
 	dev->trans_start = jiffies;
 
-	if (mp->tx_ring_size - mp->tx_desc_count < MAX_DESCS_PER_SKB)
+	if (mp->tx_ring_size - mp->tx_desc_count <= MAX_DESCS_PER_SKB)
 		netif_stop_queue(dev);
 
 	spin_unlock_irqrestore(&mp->lock, flags);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 20:05         ` Dale Farnsworth
@ 2007-01-09 21:05           ` Thibaut VARENE
  2007-01-10 17:12             ` Thibaut VARENE
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-09 21:05 UTC (permalink / raw)
  To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani

On 1/9/07, Dale Farnsworth <dale@farnsworth.org> wrote:
> On Tue, Jan 09, 2007 at 06:44:49PM +0100, Thibaut VARENE wrote:
> > On 1/9/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
> > >On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
> > >...
> > >> I suspected both and changed both the disk and the ram for quality
> > >> parts, that I tested afterwards. Both passed thorough tests.
> > >>
> > >> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
> > >> works absolutely fine.
> > >
> > >If you are not tired, I'd suggest two more tests:
> >
> > I volunteered to help :)
>
> Thank you Thibaut.  Please try the following patch:
>
> From: Dale Farnsworth <dale@farnsworth.org>
>
> Reserve one unused descriptor in the TX ring
> to facilitate testing for when the ring is full.

Dale,

tried it and unfortunately:

Alucard login: ------------[ cut here ]------------
kernel BUG at drivers/net/mv643xx_eth.c:1071!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT
Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd vt8231 ohci1394 ieee13t
NIP: C0210B40 LR: C02126DC CTR: C0212620
REGS: dd2d7b40 TRAP: 0700   Not tainted  (2.6.20-rc4)
MSR: 00021032 <ME,IR,DR>  CR: 28242488  XER: 00000000
TASK = da03c640[1775] 'ncftp' THREAD: dd2d6000
GPR00: 00000000 DD2D7BF0 DA03C640 CFB16260 CFB16000 0000000B DF79FDD2 00000000
GPR08: 00000BA9 00000001 00001000 00000BAA 28242482 10056CD0 28004422 C03D9BF8
GPR16: 00000000 00000000 00000000 DD2D6000 00000001 CFB162BC 00009032 00000000
GPR24: 000005A8 00000000 C03E0000 CFB16000 C0212620 CFCB3260 CFB16260 DF79FDA0
NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50
LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8
Call Trace:
[DD2D7BF0] [DF79FDD0] 0xdf79fdd0 (unreliable)
[DD2D7C30] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8
[DD2D7C50] [C02A1BF4] dev_queue_xmit+0x2bc/0x334
[DD2D7C70] [C02BC8A8] ip_output+0x120/0x244
[DD2D7C90] [C02BD8DC] ip_queue_xmit+0x17c/0x408
[DD2D7D00] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc
[DD2D7D40] [C02C2FC0] tcp_cleanup_rbuf+0xb8/0x158
[DD2D7D50] [C02C5C14] tcp_recvmsg+0x4c0/0xbcc
[DD2D7DB0] [C0294490] sock_common_recvmsg+0x3c/0x60
[DD2D7DD0] [C02920E4] sock_aio_read+0x10c/0x114
[DD2D7E30] [C006F210] do_sync_read+0xc4/0x138
[DD2D7EF0] [C006FECC] vfs_read+0x19c/0x1a4
[DD2D7F10] [C00702E4] sys_read+0x4c/0x90
[DD2D7F40] [C00122EC] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff5ba98
    LR = 0x10032fc0
Instruction dump:
5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850
7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some .
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-09 21:05           ` Thibaut VARENE
@ 2007-01-10 17:12             ` Thibaut VARENE
  2007-01-11 10:42               ` [PATCH] " Jarek Poplawski
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-10 17:12 UTC (permalink / raw)
  To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani

On 1/9/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> On 1/9/07, Dale Farnsworth <dale@farnsworth.org> wrote:
> >
> > Thank you Thibaut.  Please try the following patch:
> >
> > From: Dale Farnsworth <dale@farnsworth.org>
> >
> > Reserve one unused descriptor in the TX ring
> > to facilitate testing for when the ring is full.
>
> Dale,
>
> tried it and unfortunately:

Also, I don't know if you read that bit, but everytime I reboot the
box immediately after a crash, the NIC gets a bogus (always the same
it seems) MAC address, and I have to reboot one more time to get back
to the "normal" MAC address.

Dunno if that hints anything though.

HTH

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-10 17:12             ` Thibaut VARENE
@ 2007-01-11 10:42               ` Jarek Poplawski
  2007-01-21 12:18                 ` Thibaut VARENE
  0 siblings, 1 reply; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-11 10:42 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: Dale Farnsworth, netdev, mlachwani

On Wed, Jan 10, 2007 at 06:12:29PM +0100, Thibaut VARENE wrote:
> On 1/9/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> >On 1/9/07, Dale Farnsworth <dale@farnsworth.org> wrote:
> >>
> >> Thank you Thibaut.  Please try the following patch:
> >>
> >> From: Dale Farnsworth <dale@farnsworth.org>
> >>
> >> Reserve one unused descriptor in the TX ring
> >> to facilitate testing for when the ring is full.
> >
> >Dale,
> >
> >tried it and unfortunately:
> 
> Also, I don't know if you read that bit, but everytime I reboot the
> box immediately after a crash, the NIC gets a bogus (always the same
> it seems) MAC address, and I have to reboot one more time to get back
> to the "normal" MAC address.
> 
> Dunno if that hints anything though.

There is something in the code about MAC writing and saving
some config during initialization, so probably it's possible
if reinitialization was broken.

I tried to look more into the code and here are my (maybe
wrong) conclusions:

- It looks like something could be broken during tx descs
freeing or eth_tx_timeout_task. I compared the timeout code
with e100 and tg3 and have a feeling mv643xx_eth is doing 
less but I'm not able to estimate the importance of this.

- Such errors, IMHO, could be possible with races and not
enough locking, and btw. I think suspected function isn't
properly locked: mp->tx_desc_count in while condition isn't
protected at all. Below I attach a patch proposal but I'm
not sure some irq off or spin_lock isn't also needed
elswere. If it's only locking it would be suitable to do
the test with a kernel compiled without PREEMPT and SMP,
but if irqs nothing should change... 

Regards,
Jarek P.

PS: alas I didn't even check compiling - I had no time to
find all compile dependencies of this driver
---

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
---

diff -Nurp linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c linux-2.6.20-rc4/drivers/net/mv643xx_eth.c
--- linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c	2006-12-18 08:57:52.000000000 +0100
+++ linux-2.6.20-rc4/drivers/net/mv643xx_eth.c	2007-01-11 08:55:34.000000000 +0100
@@ -312,8 +312,8 @@ int mv643xx_eth_free_tx_descs(struct net
 	int count;
 	int released = 0;
 
+	spin_lock_irqsave(&mp->lock, flags);
 	while (mp->tx_desc_count > 0) {
-		spin_lock_irqsave(&mp->lock, flags);
 		tx_index = mp->tx_used_desc_q;
 		desc = &mp->p_tx_desc_area[tx_index];
 		cmd_sts = desc->cmd_sts;
@@ -348,8 +348,10 @@ int mv643xx_eth_free_tx_descs(struct net
 			dev_kfree_skb_irq(skb);
 
 		released = 1;
+		spin_lock_irqsave(&mp->lock, flags);
 	}
 
+	spin_unlock_irqrestore(&mp->lock, flags);
 	return released;
 }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-11 10:42               ` [PATCH] " Jarek Poplawski
@ 2007-01-21 12:18                 ` Thibaut VARENE
  2007-01-21 13:02                   ` Thibaut VARENE
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-21 12:18 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Dale Farnsworth, netdev, mlachwani

On 1/11/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> PS: alas I didn't even check compiling - I had no time to
> find all compile dependencies of this driver
> ---
>
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
> ---
>
> diff -Nurp linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c linux-2.6.20-rc4/drivers/net/mv643xx_eth.c
> --- linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c 2006-12-18 08:57:52.000000000 +0100
> +++ linux-2.6.20-rc4/drivers/net/mv643xx_eth.c  2007-01-11 08:55:34.000000000 +0100
> @@ -312,8 +312,8 @@ int mv643xx_eth_free_tx_descs(struct net
>         int count;
>         int released = 0;
>
> +       spin_lock_irqsave(&mp->lock, flags);
>         while (mp->tx_desc_count > 0) {
> -               spin_lock_irqsave(&mp->lock, flags);
>                 tx_index = mp->tx_used_desc_q;
>                 desc = &mp->p_tx_desc_area[tx_index];
>                 cmd_sts = desc->cmd_sts;
> @@ -348,8 +348,10 @@ int mv643xx_eth_free_tx_descs(struct net
>                         dev_kfree_skb_irq(skb);

Hmm, I think this is guaranteed not to work. In between those lines
the lock is released, while data in the mp structure is still being
accessed. It seems that this bit of code is indeed not race-safe
though, I'm gonna try to figure something.

>                 released = 1;
> +               spin_lock_irqsave(&mp->lock, flags);
>         }
>
> +       spin_unlock_irqrestore(&mp->lock, flags);
>         return released;
>  }

Ugh, this is really unclean... Taking a lock "for nothing" like that
has a perf cost.

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-21 12:18                 ` Thibaut VARENE
@ 2007-01-21 13:02                   ` Thibaut VARENE
  2007-01-22 10:02                     ` Jarek Poplawski
  0 siblings, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-21 13:02 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Dale Farnsworth, netdev, mlachwani

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

On 1/21/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> On 1/11/07, Jarek Poplawski <jarkao2@o2.pl> wrote:
> >
> > PS: alas I didn't even check compiling - I had no time to
> > find all compile dependencies of this driver
> > ---
> Hmm, I think this is guaranteed not to work. In between those lines
> the lock is released, while data in the mp structure is still being
> accessed. It seems that this bit of code is indeed not race-safe
> though, I'm gonna try to figure something.

This was indeed the right spot. The attached raw hack seems to fix the
bug (I couldn't crash the box so far).  I haven't checked that the
same "situation" happens elsewhere in the code, I leave that as an
exercise for the maintainers (or until I experience another kind of
crash :)

The patch is a bit ugly (printk with irq disabled will not show, etc)
but at least it does work. I'm sure somebody will figure something

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix_mv643xx_race.patch --]
[-- Type: text/x-patch; name="fix_mv643xx_race.patch", Size: 768 bytes --]

--- linux-2.6.19.orig/drivers/net/mv643xx_eth.c	2007-01-21 13:56:04.450689123 +0100
+++ linux-2.6.19/drivers/net/mv643xx_eth.c	2007-01-21 13:39:58.228404763 +0100
@@ -312,8 +312,8 @@
 	int count;
 	int released = 0;
 
+	spin_lock_irqsave(&mp->lock, flags);
 	while (mp->tx_desc_count > 0) {
-		spin_lock_irqsave(&mp->lock, flags);
 		tx_index = mp->tx_used_desc_q;
 		desc = &mp->p_tx_desc_area[tx_index];
 		cmd_sts = desc->cmd_sts;
@@ -332,8 +332,6 @@
 		if (skb)
 			mp->tx_skb[tx_index] = NULL;
 
-		spin_unlock_irqrestore(&mp->lock, flags);
-
 		if (cmd_sts & ETH_ERROR_SUMMARY) {
 			printk("%s: Error in TX\n", dev->name);
 			mp->stats.tx_errors++;
@@ -349,6 +347,7 @@
 
 		released = 1;
 	}
+	spin_unlock_irqrestore(&mp->lock, flags);
 
 	return released;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-21 13:02                   ` Thibaut VARENE
@ 2007-01-22 10:02                     ` Jarek Poplawski
  2007-01-22 17:06                       ` Dale Farnsworth
  0 siblings, 1 reply; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-22 10:02 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: Dale Farnsworth, netdev, mlachwani

On Sun, Jan 21, 2007 at 02:02:15PM +0100, Thibaut VARENE wrote:
> On 1/21/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
...
> >Hmm, I think this is guaranteed not to work. In between those lines
> >the lock is released, while data in the mp structure is still being
> >accessed. It seems that this bit of code is indeed not race-safe
> >though, I'm gonna try to figure something.

I only changed the part I was quite sure is wrong. 
I didn't know the internals of this place but thought
probably somebody had some reasons to enable irqs here.

I hope the maintainers will decide the range of necessary
changes considering your testing and the patch.

> This was indeed the right spot. The attached raw hack seems to fix the
> bug (I couldn't crash the box so far).  I haven't checked that the
> same "situation" happens elsewhere in the code, I leave that as an
> exercise for the maintainers (or until I experience another kind of
> crash :)
> 
> The patch is a bit ugly (printk with irq disabled will not show, etc)
> but at least it does work. I'm sure somebody will figure something

Congratulations and regards,

Jarek P.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-22 10:02                     ` Jarek Poplawski
@ 2007-01-22 17:06                       ` Dale Farnsworth
  2007-01-23  8:17                         ` Jarek Poplawski
  2007-01-23 11:52                         ` Thibaut VARENE
  0 siblings, 2 replies; 20+ messages in thread
From: Dale Farnsworth @ 2007-01-22 17:06 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Thibaut VARENE, netdev, mlachwani

Jarek and Thibaut,

Thank you both very much for your work finding and fixing this bug.
Jarek, can you verify that the following patch fixes the problem you
were seeing?

-Dale

----- Patch follows -----

From: Dale Farnsworth <dale@farnsworth.org>

mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs

The bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org>
and Jarek Poplawski <jarkao2@o2.pl>.  This patch is a modification of their
fixes.  We acquire and release the lock for each descriptor that is freed
to minimize the time the lock is held.

---

 drivers/net/mv643xx_eth.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index c41ae42..0d32381 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -314,6 +314,13 @@ int mv643xx_eth_free_tx_descs(struct net
 
 	while (mp->tx_desc_count > 0) {
 		spin_lock_irqsave(&mp->lock, flags);
+
+		/* maybe tx_desc_count changed before the lock was acquired */
+		if (mp->tx_desc_count <= 0) {
+			spin_unlock_irqrestore(&mp->lock, flags);
+			return released;
+		}
+
 		tx_index = mp->tx_used_desc_q;
 		desc = &mp->p_tx_desc_area[tx_index];
 		cmd_sts = desc->cmd_sts;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-22 17:06                       ` Dale Farnsworth
@ 2007-01-23  8:17                         ` Jarek Poplawski
  2007-01-23 11:52                         ` Thibaut VARENE
  1 sibling, 0 replies; 20+ messages in thread
From: Jarek Poplawski @ 2007-01-23  8:17 UTC (permalink / raw)
  To: Dale Farnsworth; +Cc: Thibaut VARENE, netdev, mlachwani

On Mon, Jan 22, 2007 at 10:06:16AM -0700, Dale Farnsworth wrote:
> Jarek and Thibaut,
> 
> Thank you both very much for your work finding and fixing this bug.
> Jarek, can you verify that the following patch fixes the problem you
> were seeing?
> 
> -Dale

Sorry, only Thibaut can verify this. I don't have
such card.

I can only confirm that your patch fixes unproper
locking of mp->tx_desc_count in the while condition.
But I'm not sure your way is optimal now because
mp->tx_desc_count is checked 2 times per every loop.
I think it is right only if you know the function
mv643xx_eth_free_tx_descs is called mostly while 
mp->tx_desc_count == 0 or 1.

> ----- Patch follows -----
> 
> From: Dale Farnsworth <dale@farnsworth.org>
> 
> mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs
> 
> The bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org>
> and Jarek Poplawski <jarkao2@o2.pl>.  This patch is a modification of their
> fixes.  We acquire and release the lock for each descriptor that is freed
> to minimize the time the lock is held.
> 
> ---

From: Dale Farnsworth <dale@farnsworth.org>
 
mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs
 
The bug was found and isolated by Thibaut VARENE <T-Bone@parisc-linux.org>
and Jarek Poplawski <jarkao2@o2.pl> noticed a locking problem.  This patch
is a modification of their fixes.  We acquire and release the lock for each
descriptor that is freed to minimize the time the lock is held.
 
---

I did small adjustment of my role here.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-22 17:06                       ` Dale Farnsworth
  2007-01-23  8:17                         ` Jarek Poplawski
@ 2007-01-23 11:52                         ` Thibaut VARENE
  2007-01-23 12:42                           ` Thibaut VARENE
  1 sibling, 1 reply; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-23 11:52 UTC (permalink / raw)
  To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani

On 1/22/07, Dale Farnsworth <dale@farnsworth.org> wrote:
> Jarek and Thibaut,
>
> Thank you both very much for your work finding and fixing this bug.
> Jarek, can you verify that the following patch fixes the problem you
> were seeing?
>
> -Dale

Hi Dale,

The patch seems to work fine. Just thinking out loud (as I really
don't know this part of the kernel), here are a few remarks:

- As Jarek pointed out, you're checking twice the value of
mp->tx_desc_count, which means dereferencing a pointer and accessing
memory twice. I don't know how perf-critical this bit of code is, but
I wonder which of keeping the lock for a long time or doing what you
is better (I'm being anal and you probably know that better than me :)

- Also, lines 344-349, in the test condition, cmd_sts (an indirection
to mp content) is accessed (dunno if it's ok to do that outside of the
lock), and on line 346, mp->stats.tx.errors is incremented outside of
the spinlock protection. But then, I don't know what that lock is
meant to protect, just pointing this out :)

Thanks for your help, I hope the fix will go upstream asap :)

And about being the author of the patch, since I'm not, I don't really mind 8)

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
  2007-01-23 11:52                         ` Thibaut VARENE
@ 2007-01-23 12:42                           ` Thibaut VARENE
  0 siblings, 0 replies; 20+ messages in thread
From: Thibaut VARENE @ 2007-01-23 12:42 UTC (permalink / raw)
  To: Dale Farnsworth; +Cc: Jarek Poplawski, netdev, mlachwani

On 1/23/07, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> - As Jarek pointed out, you're checking twice the value of
> mp->tx_desc_count, which means dereferencing a pointer and accessing
> memory twice. I don't know how perf-critical this bit of code is, but
> I wonder which of keeping the lock for a long time or doing what you
> is better (I'm being anal and you probably know that better than me :)

Forget that. That's an irq disabling lock, it's worse than anything else :)

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-01-23 12:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-05 19:03 kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! Thibaut VARENE
2007-01-09  9:26 ` Jarek Poplawski
2007-01-09 10:27   ` Thibaut VARENE
2007-01-09 10:52     ` Jarek Poplawski
2007-01-09 10:56       ` Thibaut VARENE
2007-01-09 11:48         ` Jarek Poplawski
2007-01-09 10:57     ` Jarek Poplawski
2007-01-09 13:02     ` Jarek Poplawski
2007-01-09 17:44       ` Thibaut VARENE
2007-01-09 20:05         ` Dale Farnsworth
2007-01-09 21:05           ` Thibaut VARENE
2007-01-10 17:12             ` Thibaut VARENE
2007-01-11 10:42               ` [PATCH] " Jarek Poplawski
2007-01-21 12:18                 ` Thibaut VARENE
2007-01-21 13:02                   ` Thibaut VARENE
2007-01-22 10:02                     ` Jarek Poplawski
2007-01-22 17:06                       ` Dale Farnsworth
2007-01-23  8:17                         ` Jarek Poplawski
2007-01-23 11:52                         ` Thibaut VARENE
2007-01-23 12:42                           ` Thibaut VARENE

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.