All of lore.kernel.org
 help / color / mirror / Atom feed
* Gianfar driver failing on MPC8641D based board
@ 2010-02-05 14:00 Martyn Welch
  2010-02-25 10:31 ` Martyn Welch
  0 siblings, 1 reply; 40+ messages in thread
From: Martyn Welch @ 2010-02-05 14:00 UTC (permalink / raw)
  To: linuxppc-dev list, netdev; +Cc: Anton Vorontsov, Sandeep Gopalpet, davem

I have recently attempted to boot an 8641D based board from an NFS root.
The boot process grinds to a halt not long after the first access of the
NFS root and I receive multiple "nfs: server 192.168.0.1 not responding,
still trying" messages. Wireshark suggests that there is no further
traffic from this board at this point on. The NFS server seems to
eventually try sending duplicate packets it's already sent, which
results in "nfs: server 192.168.0.1 OK" messages, but the "not
responding" messages resume with no further traffic from the board.

I am able to boot to a ramdisk fine and the network seems to work -
though I haven't really pushed the interface from it.

I have attempted to git bisect, though I wasn't able to get much further
than discovering the problem was introduced in the 2.6.33 merge window -
at which point the gianfar network driver fails to compile (I have tried
to git bisect skip many, many times to no avail).

NFS booting fails for this board on todays linux-next, the master branch
of Kumar's PPC tree and the head of the main tree. I have also been able
to NFS boot from a random x86 based board that I have, using the head of
the main tree and the linux-next tree.

Copying the gianfar drivers from 2.6.32 into the head of the main tree
restores the correct behaviour and I'm able to NFS boot. I have heard
from others that the latest drivers work on 83xx and 85xx based boards,
but it seems to be broken on at least the 8641D.

I can see there has been a fair amount of work done on the gianfar
driver, I assume that this is a bug introduced by the multiple queue
support, but I'm way out of my depth on this.

I'm also off for the next week - so if I'm quiet, it'll be because of that.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-05 14:00 Gianfar driver failing on MPC8641D based board Martyn Welch
@ 2010-02-25 10:31 ` Martyn Welch
  2010-02-25 16:46     ` Martyn Welch
  0 siblings, 1 reply; 40+ messages in thread
From: Martyn Welch @ 2010-02-25 10:31 UTC (permalink / raw)
  To: linuxppc-dev list, netdev, linux-kernel
  Cc: Anton Vorontsov, Sandeep Gopalpet, davem

Martyn Welch wrote:
> I have recently attempted to boot an 8641D based board from an NFS root.
> The boot process grinds to a halt not long after the first access of the
> NFS root and I receive multiple "nfs: server 192.168.0.1 not responding,
> still trying" messages. Wireshark suggests that there is no further
> traffic from this board at this point on. The NFS server seems to
> eventually try sending duplicate packets it's already sent, which
> results in "nfs: server 192.168.0.1 OK" messages, but the "not
> responding" messages resume with no further traffic from the board.
>
> I am able to boot to a ramdisk fine and the network seems to work -
> though I haven't really pushed the interface from it.
>
> I have attempted to git bisect, though I wasn't able to get much further
> than discovering the problem was introduced in the 2.6.33 merge window -
> at which point the gianfar network driver fails to compile (I have tried
> to git bisect skip many, many times to no avail).
>
> NFS booting fails for this board on todays linux-next, the master branch
> of Kumar's PPC tree and the head of the main tree. I have also been able
> to NFS boot from a random x86 based board that I have, using the head of
> the main tree and the linux-next tree.
>
> Copying the gianfar drivers from 2.6.32 into the head of the main tree
> restores the correct behaviour and I'm able to NFS boot. I have heard
> from others that the latest drivers work on 83xx and 85xx based boards,
> but it seems to be broken on at least the 8641D.
>
> I can see there has been a fair amount of work done on the gianfar
> driver, I assume that this is a bug introduced by the multiple queue
> support, but I'm way out of my depth on this.
>   
I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board
and am having still experiencing the problems outlined in my previous
email, though I have noticed that I tend to be able to boot from cold,
but my boot fails on reboot. Hitting the reset button doesn't help, I
need to actually power the machine on and off again for it to work.

As before, I'm way out of my depth in this, any one have any ideas?
Below is a dump of the failed boot process:

U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31)

Freescale PowerPC
CPU:
    Core: E600 Core 0, Version: 0.2, (0x80040202)
    System: Unknown, Version: 2.0, (0x80900120)
    Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC:  25 MHz
    L2: Enabled
Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version:
0x22
I2C:   ready
DRAM:      DDR:  1 GB
FLASH:  8 MB
Invalid ID (ff ff ff ff)
               Scanning PCI bus 01
    PCI-EXPRESS 1 on bus 00 - 02
    PCI-EXPRESS 2 on bus 03 - 03
Video: No radeon video card found!
In:    serial
Out:   serial
Err:   serial
SCSI:  AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl IDE mode
flags: ncq ilck pm led clo pmp pio slum part
scanning bus for devices...
Net:   eTSEC1, eTSEC2, eTSEC3, eTSEC4
=>  tftp 4000000 hpcn/uImage-torvalds-linux-2.6
Speed: 1000, full duplex
Using eTSEC1 device
TFTP from server 192.168.0.1; our IP address is 192.168.0.30
Filename 'hpcn/uImage-torvalds-linux-2.6'.
Load address: 0x4000000
Loading: #################################################################
         #################################################################
         #######################################################
done
Bytes transferred = 2709050 (29563a hex)
=> tftp 5000000 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb
Speed: 1000, full duplex
Using eTSEC1 device
TFTP from server 192.168.0.1; our IP address is 192.168.0.30
Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'.
Load address: 0x5000000
Loading: #
done
Bytes transferred = 11523 (2d03 hex)
=> setenv bootargs "root=/dev/nfs rw
nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i"
=> bootm 4000000 - 5000000
WARNING: adjusting available memory to 10000000
## Booting kernel from Legacy Image at 04000000 ...
   Image Name:   Linux-2.6.33-00001-gbaac35c
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:    2708986 Bytes =  2.6 MB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 05000000
   Booting using the fdt blob at 0x5000000
   Uncompressing Kernel Image ... OK
   Loading Device Tree to 007fa000, end 007ffd02 ... OK
Using MPC86xx HPCN machine description
Total memory = 1024MB; using 2048kB for hash table (at cfe00000)
Linux version 2.6.33-00001-gbaac35c (welchma@ES-J7S4D2J) (gcc version
4.1.2) #20
CPU maps initialized for 1 thread per core
bootconsole [udbg0] enabled
setup_arch: bootmem
mpc86xx_hpcn_setup_arch()
Found FSL PCI host bridge at 0x00000000ffe08000. Firmware bus number: 0->2
PCI host bridge /pcie@ffe08000 (primary) ranges:
 MEM 0x0000000080000000..0x000000009fffffff -> 0x0000000080000000
  IO 0x00000000ffc00000..0x00000000ffc0ffff -> 0x0000000000000000
/pcie@ffe08000: PCICSRBAR @ 0xfff00000
Found FSL PCI host bridge at 0x00000000ffe09000. Firmware bus number: 0->0
PCI host bridge /pcie@ffe09000  ranges:
 MEM 0x00000000a0000000..0x00000000bfffffff -> 0x00000000a0000000
  IO 0x00000000ffc10000..0x00000000ffc1ffff -> 0x0000000000000000
/pcie@ffe09000: PCICSRBAR @ 0xfff00000
MPC86xx HPCN board from Freescale Semiconductor
arch: exit
Zone PFN ranges:
  DMA      0x00000000 -> 0x00030000
  Normal   0x00030000 -> 0x00030000
  HighMem  0x00030000 -> 0x00040000
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0: 0x00000000 -> 0x00040000
PERCPU: Embedded 7 pages/cpu @c1003000 s7712 r8192 d12768 u65536
pcpu-alloc: s7712 r8192 d12768 u65536 alloc=16*4096
pcpu-alloc: [0] 0 [0] 1
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
Kernel command line: root=/dev/nfs rw
nfsroot=192.168.0.1:/tftpboot/hpcn/root/ p
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1030864k/1048576k available (5228k kernel code, 17004k reserved,
196k d)
Kernel virtual memory layout:
  * 0xfffc1000..0xfffff000  : fixmap
  * 0xff800000..0xffc00000  : highmem PTEs
  * 0xff7da000..0xff800000  : early ioremap
  * 0xf1000000..0xff7da000  : vmalloc & ioremap
SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Hierarchical RCU implementation.
NR_IRQS:512 nr_irqs:512
mpic: Setting up MPIC " MPIC     " version 1.2 at ffe40000, max 2 CPUs
mpic: ISU size: 256, shift: 8, mask: ff
mpic: Initializing for 256 sources
i8259 legacy interrupt controller initialized
clocksource: timebase mult[2800000] shift[22] registered
Console: colour dummy device 80x25
Mount-cache hash table entries: 512
mpic: requesting IPIs ...
Processor 1 found.
Brought up 2 CPUs
NET: Registered protocol family 16
            
PCI: Probing PCI hardware
pci 0000:00:00.0: ignoring class b20 (doesn't match header type 01)
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
pci 0000:02:1d.0: unsupported PM cap regs version (4)
pci 0000:01:00.0: PCI bridge to [bus 02-ff] (subtractive decode)
pci 0001:03:00.0: ignoring class b20 (doesn't match header type 01)
pci 0001:03:00.0: PCI bridge to [bus 04-ff]
pci 0000:01:00.0: PCI bridge to [bus 02-02]
pci 0000:01:00.0:   bridge window [io  0x1000-0x1fff]
pci 0000:01:00.0:   bridge window [mem 0x80000000-0x800fffff]
pci 0000:01:00.0:   bridge window [mem pref disabled]
pci 0000:00:00.0: PCI bridge to [bus 01-02]
pci 0000:00:00.0:   bridge window [io  0x0000-0xffff]
pci 0000:00:00.0:   bridge window [mem 0x80000000-0x9fffffff]
pci 0000:00:00.0:   bridge window [mem pref disabled]
pci 0000:00:00.0: enabling device (0106 -> 0107)
pci 0001:03:00.0: PCI bridge to [bus 04-04]
pci 0001:03:00.0:   bridge window [io  0xfffee000-0xffffdfff]
pci 0001:03:00.0:   bridge window [mem 0xa0000000-0xbfffffff]
pci 0001:03:00.0:   bridge window [mem pref disabled]
pci 0001:03:00.0: enabling device (0106 -> 0107)
bio: create slab <bio-0> at 0
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Advanced Linux Sound Architecture Driver Version 1.0.21.
Switching to clocksource timebase
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
UDP hash table entries: 512 (order: 2, 16384 bytes)
UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
NET: Registered protocol family 1
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
audit: initializing netlink socket (disabled)
type=2000 audit(0.144:1): initialized
highmem bounce pool size: 64 pages
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NTFS driver 2.1.29 [Flags: R/O].
msgmni has been set to 1502
alg: No test for stdrng (krng)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Generic non-volatile memory driver v1.1
Serial: 8250/16550 driver, 2 ports, IRQ sharing enabled
serial8250.0: ttyS0 at MMIO 0xffe04500 (irq = 42) is a 16550A
console [ttyS0] enabled, bootconsole disabled
console [ttyS0] enabled, bootconsole disabled
serial8250.0: ttyS1 at MMIO 0xffe04600 (irq = 28) is a 16550A
brd: module loaded
loop: module loaded
nbd: registered device at major 43
st: Version 20081215, fixed bufsize 32768, s/g segs 256
ahci 0000:02:1f.1: AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:02:1f.1: flags: ncq sntf ilck pm led clo pmp pio slum part
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006100 irq 5
ata2: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006180 irq 5
ata3: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006200 irq 5
ata4: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006280 irq 5
scsi4 : pata_ali
scsi5 : pata_ali
ata5: PATA max UDMA/133 cmd 0x1200 ctl 0x1208 bmdma 0x1220 irq 14
ata6: PATA max UDMA/133 cmd 0x1210 ctl 0x1218 bmdma 0x1228 irq 14
eth0: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:00:01
eth0: Running with NAPI enabled
eth0: :RX BD ring size for Q[0]: 256
eth0:TX BD ring size for Q[0]: 256
eth1: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:01:fd
eth1: Running with NAPI enabled
eth1: :RX BD ring size for Q[0]: 256
eth1:TX BD ring size for Q[0]: 256
eth2: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:02:fd
eth2: Running with NAPI enabled
eth2: :RX BD ring size for Q[0]: 256
eth2:TX BD ring size for Q[0]: 256
eth3: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:03:fd
eth3: Running with NAPI enabled
eth3: :RX BD ring size for Q[0]: 256
eth3:TX BD ring size for Q[0]: 256
Freescale PowerQUICC MII Bus: probed
Freescale PowerQUICC MII Bus: probed
Freescale PowerQUICC MII Bus: probed
Freescale PowerQUICC MII Bus: probed
usbmon: debugfs is not available
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd 0000:02:1c.3: EHCI Host Controller
ehci_hcd 0000:02:1c.3: new USB bus registered, assigned bus number 1
ehci_hcd 0000:02:1c.3: debug port 1
ehci_hcd 0000:02:1c.3: Enabling legacy PCI PM
ehci_hcd 0000:02:1c.3: irq 11, io mem 0x80003000
ehci_hcd 0000:02:1c.3: USB 2.0 started, EHCI 1.00
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0000:02:1c.0: OHCI Host Controller
ata5.00: ATAPI: SONY    DVD RW AW-Q170A, 1.73, max UDMA/66
ata5.00: WARNING: ATAPI DMA disabled for reliability issues.  It can be
enabled
ata5.00: WARNING: via pata_ali.atapi_dma modparam or corresponding sysfs
node.
ata5.00: configured for UDMA/66
ohci_hcd 0000:02:1c.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:02:1c.0: irq 12, io mem 0x80000000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ohci_hcd 0000:02:1c.1: OHCI Host Controller
ohci_hcd 0000:02:1c.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:02:1c.1: irq 9, io mem 0x80001000
ata3: SATA link down (SStatus 0 SControl 300)
ata1: SATA link down (SStatus 0 SControl 300)
ata4: SATA link down (SStatus 0 SControl 300)
ata2: SATA link down (SStatus 0 SControl 300)
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 3 ports detected
ohci_hcd 0000:02:1c.2: OHCI Host Controller
ohci_hcd 0000:02:1c.2: new USB bus registered, assigned bus number 4
scsi 4:0:0:0: CD-ROM            SONY     DVD RW AW-Q170A  1.73 PQ: 0 ANSI: 5
ohci_hcd 0000:02:1c.2: irq 10, io mem 0x80002000
sr0: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 4:0:0:0: Attached scsi generic sg0 type 5
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 3 ports detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
i8042.c: No controller found.
rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
intel8x0_measure_ac97_clock: measured 50231 usecs (2424 samples)
intel8x0: clocking to 48000
ALSA device list:
  #0: ALi M5455 with ALC650F at irq 6
IPv4 over IPv4 tunneling driver
GRE over IPv4 tunneling driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
rtc_cmos rtc_cmos: setting system clock to 2002-03-11 18:46:05 UTC
(1015872365)
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): eth1: link is not ready
ADDRCONF(NETDEV_UP): eth2: link is not ready
ADDRCONF(NETDEV_UP): eth3: link is not ready
Sending DHCP requests .
PHY: mdio@ffe24520:00 - Link is Up - 1000/Full
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
., OK
IP-Config: Got DHCP answer from 192.168.0.1, my address is 192.168.0.241
IP-Config: Complete:
     device=eth0, addr=192.168.0.241, mask=255.255.255.0, gw=192.168.0.1,
     host=192.168.0.241, domain=Radstone.Local, nis-domain=(none),
     bootserver=192.168.0.1, rootserver=192.168.0.1, rootpath=
Looking up port of RPC 100003/2 on 192.168.0.1
Looking up port of RPC 100005/1 on 192.168.0.1
VFS: Mounted root (nfs filesystem) on device 0:13.
Freeing unused kernel memory: 220k init
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying
nfs: server 192.168.0.1 not responding, still trying



-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-25 10:31 ` Martyn Welch
@ 2010-02-25 16:46     ` Martyn Welch
  0 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-25 16:46 UTC (permalink / raw)
  To: linuxppc-dev list, netdev, linux-kernel
  Cc: Anton Vorontsov, Sandeep Gopalpet, davem, Kumar Gala

Martyn Welch wrote:
> Martyn Welch wrote:
>   
>> I have recently attempted to boot an 8641D based board from an NFS root.
>> The boot process grinds to a halt not long after the first access of the
>> NFS root and I receive multiple "nfs: server 192.168.0.1 not responding,
>> still trying" messages. Wireshark suggests that there is no further
>> traffic from this board at this point on. The NFS server seems to
>> eventually try sending duplicate packets it's already sent, which
>> results in "nfs: server 192.168.0.1 OK" messages, but the "not
>> responding" messages resume with no further traffic from the board.
>>
>> I am able to boot to a ramdisk fine and the network seems to work -
>> though I haven't really pushed the interface from it.
>>
>> I have attempted to git bisect, though I wasn't able to get much further
>> than discovering the problem was introduced in the 2.6.33 merge window -
>> at which point the gianfar network driver fails to compile (I have tried
>> to git bisect skip many, many times to no avail).
>>
>> NFS booting fails for this board on todays linux-next, the master branch
>> of Kumar's PPC tree and the head of the main tree. I have also been able
>> to NFS boot from a random x86 based board that I have, using the head of
>> the main tree and the linux-next tree.
>>
>> Copying the gianfar drivers from 2.6.32 into the head of the main tree
>> restores the correct behaviour and I'm able to NFS boot. I have heard
>> from others that the latest drivers work on 83xx and 85xx based boards,
>> but it seems to be broken on at least the 8641D.
>>
>> I can see there has been a fair amount of work done on the gianfar
>> driver, I assume that this is a bug introduced by the multiple queue
>> support, but I'm way out of my depth on this.
>>   
>>     
> I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board
> and am having still experiencing the problems outlined in my previous
> email, though I have noticed that I tend to be able to boot from cold,
> but my boot fails on reboot. Hitting the reset button doesn't help, I
> need to actually power the machine on and off again for it to work.
>
> As before, I'm way out of my depth in this, any one have any ideas?
> Below is a dump of the failed boot process:
>
> U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31)
>
> Freescale PowerPC
> CPU:
>     Core: E600 Core 0, Version: 0.2, (0x80040202)
>     System: Unknown, Version: 2.0, (0x80900120)
>     Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC:  25 MHz
>     L2: Enabled
> Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version:
> 0x22
> I2C:   ready
> DRAM:      DDR:  1 GB
> FLASH:  8 MB
> Invalid ID (ff ff ff ff)
>                Scanning PCI bus 01
>     PCI-EXPRESS 1 on bus 00 - 02
>     PCI-EXPRESS 2 on bus 03 - 03
> Video: No radeon video card found!
> In:    serial
> Out:   serial
> Err:   serial
> SCSI:  AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl IDE mode
> flags: ncq ilck pm led clo pmp pio slum part
> scanning bus for devices...
> Net:   eTSEC1, eTSEC2, eTSEC3, eTSEC4
> =>  tftp 4000000 hpcn/uImage-torvalds-linux-2.6
> Speed: 1000, full duplex
> Using eTSEC1 device
> TFTP from server 192.168.0.1; our IP address is 192.168.0.30
> Filename 'hpcn/uImage-torvalds-linux-2.6'.
> Load address: 0x4000000
> Loading: #################################################################
>          #################################################################
>          #######################################################
> done
> Bytes transferred = 2709050 (29563a hex)
> => tftp 5000000 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb
> Speed: 1000, full duplex
> Using eTSEC1 device
> TFTP from server 192.168.0.1; our IP address is 192.168.0.30
> Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'.
> Load address: 0x5000000
> Loading: #
> done
> Bytes transferred = 11523 (2d03 hex)
> => setenv bootargs "root=/dev/nfs rw
> nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i"
> => bootm 4000000 - 5000000
> WARNING: adjusting available memory to 10000000
> ## Booting kernel from Legacy Image at 04000000 ...
>    Image Name:   Linux-2.6.33-00001-gbaac35c
>    Image Type:   PowerPC Linux Kernel Image (gzip compressed)
>    Data Size:    2708986 Bytes =  2.6 MB
>    Load Address: 00000000
>    Entry Point:  00000000
>    Verifying Checksum ... OK
> ## Flattened Device Tree blob at 05000000
>    Booting using the fdt blob at 0x5000000
>    Uncompressing Kernel Image ... OK
>    Loading Device Tree to 007fa000, end 007ffd02 ... OK
> Using MPC86xx HPCN machine description
> Total memory = 1024MB; using 2048kB for hash table (at cfe00000)
> Linux version 2.6.33-00001-gbaac35c (welchma@ES-J7S4D2J) (gcc version
> 4.1.2) #20
> CPU maps initialized for 1 thread per core
> bootconsole [udbg0] enabled
> setup_arch: bootmem
> mpc86xx_hpcn_setup_arch()
> Found FSL PCI host bridge at 0x00000000ffe08000. Firmware bus number: 0->2
> PCI host bridge /pcie@ffe08000 (primary) ranges:
>  MEM 0x0000000080000000..0x000000009fffffff -> 0x0000000080000000
>   IO 0x00000000ffc00000..0x00000000ffc0ffff -> 0x0000000000000000
> /pcie@ffe08000: PCICSRBAR @ 0xfff00000
> Found FSL PCI host bridge at 0x00000000ffe09000. Firmware bus number: 0->0
> PCI host bridge /pcie@ffe09000  ranges:
>  MEM 0x00000000a0000000..0x00000000bfffffff -> 0x00000000a0000000
>   IO 0x00000000ffc10000..0x00000000ffc1ffff -> 0x0000000000000000
> /pcie@ffe09000: PCICSRBAR @ 0xfff00000
> MPC86xx HPCN board from Freescale Semiconductor
> arch: exit
> Zone PFN ranges:
>   DMA      0x00000000 -> 0x00030000
>   Normal   0x00030000 -> 0x00030000
>   HighMem  0x00030000 -> 0x00040000
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>     0: 0x00000000 -> 0x00040000
> PERCPU: Embedded 7 pages/cpu @c1003000 s7712 r8192 d12768 u65536
> pcpu-alloc: s7712 r8192 d12768 u65536 alloc=16*4096
> pcpu-alloc: [0] 0 [0] 1
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
> Kernel command line: root=/dev/nfs rw
> nfsroot=192.168.0.1:/tftpboot/hpcn/root/ p
> PID hash table entries: 4096 (order: 2, 16384 bytes)
> Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> Memory: 1030864k/1048576k available (5228k kernel code, 17004k reserved,
> 196k d)
> Kernel virtual memory layout:
>   * 0xfffc1000..0xfffff000  : fixmap
>   * 0xff800000..0xffc00000  : highmem PTEs
>   * 0xff7da000..0xff800000  : early ioremap
>   * 0xf1000000..0xff7da000  : vmalloc & ioremap
> SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> Hierarchical RCU implementation.
> NR_IRQS:512 nr_irqs:512
> mpic: Setting up MPIC " MPIC     " version 1.2 at ffe40000, max 2 CPUs
> mpic: ISU size: 256, shift: 8, mask: ff
> mpic: Initializing for 256 sources
> i8259 legacy interrupt controller initialized
> clocksource: timebase mult[2800000] shift[22] registered
> Console: colour dummy device 80x25
> Mount-cache hash table entries: 512
> mpic: requesting IPIs ...
> Processor 1 found.
> Brought up 2 CPUs
> NET: Registered protocol family 16
>             
> PCI: Probing PCI hardware
> pci 0000:00:00.0: ignoring class b20 (doesn't match header type 01)
> pci 0000:00:00.0: PCI bridge to [bus 01-ff]
> pci 0000:02:1d.0: unsupported PM cap regs version (4)
> pci 0000:01:00.0: PCI bridge to [bus 02-ff] (subtractive decode)
> pci 0001:03:00.0: ignoring class b20 (doesn't match header type 01)
> pci 0001:03:00.0: PCI bridge to [bus 04-ff]
> pci 0000:01:00.0: PCI bridge to [bus 02-02]
> pci 0000:01:00.0:   bridge window [io  0x1000-0x1fff]
> pci 0000:01:00.0:   bridge window [mem 0x80000000-0x800fffff]
> pci 0000:01:00.0:   bridge window [mem pref disabled]
> pci 0000:00:00.0: PCI bridge to [bus 01-02]
> pci 0000:00:00.0:   bridge window [io  0x0000-0xffff]
> pci 0000:00:00.0:   bridge window [mem 0x80000000-0x9fffffff]
> pci 0000:00:00.0:   bridge window [mem pref disabled]
> pci 0000:00:00.0: enabling device (0106 -> 0107)
> pci 0001:03:00.0: PCI bridge to [bus 04-04]
> pci 0001:03:00.0:   bridge window [io  0xfffee000-0xffffdfff]
> pci 0001:03:00.0:   bridge window [mem 0xa0000000-0xbfffffff]
> pci 0001:03:00.0:   bridge window [mem pref disabled]
> pci 0001:03:00.0: enabling device (0106 -> 0107)
> bio: create slab <bio-0> at 0
> vgaarb: loaded
> SCSI subsystem initialized
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> Advanced Linux Sound Architecture Driver Version 1.0.21.
> Switching to clocksource timebase
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> UDP hash table entries: 512 (order: 2, 16384 bytes)
> UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
> NET: Registered protocol family 1
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> audit: initializing netlink socket (disabled)
> type=2000 audit(0.144:1): initialized
> highmem bounce pool size: 64 pages
> Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> NTFS driver 2.1.29 [Flags: R/O].
> msgmni has been set to 1502
> alg: No test for stdrng (krng)
> io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> Generic non-volatile memory driver v1.1
> Serial: 8250/16550 driver, 2 ports, IRQ sharing enabled
> serial8250.0: ttyS0 at MMIO 0xffe04500 (irq = 42) is a 16550A
> console [ttyS0] enabled, bootconsole disabled
> console [ttyS0] enabled, bootconsole disabled
> serial8250.0: ttyS1 at MMIO 0xffe04600 (irq = 28) is a 16550A
> brd: module loaded
> loop: module loaded
> nbd: registered device at major 43
> st: Version 20081215, fixed bufsize 32768, s/g segs 256
> ahci 0000:02:1f.1: AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl SATA mode
> ahci 0000:02:1f.1: flags: ncq sntf ilck pm led clo pmp pio slum part
> scsi0 : ahci
> scsi1 : ahci
> scsi2 : ahci
> scsi3 : ahci
> ata1: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006100 irq 5
> ata2: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006180 irq 5
> ata3: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006200 irq 5
> ata4: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006280 irq 5
> scsi4 : pata_ali
> scsi5 : pata_ali
> ata5: PATA max UDMA/133 cmd 0x1200 ctl 0x1208 bmdma 0x1220 irq 14
> ata6: PATA max UDMA/133 cmd 0x1210 ctl 0x1218 bmdma 0x1228 irq 14
> eth0: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:00:01
> eth0: Running with NAPI enabled
> eth0: :RX BD ring size for Q[0]: 256
> eth0:TX BD ring size for Q[0]: 256
> eth1: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:01:fd
> eth1: Running with NAPI enabled
> eth1: :RX BD ring size for Q[0]: 256
> eth1:TX BD ring size for Q[0]: 256
> eth2: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:02:fd
> eth2: Running with NAPI enabled
> eth2: :RX BD ring size for Q[0]: 256
> eth2:TX BD ring size for Q[0]: 256
> eth3: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:03:fd
> eth3: Running with NAPI enabled
> eth3: :RX BD ring size for Q[0]: 256
> eth3:TX BD ring size for Q[0]: 256
> Freescale PowerQUICC MII Bus: probed
> Freescale PowerQUICC MII Bus: probed
> Freescale PowerQUICC MII Bus: probed
> Freescale PowerQUICC MII Bus: probed
> usbmon: debugfs is not available
> ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> ehci_hcd 0000:02:1c.3: EHCI Host Controller
> ehci_hcd 0000:02:1c.3: new USB bus registered, assigned bus number 1
> ehci_hcd 0000:02:1c.3: debug port 1
> ehci_hcd 0000:02:1c.3: Enabling legacy PCI PM
> ehci_hcd 0000:02:1c.3: irq 11, io mem 0x80003000
> ehci_hcd 0000:02:1c.3: USB 2.0 started, EHCI 1.00
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 8 ports detected
> ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> ohci_hcd 0000:02:1c.0: OHCI Host Controller
> ata5.00: ATAPI: SONY    DVD RW AW-Q170A, 1.73, max UDMA/66
> ata5.00: WARNING: ATAPI DMA disabled for reliability issues.  It can be
> enabled
> ata5.00: WARNING: via pata_ali.atapi_dma modparam or corresponding sysfs
> node.
> ata5.00: configured for UDMA/66
> ohci_hcd 0000:02:1c.0: new USB bus registered, assigned bus number 2
> ohci_hcd 0000:02:1c.0: irq 12, io mem 0x80000000
> hub 2-0:1.0: USB hub found
> hub 2-0:1.0: 3 ports detected
> ohci_hcd 0000:02:1c.1: OHCI Host Controller
> ohci_hcd 0000:02:1c.1: new USB bus registered, assigned bus number 3
> ohci_hcd 0000:02:1c.1: irq 9, io mem 0x80001000
> ata3: SATA link down (SStatus 0 SControl 300)
> ata1: SATA link down (SStatus 0 SControl 300)
> ata4: SATA link down (SStatus 0 SControl 300)
> ata2: SATA link down (SStatus 0 SControl 300)
> hub 3-0:1.0: USB hub found
> hub 3-0:1.0: 3 ports detected
> ohci_hcd 0000:02:1c.2: OHCI Host Controller
> ohci_hcd 0000:02:1c.2: new USB bus registered, assigned bus number 4
> scsi 4:0:0:0: CD-ROM            SONY     DVD RW AW-Q170A  1.73 PQ: 0 ANSI: 5
> ohci_hcd 0000:02:1c.2: irq 10, io mem 0x80002000
> sr0: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
> Uniform CD-ROM driver Revision: 3.20
> sr 4:0:0:0: Attached scsi generic sg0 type 5
> hub 4-0:1.0: USB hub found
> hub 4-0:1.0: 3 ports detected
> Initializing USB Mass Storage driver...
> usbcore: registered new interface driver usb-storage
> USB Mass Storage support registered.
> i8042.c: No controller found.
> rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> rtc0: alarms up to one day, 114 bytes nvram
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> intel8x0_measure_ac97_clock: measured 50231 usecs (2424 samples)
> intel8x0: clocking to 48000
> ALSA device list:
>   #0: ALi M5455 with ALC650F at irq 6
> IPv4 over IPv4 tunneling driver
> GRE over IPv4 tunneling driver
> TCP cubic registered
> Initializing XFRM netlink socket
> NET: Registered protocol family 10
> IPv6 over IPv4 tunneling driver
> NET: Registered protocol family 17
> rtc_cmos rtc_cmos: setting system clock to 2002-03-11 18:46:05 UTC
> (1015872365)
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> ADDRCONF(NETDEV_UP): eth1: link is not ready
> ADDRCONF(NETDEV_UP): eth2: link is not ready
> ADDRCONF(NETDEV_UP): eth3: link is not ready
> Sending DHCP requests .
> PHY: mdio@ffe24520:00 - Link is Up - 1000/Full
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> ., OK
> IP-Config: Got DHCP answer from 192.168.0.1, my address is 192.168.0.241
> IP-Config: Complete:
>      device=eth0, addr=192.168.0.241, mask=255.255.255.0, gw=192.168.0.1,
>      host=192.168.0.241, domain=Radstone.Local, nis-domain=(none),
>      bootserver=192.168.0.1, rootserver=192.168.0.1, rootpath=
> Looking up port of RPC 100003/2 on 192.168.0.1
> Looking up port of RPC 100005/1 on 192.168.0.1
> VFS: Mounted root (nfs filesystem) on device 0:13.
> Freeing unused kernel memory: 220k init
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
>   

Further testing has shown that this isn't restricted to warm reboots, it
happens from cold as well. In addition, the exact timing of the failure
seems to vary, some boots have got further before failing.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-25 16:46     ` Martyn Welch
  0 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-25 16:46 UTC (permalink / raw)
  To: linuxppc-dev list, netdev, linux-kernel
  Cc: Anton Vorontsov, Sandeep Gopalpet, davem

Martyn Welch wrote:
> Martyn Welch wrote:
>   
>> I have recently attempted to boot an 8641D based board from an NFS root.
>> The boot process grinds to a halt not long after the first access of the
>> NFS root and I receive multiple "nfs: server 192.168.0.1 not responding,
>> still trying" messages. Wireshark suggests that there is no further
>> traffic from this board at this point on. The NFS server seems to
>> eventually try sending duplicate packets it's already sent, which
>> results in "nfs: server 192.168.0.1 OK" messages, but the "not
>> responding" messages resume with no further traffic from the board.
>>
>> I am able to boot to a ramdisk fine and the network seems to work -
>> though I haven't really pushed the interface from it.
>>
>> I have attempted to git bisect, though I wasn't able to get much further
>> than discovering the problem was introduced in the 2.6.33 merge window -
>> at which point the gianfar network driver fails to compile (I have tried
>> to git bisect skip many, many times to no avail).
>>
>> NFS booting fails for this board on todays linux-next, the master branch
>> of Kumar's PPC tree and the head of the main tree. I have also been able
>> to NFS boot from a random x86 based board that I have, using the head of
>> the main tree and the linux-next tree.
>>
>> Copying the gianfar drivers from 2.6.32 into the head of the main tree
>> restores the correct behaviour and I'm able to NFS boot. I have heard
>> from others that the latest drivers work on 83xx and 85xx based boards,
>> but it seems to be broken on at least the 8641D.
>>
>> I can see there has been a fair amount of work done on the gianfar
>> driver, I assume that this is a bug introduced by the multiple queue
>> support, but I'm way out of my depth on this.
>>   
>>     
> I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board
> and am having still experiencing the problems outlined in my previous
> email, though I have noticed that I tend to be able to boot from cold,
> but my boot fails on reboot. Hitting the reset button doesn't help, I
> need to actually power the machine on and off again for it to work.
>
> As before, I'm way out of my depth in this, any one have any ideas?
> Below is a dump of the failed boot process:
>
> U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31)
>
> Freescale PowerPC
> CPU:
>     Core: E600 Core 0, Version: 0.2, (0x80040202)
>     System: Unknown, Version: 2.0, (0x80900120)
>     Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC:  25 MHz
>     L2: Enabled
> Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version:
> 0x22
> I2C:   ready
> DRAM:      DDR:  1 GB
> FLASH:  8 MB
> Invalid ID (ff ff ff ff)
>                Scanning PCI bus 01
>     PCI-EXPRESS 1 on bus 00 - 02
>     PCI-EXPRESS 2 on bus 03 - 03
> Video: No radeon video card found!
> In:    serial
> Out:   serial
> Err:   serial
> SCSI:  AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl IDE mode
> flags: ncq ilck pm led clo pmp pio slum part
> scanning bus for devices...
> Net:   eTSEC1, eTSEC2, eTSEC3, eTSEC4
> =>  tftp 4000000 hpcn/uImage-torvalds-linux-2.6
> Speed: 1000, full duplex
> Using eTSEC1 device
> TFTP from server 192.168.0.1; our IP address is 192.168.0.30
> Filename 'hpcn/uImage-torvalds-linux-2.6'.
> Load address: 0x4000000
> Loading: #################################################################
>          #################################################################
>          #######################################################
> done
> Bytes transferred = 2709050 (29563a hex)
> => tftp 5000000 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb
> Speed: 1000, full duplex
> Using eTSEC1 device
> TFTP from server 192.168.0.1; our IP address is 192.168.0.30
> Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'.
> Load address: 0x5000000
> Loading: #
> done
> Bytes transferred = 11523 (2d03 hex)
> => setenv bootargs "root=/dev/nfs rw
> nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i"
> => bootm 4000000 - 5000000
> WARNING: adjusting available memory to 10000000
> ## Booting kernel from Legacy Image at 04000000 ...
>    Image Name:   Linux-2.6.33-00001-gbaac35c
>    Image Type:   PowerPC Linux Kernel Image (gzip compressed)
>    Data Size:    2708986 Bytes =  2.6 MB
>    Load Address: 00000000
>    Entry Point:  00000000
>    Verifying Checksum ... OK
> ## Flattened Device Tree blob at 05000000
>    Booting using the fdt blob at 0x5000000
>    Uncompressing Kernel Image ... OK
>    Loading Device Tree to 007fa000, end 007ffd02 ... OK
> Using MPC86xx HPCN machine description
> Total memory = 1024MB; using 2048kB for hash table (at cfe00000)
> Linux version 2.6.33-00001-gbaac35c (welchma@ES-J7S4D2J) (gcc version
> 4.1.2) #20
> CPU maps initialized for 1 thread per core
> bootconsole [udbg0] enabled
> setup_arch: bootmem
> mpc86xx_hpcn_setup_arch()
> Found FSL PCI host bridge at 0x00000000ffe08000. Firmware bus number: 0->2
> PCI host bridge /pcie@ffe08000 (primary) ranges:
>  MEM 0x0000000080000000..0x000000009fffffff -> 0x0000000080000000
>   IO 0x00000000ffc00000..0x00000000ffc0ffff -> 0x0000000000000000
> /pcie@ffe08000: PCICSRBAR @ 0xfff00000
> Found FSL PCI host bridge at 0x00000000ffe09000. Firmware bus number: 0->0
> PCI host bridge /pcie@ffe09000  ranges:
>  MEM 0x00000000a0000000..0x00000000bfffffff -> 0x00000000a0000000
>   IO 0x00000000ffc10000..0x00000000ffc1ffff -> 0x0000000000000000
> /pcie@ffe09000: PCICSRBAR @ 0xfff00000
> MPC86xx HPCN board from Freescale Semiconductor
> arch: exit
> Zone PFN ranges:
>   DMA      0x00000000 -> 0x00030000
>   Normal   0x00030000 -> 0x00030000
>   HighMem  0x00030000 -> 0x00040000
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>     0: 0x00000000 -> 0x00040000
> PERCPU: Embedded 7 pages/cpu @c1003000 s7712 r8192 d12768 u65536
> pcpu-alloc: s7712 r8192 d12768 u65536 alloc=16*4096
> pcpu-alloc: [0] 0 [0] 1
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
> Kernel command line: root=/dev/nfs rw
> nfsroot=192.168.0.1:/tftpboot/hpcn/root/ p
> PID hash table entries: 4096 (order: 2, 16384 bytes)
> Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> Memory: 1030864k/1048576k available (5228k kernel code, 17004k reserved,
> 196k d)
> Kernel virtual memory layout:
>   * 0xfffc1000..0xfffff000  : fixmap
>   * 0xff800000..0xffc00000  : highmem PTEs
>   * 0xff7da000..0xff800000  : early ioremap
>   * 0xf1000000..0xff7da000  : vmalloc & ioremap
> SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> Hierarchical RCU implementation.
> NR_IRQS:512 nr_irqs:512
> mpic: Setting up MPIC " MPIC     " version 1.2 at ffe40000, max 2 CPUs
> mpic: ISU size: 256, shift: 8, mask: ff
> mpic: Initializing for 256 sources
> i8259 legacy interrupt controller initialized
> clocksource: timebase mult[2800000] shift[22] registered
> Console: colour dummy device 80x25
> Mount-cache hash table entries: 512
> mpic: requesting IPIs ...
> Processor 1 found.
> Brought up 2 CPUs
> NET: Registered protocol family 16
>             
> PCI: Probing PCI hardware
> pci 0000:00:00.0: ignoring class b20 (doesn't match header type 01)
> pci 0000:00:00.0: PCI bridge to [bus 01-ff]
> pci 0000:02:1d.0: unsupported PM cap regs version (4)
> pci 0000:01:00.0: PCI bridge to [bus 02-ff] (subtractive decode)
> pci 0001:03:00.0: ignoring class b20 (doesn't match header type 01)
> pci 0001:03:00.0: PCI bridge to [bus 04-ff]
> pci 0000:01:00.0: PCI bridge to [bus 02-02]
> pci 0000:01:00.0:   bridge window [io  0x1000-0x1fff]
> pci 0000:01:00.0:   bridge window [mem 0x80000000-0x800fffff]
> pci 0000:01:00.0:   bridge window [mem pref disabled]
> pci 0000:00:00.0: PCI bridge to [bus 01-02]
> pci 0000:00:00.0:   bridge window [io  0x0000-0xffff]
> pci 0000:00:00.0:   bridge window [mem 0x80000000-0x9fffffff]
> pci 0000:00:00.0:   bridge window [mem pref disabled]
> pci 0000:00:00.0: enabling device (0106 -> 0107)
> pci 0001:03:00.0: PCI bridge to [bus 04-04]
> pci 0001:03:00.0:   bridge window [io  0xfffee000-0xffffdfff]
> pci 0001:03:00.0:   bridge window [mem 0xa0000000-0xbfffffff]
> pci 0001:03:00.0:   bridge window [mem pref disabled]
> pci 0001:03:00.0: enabling device (0106 -> 0107)
> bio: create slab <bio-0> at 0
> vgaarb: loaded
> SCSI subsystem initialized
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> Advanced Linux Sound Architecture Driver Version 1.0.21.
> Switching to clocksource timebase
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> UDP hash table entries: 512 (order: 2, 16384 bytes)
> UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
> NET: Registered protocol family 1
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> audit: initializing netlink socket (disabled)
> type=2000 audit(0.144:1): initialized
> highmem bounce pool size: 64 pages
> Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> NTFS driver 2.1.29 [Flags: R/O].
> msgmni has been set to 1502
> alg: No test for stdrng (krng)
> io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> Generic non-volatile memory driver v1.1
> Serial: 8250/16550 driver, 2 ports, IRQ sharing enabled
> serial8250.0: ttyS0 at MMIO 0xffe04500 (irq = 42) is a 16550A
> console [ttyS0] enabled, bootconsole disabled
> console [ttyS0] enabled, bootconsole disabled
> serial8250.0: ttyS1 at MMIO 0xffe04600 (irq = 28) is a 16550A
> brd: module loaded
> loop: module loaded
> nbd: registered device at major 43
> st: Version 20081215, fixed bufsize 32768, s/g segs 256
> ahci 0000:02:1f.1: AHCI 0001.0000 32 slots 4 ports 3 Gbps 0xf impl SATA mode
> ahci 0000:02:1f.1: flags: ncq sntf ilck pm led clo pmp pio slum part
> scsi0 : ahci
> scsi1 : ahci
> scsi2 : ahci
> scsi3 : ahci
> ata1: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006100 irq 5
> ata2: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006180 irq 5
> ata3: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006200 irq 5
> ata4: SATA max UDMA/133 abar m1024@0x80006000 port 0x80006280 irq 5
> scsi4 : pata_ali
> scsi5 : pata_ali
> ata5: PATA max UDMA/133 cmd 0x1200 ctl 0x1208 bmdma 0x1220 irq 14
> ata6: PATA max UDMA/133 cmd 0x1210 ctl 0x1218 bmdma 0x1228 irq 14
> eth0: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:00:01
> eth0: Running with NAPI enabled
> eth0: :RX BD ring size for Q[0]: 256
> eth0:TX BD ring size for Q[0]: 256
> eth1: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:01:fd
> eth1: Running with NAPI enabled
> eth1: :RX BD ring size for Q[0]: 256
> eth1:TX BD ring size for Q[0]: 256
> eth2: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:02:fd
> eth2: Running with NAPI enabled
> eth2: :RX BD ring size for Q[0]: 256
> eth2:TX BD ring size for Q[0]: 256
> eth3: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:03:fd
> eth3: Running with NAPI enabled
> eth3: :RX BD ring size for Q[0]: 256
> eth3:TX BD ring size for Q[0]: 256
> Freescale PowerQUICC MII Bus: probed
> Freescale PowerQUICC MII Bus: probed
> Freescale PowerQUICC MII Bus: probed
> Freescale PowerQUICC MII Bus: probed
> usbmon: debugfs is not available
> ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> ehci_hcd 0000:02:1c.3: EHCI Host Controller
> ehci_hcd 0000:02:1c.3: new USB bus registered, assigned bus number 1
> ehci_hcd 0000:02:1c.3: debug port 1
> ehci_hcd 0000:02:1c.3: Enabling legacy PCI PM
> ehci_hcd 0000:02:1c.3: irq 11, io mem 0x80003000
> ehci_hcd 0000:02:1c.3: USB 2.0 started, EHCI 1.00
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 8 ports detected
> ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> ohci_hcd 0000:02:1c.0: OHCI Host Controller
> ata5.00: ATAPI: SONY    DVD RW AW-Q170A, 1.73, max UDMA/66
> ata5.00: WARNING: ATAPI DMA disabled for reliability issues.  It can be
> enabled
> ata5.00: WARNING: via pata_ali.atapi_dma modparam or corresponding sysfs
> node.
> ata5.00: configured for UDMA/66
> ohci_hcd 0000:02:1c.0: new USB bus registered, assigned bus number 2
> ohci_hcd 0000:02:1c.0: irq 12, io mem 0x80000000
> hub 2-0:1.0: USB hub found
> hub 2-0:1.0: 3 ports detected
> ohci_hcd 0000:02:1c.1: OHCI Host Controller
> ohci_hcd 0000:02:1c.1: new USB bus registered, assigned bus number 3
> ohci_hcd 0000:02:1c.1: irq 9, io mem 0x80001000
> ata3: SATA link down (SStatus 0 SControl 300)
> ata1: SATA link down (SStatus 0 SControl 300)
> ata4: SATA link down (SStatus 0 SControl 300)
> ata2: SATA link down (SStatus 0 SControl 300)
> hub 3-0:1.0: USB hub found
> hub 3-0:1.0: 3 ports detected
> ohci_hcd 0000:02:1c.2: OHCI Host Controller
> ohci_hcd 0000:02:1c.2: new USB bus registered, assigned bus number 4
> scsi 4:0:0:0: CD-ROM            SONY     DVD RW AW-Q170A  1.73 PQ: 0 ANSI: 5
> ohci_hcd 0000:02:1c.2: irq 10, io mem 0x80002000
> sr0: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
> Uniform CD-ROM driver Revision: 3.20
> sr 4:0:0:0: Attached scsi generic sg0 type 5
> hub 4-0:1.0: USB hub found
> hub 4-0:1.0: 3 ports detected
> Initializing USB Mass Storage driver...
> usbcore: registered new interface driver usb-storage
> USB Mass Storage support registered.
> i8042.c: No controller found.
> rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> rtc0: alarms up to one day, 114 bytes nvram
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> intel8x0_measure_ac97_clock: measured 50231 usecs (2424 samples)
> intel8x0: clocking to 48000
> ALSA device list:
>   #0: ALi M5455 with ALC650F at irq 6
> IPv4 over IPv4 tunneling driver
> GRE over IPv4 tunneling driver
> TCP cubic registered
> Initializing XFRM netlink socket
> NET: Registered protocol family 10
> IPv6 over IPv4 tunneling driver
> NET: Registered protocol family 17
> rtc_cmos rtc_cmos: setting system clock to 2002-03-11 18:46:05 UTC
> (1015872365)
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> ADDRCONF(NETDEV_UP): eth1: link is not ready
> ADDRCONF(NETDEV_UP): eth2: link is not ready
> ADDRCONF(NETDEV_UP): eth3: link is not ready
> Sending DHCP requests .
> PHY: mdio@ffe24520:00 - Link is Up - 1000/Full
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> ., OK
> IP-Config: Got DHCP answer from 192.168.0.1, my address is 192.168.0.241
> IP-Config: Complete:
>      device=eth0, addr=192.168.0.241, mask=255.255.255.0, gw=192.168.0.1,
>      host=192.168.0.241, domain=Radstone.Local, nis-domain=(none),
>      bootserver=192.168.0.1, rootserver=192.168.0.1, rootpath=
> Looking up port of RPC 100003/2 on 192.168.0.1
> Looking up port of RPC 100005/1 on 192.168.0.1
> VFS: Mounted root (nfs filesystem) on device 0:13.
> Freeing unused kernel memory: 220k init
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
> nfs: server 192.168.0.1 not responding, still trying
>   

Further testing has shown that this isn't restricted to warm reboots, it
happens from cold as well. In addition, the exact timing of the failure
seems to vary, some boots have got further before failing.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-25 16:46     ` Martyn Welch
@ 2010-02-25 16:51       ` Anton Vorontsov
  -1 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-25 16:51 UTC (permalink / raw)
  To: Martyn Welch
  Cc: linuxppc-dev list, netdev, linux-kernel, Sandeep Gopalpet, davem,
	Kumar Gala

On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
[...]
> > nfs: server 192.168.0.1 not responding, still trying
> >   
> 
> Further testing has shown that this isn't restricted to warm reboots, it
> happens from cold as well. In addition, the exact timing of the failure
> seems to vary, some boots have got further before failing.

Unfortunately I don't have any 8641 boards near me, so I can't
debug this myself. Though, I tested gianfar on MPC8568E-MDS with
2.6.33 kernel, and it seems to work just fine.

I see you use SMP. Can you try to turn it off? If that will fix
the issue, then it'll be a good data point.

Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
know the results.

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-25 16:51       ` Anton Vorontsov
  0 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-25 16:51 UTC (permalink / raw)
  To: Martyn Welch
  Cc: netdev, linux-kernel, linuxppc-dev list, Sandeep Gopalpet, davem

On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
[...]
> > nfs: server 192.168.0.1 not responding, still trying
> >   
> 
> Further testing has shown that this isn't restricted to warm reboots, it
> happens from cold as well. In addition, the exact timing of the failure
> seems to vary, some boots have got further before failing.

Unfortunately I don't have any 8641 boards near me, so I can't
debug this myself. Though, I tested gianfar on MPC8568E-MDS with
2.6.33 kernel, and it seems to work just fine.

I see you use SMP. Can you try to turn it off? If that will fix
the issue, then it'll be a good data point.

Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
know the results.

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-25 16:51       ` Anton Vorontsov
@ 2010-02-25 17:49         ` Anton Vorontsov
  -1 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-25 17:49 UTC (permalink / raw)
  To: Martyn Welch
  Cc: linuxppc-dev list, netdev, linux-kernel, Sandeep Gopalpet, davem,
	Kumar Gala

On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
> [...]
> > > nfs: server 192.168.0.1 not responding, still trying
> > >   
> > 
> > Further testing has shown that this isn't restricted to warm reboots, it
> > happens from cold as well. In addition, the exact timing of the failure
> > seems to vary, some boots have got further before failing.
> 
> Unfortunately I don't have any 8641 boards near me, so I can't
> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
> 2.6.33 kernel, and it seems to work just fine.
> 
> I see you use SMP. Can you try to turn it off? If that will fix
> the issue, then it'll be a good data point.
> 
> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
> know the results.

Nope, no luck. Can't trigger the issue. :-/
Tested with NFS boot, TCP and UDP netperf tests.

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-25 17:49         ` Anton Vorontsov
  0 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-25 17:49 UTC (permalink / raw)
  To: Martyn Welch
  Cc: netdev, linux-kernel, linuxppc-dev list, Sandeep Gopalpet, davem

On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
> [...]
> > > nfs: server 192.168.0.1 not responding, still trying
> > >   
> > 
> > Further testing has shown that this isn't restricted to warm reboots, it
> > happens from cold as well. In addition, the exact timing of the failure
> > seems to vary, some boots have got further before failing.
> 
> Unfortunately I don't have any 8641 boards near me, so I can't
> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
> 2.6.33 kernel, and it seems to work just fine.
> 
> I see you use SMP. Can you try to turn it off? If that will fix
> the issue, then it'll be a good data point.
> 
> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
> know the results.

Nope, no luck. Can't trigger the issue. :-/
Tested with NFS boot, TCP and UDP netperf tests.

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-25 16:46     ` Martyn Welch
@ 2010-02-25 18:27       ` Kumar Gala
  -1 siblings, 0 replies; 40+ messages in thread
From: Kumar Gala @ 2010-02-25 18:27 UTC (permalink / raw)
  To: Martyn Welch
  Cc: linuxppc-dev list, netdev, linux-kernel, Anton Vorontsov,
	Sandeep Gopalpet, davem


On Feb 25, 2010, at 10:46 AM, Martyn Welch wrote:

> 
> Further testing has shown that this isn't restricted to warm reboots, it
> happens from cold as well. In addition, the exact timing of the failure
> seems to vary, some boots have got further before failing.

what mechanism do you use for warm resets?

- k

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-25 18:27       ` Kumar Gala
  0 siblings, 0 replies; 40+ messages in thread
From: Kumar Gala @ 2010-02-25 18:27 UTC (permalink / raw)
  To: Martyn Welch
  Cc: netdev, linux-kernel, linuxppc-dev list, Sandeep Gopalpet,
	Anton Vorontsov, davem


On Feb 25, 2010, at 10:46 AM, Martyn Welch wrote:

> 
> Further testing has shown that this isn't restricted to warm reboots, it
> happens from cold as well. In addition, the exact timing of the failure
> seems to vary, some boots have got further before failing.

what mechanism do you use for warm resets?

- k

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-25 17:49         ` Anton Vorontsov
@ 2010-02-26  0:53           ` Paul Gortmaker
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26  0:53 UTC (permalink / raw)
  To: avorontsov
  Cc: Martyn Welch, linuxppc-dev list, netdev, linux-kernel,
	Sandeep Gopalpet, davem, Kumar Gala

On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov
<avorontsov@ru.mvista.com> wrote:
> On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote:
>> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
>> [...]
>> > > nfs: server 192.168.0.1 not responding, still trying
>> > >
>> >
>> > Further testing has shown that this isn't restricted to warm reboots, it
>> > happens from cold as well. In addition, the exact timing of the failure
>> > seems to vary, some boots have got further before failing.
>>
>> Unfortunately I don't have any 8641 boards near me, so I can't
>> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
>> 2.6.33 kernel, and it seems to work just fine.
>>
>> I see you use SMP. Can you try to turn it off? If that will fix
>> the issue, then it'll be a good data point.
>>
>> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
>> know the results.
>
> Nope, no luck. Can't trigger the issue. :-/
> Tested with NFS boot, TCP and UDP netperf tests.

I was able to reproduce it on an 8641D and bisected it down to this:

-----------
commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
Author: Anton Vorontsov <avorontsov@ru.mvista.com>
Date:   Tue Nov 10 14:11:10 2009 +0000

    gianfar: Revive SKB recycling

    Before calling gfar_clean_tx_ring() the driver grabs an irqsave
    spinlock, and then tries to recycle skbs. But since
    skb_recycle_check() returns 0 with IRQs disabled, we'll never
    recycle any skbs.

    It appears that gfar_clean_tx_ring() and gfar_start_xmit() are
    mostly idependent and can work in parallel, except when they
    modify num_txbdfree.

    So we can drop the lock from most sections and thus fix the skb
    recycling.
-----------

...which probably explains why you weren't seeing it on non-SMP.
I'd imagine it would show up on any of the e500mc boards too.

I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then
cherry-picked those onto a 32 baseline to reduce the scale of
the bisection, but I don't think that should impact the final
result I got in any meaningful way.

Paul.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26  0:53           ` Paul Gortmaker
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26  0:53 UTC (permalink / raw)
  To: avorontsov
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list,
	Sandeep Gopalpet, davem

On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov
<avorontsov@ru.mvista.com> wrote:
> On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote:
>> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
>> [...]
>> > > nfs: server 192.168.0.1 not responding, still trying
>> > >
>> >
>> > Further testing has shown that this isn't restricted to warm reboots, it
>> > happens from cold as well. In addition, the exact timing of the failure
>> > seems to vary, some boots have got further before failing.
>>
>> Unfortunately I don't have any 8641 boards near me, so I can't
>> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
>> 2.6.33 kernel, and it seems to work just fine.
>>
>> I see you use SMP. Can you try to turn it off? If that will fix
>> the issue, then it'll be a good data point.
>>
>> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
>> know the results.
>
> Nope, no luck. Can't trigger the issue. :-/
> Tested with NFS boot, TCP and UDP netperf tests.

I was able to reproduce it on an 8641D and bisected it down to this:

-----------
commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
Author: Anton Vorontsov <avorontsov@ru.mvista.com>
Date:   Tue Nov 10 14:11:10 2009 +0000

    gianfar: Revive SKB recycling

    Before calling gfar_clean_tx_ring() the driver grabs an irqsave
    spinlock, and then tries to recycle skbs. But since
    skb_recycle_check() returns 0 with IRQs disabled, we'll never
    recycle any skbs.

    It appears that gfar_clean_tx_ring() and gfar_start_xmit() are
    mostly idependent and can work in parallel, except when they
    modify num_txbdfree.

    So we can drop the lock from most sections and thus fix the skb
    recycling.
-----------

...which probably explains why you weren't seeing it on non-SMP.
I'd imagine it would show up on any of the e500mc boards too.

I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then
cherry-picked those onto a 32 baseline to reduce the scale of
the bisection, but I don't think that should impact the final
result I got in any meaningful way.

Paul.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26  0:53           ` Paul Gortmaker
@ 2010-02-26  3:14             ` Anton Vorontsov
  -1 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26  3:14 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Martyn Welch, linuxppc-dev list, netdev, linux-kernel,
	Sandeep Gopalpet, davem, Kumar Gala

On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
[...]
> I was able to reproduce it on an 8641D and bisected it down to this:
> 
> -----------
> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
> Date:   Tue Nov 10 14:11:10 2009 +0000
> 
>     gianfar: Revive SKB recycling

Thanks for the bisect. I have a guess why tx hangs in
SMP case. Could anyone try the patch down below?

[...]
> ...which probably explains why you weren't seeing it on non-SMP.
> I'd imagine it would show up on any of the e500mc boards too.

Yeah.. Pity, I don't have SMP boards anymore. I'll try
to get one though.


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..3ff3bd0 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 			tx_queue = priv->tx_queue[rx_queue->qindex];
 
 			tx_cleaned += gfar_clean_tx_ring(tx_queue);
+			if (!tx_cleaned && !tx_queue->num_txbdfree)
+				tx_cleaned += 1; /* don't complete napi */
 			rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
 							budget_per_queue);
 			rx_cleaned += rx_cleaned_per_queue;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26  3:14             ` Anton Vorontsov
  0 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26  3:14 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list,
	Sandeep Gopalpet, davem

On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
[...]
> I was able to reproduce it on an 8641D and bisected it down to this:
> 
> -----------
> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
> Date:   Tue Nov 10 14:11:10 2009 +0000
> 
>     gianfar: Revive SKB recycling

Thanks for the bisect. I have a guess why tx hangs in
SMP case. Could anyone try the patch down below?

[...]
> ...which probably explains why you weren't seeing it on non-SMP.
> I'd imagine it would show up on any of the e500mc boards too.

Yeah.. Pity, I don't have SMP boards anymore. I'll try
to get one though.


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..3ff3bd0 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 			tx_queue = priv->tx_queue[rx_queue->qindex];
 
 			tx_cleaned += gfar_clean_tx_ring(tx_queue);
+			if (!tx_cleaned && !tx_queue->num_txbdfree)
+				tx_cleaned += 1; /* don't complete napi */
 			rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
 							budget_per_queue);
 			rx_cleaned += rx_cleaned_per_queue;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: Gianfar driver failing on MPC8641D based board
  2010-02-26  3:14             ` Anton Vorontsov
@ 2010-02-26  4:58               ` Kumar Gopalpet-B05799
  -1 siblings, 0 replies; 40+ messages in thread
From: Kumar Gopalpet-B05799 @ 2010-02-26  4:58 UTC (permalink / raw)
  To: avorontsov, Paul Gortmaker
  Cc: Martyn Welch, linuxppc-dev list, netdev, linux-kernel, davem, Kumar Gala

 

>-----Original Message-----
>From: Anton Vorontsov [mailto:avorontsov@ru.mvista.com] 
>Sent: Friday, February 26, 2010 8:45 AM
>To: Paul Gortmaker
>Cc: Martyn Welch; linuxppc-dev list; netdev@vger.kernel.org; 
>linux-kernel@vger.kernel.org; Kumar Gopalpet-B05799; 
>davem@davemloft.net; Kumar Gala
>Subject: Re: Gianfar driver failing on MPC8641D based board
>
>On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>[...]
>> I was able to reproduce it on an 8641D and bisected it down to this:
>> 
>> -----------
>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
>> Date:   Tue Nov 10 14:11:10 2009 +0000
>> 
>>     gianfar: Revive SKB recycling
>
>Thanks for the bisect. I have a guess why tx hangs in SMP 
>case. Could anyone try the patch down below?
>
>[...]
>> ...which probably explains why you weren't seeing it on non-SMP.
>> I'd imagine it would show up on any of the e500mc boards too.
>
>Yeah.. Pity, I don't have SMP boards anymore. I'll try to get 
>one though.
>
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c 
>index 8bd3c9f..3ff3bd0 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct 
>*napi, int budget)
> 			tx_queue = priv->tx_queue[rx_queue->qindex];
> 
> 			tx_cleaned += gfar_clean_tx_ring(tx_queue);
>+			if (!tx_cleaned && !tx_queue->num_txbdfree)
>+				tx_cleaned += 1; /* don't 
>complete napi */
> 			rx_cleaned_per_queue = 
>gfar_clean_rx_ring(rx_queue,
> 							
>budget_per_queue);
> 			rx_cleaned += rx_cleaned_per_queue;
>

Anton, 

There is also one more issue that I have been observing with the patch
"gianfar: Revive SKB recycling".
The issue is when I do a IPV4 forwarding test scenario with
bidirectional flows (SMP environment). I am using Spirent smart bits
(smartflow) for automation testing and I frequently observe smart flow
reporting "Rx packet counte greater than Tx packet count. Duplicate
packets might have been received".

To just get over the issue I have removed this patch and I didn't see
the issue.

To a certain extent I could get over the problem by using atomic_t for
num_txbdfree (atomic_add and atomic_dec instructions for updating the
num_txbdfree) and completely removing the spin_locks in the tx routines.

Also, I feel we might want to make some more changes to the
gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can
operate parallely. 

I am really sorry for not posting it a bit earlier as I am caught up
with some urgent issues.

--

Thanks
Sandeep

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Gianfar driver failing on MPC8641D based board
@ 2010-02-26  4:58               ` Kumar Gopalpet-B05799
  0 siblings, 0 replies; 40+ messages in thread
From: Kumar Gopalpet-B05799 @ 2010-02-26  4:58 UTC (permalink / raw)
  To: avorontsov, Paul Gortmaker
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list, davem

=20

>-----Original Message-----
>From: Anton Vorontsov [mailto:avorontsov@ru.mvista.com]=20
>Sent: Friday, February 26, 2010 8:45 AM
>To: Paul Gortmaker
>Cc: Martyn Welch; linuxppc-dev list; netdev@vger.kernel.org;=20
>linux-kernel@vger.kernel.org; Kumar Gopalpet-B05799;=20
>davem@davemloft.net; Kumar Gala
>Subject: Re: Gianfar driver failing on MPC8641D based board
>
>On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>[...]
>> I was able to reproduce it on an 8641D and bisected it down to this:
>>=20
>> -----------
>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>=20
>>     gianfar: Revive SKB recycling
>
>Thanks for the bisect. I have a guess why tx hangs in SMP=20
>case. Could anyone try the patch down below?
>
>[...]
>> ...which probably explains why you weren't seeing it on non-SMP.
>> I'd imagine it would show up on any of the e500mc boards too.
>
>Yeah.. Pity, I don't have SMP boards anymore. I'll try to get=20
>one though.
>
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c=20
>index 8bd3c9f..3ff3bd0 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct=20
>*napi, int budget)
> 			tx_queue =3D priv->tx_queue[rx_queue->qindex];
>=20
> 			tx_cleaned +=3D gfar_clean_tx_ring(tx_queue);
>+			if (!tx_cleaned && !tx_queue->num_txbdfree)
>+				tx_cleaned +=3D 1; /* don't=20
>complete napi */
> 			rx_cleaned_per_queue =3D=20
>gfar_clean_rx_ring(rx_queue,
> 						=09
>budget_per_queue);
> 			rx_cleaned +=3D rx_cleaned_per_queue;
>

Anton,=20

There is also one more issue that I have been observing with the patch
"gianfar: Revive SKB recycling".
The issue is when I do a IPV4 forwarding test scenario with
bidirectional flows (SMP environment). I am using Spirent smart bits
(smartflow) for automation testing and I frequently observe smart flow
reporting "Rx packet counte greater than Tx packet count. Duplicate
packets might have been received".

To just get over the issue I have removed this patch and I didn't see
the issue.

To a certain extent I could get over the problem by using atomic_t for
num_txbdfree (atomic_add and atomic_dec instructions for updating the
num_txbdfree) and completely removing the spin_locks in the tx routines.

Also, I feel we might want to make some more changes to the
gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can
operate parallely.=20

I am really sorry for not posting it a bit earlier as I am caught up
with some urgent issues.

--

Thanks
Sandeep

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-25 16:51       ` Anton Vorontsov
@ 2010-02-26 11:51         ` Martyn Welch
  -1 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 11:51 UTC (permalink / raw)
  To: avorontsov
  Cc: linuxppc-dev list, netdev, linux-kernel, Sandeep Gopalpet, davem,
	Kumar Gala

Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
> [...]
>   
>>> nfs: server 192.168.0.1 not responding, still trying
>>>   
>>>       
>> Further testing has shown that this isn't restricted to warm reboots, it
>> happens from cold as well. In addition, the exact timing of the failure
>> seems to vary, some boots have got further before failing.
>>     
>
> Unfortunately I don't have any 8641 boards near me, so I can't
> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
> 2.6.33 kernel, and it seems to work just fine.
>
> I see you use SMP. Can you try to turn it off? If that will fix
> the issue, then it'll be a good data point.
>
> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
> know the results.
>
> Thanks

I removed the second core from the dts file rather than truly disabling
SMP in the kernel config. Doing this allowed the board to boot reliably.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 11:51         ` Martyn Welch
  0 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 11:51 UTC (permalink / raw)
  To: avorontsov
  Cc: netdev, linux-kernel, linuxppc-dev list, Sandeep Gopalpet, davem

Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
> [...]
>   
>>> nfs: server 192.168.0.1 not responding, still trying
>>>   
>>>       
>> Further testing has shown that this isn't restricted to warm reboots, it
>> happens from cold as well. In addition, the exact timing of the failure
>> seems to vary, some boots have got further before failing.
>>     
>
> Unfortunately I don't have any 8641 boards near me, so I can't
> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
> 2.6.33 kernel, and it seems to work just fine.
>
> I see you use SMP. Can you try to turn it off? If that will fix
> the issue, then it'll be a good data point.
>
> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
> know the results.
>
> Thanks

I removed the second core from the dts file rather than truly disabling
SMP in the kernel config. Doing this allowed the board to boot reliably.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26  3:14             ` Anton Vorontsov
@ 2010-02-26 12:06               ` Martyn Welch
  -1 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 12:06 UTC (permalink / raw)
  To: avorontsov
  Cc: Paul Gortmaker, linuxppc-dev list, netdev, linux-kernel,
	Sandeep Gopalpet, davem, Kumar Gala

Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
> [...]
>   
>> I was able to reproduce it on an 8641D and bisected it down to this:
>>
>> -----------
>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>
>>     gianfar: Revive SKB recycling
>>     
>
> Thanks for the bisect. I have a guess why tx hangs in
> SMP case. Could anyone try the patch down below?
>   

Yup, no problem. I'm afraid it doesn't resolve the problem for me.

> [...]
>   
>> ...which probably explains why you weren't seeing it on non-SMP.
>> I'd imagine it would show up on any of the e500mc boards too.
>>     
>
> Yeah.. Pity, I don't have SMP boards anymore. I'll try
> to get one though.
>
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..3ff3bd0 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
>  			tx_queue = priv->tx_queue[rx_queue->qindex];
>  
>  			tx_cleaned += gfar_clean_tx_ring(tx_queue);
> +			if (!tx_cleaned && !tx_queue->num_txbdfree)
> +				tx_cleaned += 1; /* don't complete napi */
>  			rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
>  							budget_per_queue);
>  			rx_cleaned += rx_cleaned_per_queue;
>   


-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 12:06               ` Martyn Welch
  0 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 12:06 UTC (permalink / raw)
  To: avorontsov
  Cc: linuxppc-dev list, netdev, linux-kernel, Paul Gortmaker,
	Sandeep Gopalpet, davem

Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
> [...]
>   
>> I was able to reproduce it on an 8641D and bisected it down to this:
>>
>> -----------
>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>
>>     gianfar: Revive SKB recycling
>>     
>
> Thanks for the bisect. I have a guess why tx hangs in
> SMP case. Could anyone try the patch down below?
>   

Yup, no problem. I'm afraid it doesn't resolve the problem for me.

> [...]
>   
>> ...which probably explains why you weren't seeing it on non-SMP.
>> I'd imagine it would show up on any of the e500mc boards too.
>>     
>
> Yeah.. Pity, I don't have SMP boards anymore. I'll try
> to get one though.
>
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..3ff3bd0 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
>  			tx_queue = priv->tx_queue[rx_queue->qindex];
>  
>  			tx_cleaned += gfar_clean_tx_ring(tx_queue);
> +			if (!tx_cleaned && !tx_queue->num_txbdfree)
> +				tx_cleaned += 1; /* don't complete napi */
>  			rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
>  							budget_per_queue);
>  			rx_cleaned += rx_cleaned_per_queue;
>   


-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 12:06               ` Martyn Welch
@ 2010-02-26 14:35                 ` Anton Vorontsov
  -1 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26 14:35 UTC (permalink / raw)
  To: Martyn Welch
  Cc: linuxppc-dev list, netdev, linux-kernel, Paul Gortmaker,
	Sandeep Gopalpet, davem

On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
> Anton Vorontsov wrote:
> > On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
> > [...]
> >   
> >> I was able to reproduce it on an 8641D and bisected it down to this:
> >>
> >> -----------
> >> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
> >> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
> >> Date:   Tue Nov 10 14:11:10 2009 +0000
> >>
> >>     gianfar: Revive SKB recycling
> >>     
> >
> > Thanks for the bisect. I have a guess why tx hangs in
> > SMP case. Could anyone try the patch down below?
> >   
> 
> Yup, no problem. I'm afraid it doesn't resolve the problem for me.

Hm.. I found a p2020 board and I was able to reproduce the issue.
The patch down below fixed it completely for me... hm.

I'll look further, thanks!

> > [...]
> >   
> >> ...which probably explains why you weren't seeing it on non-SMP.
> >> I'd imagine it would show up on any of the e500mc boards too.
> >>     
> >
> > Yeah.. Pity, I don't have SMP boards anymore. I'll try
> > to get one though.
> >
> >
> > diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> > index 8bd3c9f..3ff3bd0 100644
> > --- a/drivers/net/gianfar.c
> > +++ b/drivers/net/gianfar.c
> > @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
> >  			tx_queue = priv->tx_queue[rx_queue->qindex];
> >  
> >  			tx_cleaned += gfar_clean_tx_ring(tx_queue);
> > +			if (!tx_cleaned && !tx_queue->num_txbdfree)
> > +				tx_cleaned += 1; /* don't complete napi */
> >  			rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
> >  							budget_per_queue);
> >  			rx_cleaned += rx_cleaned_per_queue;
> >   

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 14:35                 ` Anton Vorontsov
  0 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26 14:35 UTC (permalink / raw)
  To: Martyn Welch
  Cc: Paul Gortmaker, netdev, linux-kernel, linuxppc-dev list,
	Sandeep Gopalpet, davem

On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
> Anton Vorontsov wrote:
> > On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
> > [...]
> >   
> >> I was able to reproduce it on an 8641D and bisected it down to this:
> >>
> >> -----------
> >> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
> >> Author: Anton Vorontsov <avorontsov@ru.mvista.com>
> >> Date:   Tue Nov 10 14:11:10 2009 +0000
> >>
> >>     gianfar: Revive SKB recycling
> >>     
> >
> > Thanks for the bisect. I have a guess why tx hangs in
> > SMP case. Could anyone try the patch down below?
> >   
> 
> Yup, no problem. I'm afraid it doesn't resolve the problem for me.

Hm.. I found a p2020 board and I was able to reproduce the issue.
The patch down below fixed it completely for me... hm.

I'll look further, thanks!

> > [...]
> >   
> >> ...which probably explains why you weren't seeing it on non-SMP.
> >> I'd imagine it would show up on any of the e500mc boards too.
> >>     
> >
> > Yeah.. Pity, I don't have SMP boards anymore. I'll try
> > to get one though.
> >
> >
> > diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> > index 8bd3c9f..3ff3bd0 100644
> > --- a/drivers/net/gianfar.c
> > +++ b/drivers/net/gianfar.c
> > @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
> >  			tx_queue = priv->tx_queue[rx_queue->qindex];
> >  
> >  			tx_cleaned += gfar_clean_tx_ring(tx_queue);
> > +			if (!tx_cleaned && !tx_queue->num_txbdfree)
> > +				tx_cleaned += 1; /* don't complete napi */
> >  			rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
> >  							budget_per_queue);
> >  			rx_cleaned += rx_cleaned_per_queue;
> >   

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 14:35                 ` Anton Vorontsov
@ 2010-02-26 14:52                   ` Paul Gortmaker
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26 14:52 UTC (permalink / raw)
  To: avorontsov
  Cc: Martyn Welch, linuxppc-dev list, netdev, linux-kernel,
	Sandeep Gopalpet, davem

On 10-02-26 09:35 AM, Anton Vorontsov wrote:
> On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
>> Anton Vorontsov wrote:
>>> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>>> [...]
>>>
>>>> I was able to reproduce it on an 8641D and bisected it down to this:
>>>>
>>>> -----------
>>>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>>>> Author: Anton Vorontsov<avorontsov@ru.mvista.com>
>>>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>>>
>>>>      gianfar: Revive SKB recycling
>>>>
>>>
>>> Thanks for the bisect. I have a guess why tx hangs in
>>> SMP case. Could anyone try the patch down below?
>>>
>>
>> Yup, no problem. I'm afraid it doesn't resolve the problem for me.
> 
> Hm.. I found a p2020 board and I was able to reproduce the issue.
> The patch down below fixed it completely for me... hm.

Interesting. I just tested the patch on the sbc8641d, and it
still has the issue with your patch applied.  I'm using NFSroot
just like Martyn was and it still appears bound up on that
gianfar tx lock.  I'll see if I can get a SysRq backtrace in
case that will help you see how it manages to get there...

Paul.

----

nfs: server not responding, still trying 

[repeated ~15 times, then...]
                      
INFO: task rc.sysinit:837 blocked for more than 120 seconds.                    
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.       
rc.sysinit    D 0fef73f4     0   837    836 0x00000000                          
Call Trace:                                                                     
[dfb7d9b0] [c000a144] __switch_to+0x8c/0xf8                                     
[dfb7d9d0] [c03443dc] schedule+0x380/0x954                                      
[dfb7da50] [c0344a0c] io_schedule+0x5c/0x90                                     
[dfb7da70] [c0074b0c] sync_page+0x4c/0x74                                       
[dfb7da80] [c0344f44] __wait_on_bit_lock+0xb0/0x148                             
[dfb7dab0] [c0074a8c] __lock_page+0x94/0xa4                                     
[dfb7dae0] [c0074d5c] find_lock_page+0x8c/0xa4                                  
[dfb7db00] [c0075674] filemap_fault+0x1ec/0x4fc                                 
[dfb7db40] [c008d548] __do_fault+0x98/0x53c                                     
[dfb7dba0] [c0018478] do_page_fault+0x2d0/0x500                                 
[dfb7dc50] [c00149d4] handle_page_fault+0xc/0x80                                
--- Exception: 301 at __clear_user+0x14/0x7c                                    
    LR = load_elf_binary+0x670/0x1270                                           
[dfb7dd10] [c00f6ca0] load_elf_binary+0x620/0x1270 (unreliable)                 
[dfb7dd90] [c00b1f78] search_binary_handler+0x17c/0x394                         
[dfb7dde0] [c00f4f50] load_script+0x274/0x288                                   
[dfb7de90] [c00b1f78] search_binary_handler+0x17c/0x394                         
[dfb7dee0] [c00b3580] do_execve+0x240/0x29c                                     
[dfb7df20] [c000a46c] sys_execve+0x68/0xa4                                      
[dfb7df40] [c00145a4] ret_from_syscall+0x0/0x38     


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 14:52                   ` Paul Gortmaker
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26 14:52 UTC (permalink / raw)
  To: avorontsov
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list,
	Sandeep Gopalpet, davem

On 10-02-26 09:35 AM, Anton Vorontsov wrote:
> On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
>> Anton Vorontsov wrote:
>>> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>>> [...]
>>>
>>>> I was able to reproduce it on an 8641D and bisected it down to this:
>>>>
>>>> -----------
>>>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>>>> Author: Anton Vorontsov<avorontsov@ru.mvista.com>
>>>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>>>
>>>>      gianfar: Revive SKB recycling
>>>>
>>>
>>> Thanks for the bisect. I have a guess why tx hangs in
>>> SMP case. Could anyone try the patch down below?
>>>
>>
>> Yup, no problem. I'm afraid it doesn't resolve the problem for me.
> 
> Hm.. I found a p2020 board and I was able to reproduce the issue.
> The patch down below fixed it completely for me... hm.

Interesting. I just tested the patch on the sbc8641d, and it
still has the issue with your patch applied.  I'm using NFSroot
just like Martyn was and it still appears bound up on that
gianfar tx lock.  I'll see if I can get a SysRq backtrace in
case that will help you see how it manages to get there...

Paul.

----

nfs: server not responding, still trying 

[repeated ~15 times, then...]
                      
INFO: task rc.sysinit:837 blocked for more than 120 seconds.                    
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.       
rc.sysinit    D 0fef73f4     0   837    836 0x00000000                          
Call Trace:                                                                     
[dfb7d9b0] [c000a144] __switch_to+0x8c/0xf8                                     
[dfb7d9d0] [c03443dc] schedule+0x380/0x954                                      
[dfb7da50] [c0344a0c] io_schedule+0x5c/0x90                                     
[dfb7da70] [c0074b0c] sync_page+0x4c/0x74                                       
[dfb7da80] [c0344f44] __wait_on_bit_lock+0xb0/0x148                             
[dfb7dab0] [c0074a8c] __lock_page+0x94/0xa4                                     
[dfb7dae0] [c0074d5c] find_lock_page+0x8c/0xa4                                  
[dfb7db00] [c0075674] filemap_fault+0x1ec/0x4fc                                 
[dfb7db40] [c008d548] __do_fault+0x98/0x53c                                     
[dfb7dba0] [c0018478] do_page_fault+0x2d0/0x500                                 
[dfb7dc50] [c00149d4] handle_page_fault+0xc/0x80                                
--- Exception: 301 at __clear_user+0x14/0x7c                                    
    LR = load_elf_binary+0x670/0x1270                                           
[dfb7dd10] [c00f6ca0] load_elf_binary+0x620/0x1270 (unreliable)                 
[dfb7dd90] [c00b1f78] search_binary_handler+0x17c/0x394                         
[dfb7dde0] [c00f4f50] load_script+0x274/0x288                                   
[dfb7de90] [c00b1f78] search_binary_handler+0x17c/0x394                         
[dfb7dee0] [c00b3580] do_execve+0x240/0x29c                                     
[dfb7df20] [c000a46c] sys_execve+0x68/0xa4                                      
[dfb7df40] [c00145a4] ret_from_syscall+0x0/0x38     

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 14:52                   ` Paul Gortmaker
@ 2010-02-26 15:18                     ` Martyn Welch
  -1 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 15:18 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: avorontsov, linuxppc-dev list, netdev, linux-kernel,
	Sandeep Gopalpet, davem

Paul Gortmaker wrote:
> On 10-02-26 09:35 AM, Anton Vorontsov wrote:
>   
>> On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
>>     
>>> Anton Vorontsov wrote:
>>>       
>>>> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>>>> [...]
>>>>
>>>>         
>>>>> I was able to reproduce it on an 8641D and bisected it down to this:
>>>>>
>>>>> -----------
>>>>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>>>>> Author: Anton Vorontsov<avorontsov@ru.mvista.com>
>>>>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>>>>
>>>>>      gianfar: Revive SKB recycling
>>>>>
>>>>>           
>>>> Thanks for the bisect. I have a guess why tx hangs in
>>>> SMP case. Could anyone try the patch down below?
>>>>
>>>>         
>>> Yup, no problem. I'm afraid it doesn't resolve the problem for me.
>>>       
>> Hm.. I found a p2020 board and I was able to reproduce the issue.
>> The patch down below fixed it completely for me... hm.
>>     
>
> Interesting. I just tested the patch on the sbc8641d, and it
> still has the issue with your patch applied.  I'm using NFSroot
> just like Martyn was and it still appears bound up on that
> gianfar tx lock.  I'll see if I can get a SysRq backtrace in
> case that will help you see how it manages to get there...
>   

I've got a p2020ds here as well, so I'll give NFSroot on that a try with
your patch.

Martyn



-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 15:18                     ` Martyn Welch
  0 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 15:18 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: netdev, linux-kernel, linuxppc-dev list, Sandeep Gopalpet,
	avorontsov, davem

Paul Gortmaker wrote:
> On 10-02-26 09:35 AM, Anton Vorontsov wrote:
>   
>> On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
>>     
>>> Anton Vorontsov wrote:
>>>       
>>>> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>>>> [...]
>>>>
>>>>         
>>>>> I was able to reproduce it on an 8641D and bisected it down to this:
>>>>>
>>>>> -----------
>>>>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>>>>> Author: Anton Vorontsov<avorontsov@ru.mvista.com>
>>>>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>>>>
>>>>>      gianfar: Revive SKB recycling
>>>>>
>>>>>           
>>>> Thanks for the bisect. I have a guess why tx hangs in
>>>> SMP case. Could anyone try the patch down below?
>>>>
>>>>         
>>> Yup, no problem. I'm afraid it doesn't resolve the problem for me.
>>>       
>> Hm.. I found a p2020 board and I was able to reproduce the issue.
>> The patch down below fixed it completely for me... hm.
>>     
>
> Interesting. I just tested the patch on the sbc8641d, and it
> still has the issue with your patch applied.  I'm using NFSroot
> just like Martyn was and it still appears bound up on that
> gianfar tx lock.  I'll see if I can get a SysRq backtrace in
> case that will help you see how it manages to get there...
>   

I've got a p2020ds here as well, so I'll give NFSroot on that a try with
your patch.

Martyn



-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 15:18                     ` Martyn Welch
  (?)
@ 2010-02-26 15:34                     ` Martyn Welch
  2010-02-26 16:10                       ` Anton Vorontsov
  -1 siblings, 1 reply; 40+ messages in thread
From: Martyn Welch @ 2010-02-26 15:34 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: netdev, linux-kernel, linuxppc-dev list, Sandeep Gopalpet,
	avorontsov, davem

Martyn Welch wrote:
> Paul Gortmaker wrote:
>   
>> On 10-02-26 09:35 AM, Anton Vorontsov wrote:
>>   
>>     
>>> On Fri, Feb 26, 2010 at 12:06:15PM +0000, Martyn Welch wrote:
>>>     
>>>       
>>>> Anton Vorontsov wrote:
>>>>       
>>>>         
>>>>> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>>>>> [...]
>>>>>
>>>>>         
>>>>>           
>>>>>> I was able to reproduce it on an 8641D and bisected it down to this:
>>>>>>
>>>>>> -----------
>>>>>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>>>>>> Author: Anton Vorontsov<avorontsov@ru.mvista.com>
>>>>>> Date:   Tue Nov 10 14:11:10 2009 +0000
>>>>>>
>>>>>>      gianfar: Revive SKB recycling
>>>>>>
>>>>>>           
>>>>>>             
>>>>> Thanks for the bisect. I have a guess why tx hangs in
>>>>> SMP case. Could anyone try the patch down below?
>>>>>
>>>>>         
>>>>>           
>>>> Yup, no problem. I'm afraid it doesn't resolve the problem for me.
>>>>       
>>>>         
>>> Hm.. I found a p2020 board and I was able to reproduce the issue.
>>> The patch down below fixed it completely for me... hm.
>>>     
>>>       
>> Interesting. I just tested the patch on the sbc8641d, and it
>> still has the issue with your patch applied.  I'm using NFSroot
>> just like Martyn was and it still appears bound up on that
>> gianfar tx lock.  I'll see if I can get a SysRq backtrace in
>> case that will help you see how it manages to get there...
>>   
>>     
>
> I've got a p2020ds here as well, so I'll give NFSroot on that a try with
> your patch.
>   

Out of 10 boot attempts, 7 failed.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 15:34                     ` Martyn Welch
@ 2010-02-26 16:10                       ` Anton Vorontsov
  2010-02-26 16:27                           ` Paul Gortmaker
  0 siblings, 1 reply; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26 16:10 UTC (permalink / raw)
  To: Martyn Welch
  Cc: Paul Gortmaker, netdev, linux-kernel, linuxppc-dev list,
	Sandeep Gopalpet, davem

On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
[...]
> Out of 10 boot attempts, 7 failed.

OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
it. With static ip config can I see the same.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 16:10                       ` Anton Vorontsov
@ 2010-02-26 16:27                           ` Paul Gortmaker
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26 16:27 UTC (permalink / raw)
  To: avorontsov
  Cc: Martyn Welch, netdev, linux-kernel, linuxppc-dev list,
	Sandeep Gopalpet, davem

On 10-02-26 11:10 AM, Anton Vorontsov wrote:
> On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
> [...]
>> Out of 10 boot attempts, 7 failed.
> 
> OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
> it. With static ip config can I see the same.

I'd kind of expected to see us stuck in gianfar on that lock, but
the SysRQ-T doesn't show us hung up anywhere in gianfar itself.
[This was on a base 2.6.33, with just a small sysrq fix patch]

Paul.

----------

SysRq : Changing Loglevel                                            
Loglevel set to 9                                                               
nfs: server not responding, still trying                          
SysRq : Show State                                                              
  task                PC stack   pid father                                     
init          D 0ff1c380     0     1      0 0x00000000                          
Call Trace:                                                                     
[df841a30] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df841a50] [c0350160] schedule+0x354/0x92c                                      
[df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54                           
[df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108                                  
[df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4                         
[df841b40] [c0331cf0] __rpc_execute+0x16c/0x398                                 
[df841b90] [c0329abc] rpc_run_task+0x48/0x9c                                    
[df841ba0] [c0329c40] rpc_call_sync+0x54/0x88                                   
[df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8                                 
[df841c20] [c014eb60] nfs_lookup+0x12c/0x230                                    
[df841d50] [c00b9680] do_lookup+0x118/0x288                                     
[df841d80] [c00bb904] link_path_walk+0x194/0x1118                               
[df841df0] [c00bcb08] path_walk+0x8c/0x168                                      
[df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c                                  
[df841e40] [c00be148] do_filp_open+0x5d4/0xba4                                  
[df841f10] [c00abe94] do_sys_open+0xac/0x190                                    
[df841f40] [c001437c] ret_from_syscall+0x0/0x38                                 
--- Exception: c01 at 0xff1c380                                                 
    LR = 0xfec6d98                                                              
kthreadd      S 00000000     0     2      0 0x00000000                          
Call Trace:                                                                     
[df843e50] [c002e788] wake_up_new_task+0x128/0x16c (unreliable)                 
[df843f10] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df843f30] [c0350160] schedule+0x354/0x92c                                      
[df843fc0] [c004d154] kthreadd+0x130/0x134                                      
[df843ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
migration/0   S 00000000     0     3      2 0x00000000                          
Call Trace:                                                                     
[df847de0] [ffffffff] 0xffffffff (unreliable)                                   
[df847ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df847ec0] [c0350160] schedule+0x354/0x92c                                      
[df847f50] [c002d074] migration_thread+0x29c/0x448                              
[df847fb0] [c004d020] kthread+0x80/0x84                                         
[df847ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
ksoftirqd/0   S 00000000     0     4      2 0x00000000                          
Call Trace:                                                                     
[df84be10] [00000800] 0x800 (unreliable)                                        
[df84bed0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df84bef0] [c0350160] schedule+0x354/0x92c                                      
[df84bf80] [c0038454] run_ksoftirqd+0x14c/0x1e0                                 
[df84bfb0] [c004d020] kthread+0x80/0x84                                         
[df84bff0] [c00141a0] kernel_thread+0x4c/0x68                                   
watchdog/0    S 00000000     0     5      2 0x00000000                          
Call Trace:                                                                     
[df84dee0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df84df00] [c0350160] schedule+0x354/0x92c                                      
[df84df90] [c006b8e8] watchdog+0x48/0x88                                        
[df84dfb0] [c004d020] kthread+0x80/0x84                                         
[df84dff0] [c00141a0] kernel_thread+0x4c/0x68                                   
migration/1   S 00000000     0     6      2 0x00000000                          
Call Trace:                                                                     
[df84fea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df84fec0] [c0350160] schedule+0x354/0x92c                                      
[df84ff50] [c002d074] migration_thread+0x29c/0x448                              
[df84ffb0] [c004d020] kthread+0x80/0x84                                         
[df84fff0] [c00141a0] kernel_thread+0x4c/0x68                                   
ksoftirqd/1   S 00000000     0     7      2 0x00000000                          
Call Trace:                                                                     
[df853ed0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df853ef0] [c0350160] schedule+0x354/0x92c                                      
[df853f80] [c0038454] run_ksoftirqd+0x14c/0x1e0                                 
[df853fb0] [c004d020] kthread+0x80/0x84                                         
[df853ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
watchdog/1    S 00000000     0     8      2 0x00000000                          
Call Trace:                                                                     
[df857ee0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df857f00] [c0350160] schedule+0x354/0x92c                                      
[df857f90] [c006b8e8] watchdog+0x48/0x88                                        
[df857fb0] [c004d020] kthread+0x80/0x84                                         
[df857ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
events/0      S 00000000     0     9      2 0x00000000                          
Call Trace:                                                                     
[df859ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df859ec0] [c0350160] schedule+0x354/0x92c                                      
[df859f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df859fb0] [c004d020] kthread+0x80/0x84                                         
[df859ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
events/1      S 00000000     0    10      2 0x00000000                          
Call Trace:                                                                     
[df85bea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df85bec0] [c0350160] schedule+0x354/0x92c                                      
[df85bf50] [c0048718] worker_thread+0x1fc/0x200                                 
[df85bfb0] [c004d020] kthread+0x80/0x84                                         
[df85bff0] [c00141a0] kernel_thread+0x4c/0x68                                   
khelper       S 00000000     0    11      2 0x00000000                          
Call Trace:                                                                     
[df85dde0] [c0030564] do_fork+0x1b0/0x344 (unreliable)                          
[df85dea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df85dec0] [c0350160] schedule+0x354/0x92c                                      
[df85df50] [c0048718] worker_thread+0x1fc/0x200                                 
[df85dfb0] [c004d020] kthread+0x80/0x84                                         
[df85dff0] [c00141a0] kernel_thread+0x4c/0x68                                   
async/mgr     S 00000000     0    15      2 0x00000000                          
Call Trace:                                                                     
[df8a7df0] [000000fc] 0xfc (unreliable)                                         
[df8a7eb0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df8a7ed0] [c0350160] schedule+0x354/0x92c                                      
[df8a7f60] [c00565c0] async_manager_thread+0x120/0x174                          
[df8a7fb0] [c004d020] kthread+0x80/0x84                                         
[df8a7ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
sync_supers   S 00000000     0    85      2 0x00000000                          
Call Trace:                                                                     
[df951e30] [00000400] 0x400 (unreliable)                                        
[df951ef0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df951f10] [c0350160] schedule+0x354/0x92c                                      
[df951fa0] [c008d714] bdi_sync_supers+0x30/0x5c                                 
[df951fb0] [c004d020] kthread+0x80/0x84                                         
[df951ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
bdi-default   S 00000000     0    87      2 0x00000000                          
Call Trace:                                                                     
[df957e30] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df957e50] [c0350160] schedule+0x354/0x92c                                      
[df957ee0] [c0350b14] schedule_timeout+0x15c/0x23c                              
[df957f30] [c008e510] bdi_forker_task+0x2f8/0x30c                               
[df957fb0] [c004d020] kthread+0x80/0x84                                         
[df957ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kblockd/0     S 00000000     0    88      2 0x00000000                          
Call Trace:                                                                     
[df8bdde0] [00000800] 0x800 (unreliable)                                        
[df8bdea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df8bdec0] [c0350160] schedule+0x354/0x92c                                      
[df8bdf50] [c0048718] worker_thread+0x1fc/0x200                                 
[df8bdfb0] [c004d020] kthread+0x80/0x84                                         
[df8bdff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kblockd/1     S 00000000     0    89      2 0x00000000                          
Call Trace:                                                                     
[df959de0] [00000800] 0x800 (unreliable)                                        
[df959ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df959ec0] [c0350160] schedule+0x354/0x92c                                      
[df959f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df959fb0] [c004d020] kthread+0x80/0x84                                         
[df959ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
rpciod/0      S 00000000     0   111      2 0x00000000                          
Call Trace:                                                                     
[df93fea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df93fec0] [c0350160] schedule+0x354/0x92c                                      
[df93ff50] [c0048718] worker_thread+0x1fc/0x200                                 
[df93ffb0] [c004d020] kthread+0x80/0x84                                         
[df93fff0] [c00141a0] kernel_thread+0x4c/0x68                                   
rpciod/1      S 00000000     0   112      2 0x00000000                          
Call Trace:                                                                     
[df931de0] [00000001] 0x1 (unreliable)                                          
[df931ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df931ec0] [c0350160] schedule+0x354/0x92c                                      
[df931f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df931fb0] [c004d020] kthread+0x80/0x84                                         
[df931ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
khungtaskd    S 00000000     0   141      2 0x00000000                          
Call Trace:                                                                     
[df979db0] [00000800] 0x800 (unreliable)                                        
[df979e70] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df979e90] [c0350160] schedule+0x354/0x92c                                      
[df979f20] [c0350b14] schedule_timeout+0x15c/0x23c                              
[df979f70] [c006bd38] watchdog+0x98/0x294                                       
[df979fb0] [c004d020] kthread+0x80/0x84                                         
[df979ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kswapd0       S 00000000     0   142      2 0x00000000                          
Call Trace:                                                                     
[df97bd60] [c04383a0] 0xc04383a0 (unreliable)                                   
[df97be20] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df97be40] [c0350160] schedule+0x354/0x92c                                      
[df97bed0] [c00868a8] kswapd+0x81c/0x858                                        
[df97bfb0] [c004d020] kthread+0x80/0x84                                         
[df97bff0] [c00141a0] kernel_thread+0x4c/0x68                                   
aio/0         S 00000000     0   143      2 0x00000000                          
Call Trace:                                                                     
[df97dde0] [ffffffff] 0xffffffff (unreliable)                                   
[df97dea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df97dec0] [c0350160] schedule+0x354/0x92c                                      
[df97df50] [c0048718] worker_thread+0x1fc/0x200                                 
[df97dfb0] [c004d020] kthread+0x80/0x84                                         
[df97dff0] [c00141a0] kernel_thread+0x4c/0x68                                   
aio/1         S 00000000     0   144      2 0x00000000                          
Call Trace:                                                                     
[df97fde0] [ffffffff] 0xffffffff (unreliable)                                   
[df97fea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df97fec0] [c0350160] schedule+0x354/0x92c                                      
[df97ff50] [c0048718] worker_thread+0x1fc/0x200                                 
[df97ffb0] [c004d020] kthread+0x80/0x84                                         
[df97fff0] [c00141a0] kernel_thread+0x4c/0x68                                   
nfsiod        S 00000000     0   145      2 0x00000000                          
Call Trace:                                                                     
[df9a5de0] [00000003] 0x3 (unreliable)                                          
[df9a5ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df9a5ec0] [c0350160] schedule+0x354/0x92c                                      
[df9a5f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df9a5fb0] [c004d020] kthread+0x80/0x84                                         
[df9a5ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
crypto/0      S 00000000     0   146      2 0x00000000                          
Call Trace:                                                                     
[df9a7de0] [00000800] 0x800 (unreliable)                                        
[df9a7ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df9a7ec0] [c0350160] schedule+0x354/0x92c                                      
[df9a7f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df9a7fb0] [c004d020] kthread+0x80/0x84                                         
[df9a7ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
crypto/1      S 00000000     0   147      2 0x00000000                          
Call Trace:                                                                     
[df9a9ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df9a9ec0] [c0350160] schedule+0x354/0x92c                                      
[df9a9f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df9a9fb0] [c004d020] kthread+0x80/0x84                                         
[df9a9ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
mtdblockd     S 00000000     0   779      2 0x00000000                          
Call Trace:                                                                     
[dfae1e00] [00000800] 0x800 (unreliable)                                        
[dfae1ec0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[dfae1ee0] [c0350160] schedule+0x354/0x92c                                      
[dfae1f70] [c02232dc] mtd_blktrans_thread+0x1c4/0x394                           
[dfae1fb0] [c004d020] kthread+0x80/0x84                                         
[dfae1ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kstriped      S 00000000     0   826      2 0x00000000                          
Call Trace:                                                                     
[df935de0] [00000800] 0x800 (unreliable)                                        
[df935ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df935ec0] [c0350160] schedule+0x354/0x92c                                      
[df935f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df935fb0] [c004d020] kthread+0x80/0x84                                         
[df935ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
ksnapd        S 00000000     0   828      2 0x00000000                          
Call Trace:                                                                     
[dfae9de0] [00000800] 0x800 (unreliable)                                        
[dfae9ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[dfae9ec0] [c0350160] schedule+0x354/0x92c                                      
[dfae9f50] [c0048718] worker_thread+0x1fc/0x200                                 
[dfae9fb0] [c004d020] kthread+0x80/0x84                                         
[dfae9ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
Sched Debug Version: v0.09, 2.6.33-00001-g8c31d07 #1                            
now at 35747.705693 msecs                                                       
  .jiffies                                 : 4294901234                         
  .sysctl_sched_latency                    : 10.000000                          
  .sysctl_sched_min_granularity            : 2.000000                           
  .sysctl_sched_wakeup_granularity         : 2.000000                           
  .sysctl_sched_child_runs_first           : 0.000000                           
  .sysctl_sched_features                   : 7917179                            
  .sysctl_sched_tunable_scaling            : 1 (logaritmic)                     
                                                                                
cpu#0                                                                           
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_switches                   : 2809                                         
  .nr_load_updates               : 8950                                         
  .nr_uninterruptible            : 1                                            
  .next_balance                  : 4294.901248                                  
  .curr->pid                     : 0                                            
  .clock                         : 35832.063536                                 
  .cpu_load[0]                   : 0                                            
  .cpu_load[1]                   : 0                                            
  .cpu_load[2]                   : 0                                            
  .cpu_load[3]                   : 0                                            
  .cpu_load[4]                   : 0                                            
                                                                                
cfs_rq[0] for UID: 0                                                            
  .exec_clock                    : 0.000000                                     
  .MIN_vruntime                  : 0.000001                                     
  .min_vruntime                  : 4129.195888                                  
  .max_vruntime                  : 0.000001                                     
  .spread                        : 0.000000                                     
  .spread0                       : 4048.261385                                  
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_spread_over                : 0                                            
  .shares                        : 0                                            
  .se->exec_start                : 35836.116992                                 
  .se->vruntime                  : 80.934503                                    
  .se->sum_exec_runtime          : 123.815984                                   
  .se->load.weight               : 1024                                         
                                                                                
rt_rq[0]:                                                                       
  .rt_nr_running                 : 0                                            
  .rt_throttled                  : 0                                            
  .rt_time                       : 0.000000                                     
  .rt_runtime                    : 950.000000                                   
                                                                                
runnable tasks:                                                                 
            task   PID         tree-key  switches  prio     exec-runtime        
 sum-exec        sum-sleep                                                      
--------------------------------------------------------------------------------
--------------------------                                                      
                                                                                
cpu#1                                                                           
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_switches                   : 4069                                         
  .nr_load_updates               : 8689                                         
  .nr_uninterruptible            : 0                                            
  .next_balance                  : 4294.901019                                  
  .curr->pid                     : 0                                            
  .clock                         : 34909.104304                                 
  .cpu_load[0]                   : 0                                            
  .cpu_load[1]                   : 0                                            
  .cpu_load[2]                   : 0                                            
  .cpu_load[3]                   : 0                                            
  .cpu_load[4]                   : 0                                            
                                                                                
cfs_rq[1] for UID: 0                                                            
  .exec_clock                    : 0.000000                                     
  .MIN_vruntime                  : 0.000001                                     
  .min_vruntime                  : 509.424556                                   
  .max_vruntime                  : 0.000001                                     
  .spread                        : 0.000000                                     
  .spread0                       : 428.490053                                   
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_spread_over                : 0                                            
  .shares                        : 0                                            
  .se->exec_start                : 34909.104304                                 
  .se->vruntime                  : 273.153007                                   
  .se->sum_exec_runtime          : 503.971344                                   
  .se->load.weight               : 1024                                         
                                                                                
rt_rq[1]:                                                                       
  .rt_nr_running                 : 0                                            
  .rt_throttled                  : 0                                            
  .rt_time                       : 0.000000                                     
  .rt_runtime                    : 950.000000                                   
                                                                                
runnable tasks:                                                                 
            task   PID         tree-key  switches  prio     exec-runtime        
 sum-exec        sum-sleep                                                      
--------------------------------------------------------------------------------
--------------------------                                                      
                              

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 16:27                           ` Paul Gortmaker
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26 16:27 UTC (permalink / raw)
  To: avorontsov
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list,
	Sandeep Gopalpet, davem

On 10-02-26 11:10 AM, Anton Vorontsov wrote:
> On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
> [...]
>> Out of 10 boot attempts, 7 failed.
> 
> OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
> it. With static ip config can I see the same.

I'd kind of expected to see us stuck in gianfar on that lock, but
the SysRQ-T doesn't show us hung up anywhere in gianfar itself.
[This was on a base 2.6.33, with just a small sysrq fix patch]

Paul.

----------

SysRq : Changing Loglevel                                            
Loglevel set to 9                                                               
nfs: server not responding, still trying                          
SysRq : Show State                                                              
  task                PC stack   pid father                                     
init          D 0ff1c380     0     1      0 0x00000000                          
Call Trace:                                                                     
[df841a30] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df841a50] [c0350160] schedule+0x354/0x92c                                      
[df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54                           
[df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108                                  
[df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4                         
[df841b40] [c0331cf0] __rpc_execute+0x16c/0x398                                 
[df841b90] [c0329abc] rpc_run_task+0x48/0x9c                                    
[df841ba0] [c0329c40] rpc_call_sync+0x54/0x88                                   
[df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8                                 
[df841c20] [c014eb60] nfs_lookup+0x12c/0x230                                    
[df841d50] [c00b9680] do_lookup+0x118/0x288                                     
[df841d80] [c00bb904] link_path_walk+0x194/0x1118                               
[df841df0] [c00bcb08] path_walk+0x8c/0x168                                      
[df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c                                  
[df841e40] [c00be148] do_filp_open+0x5d4/0xba4                                  
[df841f10] [c00abe94] do_sys_open+0xac/0x190                                    
[df841f40] [c001437c] ret_from_syscall+0x0/0x38                                 
--- Exception: c01 at 0xff1c380                                                 
    LR = 0xfec6d98                                                              
kthreadd      S 00000000     0     2      0 0x00000000                          
Call Trace:                                                                     
[df843e50] [c002e788] wake_up_new_task+0x128/0x16c (unreliable)                 
[df843f10] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df843f30] [c0350160] schedule+0x354/0x92c                                      
[df843fc0] [c004d154] kthreadd+0x130/0x134                                      
[df843ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
migration/0   S 00000000     0     3      2 0x00000000                          
Call Trace:                                                                     
[df847de0] [ffffffff] 0xffffffff (unreliable)                                   
[df847ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df847ec0] [c0350160] schedule+0x354/0x92c                                      
[df847f50] [c002d074] migration_thread+0x29c/0x448                              
[df847fb0] [c004d020] kthread+0x80/0x84                                         
[df847ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
ksoftirqd/0   S 00000000     0     4      2 0x00000000                          
Call Trace:                                                                     
[df84be10] [00000800] 0x800 (unreliable)                                        
[df84bed0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df84bef0] [c0350160] schedule+0x354/0x92c                                      
[df84bf80] [c0038454] run_ksoftirqd+0x14c/0x1e0                                 
[df84bfb0] [c004d020] kthread+0x80/0x84                                         
[df84bff0] [c00141a0] kernel_thread+0x4c/0x68                                   
watchdog/0    S 00000000     0     5      2 0x00000000                          
Call Trace:                                                                     
[df84dee0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df84df00] [c0350160] schedule+0x354/0x92c                                      
[df84df90] [c006b8e8] watchdog+0x48/0x88                                        
[df84dfb0] [c004d020] kthread+0x80/0x84                                         
[df84dff0] [c00141a0] kernel_thread+0x4c/0x68                                   
migration/1   S 00000000     0     6      2 0x00000000                          
Call Trace:                                                                     
[df84fea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df84fec0] [c0350160] schedule+0x354/0x92c                                      
[df84ff50] [c002d074] migration_thread+0x29c/0x448                              
[df84ffb0] [c004d020] kthread+0x80/0x84                                         
[df84fff0] [c00141a0] kernel_thread+0x4c/0x68                                   
ksoftirqd/1   S 00000000     0     7      2 0x00000000                          
Call Trace:                                                                     
[df853ed0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df853ef0] [c0350160] schedule+0x354/0x92c                                      
[df853f80] [c0038454] run_ksoftirqd+0x14c/0x1e0                                 
[df853fb0] [c004d020] kthread+0x80/0x84                                         
[df853ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
watchdog/1    S 00000000     0     8      2 0x00000000                          
Call Trace:                                                                     
[df857ee0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df857f00] [c0350160] schedule+0x354/0x92c                                      
[df857f90] [c006b8e8] watchdog+0x48/0x88                                        
[df857fb0] [c004d020] kthread+0x80/0x84                                         
[df857ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
events/0      S 00000000     0     9      2 0x00000000                          
Call Trace:                                                                     
[df859ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df859ec0] [c0350160] schedule+0x354/0x92c                                      
[df859f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df859fb0] [c004d020] kthread+0x80/0x84                                         
[df859ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
events/1      S 00000000     0    10      2 0x00000000                          
Call Trace:                                                                     
[df85bea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df85bec0] [c0350160] schedule+0x354/0x92c                                      
[df85bf50] [c0048718] worker_thread+0x1fc/0x200                                 
[df85bfb0] [c004d020] kthread+0x80/0x84                                         
[df85bff0] [c00141a0] kernel_thread+0x4c/0x68                                   
khelper       S 00000000     0    11      2 0x00000000                          
Call Trace:                                                                     
[df85dde0] [c0030564] do_fork+0x1b0/0x344 (unreliable)                          
[df85dea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df85dec0] [c0350160] schedule+0x354/0x92c                                      
[df85df50] [c0048718] worker_thread+0x1fc/0x200                                 
[df85dfb0] [c004d020] kthread+0x80/0x84                                         
[df85dff0] [c00141a0] kernel_thread+0x4c/0x68                                   
async/mgr     S 00000000     0    15      2 0x00000000                          
Call Trace:                                                                     
[df8a7df0] [000000fc] 0xfc (unreliable)                                         
[df8a7eb0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df8a7ed0] [c0350160] schedule+0x354/0x92c                                      
[df8a7f60] [c00565c0] async_manager_thread+0x120/0x174                          
[df8a7fb0] [c004d020] kthread+0x80/0x84                                         
[df8a7ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
sync_supers   S 00000000     0    85      2 0x00000000                          
Call Trace:                                                                     
[df951e30] [00000400] 0x400 (unreliable)                                        
[df951ef0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df951f10] [c0350160] schedule+0x354/0x92c                                      
[df951fa0] [c008d714] bdi_sync_supers+0x30/0x5c                                 
[df951fb0] [c004d020] kthread+0x80/0x84                                         
[df951ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
bdi-default   S 00000000     0    87      2 0x00000000                          
Call Trace:                                                                     
[df957e30] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df957e50] [c0350160] schedule+0x354/0x92c                                      
[df957ee0] [c0350b14] schedule_timeout+0x15c/0x23c                              
[df957f30] [c008e510] bdi_forker_task+0x2f8/0x30c                               
[df957fb0] [c004d020] kthread+0x80/0x84                                         
[df957ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kblockd/0     S 00000000     0    88      2 0x00000000                          
Call Trace:                                                                     
[df8bdde0] [00000800] 0x800 (unreliable)                                        
[df8bdea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df8bdec0] [c0350160] schedule+0x354/0x92c                                      
[df8bdf50] [c0048718] worker_thread+0x1fc/0x200                                 
[df8bdfb0] [c004d020] kthread+0x80/0x84                                         
[df8bdff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kblockd/1     S 00000000     0    89      2 0x00000000                          
Call Trace:                                                                     
[df959de0] [00000800] 0x800 (unreliable)                                        
[df959ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df959ec0] [c0350160] schedule+0x354/0x92c                                      
[df959f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df959fb0] [c004d020] kthread+0x80/0x84                                         
[df959ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
rpciod/0      S 00000000     0   111      2 0x00000000                          
Call Trace:                                                                     
[df93fea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df93fec0] [c0350160] schedule+0x354/0x92c                                      
[df93ff50] [c0048718] worker_thread+0x1fc/0x200                                 
[df93ffb0] [c004d020] kthread+0x80/0x84                                         
[df93fff0] [c00141a0] kernel_thread+0x4c/0x68                                   
rpciod/1      S 00000000     0   112      2 0x00000000                          
Call Trace:                                                                     
[df931de0] [00000001] 0x1 (unreliable)                                          
[df931ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df931ec0] [c0350160] schedule+0x354/0x92c                                      
[df931f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df931fb0] [c004d020] kthread+0x80/0x84                                         
[df931ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
khungtaskd    S 00000000     0   141      2 0x00000000                          
Call Trace:                                                                     
[df979db0] [00000800] 0x800 (unreliable)                                        
[df979e70] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df979e90] [c0350160] schedule+0x354/0x92c                                      
[df979f20] [c0350b14] schedule_timeout+0x15c/0x23c                              
[df979f70] [c006bd38] watchdog+0x98/0x294                                       
[df979fb0] [c004d020] kthread+0x80/0x84                                         
[df979ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kswapd0       S 00000000     0   142      2 0x00000000                          
Call Trace:                                                                     
[df97bd60] [c04383a0] 0xc04383a0 (unreliable)                                   
[df97be20] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df97be40] [c0350160] schedule+0x354/0x92c                                      
[df97bed0] [c00868a8] kswapd+0x81c/0x858                                        
[df97bfb0] [c004d020] kthread+0x80/0x84                                         
[df97bff0] [c00141a0] kernel_thread+0x4c/0x68                                   
aio/0         S 00000000     0   143      2 0x00000000                          
Call Trace:                                                                     
[df97dde0] [ffffffff] 0xffffffff (unreliable)                                   
[df97dea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df97dec0] [c0350160] schedule+0x354/0x92c                                      
[df97df50] [c0048718] worker_thread+0x1fc/0x200                                 
[df97dfb0] [c004d020] kthread+0x80/0x84                                         
[df97dff0] [c00141a0] kernel_thread+0x4c/0x68                                   
aio/1         S 00000000     0   144      2 0x00000000                          
Call Trace:                                                                     
[df97fde0] [ffffffff] 0xffffffff (unreliable)                                   
[df97fea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df97fec0] [c0350160] schedule+0x354/0x92c                                      
[df97ff50] [c0048718] worker_thread+0x1fc/0x200                                 
[df97ffb0] [c004d020] kthread+0x80/0x84                                         
[df97fff0] [c00141a0] kernel_thread+0x4c/0x68                                   
nfsiod        S 00000000     0   145      2 0x00000000                          
Call Trace:                                                                     
[df9a5de0] [00000003] 0x3 (unreliable)                                          
[df9a5ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df9a5ec0] [c0350160] schedule+0x354/0x92c                                      
[df9a5f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df9a5fb0] [c004d020] kthread+0x80/0x84                                         
[df9a5ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
crypto/0      S 00000000     0   146      2 0x00000000                          
Call Trace:                                                                     
[df9a7de0] [00000800] 0x800 (unreliable)                                        
[df9a7ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df9a7ec0] [c0350160] schedule+0x354/0x92c                                      
[df9a7f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df9a7fb0] [c004d020] kthread+0x80/0x84                                         
[df9a7ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
crypto/1      S 00000000     0   147      2 0x00000000                          
Call Trace:                                                                     
[df9a9ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df9a9ec0] [c0350160] schedule+0x354/0x92c                                      
[df9a9f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df9a9fb0] [c004d020] kthread+0x80/0x84                                         
[df9a9ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
mtdblockd     S 00000000     0   779      2 0x00000000                          
Call Trace:                                                                     
[dfae1e00] [00000800] 0x800 (unreliable)                                        
[dfae1ec0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[dfae1ee0] [c0350160] schedule+0x354/0x92c                                      
[dfae1f70] [c02232dc] mtd_blktrans_thread+0x1c4/0x394                           
[dfae1fb0] [c004d020] kthread+0x80/0x84                                         
[dfae1ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
kstriped      S 00000000     0   826      2 0x00000000                          
Call Trace:                                                                     
[df935de0] [00000800] 0x800 (unreliable)                                        
[df935ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[df935ec0] [c0350160] schedule+0x354/0x92c                                      
[df935f50] [c0048718] worker_thread+0x1fc/0x200                                 
[df935fb0] [c004d020] kthread+0x80/0x84                                         
[df935ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
ksnapd        S 00000000     0   828      2 0x00000000                          
Call Trace:                                                                     
[dfae9de0] [00000800] 0x800 (unreliable)                                        
[dfae9ea0] [c0009fc4] __switch_to+0x8c/0xf8                                     
[dfae9ec0] [c0350160] schedule+0x354/0x92c                                      
[dfae9f50] [c0048718] worker_thread+0x1fc/0x200                                 
[dfae9fb0] [c004d020] kthread+0x80/0x84                                         
[dfae9ff0] [c00141a0] kernel_thread+0x4c/0x68                                   
Sched Debug Version: v0.09, 2.6.33-00001-g8c31d07 #1                            
now at 35747.705693 msecs                                                       
  .jiffies                                 : 4294901234                         
  .sysctl_sched_latency                    : 10.000000                          
  .sysctl_sched_min_granularity            : 2.000000                           
  .sysctl_sched_wakeup_granularity         : 2.000000                           
  .sysctl_sched_child_runs_first           : 0.000000                           
  .sysctl_sched_features                   : 7917179                            
  .sysctl_sched_tunable_scaling            : 1 (logaritmic)                     
                                                                                
cpu#0                                                                           
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_switches                   : 2809                                         
  .nr_load_updates               : 8950                                         
  .nr_uninterruptible            : 1                                            
  .next_balance                  : 4294.901248                                  
  .curr->pid                     : 0                                            
  .clock                         : 35832.063536                                 
  .cpu_load[0]                   : 0                                            
  .cpu_load[1]                   : 0                                            
  .cpu_load[2]                   : 0                                            
  .cpu_load[3]                   : 0                                            
  .cpu_load[4]                   : 0                                            
                                                                                
cfs_rq[0] for UID: 0                                                            
  .exec_clock                    : 0.000000                                     
  .MIN_vruntime                  : 0.000001                                     
  .min_vruntime                  : 4129.195888                                  
  .max_vruntime                  : 0.000001                                     
  .spread                        : 0.000000                                     
  .spread0                       : 4048.261385                                  
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_spread_over                : 0                                            
  .shares                        : 0                                            
  .se->exec_start                : 35836.116992                                 
  .se->vruntime                  : 80.934503                                    
  .se->sum_exec_runtime          : 123.815984                                   
  .se->load.weight               : 1024                                         
                                                                                
rt_rq[0]:                                                                       
  .rt_nr_running                 : 0                                            
  .rt_throttled                  : 0                                            
  .rt_time                       : 0.000000                                     
  .rt_runtime                    : 950.000000                                   
                                                                                
runnable tasks:                                                                 
            task   PID         tree-key  switches  prio     exec-runtime        
 sum-exec        sum-sleep                                                      
--------------------------------------------------------------------------------
--------------------------                                                      
                                                                                
cpu#1                                                                           
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_switches                   : 4069                                         
  .nr_load_updates               : 8689                                         
  .nr_uninterruptible            : 0                                            
  .next_balance                  : 4294.901019                                  
  .curr->pid                     : 0                                            
  .clock                         : 34909.104304                                 
  .cpu_load[0]                   : 0                                            
  .cpu_load[1]                   : 0                                            
  .cpu_load[2]                   : 0                                            
  .cpu_load[3]                   : 0                                            
  .cpu_load[4]                   : 0                                            
                                                                                
cfs_rq[1] for UID: 0                                                            
  .exec_clock                    : 0.000000                                     
  .MIN_vruntime                  : 0.000001                                     
  .min_vruntime                  : 509.424556                                   
  .max_vruntime                  : 0.000001                                     
  .spread                        : 0.000000                                     
  .spread0                       : 428.490053                                   
  .nr_running                    : 0                                            
  .load                          : 0                                            
  .nr_spread_over                : 0                                            
  .shares                        : 0                                            
  .se->exec_start                : 34909.104304                                 
  .se->vruntime                  : 273.153007                                   
  .se->sum_exec_runtime          : 503.971344                                   
  .se->load.weight               : 1024                                         
                                                                                
rt_rq[1]:                                                                       
  .rt_nr_running                 : 0                                            
  .rt_throttled                  : 0                                            
  .rt_time                       : 0.000000                                     
  .rt_runtime                    : 950.000000                                   
                                                                                
runnable tasks:                                                                 
            task   PID         tree-key  switches  prio     exec-runtime        
 sum-exec        sum-sleep                                                      
--------------------------------------------------------------------------------
--------------------------                                                      
                              

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 16:27                           ` Paul Gortmaker
@ 2010-02-26 21:38                             ` Anton Vorontsov
  -1 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26 21:38 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Martyn Welch, netdev, linux-kernel, linuxppc-dev list,
	Sandeep Gopalpet, davem

On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote:
> On 10-02-26 11:10 AM, Anton Vorontsov wrote:
> > On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
> > [...]
> >> Out of 10 boot attempts, 7 failed.
> > 
> > OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
> > it. With static ip config can I see the same.
> 
> I'd kind of expected to see us stuck in gianfar on that lock, but
> the SysRQ-T doesn't show us hung up anywhere in gianfar itself.
> [This was on a base 2.6.33, with just a small sysrq fix patch]

> [df841a30] [c0009fc4] __switch_to+0x8c/0xf8                                     
> [df841a50] [c0350160] schedule+0x354/0x92c                                      
> [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54                           
> [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108                                  
> [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4                         
> [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398                                 
> [df841b90] [c0329abc] rpc_run_task+0x48/0x9c                                    
> [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88                                   
> [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8                                 
> [df841c20] [c014eb60] nfs_lookup+0x12c/0x230                                    
> [df841d50] [c00b9680] do_lookup+0x118/0x288                                     
> [df841d80] [c00bb904] link_path_walk+0x194/0x1118                               
> [df841df0] [c00bcb08] path_walk+0x8c/0x168                                      
> [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c                                  
> [df841e40] [c00be148] do_filp_open+0x5d4/0xba4                                  
> [df841f10] [c00abe94] do_sys_open+0xac/0x190                                    

Yeah, I don't think this is gianfar-related. It must be something
else triggered by the fact that gianfar no longer sends stuff.

OK, I think I found what's happening in gianfar.

Some background...

start_xmit() prepares new skb for transmitting, generally it does
three things:

1. sets up all BDs (marks them ready to send), except the first one.
2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
   would cleanup it later.
3. sets up the first BD, i.e. marks it ready.

Here is what clean_tx_ring() does:

1. reads skbs from tx_queue->tx_skbuff
2. Checks if the *last* BD is ready. If it's still ready [to send]
   then it it isn't transmitted, so clean_tx_ring() returns.
   Otherwise it actually cleanups BDs. All is OK.

Now, if there is just one BD, code flow:

- start_xmit(): stores skb into tx_skbuff. Note that the first BD
  (which is also the last one) isn't marked as ready, yet.
- clean_tx_ring(): sees that skb is not null, *and* its lstatus
  says that it is NOT ready (like if BD was sent), so it cleans
  it up (bad!)
- start_xmit(): marks BD as ready [to send], but it's too late.

We can fix this simply by reordering lstatus/tx_skbuff writes.

It works flawlessly on my p2020, please try it.

Thanks!


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..cccb409 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* setup the TxBD length and buffer pointer for the first BD */
-	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
 	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
 			skb_headlen(skb), DMA_TO_DEVICE);
 
@@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	txbdp_start->lstatus = lstatus;
 
+	eieio(); /* force lstatus write before tx_skbuff */
+
+	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
+
 	/* Update the current skb pointer to the next entry we will use
 	 * (wrapping if necessary) */
 	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 21:38                             ` Anton Vorontsov
  0 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-02-26 21:38 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list,
	Sandeep Gopalpet, davem

On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote:
> On 10-02-26 11:10 AM, Anton Vorontsov wrote:
> > On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
> > [...]
> >> Out of 10 boot attempts, 7 failed.
> > 
> > OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
> > it. With static ip config can I see the same.
> 
> I'd kind of expected to see us stuck in gianfar on that lock, but
> the SysRQ-T doesn't show us hung up anywhere in gianfar itself.
> [This was on a base 2.6.33, with just a small sysrq fix patch]

> [df841a30] [c0009fc4] __switch_to+0x8c/0xf8                                     
> [df841a50] [c0350160] schedule+0x354/0x92c                                      
> [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54                           
> [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108                                  
> [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4                         
> [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398                                 
> [df841b90] [c0329abc] rpc_run_task+0x48/0x9c                                    
> [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88                                   
> [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8                                 
> [df841c20] [c014eb60] nfs_lookup+0x12c/0x230                                    
> [df841d50] [c00b9680] do_lookup+0x118/0x288                                     
> [df841d80] [c00bb904] link_path_walk+0x194/0x1118                               
> [df841df0] [c00bcb08] path_walk+0x8c/0x168                                      
> [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c                                  
> [df841e40] [c00be148] do_filp_open+0x5d4/0xba4                                  
> [df841f10] [c00abe94] do_sys_open+0xac/0x190                                    

Yeah, I don't think this is gianfar-related. It must be something
else triggered by the fact that gianfar no longer sends stuff.

OK, I think I found what's happening in gianfar.

Some background...

start_xmit() prepares new skb for transmitting, generally it does
three things:

1. sets up all BDs (marks them ready to send), except the first one.
2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
   would cleanup it later.
3. sets up the first BD, i.e. marks it ready.

Here is what clean_tx_ring() does:

1. reads skbs from tx_queue->tx_skbuff
2. Checks if the *last* BD is ready. If it's still ready [to send]
   then it it isn't transmitted, so clean_tx_ring() returns.
   Otherwise it actually cleanups BDs. All is OK.

Now, if there is just one BD, code flow:

- start_xmit(): stores skb into tx_skbuff. Note that the first BD
  (which is also the last one) isn't marked as ready, yet.
- clean_tx_ring(): sees that skb is not null, *and* its lstatus
  says that it is NOT ready (like if BD was sent), so it cleans
  it up (bad!)
- start_xmit(): marks BD as ready [to send], but it's too late.

We can fix this simply by reordering lstatus/tx_skbuff writes.

It works flawlessly on my p2020, please try it.

Thanks!


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..cccb409 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* setup the TxBD length and buffer pointer for the first BD */
-	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
 	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
 			skb_headlen(skb), DMA_TO_DEVICE);
 
@@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	txbdp_start->lstatus = lstatus;
 
+	eieio(); /* force lstatus write before tx_skbuff */
+
+	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
+
 	/* Update the current skb pointer to the next entry we will use
 	 * (wrapping if necessary) */
 	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 21:38                             ` Anton Vorontsov
@ 2010-02-26 22:12                               ` Paul Gortmaker
  -1 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26 22:12 UTC (permalink / raw)
  To: avorontsov
  Cc: Martyn Welch, netdev, linux-kernel, linuxppc-dev list,
	Sandeep Gopalpet, davem

On 10-02-26 04:38 PM, Anton Vorontsov wrote:

> OK, I think I found what's happening in gianfar.
> 
> Some background...
> 
> start_xmit() prepares new skb for transmitting, generally it does
> three things:
> 
> 1. sets up all BDs (marks them ready to send), except the first one.
> 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>     would cleanup it later.
> 3. sets up the first BD, i.e. marks it ready.
> 
> Here is what clean_tx_ring() does:
> 
> 1. reads skbs from tx_queue->tx_skbuff
> 2. Checks if the *last* BD is ready. If it's still ready [to send]
>     then it it isn't transmitted, so clean_tx_ring() returns.
>     Otherwise it actually cleanups BDs. All is OK.
> 
> Now, if there is just one BD, code flow:
> 
> - start_xmit(): stores skb into tx_skbuff. Note that the first BD
>    (which is also the last one) isn't marked as ready, yet.
> - clean_tx_ring(): sees that skb is not null, *and* its lstatus
>    says that it is NOT ready (like if BD was sent), so it cleans
>    it up (bad!)
> - start_xmit(): marks BD as ready [to send], but it's too late.
> 
> We can fix this simply by reordering lstatus/tx_skbuff writes.
> 
> It works flawlessly on my p2020, please try it.

I've skipped right to the test part (I'll think about the description
more later) and it passed 5 out of 5 boot tests on NFSroot sbc8641d.
Looks like you've got a solution.

Paul.

> 
> Thanks!
> 
> 
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..cccb409 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	}
> 
>   	/* setup the TxBD length and buffer pointer for the first BD */
> -	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>   	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
>   			skb_headlen(skb), DMA_TO_DEVICE);
> 
> @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 
>   	txbdp_start->lstatus = lstatus;
> 
> +	eieio(); /* force lstatus write before tx_skbuff */
> +
> +	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> +
>   	/* Update the current skb pointer to the next entry we will use
>   	 * (wrapping if necessary) */
>   	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1)&


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-02-26 22:12                               ` Paul Gortmaker
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Gortmaker @ 2010-02-26 22:12 UTC (permalink / raw)
  To: avorontsov
  Cc: netdev, linux-kernel, Martyn Welch, linuxppc-dev list,
	Sandeep Gopalpet, davem

On 10-02-26 04:38 PM, Anton Vorontsov wrote:

> OK, I think I found what's happening in gianfar.
> 
> Some background...
> 
> start_xmit() prepares new skb for transmitting, generally it does
> three things:
> 
> 1. sets up all BDs (marks them ready to send), except the first one.
> 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>     would cleanup it later.
> 3. sets up the first BD, i.e. marks it ready.
> 
> Here is what clean_tx_ring() does:
> 
> 1. reads skbs from tx_queue->tx_skbuff
> 2. Checks if the *last* BD is ready. If it's still ready [to send]
>     then it it isn't transmitted, so clean_tx_ring() returns.
>     Otherwise it actually cleanups BDs. All is OK.
> 
> Now, if there is just one BD, code flow:
> 
> - start_xmit(): stores skb into tx_skbuff. Note that the first BD
>    (which is also the last one) isn't marked as ready, yet.
> - clean_tx_ring(): sees that skb is not null, *and* its lstatus
>    says that it is NOT ready (like if BD was sent), so it cleans
>    it up (bad!)
> - start_xmit(): marks BD as ready [to send], but it's too late.
> 
> We can fix this simply by reordering lstatus/tx_skbuff writes.
> 
> It works flawlessly on my p2020, please try it.

I've skipped right to the test part (I'll think about the description
more later) and it passed 5 out of 5 boot tests on NFSroot sbc8641d.
Looks like you've got a solution.

Paul.

> 
> Thanks!
> 
> 
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..cccb409 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	}
> 
>   	/* setup the TxBD length and buffer pointer for the first BD */
> -	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>   	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
>   			skb_headlen(skb), DMA_TO_DEVICE);
> 
> @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 
>   	txbdp_start->lstatus = lstatus;
> 
> +	eieio(); /* force lstatus write before tx_skbuff */
> +
> +	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> +
>   	/* Update the current skb pointer to the next entry we will use
>   	 * (wrapping if necessary) */
>   	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1)&

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Gianfar driver failing on MPC8641D based board
  2010-02-26 21:38                             ` Anton Vorontsov
@ 2010-02-27  5:35                               ` Kumar Gopalpet-B05799
  -1 siblings, 0 replies; 40+ messages in thread
From: Kumar Gopalpet-B05799 @ 2010-02-27  5:35 UTC (permalink / raw)
  To: avorontsov, Paul Gortmaker
  Cc: Martyn Welch, netdev, linux-kernel, linuxppc-dev list, davem

 

>-----Original Message-----
>From: Anton Vorontsov [mailto:avorontsov@ru.mvista.com] 
>Sent: Saturday, February 27, 2010 3:08 AM
>To: Paul Gortmaker
>Cc: Martyn Welch; netdev@vger.kernel.org; 
>linux-kernel@vger.kernel.org; linuxppc-dev list; Kumar 
>Gopalpet-B05799; davem@davemloft.net
>Subject: Re: Gianfar driver failing on MPC8641D based board
>
>On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote:
>> On 10-02-26 11:10 AM, Anton Vorontsov wrote:
>> > On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
>> > [...]
>> >> Out of 10 boot attempts, 7 failed.
>> > 
>> > OK, I see why. With ip=on (dhcp boot) it's much harder to trigger 
>> > it. With static ip config can I see the same.
>> 
>> I'd kind of expected to see us stuck in gianfar on that 
>lock, but the 
>> SysRQ-T doesn't show us hung up anywhere in gianfar itself.
>> [This was on a base 2.6.33, with just a small sysrq fix patch]
>
>> [df841a30] [c0009fc4] __switch_to+0x8c/0xf8                  
>                   
>> [df841a50] [c0350160] schedule+0x354/0x92c                   
>                   
>> [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54        
>                   
>> [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108               
>                   
>> [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4      
>                   
>> [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398              
>                   
>> [df841b90] [c0329abc] rpc_run_task+0x48/0x9c                 
>                   
>> [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88                
>                   
>> [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8              
>                   
>> [df841c20] [c014eb60] nfs_lookup+0x12c/0x230                 
>                   
>> [df841d50] [c00b9680] do_lookup+0x118/0x288                  
>                   
>> [df841d80] [c00bb904] link_path_walk+0x194/0x1118            
>                   
>> [df841df0] [c00bcb08] path_walk+0x8c/0x168                   
>                   
>> [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c               
>                   
>> [df841e40] [c00be148] do_filp_open+0x5d4/0xba4               
>                   
>> [df841f10] [c00abe94] do_sys_open+0xac/0x190                 
>                   
>
>Yeah, I don't think this is gianfar-related. It must be 
>something else triggered by the fact that gianfar no longer 
>sends stuff.
>
>OK, I think I found what's happening in gianfar.
>
>Some background...
>
>start_xmit() prepares new skb for transmitting, generally it 
>does three things:
>
>1. sets up all BDs (marks them ready to send), except the first one.
>2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>   would cleanup it later.
>3. sets up the first BD, i.e. marks it ready.
>
>Here is what clean_tx_ring() does:
>
>1. reads skbs from tx_queue->tx_skbuff
>2. Checks if the *last* BD is ready. If it's still ready [to send]
>   then it it isn't transmitted, so clean_tx_ring() returns.
>   Otherwise it actually cleanups BDs. All is OK.
>
>Now, if there is just one BD, code flow:
>
>- start_xmit(): stores skb into tx_skbuff. Note that the first BD
>  (which is also the last one) isn't marked as ready, yet.
>- clean_tx_ring(): sees that skb is not null, *and* its lstatus
>  says that it is NOT ready (like if BD was sent), so it cleans
>  it up (bad!)
>- start_xmit(): marks BD as ready [to send], but it's too late.
>
>We can fix this simply by reordering lstatus/tx_skbuff writes.
>
>It works flawlessly on my p2020, please try it.

Anton,

Understood, and thanks for the explanation. Am I correct in saying that
this is
due to the out-of-order execution capability on powerpc ?

I have one more question, why don't we use use atomic_t for num_txbdfree
and
completely  do away with spin_locks in gfar_clean_tx_ring() and
gfar_start_xmit().
In an non-SMP, scenario I would feel there is absolutely no requirement
of spin_locks
and in case of SMP atomic operation would be much more safer on powerpc
rather than spin_locks.

What is your suggestion ?


--

Thanks
Sandeep

>
>Thanks!
>
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c 
>index 8bd3c9f..cccb409 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct 
>sk_buff *skb, struct net_device *dev)
> 	}
> 
> 	/* setup the TxBD length and buffer pointer for the first BD */
>-	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> 	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, 
>skb->data,
> 			skb_headlen(skb), DMA_TO_DEVICE);
> 
>@@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct 
>sk_buff *skb, struct net_device *dev)
> 
> 	txbdp_start->lstatus = lstatus;
> 
>+	eieio(); /* force lstatus write before tx_skbuff */
>+
>+	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>+
> 	/* Update the current skb pointer to the next entry we will use
> 	 * (wrapping if necessary) */
> 	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Gianfar driver failing on MPC8641D based board
@ 2010-02-27  5:35                               ` Kumar Gopalpet-B05799
  0 siblings, 0 replies; 40+ messages in thread
From: Kumar Gopalpet-B05799 @ 2010-02-27  5:35 UTC (permalink / raw)
  To: avorontsov, Paul Gortmaker
  Cc: Martyn Welch, netdev, davem, linux-kernel, linuxppc-dev list

=20

>-----Original Message-----
>From: Anton Vorontsov [mailto:avorontsov@ru.mvista.com]=20
>Sent: Saturday, February 27, 2010 3:08 AM
>To: Paul Gortmaker
>Cc: Martyn Welch; netdev@vger.kernel.org;=20
>linux-kernel@vger.kernel.org; linuxppc-dev list; Kumar=20
>Gopalpet-B05799; davem@davemloft.net
>Subject: Re: Gianfar driver failing on MPC8641D based board
>
>On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote:
>> On 10-02-26 11:10 AM, Anton Vorontsov wrote:
>> > On Fri, Feb 26, 2010 at 03:34:07PM +0000, Martyn Welch wrote:
>> > [...]
>> >> Out of 10 boot attempts, 7 failed.
>> >=20
>> > OK, I see why. With ip=3Don (dhcp boot) it's much harder to trigger =

>> > it. With static ip config can I see the same.
>>=20
>> I'd kind of expected to see us stuck in gianfar on that=20
>lock, but the=20
>> SysRQ-T doesn't show us hung up anywhere in gianfar itself.
>> [This was on a base 2.6.33, with just a small sysrq fix patch]
>
>> [df841a30] [c0009fc4] __switch_to+0x8c/0xf8                 =20
>                  =20
>> [df841a50] [c0350160] schedule+0x354/0x92c                  =20
>                  =20
>> [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54       =20
>                  =20
>> [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108              =20
>                  =20
>> [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4     =20
>                  =20
>> [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398             =20
>                  =20
>> [df841b90] [c0329abc] rpc_run_task+0x48/0x9c                =20
>                  =20
>> [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88               =20
>                  =20
>> [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8             =20
>                  =20
>> [df841c20] [c014eb60] nfs_lookup+0x12c/0x230                =20
>                  =20
>> [df841d50] [c00b9680] do_lookup+0x118/0x288                 =20
>                  =20
>> [df841d80] [c00bb904] link_path_walk+0x194/0x1118           =20
>                  =20
>> [df841df0] [c00bcb08] path_walk+0x8c/0x168                  =20
>                  =20
>> [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c              =20
>                  =20
>> [df841e40] [c00be148] do_filp_open+0x5d4/0xba4              =20
>                  =20
>> [df841f10] [c00abe94] do_sys_open+0xac/0x190                =20
>                  =20
>
>Yeah, I don't think this is gianfar-related. It must be=20
>something else triggered by the fact that gianfar no longer=20
>sends stuff.
>
>OK, I think I found what's happening in gianfar.
>
>Some background...
>
>start_xmit() prepares new skb for transmitting, generally it=20
>does three things:
>
>1. sets up all BDs (marks them ready to send), except the first one.
>2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>   would cleanup it later.
>3. sets up the first BD, i.e. marks it ready.
>
>Here is what clean_tx_ring() does:
>
>1. reads skbs from tx_queue->tx_skbuff
>2. Checks if the *last* BD is ready. If it's still ready [to send]
>   then it it isn't transmitted, so clean_tx_ring() returns.
>   Otherwise it actually cleanups BDs. All is OK.
>
>Now, if there is just one BD, code flow:
>
>- start_xmit(): stores skb into tx_skbuff. Note that the first BD
>  (which is also the last one) isn't marked as ready, yet.
>- clean_tx_ring(): sees that skb is not null, *and* its lstatus
>  says that it is NOT ready (like if BD was sent), so it cleans
>  it up (bad!)
>- start_xmit(): marks BD as ready [to send], but it's too late.
>
>We can fix this simply by reordering lstatus/tx_skbuff writes.
>
>It works flawlessly on my p2020, please try it.

Anton,

Understood, and thanks for the explanation. Am I correct in saying that
this is
due to the out-of-order execution capability on powerpc ?

I have one more question, why don't we use use atomic_t for num_txbdfree
and
completely  do away with spin_locks in gfar_clean_tx_ring() and
gfar_start_xmit().
In an non-SMP, scenario I would feel there is absolutely no requirement
of spin_locks
and in case of SMP atomic operation would be much more safer on powerpc
rather than spin_locks.

What is your suggestion ?


--

Thanks
Sandeep

>
>Thanks!
>
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c=20
>index 8bd3c9f..cccb409 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct=20
>sk_buff *skb, struct net_device *dev)
> 	}
>=20
> 	/* setup the TxBD length and buffer pointer for the first BD */
>-	tx_queue->tx_skbuff[tx_queue->skb_curtx] =3D skb;
> 	txbdp_start->bufPtr =3D dma_map_single(&priv->ofdev->dev,=20
>skb->data,
> 			skb_headlen(skb), DMA_TO_DEVICE);
>=20
>@@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct=20
>sk_buff *skb, struct net_device *dev)
>=20
> 	txbdp_start->lstatus =3D lstatus;
>=20
>+	eieio(); /* force lstatus write before tx_skbuff */
>+
>+	tx_queue->tx_skbuff[tx_queue->skb_curtx] =3D skb;
>+
> 	/* Update the current skb pointer to the next entry we will use
> 	 * (wrapping if necessary) */
> 	tx_queue->skb_curtx =3D (tx_queue->skb_curtx + 1) &
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-26 21:38                             ` Anton Vorontsov
@ 2010-03-01 13:07                               ` Martyn Welch
  -1 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-03-01 13:07 UTC (permalink / raw)
  To: linuxppc-dev list
  Cc: Paul Gortmaker, netdev, linux-kernel, Sandeep Gopalpet, davem

Anton Vorontsov wrote:
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..cccb409 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	}
>  
>  	/* setup the TxBD length and buffer pointer for the first BD */
> -	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>  	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
>  			skb_headlen(skb), DMA_TO_DEVICE);
>  
> @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	txbdp_start->lstatus = lstatus;
>  
> +	eieio(); /* force lstatus write before tx_skbuff */
> +
> +	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> +
>  	/* Update the current skb pointer to the next entry we will use
>  	 * (wrapping if necessary) */
>  	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &
>   
I can confirm 10/10 successful boots on p2020ds and mpc8641_hpcn.

Martyn


-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-03-01 13:07                               ` Martyn Welch
  0 siblings, 0 replies; 40+ messages in thread
From: Martyn Welch @ 2010-03-01 13:07 UTC (permalink / raw)
  To: linuxppc-dev list
  Cc: Paul Gortmaker, davem, Sandeep Gopalpet, linux-kernel, netdev

Anton Vorontsov wrote:
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..cccb409 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	}
>  
>  	/* setup the TxBD length and buffer pointer for the first BD */
> -	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>  	txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
>  			skb_headlen(skb), DMA_TO_DEVICE);
>  
> @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	txbdp_start->lstatus = lstatus;
>  
> +	eieio(); /* force lstatus write before tx_skbuff */
> +
> +	tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> +
>  	/* Update the current skb pointer to the next entry we will use
>  	 * (wrapping if necessary) */
>  	tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &
>   
I can confirm 10/10 successful boots on p2020ds and mpc8641_hpcn.

Martyn


-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms                     |   Wales (3828642) at 100
T +44(0)127322748                            |   Barbirolli Square, Manchester,
E martyn.welch@ge.com                        |   M2 3AB  VAT:GB 927559189

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
  2010-02-27  5:35                               ` Kumar Gopalpet-B05799
@ 2010-03-02 14:02                                 ` Anton Vorontsov
  -1 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-03-02 14:02 UTC (permalink / raw)
  To: Kumar Gopalpet-B05799
  Cc: Paul Gortmaker, Martyn Welch, netdev, linux-kernel,
	linuxppc-dev list, davem

Hi!

On Sat, Feb 27, 2010 at 11:05:32AM +0530, Kumar Gopalpet-B05799 wrote:
[...]
> Understood, and thanks for the explanation. Am I correct in saying that
> this is
> due to the out-of-order execution capability on powerpc ?

Nope, that was just a logic issue in the driver. 

Though, with the patch, the eieio() is needed so that compiler (or CPU)
won't reorder lstatus and skbuff writes.

> I have one more question, why don't we use use atomic_t for num_txbdfree
> and
> completely  do away with spin_locks in gfar_clean_tx_ring() and
> gfar_start_xmit().
> In an non-SMP, scenario I would feel there is absolutely no requirement
> of spin_locks
> and in case of SMP atomic operation would be much more safer on powerpc
> rather than spin_locks.
> 
> What is your suggestion ?

I think that's a good idea.

However, in start_xmit() we'll have to keep the spinlock anyway
since it also protects from gfar_error(), which can modify
regs->tstat.

Thanks!

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Gianfar driver failing on MPC8641D based board
@ 2010-03-02 14:02                                 ` Anton Vorontsov
  0 siblings, 0 replies; 40+ messages in thread
From: Anton Vorontsov @ 2010-03-02 14:02 UTC (permalink / raw)
  To: Kumar Gopalpet-B05799
  Cc: linuxppc-dev list, netdev, linux-kernel, Martyn Welch,
	Paul Gortmaker, davem

Hi!

On Sat, Feb 27, 2010 at 11:05:32AM +0530, Kumar Gopalpet-B05799 wrote:
[...]
> Understood, and thanks for the explanation. Am I correct in saying that
> this is
> due to the out-of-order execution capability on powerpc ?

Nope, that was just a logic issue in the driver. 

Though, with the patch, the eieio() is needed so that compiler (or CPU)
won't reorder lstatus and skbuff writes.

> I have one more question, why don't we use use atomic_t for num_txbdfree
> and
> completely  do away with spin_locks in gfar_clean_tx_ring() and
> gfar_start_xmit().
> In an non-SMP, scenario I would feel there is absolutely no requirement
> of spin_locks
> and in case of SMP atomic operation would be much more safer on powerpc
> rather than spin_locks.
> 
> What is your suggestion ?

I think that's a good idea.

However, in start_xmit() we'll have to keep the spinlock anyway
since it also protects from gfar_error(), which can modify
regs->tstat.

Thanks!

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-03-02 14:02 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-05 14:00 Gianfar driver failing on MPC8641D based board Martyn Welch
2010-02-25 10:31 ` Martyn Welch
2010-02-25 16:46   ` Martyn Welch
2010-02-25 16:46     ` Martyn Welch
2010-02-25 16:51     ` Anton Vorontsov
2010-02-25 16:51       ` Anton Vorontsov
2010-02-25 17:49       ` Anton Vorontsov
2010-02-25 17:49         ` Anton Vorontsov
2010-02-26  0:53         ` Paul Gortmaker
2010-02-26  0:53           ` Paul Gortmaker
2010-02-26  3:14           ` Anton Vorontsov
2010-02-26  3:14             ` Anton Vorontsov
2010-02-26  4:58             ` Kumar Gopalpet-B05799
2010-02-26  4:58               ` Kumar Gopalpet-B05799
2010-02-26 12:06             ` Martyn Welch
2010-02-26 12:06               ` Martyn Welch
2010-02-26 14:35               ` Anton Vorontsov
2010-02-26 14:35                 ` Anton Vorontsov
2010-02-26 14:52                 ` Paul Gortmaker
2010-02-26 14:52                   ` Paul Gortmaker
2010-02-26 15:18                   ` Martyn Welch
2010-02-26 15:18                     ` Martyn Welch
2010-02-26 15:34                     ` Martyn Welch
2010-02-26 16:10                       ` Anton Vorontsov
2010-02-26 16:27                         ` Paul Gortmaker
2010-02-26 16:27                           ` Paul Gortmaker
2010-02-26 21:38                           ` Anton Vorontsov
2010-02-26 21:38                             ` Anton Vorontsov
2010-02-26 22:12                             ` Paul Gortmaker
2010-02-26 22:12                               ` Paul Gortmaker
2010-02-27  5:35                             ` Kumar Gopalpet-B05799
2010-02-27  5:35                               ` Kumar Gopalpet-B05799
2010-03-02 14:02                               ` Anton Vorontsov
2010-03-02 14:02                                 ` Anton Vorontsov
2010-03-01 13:07                             ` Martyn Welch
2010-03-01 13:07                               ` Martyn Welch
2010-02-26 11:51       ` Martyn Welch
2010-02-26 11:51         ` Martyn Welch
2010-02-25 18:27     ` Kumar Gala
2010-02-25 18:27       ` Kumar Gala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.