All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-25 13:55 Mattis Lorentzon
  2014-06-26 14:01 ` Russell King - ARM Linux
  0 siblings, 1 reply; 91+ messages in thread
From: Mattis Lorentzon @ 2014-06-25 13:55 UTC (permalink / raw)
  To: linux-kernel, linux

[-- Attachment #1: Type: text/plain, Size: 4370 bytes --]

Hello kernel people,

I have a similar issue with v3.16-rc2 as previously reported by Waldemar Brodkorb for v3.15-rc4.
https://lkml.org/lkml/2014/5/9/330

We are running a benchmark application, sometimes using perf, with heavy traffic over NFS.
The error is sporadic and it seems to occur more frequently when using perf.

Linux imx6-test0 3.16.0-rc2+ #1 SMP Wed Jun 25 15:04:16 CEST 2014 armv7l armv7l armv7l GNU/Linux

Any help is greatly appreciated.

Best regards,
Mattis Lorentzon

Unable to handle kernel paging request at virtual address ffffffff
pgd = 9e338000
[ffffffff] *pgd=2fffd821, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 146 Comm: stereo Not tainted 3.16.0-rc2+ #1
task: 9e07a700 ti: 81c42000 task.ti: 81c42000
PC is at find_get_entry+0x60/0xfc
LR is at radix_tree_lookup_slot+0x1c/0x2c
pc : [<800a34d8>]    lr : [<80290448>]    psr: a0000013
sp : 81c43d98  ip : 00000000  fp : 81c43dcc
r10: 00000001  r9 : 9e30e3c0  r8 : 000002a7
r7 : 9f3758a0  r6 : 00000000  r5 : 00000001  r4 : 00000000
r3 : 81c43d84  r2 : 00000000  r1 : 000002a7  r0 : ffffffff
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 2e33804a  DAC: 00000015
Process stereo (pid: 146, stack limit = 0x81c42240)
Stack: (0x81c43d98 to 0x81c44000)
3d80:                                                       00000000 00000000
3da0: 800a3478 000a6000 81c43ecc 00000000 9f37589c 00000000 806cb02a 000002a7
3dc0: 81c43e04 81c43dd0 800a406c 800a3484 80061ca0 9fc2dfe0 00000013 00000059
3de0: 9f37589c 9f375770 00000300 000002a7 9e30e3c0 000002a7 81c43e94 81c43e08
3e00: 800a50c4 800a4040 00000000 00000000 801d1818 00000000 00001000 00080001
3e20: 000002a6 9f3757f4 00000300 000a7000 00000000 801d1818 9e30e490 9f37567c
3e40: 81c43ee8 81c43ed4 00000000 00000000 804d87e0 80067098 00000004 9f375770
3e60: 81c43e94 81c43e70 801d491c 81c43ee8 9f375770 81c43ed4 9e30e3c0 9e07a700
3e80: 76907000 00000000 81c43ebc 81c43e98 801d1818 800a4dfc 80061ca0 80061b0c
3ea0: 9f375770 00200000 00000000 81c43f78 81c43f44 81c43ec0 800e1348 801d17b8
3ec0: 00100000 81c43ed0 800e1764 76907000 00100000 00000000 000a7000 00059000
3ee0: 81c43ecc 00000001 9e30e3c0 00000000 00000000 00000000 9e07a700 00000000
3f00: 00000000 00000000 00200000 00000000 00100000 00000000 00000000 00000000
3f20: 9e30e3c0 9e30e3c0 76907000 81c43f78 9e30e3c0 00100000 81c43f74 81c43f48
3f40: 800e1adc 800e12b8 00000000 0027cce0 00200000 00000000 9e30e3c0 9e30e3c0
3f60: 00100000 76907000 81c43fa4 81c43f78 800e2200 800e1a58 00200000 00000000
3f80: 0027cce0 00000000 0007cce0 00000003 8000ebc4 81c42000 00000000 81c43fa8
3fa0: 8000ea00 800e21c8 0027cce0 00000000 00000003 76907000 00100000 00000000
3fc0: 0027cce0 00000000 0007cce0 00000003 0142b5a0 00000000 00000000 00000000
3fe0: 00000000 7ec59d94 76dc26ac 76e1762c 60000010 00000003 00000000 00000000
Backtrace:
[<800a3478>] (find_get_entry) from [<800a406c>] (pagecache_get_page+0x38/0x1d8)
 r8:000002a7 r7:806cb02a r6:00000000 r5:9f37589c r4:00000000
[<800a4034>] (pagecache_get_page) from [<800a50c4>] (generic_file_read_iter+0x2d4/0x750)
 r10:000002a7 r9:9e30e3c0 r8:000002a7 r7:00000300 r6:9f375770 r5:9f37589c
 r4:00000059
[<800a4df0>] (generic_file_read_iter) from [<801d1818>] (nfs_file_read+0x6c/0xa8)
 r10:00000000 r9:76907000 r8:9e07a700 r7:9e30e3c0 r6:81c43ed4 r5:9f375770
 r4:81c43ee8
[<801d17ac>] (nfs_file_read) from [<800e1348>] (new_sync_read+0x9c/0xc4)
 r6:81c43f78 r5:00000000 r4:00200000
[<800e12ac>] (new_sync_read) from [<800e1adc>] (vfs_read+0x90/0x150)
 r8:00100000 r7:9e30e3c0 r6:81c43f78 r5:76907000 r4:9e30e3c0
[<800e1a4c>] (vfs_read) from [<800e2200>] (SyS_read+0x44/0x98)
 r9:76907000 r8:00100000 r7:9e30e3c0 r6:9e30e3c0 r5:00000000 r4:00200000
[<800e21bc>] (SyS_read) from [<8000ea00>] (ret_fast_syscall+0x0/0x48)
 r9:81c42000 r8:8000ebc4 r7:00000003 r6:0007cce0 r5:00000000 r4:0027cce0
Code: e1a01008 eb07b3d6 e3500000 0a00001c (e5904000)
---[ end trace bebb56a5d6f464ed ]---

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 12865 bytes --]

Booting Linux on physical CPU 0x0
Linux version 3.16.0-rc2+ (mattisl@localhost.localdomain) (gcc version 4.8.2 (GCC) ) #1 SMP Wed Jun 25 15:04:16 CEST 2014
CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: Freescale i.MX6 Quad SABRE Lite Board
bootconsole [earlycon0] enabled
Memory policy: Data cache writealloc
On node 0 totalpages: 131072
free_area_init_node: node 0, pgdat 806ca500, node_mem_map 9fbf7000
  Normal zone: 1024 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 131072 pages, LIFO batch:31
PERCPU: Embedded 8 pages/cpu @9fbb5000 s8640 r8192 d15936 u32768
pcpu-alloc: s8640 r8192 d15936 u32768 alloc=8*4096
pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
Kernel command line: console=ttymxc0,115200 ip=192.168.2.151:192.168.2.1:192.168.2.1:255.255.255.0:imx6-test0:eth0:on earlyprintk enable_wait_mode=off
PID hash table entries: 2048 (order: 1, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 492104K/524288K available (4938K kernel code, 244K rwdata, 1496K rodata, 236K init, 8335K bss, 32184K reserved, 0K highmem)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xffe00000   (2048 kB)
    vmalloc : 0xa0800000 - 0xff000000   (1512 MB)
    lowmem  : 0x80000000 - 0xa0000000   ( 512 MB)
    pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
    modules : 0x7f000000 - 0x7fe00000   (  14 MB)
      .text : 0x80008000 - 0x80650e0c   (6436 kB)
      .init : 0x80651000 - 0x8068c1c0   ( 237 kB)
      .data : 0x8068e000 - 0x806cb140   ( 245 kB)
       .bss : 0x806cb148 - 0x80eeefd0   (8336 kB)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
Hierarchical RCU implementation.
NR_IRQS:16 nr_irqs:16 16
L2C-310 erratum 769419 enabled
L2C-310 enabling early BRESP for Cortex-A9
L2C-310 full line of zeros enabled for Cortex-A9
L2C-310 ID prefetch enabled, offset 1 lines
L2C-310 dynamic clock gating enabled, standby mode enabled
L2C-310 cache controller enabled, 16 ways, 1024 kB
L2C-310: CACHE_ID 0x410000c7, AUX_CTRL 0x76070001
Switching to timer-based delay loop
sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 65075262448ns
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:  8
... MAX_LOCK_DEPTH:          48
... MAX_LOCKDEP_KEYS:        8191
... CLASSHASH_SIZE:          4096
... MAX_LOCKDEP_ENTRIES:     32768
... MAX_LOCKDEP_CHAINS:      65536
... CHAINHASH_SIZE:          32768
 memory used by lock dependency info: 5167 kB
 per task-struct memory footprint: 1152 bytes
Calibrating delay loop (skipped), value calculated using timer frequency.. 132.00 BogoMIPS (lpj=660000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x104d91c8 - 0x104d9220
CPU1: Booted secondary processor
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
CPU2: Booted secondary processor
CPU2: thread -1, cpu 2, socket 0, mpidr 80000002
CPU3: Booted secondary processor
CPU3: thread -1, cpu 3, socket 0, mpidr 80000003
Brought up 4 CPUs
SMP: Total of 4 processors activated.
CPU: All CPU(s) started in SVC mode.
devtmpfs: initialized
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
pinctrl core: initialized pinctrl subsystem
regulator-dummy: no parameters
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
CPU identified as i.MX6Q, silicon rev 1.2
vdd1p1: 800 <--> 1375 mV at 1100 mV 
vdd3p0: 2800 <--> 3150 mV at 3000 mV 
vdd2p5: 2000 <--> 2750 mV at 2400 mV 
vddarm: 725 <--> 1450 mV at 1225 mV 
vddpu: 725 <--> 1450 mV at 1225 mV 
vddsoc: 725 <--> 1450 mV at 1225 mV 
hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 4 bytes.
imx6q-pinctrl 20e0000.iomuxc: initialized IMX pinctrl driver
mxs-dma 110000.dma-apbh: initialized
2P5V: 2500 mV 
3P3V: 3300 mV 
usb_otg_vbus: 5000 mV 
SCSI subsystem initialized
i2c i2c-0: IMX I2C adapter registered
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
Switched to clocksource mxc_timer1
imx6q-pcie 1ffc000.pcie: phy link never came up
imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0x10000]
pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
pci 0000:00:00.0: [16c3:abcd] type 01 class 0x060400
pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref]
pci 0000:00:00.0: supports D1
pci 0000:00:00.0: PME# supported from D0 D1 D3hot D3cold
PCI: bus0: Fast back to back transfers disabled
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 01
pci 0000:00:00.0: BAR 0: assigned [mem 0x01000000-0x010fffff]
pci 0000:00:00.0: BAR 6: assigned [mem 0x01100000-0x0110ffff pref]
pci 0000:00:00.0: PCI bridge to [bus 01]
pci 0000:00:00.0: PCI bridge to [bus 01]
pci_bus 0000:00: resource 4 [io  0x1000-0x10000]
pci_bus 0000:00: resource 5 [mem 0x01000000-0x01efffff]
NET: Registered protocol family 2
TCP established hash table entries: 4096 (order: 2, 16384 bytes)
TCP bind hash table entries: 4096 (order: 5, 147456 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP: reno registered
UDP hash table entries: 256 (order: 2, 20480 bytes)
UDP-Lite hash table entries: 256 (order: 2, 20480 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 64 bytes, default 64
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 12084K (81801000 - 823ce000)
hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 7 counters available
futex hash table entries: 1024 (order: 4, 65536 bytes)
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
squashfs: version 4.0 (2009/01/31) Phillip Lougher
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
fuse init (API version 7.23)
msgmni has been set to 984
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
imx-weim 21b8000.weim: Driver registered.
pcieport 0000:00:00.0: Signaling PME through PCIe PME interrupt
pcie_pme 0000:00:00.0:pcie01: service driver pcie_pme loaded
imx-sdma 20ec000.sdma: Direct firmware load failed with error -2
imx-sdma 20ec000.sdma: Falling back to user helper
imx-sdma 20ec000.sdma: initialized
Serial: IMX driver
2020000.serial: ttymxc0 at MMIO 0x2020000 (irq = 58, base_baud = 5000000) is a IMX
console [ttymxc0] enabled
bootconsole [earlycon0] disabled
21e8000.serial: ttymxc1 at MMIO 0x21e8000 (irq = 59, base_baud = 5000000) is a IMX
serial: Freescale lpuart driver
brd: module loaded
loop: module loaded
spi_imx 2008000.ecspi: probed
2188000.ethernet supply phy not found, using dummy regulator
fec 2188000.ethernet (unregistered net_device): Invalid MAC address: 00:00:00:00:00:00
fec 2188000.ethernet (unregistered net_device): Using random MAC address: 72:2e:0d:fc:e8:38
libphy: fec_enet_mii_bus: probed
fec 2188000.ethernet eth0: registered PHC device 0
snvs_rtc 20cc034.snvs-rtc-lp: rtc core: registered 20cc034.snvs-rtc-lp as rtc0
i2c /dev entries driver
imx2-wdt 20bc000.wdog: timeout 60 sec (nowayout=0)
TCP: cubic registered
NET: Registered protocol family 10
sit: IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
Key type dns_resolver registered
Registering SWP/SWPB emulation handler
input: gpio-keys as /devices/soc0/gpio-keys/input/input0
snvs_rtc 20cc034.snvs-rtc-lp: setting system clock to 1970-01-01 04:55:26 UTC (17726)
fec 2188000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=2188000.ethernet:07, irq=-1)
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
fec 2188000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
IP-Config: Complete:
     device=eth0, hwaddr=72:2e:0d:fc:e8:38, ipaddr=192.168.2.151, mask=255.255.255.0, gw=192.168.2.1
     host=imx6-test0, domain=, nis-domain=(none)
     bootserver=192.168.2.1, rootserver=192.168.2.1, rootpath=
usb_otg_vbus: disabling
Freeing unused kernel memory: 236K (80651000 - 8068c000)
random: mkdir urandom read with 3 bits of entropy available
fec 2188000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=2188000.ethernet:07, irq=-1)
fec 2188000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
imx-sdma 20ec000.sdma: firmware not found
random: nonblocking pool is initialized
Unable to handle kernel paging request at virtual address ffffffff
pgd = 9e338000
[ffffffff] *pgd=2fffd821, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 146 Comm: stereo Not tainted 3.16.0-rc2+ #1
task: 9e07a700 ti: 81c42000 task.ti: 81c42000
PC is at find_get_entry+0x60/0xfc
LR is at radix_tree_lookup_slot+0x1c/0x2c
pc : [<800a34d8>]    lr : [<80290448>]    psr: a0000013
sp : 81c43d98  ip : 00000000  fp : 81c43dcc
r10: 00000001  r9 : 9e30e3c0  r8 : 000002a7
r7 : 9f3758a0  r6 : 00000000  r5 : 00000001  r4 : 00000000
r3 : 81c43d84  r2 : 00000000  r1 : 000002a7  r0 : ffffffff
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 2e33804a  DAC: 00000015
Process stereo (pid: 146, stack limit = 0x81c42240)
Stack: (0x81c43d98 to 0x81c44000)
3d80:                                                       00000000 00000000
3da0: 800a3478 000a6000 81c43ecc 00000000 9f37589c 00000000 806cb02a 000002a7
3dc0: 81c43e04 81c43dd0 800a406c 800a3484 80061ca0 9fc2dfe0 00000013 00000059
3de0: 9f37589c 9f375770 00000300 000002a7 9e30e3c0 000002a7 81c43e94 81c43e08
3e00: 800a50c4 800a4040 00000000 00000000 801d1818 00000000 00001000 00080001
3e20: 000002a6 9f3757f4 00000300 000a7000 00000000 801d1818 9e30e490 9f37567c
3e40: 81c43ee8 81c43ed4 00000000 00000000 804d87e0 80067098 00000004 9f375770
3e60: 81c43e94 81c43e70 801d491c 81c43ee8 9f375770 81c43ed4 9e30e3c0 9e07a700
3e80: 76907000 00000000 81c43ebc 81c43e98 801d1818 800a4dfc 80061ca0 80061b0c
3ea0: 9f375770 00200000 00000000 81c43f78 81c43f44 81c43ec0 800e1348 801d17b8
3ec0: 00100000 81c43ed0 800e1764 76907000 00100000 00000000 000a7000 00059000
3ee0: 81c43ecc 00000001 9e30e3c0 00000000 00000000 00000000 9e07a700 00000000
3f00: 00000000 00000000 00200000 00000000 00100000 00000000 00000000 00000000
3f20: 9e30e3c0 9e30e3c0 76907000 81c43f78 9e30e3c0 00100000 81c43f74 81c43f48
3f40: 800e1adc 800e12b8 00000000 0027cce0 00200000 00000000 9e30e3c0 9e30e3c0
3f60: 00100000 76907000 81c43fa4 81c43f78 800e2200 800e1a58 00200000 00000000
3f80: 0027cce0 00000000 0007cce0 00000003 8000ebc4 81c42000 00000000 81c43fa8
3fa0: 8000ea00 800e21c8 0027cce0 00000000 00000003 76907000 00100000 00000000
3fc0: 0027cce0 00000000 0007cce0 00000003 0142b5a0 00000000 00000000 00000000
3fe0: 00000000 7ec59d94 76dc26ac 76e1762c 60000010 00000003 00000000 00000000
Backtrace: 
[<800a3478>] (find_get_entry) from [<800a406c>] (pagecache_get_page+0x38/0x1d8)
 r8:000002a7 r7:806cb02a r6:00000000 r5:9f37589c r4:00000000
[<800a4034>] (pagecache_get_page) from [<800a50c4>] (generic_file_read_iter+0x2d4/0x750)
 r10:000002a7 r9:9e30e3c0 r8:000002a7 r7:00000300 r6:9f375770 r5:9f37589c
 r4:00000059
[<800a4df0>] (generic_file_read_iter) from [<801d1818>] (nfs_file_read+0x6c/0xa8)
 r10:00000000 r9:76907000 r8:9e07a700 r7:9e30e3c0 r6:81c43ed4 r5:9f375770
 r4:81c43ee8
[<801d17ac>] (nfs_file_read) from [<800e1348>] (new_sync_read+0x9c/0xc4)
 r6:81c43f78 r5:00000000 r4:00200000
[<800e12ac>] (new_sync_read) from [<800e1adc>] (vfs_read+0x90/0x150)
 r8:00100000 r7:9e30e3c0 r6:81c43f78 r5:76907000 r4:9e30e3c0
[<800e1a4c>] (vfs_read) from [<800e2200>] (SyS_read+0x44/0x98)
 r9:76907000 r8:00100000 r7:9e30e3c0 r6:9e30e3c0 r5:00000000 r4:00200000
[<800e21bc>] (SyS_read) from [<8000ea00>] (ret_fast_syscall+0x0/0x48)
 r9:81c42000 r8:8000ebc4 r7:00000003 r6:0007cce0 r5:00000000 r4:0027cce0
Code: e1a01008 eb07b3d6 e3500000 0a00001c (e5904000) 
---[ end trace bebb56a5d6f464ed ]---

[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 14775 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-25 13:55 Oops: 17 SMP ARM (v3.16-rc2) Mattis Lorentzon
@ 2014-06-26 14:01 ` Russell King - ARM Linux
  2014-06-26 14:44     ` Mattis Lorentzon
  0 siblings, 1 reply; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 14:01 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: linux-kernel

On Wed, Jun 25, 2014 at 01:55:05PM +0000, Mattis Lorentzon wrote:
> Hello kernel people,

You may wish to also copy linux-arm-kernel@lists.infradead.org, which is
where ARM kernel people are.

> I have a similar issue with v3.16-rc2 as previously reported by Waldemar Brodkorb for v3.15-rc4.
> https://lkml.org/lkml/2014/5/9/330

This URL returns no useful information.  I find that lkml.org is broken
more times than not in recent years.  Please use a different archive
site when referring to posts, thanks.

> We are running a benchmark application, sometimes using perf, with heavy
> traffic over NFS.

I have had two iMX6 platforms running root-NFS for about the last six to
nine months with various workloads, and have never seen this oops.
Unfortunately, the description above gives very little information for
what the mechanism to trigger this bug may be.  For example, if I wanted
to reproduce it, what would I need to do?

> The error is sporadic and it seems to occur more frequently when using perf.

So it occurs when not using perf?

> Linux imx6-test0 3.16.0-rc2+ #1 SMP Wed Jun 25 15:04:16 CEST 2014 armv7l armv7l armv7l GNU/Linux
> 
> Any help is greatly appreciated.
> 
> Best regards,
> Mattis Lorentzon
> 
> Unable to handle kernel paging request at virtual address ffffffff
> pgd = 9e338000
> [ffffffff] *pgd=2fffd821, *pte=00000000, *ppte=00000000
> Internal error: Oops: 17 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 146 Comm: stereo Not tainted 3.16.0-rc2+ #1
> task: 9e07a700 ti: 81c42000 task.ti: 81c42000
> PC is at find_get_entry+0x60/0xfc
> LR is at radix_tree_lookup_slot+0x1c/0x2c
> pc : [<800a34d8>]    lr : [<80290448>]    psr: a0000013
> sp : 81c43d98  ip : 00000000  fp : 81c43dcc
> r10: 00000001  r9 : 9e30e3c0  r8 : 000002a7
> r7 : 9f3758a0  r6 : 00000000  r5 : 00000001  r4 : 00000000
> r3 : 81c43d84  r2 : 00000000  r1 : 000002a7  r0 : ffffffff
...
> Code: e1a01008 eb07b3d6 e3500000 0a00001c (e5904000)

Right, so radix_tree_lookup_slot returned 0xffffffff.  I've no idea how
that happened, and I'm not about to try reading and trying to understand
that code.  However, as that is generic code, I find it unlikely that
the code is buggy.  So, I suspect something else must be going on here,
such as a compiler bug or memory corruption.

Your other oops dumps also show various other functions apparantly
returning 0xffffffff.  I can't believe that there's more than one bug
doing this, so I doubt the problem is in these functions.  Something
else must be going on.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-26 14:01 ` Russell King - ARM Linux
@ 2014-06-26 14:44     ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-06-26 14:44 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: linux-kernel, linux-arm-kernel, Fredrik Noring

Thank you for your reply,

> On Wed, Jun 25, 2014 at 01:55:05PM +0000, Mattis Lorentzon wrote:
> > I have a similar issue with v3.16-rc2 as previously reported by Waldemar
> Brodkorb for v3.15-rc4.
> > https://lkml.org/lkml/2014/5/9/330
> 
> This URL returns no useful information.  I find that lkml.org is broken more
> times than not in recent years.  Please use a different archive site when
> referring to posts, thanks.

http://lkml.iu.edu/hypermail/linux/kernel/1405.1/01114.html

> I have had two iMX6 platforms running root-NFS for about the last six to nine
> months with various workloads, and have never seen this oops.
> Unfortunately, the description above gives very little information for what
> the mechanism to trigger this bug may be.  For example, if I wanted to
> reproduce it, what would I need to do?

We have managed to trigger the Oops by just transferring a large file over nfs
cat /mnt/foo > /dev/null
where foo is a file that is approximately 2 GB. There may be some packet losses
on this network, perhaps this differs from your workload?

> > The error is sporadic and it seems to occur more frequently when using
> perf.
> 
> So it occurs when not using perf?

Yes, certainly, see above.

We have done some more investigations, please find it in this mail:

http://lkml.iu.edu/hypermail/linux/kernel/1406.3/02190.html

The Oops seems to have been introduced somewhere between v3.12 and v3.13:

- The Oops is reproducible within seconds when running Linux 3.16-rc2.
- We have observed the Oops on 8 different hardware units and two different chipsets (Freescale i.MX6 and Xilinx Zynq).
- The Oops has not been seen on Linux 3.12 so it appears to be good.
- The Oops has been seen on Linux 3.13, 3.14, 3.15, 3.16-rc2 so these appear to be bad.

Configs and a couple of Oops reports are attached to the linked mail.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-26 14:44     ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-06-26 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

Thank you for your reply,

> On Wed, Jun 25, 2014 at 01:55:05PM +0000, Mattis Lorentzon wrote:
> > I have a similar issue with v3.16-rc2 as previously reported by Waldemar
> Brodkorb for v3.15-rc4.
> > https://lkml.org/lkml/2014/5/9/330
> 
> This URL returns no useful information.  I find that lkml.org is broken more
> times than not in recent years.  Please use a different archive site when
> referring to posts, thanks.

http://lkml.iu.edu/hypermail/linux/kernel/1405.1/01114.html

> I have had two iMX6 platforms running root-NFS for about the last six to nine
> months with various workloads, and have never seen this oops.
> Unfortunately, the description above gives very little information for what
> the mechanism to trigger this bug may be.  For example, if I wanted to
> reproduce it, what would I need to do?

We have managed to trigger the Oops by just transferring a large file over nfs
cat /mnt/foo > /dev/null
where foo is a file that is approximately 2 GB. There may be some packet losses
on this network, perhaps this differs from your workload?

> > The error is sporadic and it seems to occur more frequently when using
> perf.
> 
> So it occurs when not using perf?

Yes, certainly, see above.

We have done some more investigations, please find it in this mail:

http://lkml.iu.edu/hypermail/linux/kernel/1406.3/02190.html

The Oops seems to have been introduced somewhere between v3.12 and v3.13:

- The Oops is reproducible within seconds when running Linux 3.16-rc2.
- We have observed the Oops on 8 different hardware units and two different chipsets (Freescale i.MX6 and Xilinx Zynq).
- The Oops has not been seen on Linux 3.12 so it appears to be good.
- The Oops has been seen on Linux 3.13, 3.14, 3.15, 3.16-rc2 so these appear to be bad.

Configs and a couple of Oops reports are attached to the linked mail.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-26 14:44     ` Mattis Lorentzon
@ 2014-06-26 15:14       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 15:14 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: linux-kernel, linux-arm-kernel, Fredrik Noring

On Thu, Jun 26, 2014 at 02:44:52PM +0000, Mattis Lorentzon wrote:
> Thank you for your reply,
> 
> > On Wed, Jun 25, 2014 at 01:55:05PM +0000, Mattis Lorentzon wrote:
> > > I have a similar issue with v3.16-rc2 as previously reported by Waldemar
> > Brodkorb for v3.15-rc4.
> > > https://lkml.org/lkml/2014/5/9/330
> >
> > This URL returns no useful information.  I find that lkml.org is broken more
> > times than not in recent years.  Please use a different archive site when
> > referring to posts, thanks.
> 
> http://lkml.iu.edu/hypermail/linux/kernel/1405.1/01114.html

I remember that report, but it was never resolved as I think no one has
any ideas what is causing these, and no one has any idea where to start
looking.

> We have managed to trigger the Oops by just transferring a large file
> over nfs
> cat /mnt/foo > /dev/null
> where foo is a file that is approximately 2 GB. There may be some
> packet losses on this network, perhaps this differs from your workload?

That's a similar workload to the one which is mentioned in the previous
report.  I've just set a similar transfer going, but this will be a 16GB
file.

> We have done some more investigations, please find it in this mail:
> 
> http://lkml.iu.edu/hypermail/linux/kernel/1406.3/02190.html

Yes, I saw that before I replied, and my reply was written with that
message in mind.  That's what prompted this paragraph in my previous
reply:

"Your other oops dumps also show various other functions apparantly
returning 0xffffffff.  I can't believe that there's more than one bug
doing this, so I doubt the problem is in these functions.  Something
else must be going on."

One of the problems is that there's soo much work going on with the
kernel by many different parties, pulling it in various directions,
that no one really has an overview of all the changes, and so no one
has much of a feel what could be the cause of weird bugs like this.

I don't know what to suggest - you could try using git bisect to see
if you can track it down to a particular commit, but it sounds like
that's going to be very time consuming.  You mentioned that 3.12
doesn't show the bug, but 3.13 does - so start off telling git bisect
that 3.12 is "good" and 3.13 is "bad".

Hopefully there won't be too many breakages during the 3.13 merge
window (between 3.12 and 3.13-rc1), but I don't have much faith in
that; people seem to have a habbit of holding back fixes until -rc1,
which makes _exactly_ this kind of bug much harder for people like
yourselves to track down - or maybe even impossible.

I'm afraid I can't offer very much help beyond this until either I can
produce it, or someone manages to identify a particular change which
caused this.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-26 15:14       ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 26, 2014 at 02:44:52PM +0000, Mattis Lorentzon wrote:
> Thank you for your reply,
> 
> > On Wed, Jun 25, 2014 at 01:55:05PM +0000, Mattis Lorentzon wrote:
> > > I have a similar issue with v3.16-rc2 as previously reported by Waldemar
> > Brodkorb for v3.15-rc4.
> > > https://lkml.org/lkml/2014/5/9/330
> >
> > This URL returns no useful information.  I find that lkml.org is broken more
> > times than not in recent years.  Please use a different archive site when
> > referring to posts, thanks.
> 
> http://lkml.iu.edu/hypermail/linux/kernel/1405.1/01114.html

I remember that report, but it was never resolved as I think no one has
any ideas what is causing these, and no one has any idea where to start
looking.

> We have managed to trigger the Oops by just transferring a large file
> over nfs
> cat /mnt/foo > /dev/null
> where foo is a file that is approximately 2 GB. There may be some
> packet losses on this network, perhaps this differs from your workload?

That's a similar workload to the one which is mentioned in the previous
report.  I've just set a similar transfer going, but this will be a 16GB
file.

> We have done some more investigations, please find it in this mail:
> 
> http://lkml.iu.edu/hypermail/linux/kernel/1406.3/02190.html

Yes, I saw that before I replied, and my reply was written with that
message in mind.  That's what prompted this paragraph in my previous
reply:

"Your other oops dumps also show various other functions apparantly
returning 0xffffffff.  I can't believe that there's more than one bug
doing this, so I doubt the problem is in these functions.  Something
else must be going on."

One of the problems is that there's soo much work going on with the
kernel by many different parties, pulling it in various directions,
that no one really has an overview of all the changes, and so no one
has much of a feel what could be the cause of weird bugs like this.

I don't know what to suggest - you could try using git bisect to see
if you can track it down to a particular commit, but it sounds like
that's going to be very time consuming.  You mentioned that 3.12
doesn't show the bug, but 3.13 does - so start off telling git bisect
that 3.12 is "good" and 3.13 is "bad".

Hopefully there won't be too many breakages during the 3.13 merge
window (between 3.12 and 3.13-rc1), but I don't have much faith in
that; people seem to have a habbit of holding back fixes until -rc1,
which makes _exactly_ this kind of bug much harder for people like
yourselves to track down - or maybe even impossible.

I'm afraid I can't offer very much help beyond this until either I can
produce it, or someone manages to identify a particular change which
caused this.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-26 15:14       ` Russell King - ARM Linux
@ 2014-06-27 11:21         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-27 11:21 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

On Thu, Jun 26, 2014 at 04:14:24PM +0100, Russell King - ARM Linux wrote:
> On Thu, Jun 26, 2014 at 02:44:52PM +0000, Mattis Lorentzon wrote:
> > We have managed to trigger the Oops by just transferring a large file
> > over nfs
> > cat /mnt/foo > /dev/null
> > where foo is a file that is approximately 2 GB. There may be some
> > packet losses on this network, perhaps this differs from your workload?
> 
> That's a similar workload to the one which is mentioned in the previous
> report.  I've just set a similar transfer going, but this will be a 16GB
> file.

I've run this transfer several times, but so far I've unable to reproduce
the issue here.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-27 11:21         ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-27 11:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 26, 2014 at 04:14:24PM +0100, Russell King - ARM Linux wrote:
> On Thu, Jun 26, 2014 at 02:44:52PM +0000, Mattis Lorentzon wrote:
> > We have managed to trigger the Oops by just transferring a large file
> > over nfs
> > cat /mnt/foo > /dev/null
> > where foo is a file that is approximately 2 GB. There may be some
> > packet losses on this network, perhaps this differs from your workload?
> 
> That's a similar workload to the one which is mentioned in the previous
> report.  I've just set a similar transfer going, but this will be a 16GB
> file.

I've run this transfer several times, but so far I've unable to reproduce
the issue here.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-27 11:21         ` Russell King - ARM Linux
@ 2014-06-27 16:16           ` Fredrik Noring
  -1 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-06-27 16:16 UTC (permalink / raw)
  To: Russell King - ARM Linux, Mattis Lorentzon; +Cc: linux-kernel, linux-arm-kernel

Hi Russel,

> On Thu, Jun 26, 2014 at 04:14:24PM +0100, Russell King - ARM Linux wrote:
> > That's a similar workload to the one which is mentioned in the
> > previous report.  I've just set a similar transfer going, but this
> > will be a 16GB file.
> 
> I've run this transfer several times, but so far I've unable to reproduce the
> issue here.

Many thanks for testing this. We attempted to bisect, but unfortunately the
result was not conclusive. One reason might be that the config had to be
updated during the process, and so we did not end up with the exact same
configuration (things like e.g. IMX_SDMA in DMA_ENGINE etc.). Some runs
deadlocked without any visible Oops or printout. Some versions did not have
an entirely working console configuration.

Please find below a trace that appeared once with 3.16-rc2. Perhaps it is of
some interest?

(We also had memtester run for days on the i.MX6 hardware, without issues.)

All the best,
Fredrik

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc2 #19
Backtrace: 
[<80012390>] (dump_backtrace) from [<8001266c>] (show_stack+0x18/0x1c)
 r6:00000108 r5:00000000 r4:8064e29c r3:00000000
[<80012654>] (show_stack) from [<8049791c>] (dump_stack+0x8c/0x9c)
[<80497890>] (dump_stack) from [<80024f4c>] (warn_slowpath_common+0x74/0x90)
 r5:00000009 r4:80631d70
[<80024ed8>] (warn_slowpath_common) from [<80024fa0>] (warn_slowpath_fmt+0x38/0x40)
 r8:806320c0 r7:9d85a254 r6:9d879000 r5:9d85a000 r4:00000000
[<80024f6c>] (warn_slowpath_fmt) from [<803b8ff0>] (dev_watchdog+0x270/0x27c)
 r3:9d85a000 r2:805c4790
[<803b8d80>] (dev_watchdog) from [<8002f280>] (call_timer_fn+0x6c/0xe4)
 r10:80630008 r9:9d85a000 r8:803b8d80 r7:00000100 r6:80630000 r5:00000001
 r4:80631dd8
[<8002f214>] (call_timer_fn) from [<8002fec8>] (run_timer_softirq+0x1d4/0x254)
 r10:803b8d80 r9:806320c0 r8:9d85a000 r7:00000000 r6:80631e28 r5:80667040
 r4:9d85a284
[<8002fcf4>] (run_timer_softirq) from [<8002945c>] (__do_softirq+0x17c/0x30c)
 r10:00000001 r9:80632080 r8:40000001 r7:80630000 r6:00000100 r5:80632084
 r4:00000020
[<800292e0>] (__do_softirq) from [<80029920>] (irq_exit+0xd0/0x114)
 r10:80630000 r9:80665f19 r8:00000001 r7:f4000100 r6:00000000 r5:80630008
 r4:80630000
[<80029850>] (irq_exit) from [<8000f348>] (handle_IRQ+0x4c/0x98)
 r5:0000001d r4:8062ce44
[<8000f2fc>] (handle_IRQ) from [<80008614>] (gic_handle_irq+0x34/0x64)
 r6:80631f20 r5:80638a40 r4:f400010c r3:000000a0
[<800085e0>] (gic_handle_irq) from [<800131c4>] (__irq_svc+0x44/0x58)
Exception stack(0x80631f20 to 0x80631f68)
1f20: 00000001 00000001 00000000 8063b6f0 8063852c 806384d8 80665f19 804a0040
1f40: 00000001 80665f19 80630000 80631f74 00000000 80631f68 800614b8 8000f6a8
1f60: 200f0013 ffffffff
 r7:80631f54 r6:ffffffff r5:200f0013 r4:8000f6a8
[<8000f67c>] (arch_cpu_idle) from [<8005cbf8>] (cpu_startup_entry+0x10c/0x164)
[<8005caec>] (cpu_startup_entry) from [<80492b68>] (rest_init+0xc8/0xd8)
 r7:80625028 r3:00000000
[<80492aa0>] (rest_init) from [<805f6c5c>] (start_kernel+0x39c/0x3a8)
 r5:00000001 r4:806385d0
[<805f68c0>] (start_kernel) from [<10008074>] (0x10008074)
---[ end trace a7b7109ab2d04e11 ]---
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-27 16:16           ` Fredrik Noring
  0 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-06-27 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russel,

> On Thu, Jun 26, 2014 at 04:14:24PM +0100, Russell King - ARM Linux wrote:
> > That's a similar workload to the one which is mentioned in the
> > previous report.  I've just set a similar transfer going, but this
> > will be a 16GB file.
> 
> I've run this transfer several times, but so far I've unable to reproduce the
> issue here.

Many thanks for testing this. We attempted to bisect, but unfortunately the
result was not conclusive. One reason might be that the config had to be
updated during the process, and so we did not end up with the exact same
configuration (things like e.g. IMX_SDMA in DMA_ENGINE etc.). Some runs
deadlocked without any visible Oops or printout. Some versions did not have
an entirely working console configuration.

Please find below a trace that appeared once with 3.16-rc2. Perhaps it is of
some interest?

(We also had memtester run for days on the i.MX6 hardware, without issues.)

All the best,
Fredrik

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc2 #19
Backtrace: 
[<80012390>] (dump_backtrace) from [<8001266c>] (show_stack+0x18/0x1c)
 r6:00000108 r5:00000000 r4:8064e29c r3:00000000
[<80012654>] (show_stack) from [<8049791c>] (dump_stack+0x8c/0x9c)
[<80497890>] (dump_stack) from [<80024f4c>] (warn_slowpath_common+0x74/0x90)
 r5:00000009 r4:80631d70
[<80024ed8>] (warn_slowpath_common) from [<80024fa0>] (warn_slowpath_fmt+0x38/0x40)
 r8:806320c0 r7:9d85a254 r6:9d879000 r5:9d85a000 r4:00000000
[<80024f6c>] (warn_slowpath_fmt) from [<803b8ff0>] (dev_watchdog+0x270/0x27c)
 r3:9d85a000 r2:805c4790
[<803b8d80>] (dev_watchdog) from [<8002f280>] (call_timer_fn+0x6c/0xe4)
 r10:80630008 r9:9d85a000 r8:803b8d80 r7:00000100 r6:80630000 r5:00000001
 r4:80631dd8
[<8002f214>] (call_timer_fn) from [<8002fec8>] (run_timer_softirq+0x1d4/0x254)
 r10:803b8d80 r9:806320c0 r8:9d85a000 r7:00000000 r6:80631e28 r5:80667040
 r4:9d85a284
[<8002fcf4>] (run_timer_softirq) from [<8002945c>] (__do_softirq+0x17c/0x30c)
 r10:00000001 r9:80632080 r8:40000001 r7:80630000 r6:00000100 r5:80632084
 r4:00000020
[<800292e0>] (__do_softirq) from [<80029920>] (irq_exit+0xd0/0x114)
 r10:80630000 r9:80665f19 r8:00000001 r7:f4000100 r6:00000000 r5:80630008
 r4:80630000
[<80029850>] (irq_exit) from [<8000f348>] (handle_IRQ+0x4c/0x98)
 r5:0000001d r4:8062ce44
[<8000f2fc>] (handle_IRQ) from [<80008614>] (gic_handle_irq+0x34/0x64)
 r6:80631f20 r5:80638a40 r4:f400010c r3:000000a0
[<800085e0>] (gic_handle_irq) from [<800131c4>] (__irq_svc+0x44/0x58)
Exception stack(0x80631f20 to 0x80631f68)
1f20: 00000001 00000001 00000000 8063b6f0 8063852c 806384d8 80665f19 804a0040
1f40: 00000001 80665f19 80630000 80631f74 00000000 80631f68 800614b8 8000f6a8
1f60: 200f0013 ffffffff
 r7:80631f54 r6:ffffffff r5:200f0013 r4:8000f6a8
[<8000f67c>] (arch_cpu_idle) from [<8005cbf8>] (cpu_startup_entry+0x10c/0x164)
[<8005caec>] (cpu_startup_entry) from [<80492b68>] (rest_init+0xc8/0xd8)
 r7:80625028 r3:00000000
[<80492aa0>] (rest_init) from [<805f6c5c>] (start_kernel+0x39c/0x3a8)
 r5:00000001 r4:806385d0
[<805f68c0>] (start_kernel) from [<10008074>] (0x10008074)
---[ end trace a7b7109ab2d04e11 ]---
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-27 16:16           ` Fredrik Noring
@ 2014-06-27 16:31             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-27 16:31 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Mattis Lorentzon, linux-kernel, linux-arm-kernel

Hi Fredrik,

On Fri, Jun 27, 2014 at 04:16:57PM +0000, Fredrik Noring wrote:
> Please find below a trace that appeared once with 3.16-rc2. Perhaps it is of
> some interest?

It's not that serious... I know that the FEC ethernet driver is
horrendously racy (I have had a patch set for about the last six months
which fixes some of its problems) but as I've had a lot of patches to
deal with, and it's been pushed to the back of the queue...

The races don't lead to data corruption though, merely timeouts and
some lost packets.

Now because things have changed during the last merge window, I've got
an even bigger problem sorting through that patch set and getting it
back into a submittable state.  I've just sent out v2 for it onto the
netdev@vger.kernel.org mailing list.

The initial version (marked RFC) attracted very little interest from
testers, or acks.  I'd very much like to have some testing of it, so
if you want to try it out, I can provide you with a git URL, patches
or a combined patch.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-27 16:31             ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-06-27 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Fredrik,

On Fri, Jun 27, 2014 at 04:16:57PM +0000, Fredrik Noring wrote:
> Please find below a trace that appeared once with 3.16-rc2. Perhaps it is of
> some interest?

It's not that serious... I know that the FEC ethernet driver is
horrendously racy (I have had a patch set for about the last six months
which fixes some of its problems) but as I've had a lot of patches to
deal with, and it's been pushed to the back of the queue...

The races don't lead to data corruption though, merely timeouts and
some lost packets.

Now because things have changed during the last merge window, I've got
an even bigger problem sorting through that patch set and getting it
back into a submittable state.  I've just sent out v2 for it onto the
netdev at vger.kernel.org mailing list.

The initial version (marked RFC) attracted very little interest from
testers, or acks.  I'd very much like to have some testing of it, so
if you want to try it out, I can provide you with a git URL, patches
or a combined patch.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-27 16:31             ` Russell King - ARM Linux
@ 2014-06-30  6:22               ` Fredrik Noring
  -1 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-06-30  6:22 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Mattis Lorentzon, linux-kernel, linux-arm-kernel

Hi Russell,

> -----Original Message-----
> It's not that serious... I know that the FEC ethernet driver is horrendously
> racy (I have had a patch set for about the last six months which fixes some of
> its problems) but as I've had a lot of patches to deal with, and it's been
> pushed to the back of the queue...
> 
> The races don't lead to data corruption though, merely timeouts and some
> lost packets.

The serial port (uart1) and Ethernet are essentially the only things we use.
No disks, no graphics, no USB, etc. If not the Ethernet driver, what else is
likely to crash NFS so badly?

Also, we are happy to change our config if that would simplify things:

http://lkml.iu.edu/hypermail/linux/kernel/1406.3/01488/config.gz

> Now because things have changed during the last merge window, I've got an
> even bigger problem sorting through that patch set and getting it back into a
> submittable state.  I've just sent out v2 for it onto the
> netdev@vger.kernel.org mailing list.
> 
> The initial version (marked RFC) attracted very little interest from testers, or
> acks.  I'd very much like to have some testing of it, so if you want to try it
> out, I can provide you with a git URL, patches or a combined patch.

Sure! A combined gzip patch attachment is fine. Git over HTTP probably works
too.

All the best,
Fredrik

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-30  6:22               ` Fredrik Noring
  0 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-06-30  6:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

> -----Original Message-----
> It's not that serious... I know that the FEC ethernet driver is horrendously
> racy (I have had a patch set for about the last six months which fixes some of
> its problems) but as I've had a lot of patches to deal with, and it's been
> pushed to the back of the queue...
> 
> The races don't lead to data corruption though, merely timeouts and some
> lost packets.

The serial port (uart1) and Ethernet are essentially the only things we use.
No disks, no graphics, no USB, etc. If not the Ethernet driver, what else is
likely to crash NFS so badly?

Also, we are happy to change our config if that would simplify things:

http://lkml.iu.edu/hypermail/linux/kernel/1406.3/01488/config.gz

> Now because things have changed during the last merge window, I've got an
> even bigger problem sorting through that patch set and getting it back into a
> submittable state.  I've just sent out v2 for it onto the
> netdev at vger.kernel.org mailing list.
> 
> The initial version (marked RFC) attracted very little interest from testers, or
> acks.  I'd very much like to have some testing of it, so if you want to try it
> out, I can provide you with a git URL, patches or a combined patch.

Sure! A combined gzip patch attachment is fine. Git over HTTP probably works
too.

All the best,
Fredrik

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-27 16:31             ` Russell King - ARM Linux
@ 2014-06-30 12:30               ` Fredrik Noring
  -1 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-06-30 12:30 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Mattis Lorentzon, linux-kernel, linux-arm-kernel

Hi Russell,

It seems to be a compiler issue, where (GCC) 4.8.2 does not produce a properly
working kernel. Happily, (Fedora 2013.11.24-2.fc19) 4.8.1 appears to do a lot
better. No crashes so far with v3.16-rc2!

All the best,
Fredrik

> -----Original Message-----
> Hi Fredrik,
> 
> On Fri, Jun 27, 2014 at 04:16:57PM +0000, Fredrik Noring wrote:
> > Please find below a trace that appeared once with 3.16-rc2. Perhaps it
> > is of some interest?
> 
> It's not that serious... I know that the FEC ethernet driver is horrendously
> racy (I have had a patch set for about the last six months which fixes some of
> its problems) but as I've had a lot of patches to deal with, and it's been
> pushed to the back of the queue...
> 
> The races don't lead to data corruption though, merely timeouts and some
> lost packets.
> 
> Now because things have changed during the last merge window, I've got an
> even bigger problem sorting through that patch set and getting it back into a
> submittable state.  I've just sent out v2 for it onto the
> netdev@vger.kernel.org mailing list.
> 
> The initial version (marked RFC) attracted very little interest from testers, or
> acks.  I'd very much like to have some testing of it, so if you want to try it
> out, I can provide you with a git URL, patches or a combined patch.
> 
> --
> FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
> improving, and getting towards what was expected from it.
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-30 12:30               ` Fredrik Noring
  0 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-06-30 12:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

It seems to be a compiler issue, where (GCC) 4.8.2 does not produce a properly
working kernel. Happily, (Fedora 2013.11.24-2.fc19) 4.8.1 appears to do a lot
better. No crashes so far with v3.16-rc2!

All the best,
Fredrik

> -----Original Message-----
> Hi Fredrik,
> 
> On Fri, Jun 27, 2014 at 04:16:57PM +0000, Fredrik Noring wrote:
> > Please find below a trace that appeared once with 3.16-rc2. Perhaps it
> > is of some interest?
> 
> It's not that serious... I know that the FEC ethernet driver is horrendously
> racy (I have had a patch set for about the last six months which fixes some of
> its problems) but as I've had a lot of patches to deal with, and it's been
> pushed to the back of the queue...
> 
> The races don't lead to data corruption though, merely timeouts and some
> lost packets.
> 
> Now because things have changed during the last merge window, I've got an
> even bigger problem sorting through that patch set and getting it back into a
> submittable state.  I've just sent out v2 for it onto the
> netdev at vger.kernel.org mailing list.
> 
> The initial version (marked RFC) attracted very little interest from testers, or
> acks.  I'd very much like to have some testing of it, so if you want to try it
> out, I can provide you with a git URL, patches or a combined patch.
> 
> --
> FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
> improving, and getting towards what was expected from it.
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-30 12:30               ` Fredrik Noring
@ 2014-06-30 13:00                 ` Nathan Lynch
  -1 siblings, 0 replies; 91+ messages in thread
From: Nathan Lynch @ 2014-06-30 13:00 UTC (permalink / raw)
  To: linux-arm-kernel, Russell King - ARM Linux; +Cc: Mattis.Lorentzon, linux-kernel

On 06/30/2014 07:30 AM, Fredrik Noring wrote:
>>
>> On Fri, Jun 27, 2014 at 04:16:57PM +0000, Fredrik Noring wrote:
>>> Please find below a trace that appeared once with 3.16-rc2. Perhaps it
>>> is of some interest?
>>
>> It's not that serious... I know that the FEC ethernet driver is horrendously
>> racy (I have had a patch set for about the last six months which fixes some of
>> its problems) but as I've had a lot of patches to deal with, and it's been
>> pushed to the back of the queue...
>>
>> The races don't lead to data corruption though, merely timeouts and some
>> lost packets.

> It seems to be a compiler issue, where (GCC) 4.8.2 does not produce a
properly
> working kernel. Happily, (Fedora 2013.11.24-2.fc19) 4.8.1 appears to
do a lot
> better. No crashes so far with v3.16-rc2!
>

Did you narrow it down to a particular GCC bug?  The symptoms you
reported remind me of:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Sadly, unpatched GCC 4.8.1 and 4.8.2 are unsuitable for building ARM
kernels.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-30 13:00                 ` Nathan Lynch
  0 siblings, 0 replies; 91+ messages in thread
From: Nathan Lynch @ 2014-06-30 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/30/2014 07:30 AM, Fredrik Noring wrote:
>>
>> On Fri, Jun 27, 2014 at 04:16:57PM +0000, Fredrik Noring wrote:
>>> Please find below a trace that appeared once with 3.16-rc2. Perhaps it
>>> is of some interest?
>>
>> It's not that serious... I know that the FEC ethernet driver is horrendously
>> racy (I have had a patch set for about the last six months which fixes some of
>> its problems) but as I've had a lot of patches to deal with, and it's been
>> pushed to the back of the queue...
>>
>> The races don't lead to data corruption though, merely timeouts and some
>> lost packets.

> It seems to be a compiler issue, where (GCC) 4.8.2 does not produce a
properly
> working kernel. Happily, (Fedora 2013.11.24-2.fc19) 4.8.1 appears to
do a lot
> better. No crashes so far with v3.16-rc2!
>

Did you narrow it down to a particular GCC bug?  The symptoms you
reported remind me of:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Sadly, unpatched GCC 4.8.1 and 4.8.2 are unsuitable for building ARM
kernels.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-27 16:31             ` Russell King - ARM Linux
@ 2014-07-02  6:02               ` Fredrik Noring
  -1 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-07-02  6:02 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Mattis Lorentzon, linux-kernel, linux-arm-kernel

Hi Russell,

> -----Original Message-----
> > The initial version (marked RFC) attracted very little interest from
> > testers, or acks.  I'd very much like to have some testing of it, so
> > if you want to try it out, I can provide you with a git URL, patches
> > or a combined patch.
> 
> Sure! A combined gzip patch attachment is fine. Git over HTTP probably
> works too.

We are still interested in trying out your patches to improve network
performance. We can do some testing this week and in August.

Best regards,
Fredrik

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-07-02  6:02               ` Fredrik Noring
  0 siblings, 0 replies; 91+ messages in thread
From: Fredrik Noring @ 2014-07-02  6:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

> -----Original Message-----
> > The initial version (marked RFC) attracted very little interest from
> > testers, or acks.  I'd very much like to have some testing of it, so
> > if you want to try it out, I can provide you with a git URL, patches
> > or a combined patch.
> 
> Sure! A combined gzip patch attachment is fine. Git over HTTP probably
> works too.

We are still interested in trying out your patches to improve network
performance. We can do some testing this week and in August.

Best regards,
Fredrik

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-06-27 16:31             ` Russell King - ARM Linux
@ 2014-08-05 13:31               ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-05 13:31 UTC (permalink / raw)
  To: Russell King - ARM Linux, Fredrik Noring; +Cc: linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

Hi Russell!

> Now because things have changed during the last merge window, I've got an
> even bigger problem sorting through that patch set and getting it back into a
> submittable state.  I've just sent out v2 for it onto the
> netdev@vger.kernel.org mailing list.
>
> The initial version (marked RFC) attracted very little interest from testers, or
> acks.  I'd very much like to have some testing of it, so if you want to try it out,
> I can provide you with a git URL, patches or a combined patch.

We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
currently running some stability tests.

During our first test round we triggered a timeout which caused the fec driver
to become unresponsive for several minutes. The attached backtrace was
shown when the hardware was rebooted.

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

[-- Attachment #2: fec-transmit-queue-timed-out.txt --]
[-- Type: text/plain, Size: 22405 bytes --]

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc2+ #7
Backtrace: 
[<8001234c>] (dump_backtrace) from [<80012628>] (show_stack+0x18/0x1c)
 r6:00000108 r5:00000000 r4:806ac3dc r3:00000000
[<80012610>] (show_stack) from [<804d2c60>] (dump_stack+0x8c/0x9c)
[<804d2bd4>] (dump_stack) from [<80025c18>] (warn_slowpath_common+0x74/0x90)
 r5:00000009 r4:8068fd70
[<80025ba4>] (warn_slowpath_common) from [<80025c6c>] (warn_slowpath_fmt+0x38/0x40)
 r8:806900c0 r7:9e160254 r6:9f4ec800 r5:9e160000 r4:00000000
[<80025c38>] (warn_slowpath_fmt) from [<803f4578>] (dev_watchdog+0x270/0x27c)
 r3:9e160000 r2:8061ad58
[<803f4308>] (dev_watchdog) from [<8002fee8>] (call_timer_fn+0x74/0xec)
 r10:8068e008 r9:9e160000 r8:803f4308 r7:00000100 r6:8068e000 r5:00000001
 r4:8068fdd8
[<8002fe74>] (call_timer_fn) from [<80030b2c>] (run_timer_softirq+0x1d4/0x254)
 r10:803f4308 r9:806900c0 r8:9e160000 r7:00000000 r6:8068fe28 r5:806cc140
 r4:9e160284
[<80030958>] (run_timer_softirq) from [<8002a124>] (__do_softirq+0x168/0x2f0)
 r10:00000001 r9:80690080 r8:40000001 r7:8068e000 r6:00000100 r5:80690084
 r4:00000020
[<80029fbc>] (__do_softirq) from [<8002a5d0>] (irq_exit+0xc8/0x10c)
 r10:8068e000 r9:806cafd9 r8:00000001 r7:f4000100 r6:00000000 r5:0000001d
 r4:8068e008
[<8002a508>] (irq_exit) from [<8000f304>] (handle_IRQ+0x4c/0x98)
 r5:0000001d r4:8068ae14
[<8000f2b8>] (handle_IRQ) from [<8000860c>] (gic_handle_irq+0x34/0x64)
 r6:8068ff20 r5:80696a40 r4:f400010c r3:000000a0
[<800085d8>] (gic_handle_irq) from [<80013184>] (__irq_svc+0x44/0x58)
Exception stack(0x8068ff20 to 0x8068ff68)
ff20: 00000001 00000001 00000000 806996f0 8069652c 806964d8 806cafd9 804db068
ff40: 00000001 806cafd9 8068e000 8068ff74 00000000 8068ff68 80061a1c 8000f664
ff60: 200f0013 ffffffff
 r7:8068ff54 r6:ffffffff r5:200f0013 r4:8000f664
[<8000f638>] (arch_cpu_idle) from [<8005d154>] (cpu_startup_entry+0x10c/0x164)
[<8005d048>] (cpu_startup_entry) from [<804cdefc>] (rest_init+0xc8/0xd8)
 r7:80683450 r3:00000000
[<804cde34>] (rest_init) from [<80651c68>] (start_kernel+0x3a0/0x3ac)
 r5:00000001 r4:806965d0
[<806518c8>] (start_kernel) from [<10008074>] (0x10008074)
---[ end trace b51f6196c5e036f0 ]---
fec 2188000.ethernet eth0: TX ring dump
Nr     SC     addr       len  SKB
  0    0x1c00 0x00000000   66   (null)
  1    0x1c00 0x00000000   66   (null)
  2    0x1c00 0x00000000   66   (null)
  3    0x1c00 0x00000000   66   (null)
  4    0x1c00 0x00000000   66   (null)
  5    0x1c00 0x00000000   66   (null)
  6    0x1c00 0x00000000   66   (null)
  7    0x1c00 0x00000000   66   (null)
  8    0x1c00 0x00000000   66   (null)
  9    0x1c00 0x00000000   66   (null)
 10    0x1c00 0x00000000   66   (null)
 11    0x1c00 0x00000000   66   (null)
 12    0x1c00 0x00000000   66   (null)
 13    0x1c00 0x00000000   66   (null)
 14    0x1c00 0x00000000   66   (null)
 15    0x1c00 0x00000000   66   (null)
 16    0x1c00 0x00000000   66   (null)
 17    0x1c00 0x00000000   66   (null)
 18    0x1c00 0x00000000   66   (null)
 19    0x1c00 0x00000000   66   (null)
 20    0x1c00 0x00000000   66   (null)
 21    0x1c00 0x00000000   66   (null)
 22    0x1c00 0x00000000   66   (null)
 23    0x1c00 0x00000000   66   (null)
 24    0x1c00 0x00000000   66   (null)
 25    0x1c00 0x00000000   66   (null)
 26    0x1c00 0x00000000   66   (null)
 27    0x1c00 0x00000000   66   (null)
 28    0x1c00 0x00000000   66   (null)
 29    0x1c00 0x00000000   66   (null)
 30    0x1c00 0x00000000   66   (null)
 31    0x1c00 0x00000000   66   (null)
 32    0x1c00 0x00000000   66   (null)
 33    0x1c00 0x00000000   66   (null)
 34    0x1c00 0x00000000   66   (null)
 35    0x1c00 0x00000000   66   (null)
 36    0x1c00 0x00000000   66   (null)
 37    0x1c00 0x00000000   66   (null)
 38    0x1c00 0x00000000   66   (null)
 39    0x1c00 0x00000000   66   (null)
 40    0x1c00 0x00000000   66   (null)
 41    0x1c00 0x00000000   66   (null)
 42    0x1c00 0x00000000   66   (null)
 43    0x1c00 0x00000000   66   (null)
 44    0x1c00 0x00000000   66   (null)
 45    0x1c00 0x00000000   66   (null)
 46    0x1c00 0x00000000   66   (null)
 47    0x1c00 0x00000000   66   (null)
 48    0x1c00 0x00000000   66   (null)
 49    0x1c00 0x00000000   66   (null)
 50    0x1c00 0x00000000   66   (null)
 51    0x1c00 0x00000000   66   (null)
 52    0x1c00 0x00000000   66   (null)
 53    0x1c00 0x00000000   66   (null)
 54    0x1c00 0x00000000   66   (null)
 55    0x1c00 0x00000000   66   (null)
 56    0x1c00 0x00000000   66   (null)
 57    0x1c00 0x00000000   66   (null)
 58    0x1c00 0x00000000   66   (null)
 59    0x1c00 0x00000000   66   (null)
 60    0x1c00 0x00000000   66   (null)
 61    0x1c00 0x00000000   66   (null)
 62    0x1c00 0x00000000   66   (null)
 63    0x1c00 0x00000000   66   (null)
 64    0x1c00 0x00000000   66   (null)
 65    0x1c00 0x00000000   66   (null)
 66    0x1c00 0x00000000   66   (null)
 67    0x1c00 0x00000000   66   (null)
 68    0x1c00 0x00000000   66   (null)
 69    0x1c00 0x00000000   66   (null)
 70    0x1c00 0x00000000   66   (null)
 71    0x1c00 0x00000000   66   (null)
 72    0x1c00 0x00000000   66   (null)
 73    0x1c00 0x00000000   66   (null)
 74    0x1c00 0x00000000   66   (null)
 75    0x1c00 0x00000000   66   (null)
 76    0x1c00 0x00000000   66   (null)
 77    0x1c00 0x00000000   66   (null)
 78    0x1c00 0x00000000   66   (null)
 79    0x1c00 0x00000000   66   (null)
 80    0x1c00 0x00000000   66   (null)
 81    0x1c00 0x00000000   66   (null)
 82    0x1c00 0x00000000   66   (null)
 83    0x1c00 0x00000000   66   (null)
 84  H 0x1c00 0x00000000   66   (null)
 85    0x9c00 0x2e205000   66 9e384f00
 86    0x1c00 0x2e204800   66 9e384d80
 87    0x1c00 0x2e204000   66 9e384180
 88    0x1c00 0x2e203800   66 9e384cc0
 89    0x1c00 0x2e203000   66 9e384c00
 90    0x1c00 0x2e202800   66 9e384000
 91    0x1c00 0x2e202000   66 9e3840c0
 92    0x1c00 0x2e201800   66 9e384240
 93    0x1c00 0x2e201000   66 9e384b40
 94    0x1c00 0x2e200800   66 9e3843c0
 95    0x1c00 0x2e200000   66 9e384480
 96    0x1c00 0x2e1a7800   66 9e384540
 97    0x1c00 0x2e1a7000   66 9e384600
 98    0x1c00 0x2e1a6800   66 9e3846c0
 99    0x1c00 0x2e238000   66 9e384780
100    0x1c00 0x2e238800   66 9e384840
101    0x1c00 0x2e239000   66 9e384900
102    0x1c00 0x2e239800   66 9e384a80
103    0x1c00 0x2e23a000   66 9e3849c0
104    0x1c00 0x2e23a800   66 9e375e40
105    0x1c00 0x2e23b000   66 9e375d80
106    0x1c00 0x2e23b800   66 9e375b40
107    0x1c00 0x2e23c000   66 9e3759c0
108    0x1c00 0x2e23c800   66 9e375000
109    0x1c00 0x2e23d000   66 9e3750c0
110    0x1c00 0x2e23d800   66 9e375180
111    0x1c00 0x2e23e000   66 9e375300
112    0x1c00 0x2e23e800   66 9e3753c0
113    0x1c00 0x2e23f000   66 9e375540
114    0x1c00 0x2e23f800   66 9e375600
115    0x1c00 0x11c10000   66 9e3756c0
116    0x1c00 0x11c10800   66 9e375780
117    0x1c00 0x11c11000   66 9e375c00
118    0x1c00 0x11c11800   66 9e375f00
119    0x1c00 0x11c12000   66 9e375840
120    0x1c00 0x11c12800   66 9e375cc0
121    0x1c00 0x11c13000   66 9e375240
122    0x1c00 0x11c13800   66 9e375a80
123    0x1c00 0x11c14000   66 9e375480
124    0x1c00 0x11c14800   66 9e384e40
125    0x1c00 0x11c15000   66 9e377d80
126    0x1c00 0x11c15800   66 9e377300
127    0x1c00 0x11c16000   66 9e3770c0
128    0x1c00 0x11c16800   66 9e377000
129    0x1c00 0x11c17000   66 9e377180
130    0x1c00 0x11c17800   66 9e377240
131    0x1c00 0x11c18000   66 9e3773c0
132    0x1c00 0x11c18800   66 9e377480
133    0x1c00 0x11c19000   66 9e377540
134    0x1c00 0x11c19800   66 9e377600
135    0x1c00 0x11c1a000   66 9e3776c0
136    0x1c00 0x11c1a800   66 9e377780
137    0x1c00 0x11c1b000   66 9e377840
138    0x1c00 0x11c1b800   66 9e377900
139    0x1c00 0x11c1c000   66 9e3779c0
140    0x1c00 0x11c1c800   66 9e377c00
141    0x1c00 0x11c1d000   66 9e377a80
142    0x1c00 0x11c1d800   66 9e377b40
143    0x1c00 0x11c1e000   66 9e375900
144    0x1c00 0x11c1e800   66 9e398000
145    0x1c00 0x11c1f000   66 9e398180
146    0x1c00 0x11c1f800   66 9e398240
147    0x1c00 0x11c20000   66 9e398300
148    0x1c00 0x11c20800   66 9e3983c0
149    0x1c00 0x11c21000   66 9e398480
150    0x1c00 0x11c21800   66 9e398540
151    0x1c00 0x11c22000   66 9e398600
152    0x1c00 0x11c22800   66 9e3986c0
153    0x1c00 0x11c23000   66 9e398780
154    0x1c00 0x11c23800   66 9e398840
155    0x1c00 0x11c24000   66 9e398900
156    0x1c00 0x11c24800   66 9e3989c0
157    0x1c00 0x11c25000   66 9e398a80
158    0x1c00 0x11c25800   66 9e398b40
159    0x1c00 0x11c26000   66 9e398c00
160    0x1c00 0x11c26800   66 9e398cc0
161    0x1c00 0x11c27000   66 9e398d80
162    0x1c00 0x11c27800   66 9e398e40
163    0x1c00 0x11c28000   66 9e398f00
164    0x1c00 0x11c28800   66 9e377e40
165    0x1c00 0x11c29000   66 9e155000
166    0x1c00 0x11c29800   66 9e155180
167    0x1c00 0x11c2a000   66 9e155240
168    0x1c00 0x11c2a800   66 9e155300
169    0x1c00 0x11c2b000   66 9e1553c0
170    0x1c00 0x11c2b800   66 9e155480
171    0x1c00 0x11c2c000   66 9e155540
172    0x1c00 0x11c2c800   66 9e155600
173    0x1c00 0x11c2d000   66 9e1556c0
174    0x1c00 0x11c2d800   66 9e155780
175    0x1c00 0x11c2e000   66 9e155840
176    0x1c00 0x11c2e800   66 9e155900
177    0x1c00 0x11c2f000   66 9e1559c0
178    0x1c00 0x11c2f800   66 9e155a80
179    0x1c00 0x11c30000   66 9e155b40
180    0x1c00 0x11c30800   66 9e155c00
181    0x1c00 0x11c31000   66 9e155cc0
182    0x1c00 0x11c31800   66 9e155d80
183    0x1c00 0x11c32000   66 9e155e40
184    0x1c00 0x11c32800   66 9e155f00
185    0x1c00 0x11c33000   66 9e3980c0
186    0x1c00 0x11c33800   66 9e12a000
187    0x1c00 0x11c34000   66 9e12a180
188    0x1c00 0x11c34800   66 9e12a240
189    0x1c00 0x11c35000   66 9e12a300
190    0x1c00 0x11c35800   66 9e12a3c0
191    0x1c00 0x11c36000   66 9e12a480
192    0x1c00 0x11c36800   66 9e12a540
193    0x1c00 0x11c37000   66 9e12a600
194    0x1c00 0x11c37800   66 9e12a6c0
195    0x1c00 0x11c38000   66 9e12a780
196    0x1c00 0x11c38800   66 9e12a840
197    0x1c00 0x11c39000   66 9e12a900
198    0x1c00 0x11c39800   66 9e12a9c0
199    0x1c00 0x11c3a000   66 9e12aa80
200    0x1c00 0x11c3a800   66 9e12ab40
201    0x1c00 0x11c3b000   66 9e12ac00
202    0x1c00 0x11c3b800   66 9e12acc0
203    0x1c00 0x11c3c000   66 9e12ad80
204    0x1c00 0x11c3c800   66 9e12ae40
205    0x1c00 0x11c3d000   66 9e12af00
206    0x1c00 0x11c3d800   66 9e1550c0
207    0x1c00 0x11c3e000   66 81c87000
208    0x1c00 0x11c3e800   66 81c87180
209    0x1c00 0x11c3f000   66 81c87240
210    0x1c00 0x11c3f800   66 81c87300
211    0x1c00 0x2e280000   66 81c873c0
212    0x1c00 0x2e280800   66 81c87480
213    0x1c00 0x2e281000   66 81c87540
214    0x1c00 0x2e281800   66 81c87600
215    0x1c00 0x2e282000   66 81c876c0
216    0x1c00 0x2e282800   66 81c87780
217    0x1c00 0x2e283000   66 81c87840
218    0x1c00 0x2e283800   66 81c87900
219    0x1c00 0x2e284000   66 81c879c0
220    0x1c00 0x2e284800   66 81c87a80
221    0x1c00 0x2e285000   66 81c87b40
222    0x1c00 0x2e285800   66 81c87c00
223    0x1c00 0x2e286000   66 81c87cc0
224    0x1c00 0x2e286800   66 81c87d80
225    0x1c00 0x2e287000   66 81c87e40
226    0x1c00 0x2e287800   66 81c87f00
227    0x1c00 0x2e288000   66 9e12a0c0
228    0x1c00 0x2e288800   66 9e1e4000
229    0x1c00 0x2e289000   66 9e1e4180
230    0x1c00 0x2e289800   66 9e1e4240
231    0x1c00 0x2e28a000   66 9e1e4300
232    0x1c00 0x2e28a800   66 9e1e43c0
233    0x1c00 0x2e28b000   66 9e1e4480
234    0x1c00 0x2e28b800   66 9e1e4540
235    0x1c00 0x2e28c000   66 9e1e4600
236    0x1c00 0x2e28c800   66 9e1e46c0
237    0x1c00 0x2e28d000   66 9e1e4780
238    0x1c00 0x2e28d800   66 9e1e4840
239    0x1c00 0x2e28e000   66 9e1e4900
240    0x1c00 0x2e28e800   66 9e1e49c0
241    0x1c00 0x2e28f000   66 9e1e4a80
242    0x1c00 0x2e28f800   66 9e1e4b40
243    0x1c00 0x2e290000   66 9e1e4c00
244    0x1c00 0x2e290800   66 9e1e4cc0
245    0x1c00 0x2e291000   66 9e1e4d80
246    0x1c00 0x2e291800   66 9e1e4e40
247    0x1c00 0x2e292000   66 9e1e4f00
248    0x1c00 0x2e292800   66 81c870c0
249    0x1c00 0x2e293000   66 81c86000
250    0x1c00 0x2e293800   66 81c86180
251    0x1c00 0x2e294000   66 81c86240
252    0x1c00 0x2e294800   66 81c86300
253    0x1c00 0x2e295000   66 81c863c0
254    0x1c00 0x2e295800   66 81c86480
255    0x1c00 0x2e296000   66 81c86540
256    0x1c00 0x2e296800   66 81c86600
257    0x1c00 0x2e297000   66 81c866c0
258    0x1c00 0x2e297800   66 81c86780
259    0x1c00 0x2e298000   66 81c86840
260    0x1c00 0x2e298800   66 81c86900
261    0x1c00 0x2e299000   66 81c869c0
262    0x1c00 0x2e299800   66 81c86a80
263    0x1c00 0x2e29a000   66 81c86b40
264    0x1c00 0x2e29a800   66 81c86c00
265    0x1c00 0x2e29b000   66 81c86cc0
266    0x1c00 0x2e29b800   66 81c86d80
267    0x1c00 0x2e29c000   66 81c86e40
268    0x1c00 0x2e29c800   66 81c86f00
269    0x1c00 0x2e29d000   66 9e1e40c0
270    0x1c00 0x2e29d800   66 81c8a000
271    0x1c00 0x2e29e000   66 81c8a180
272    0x1c00 0x2e29e800   66 81c8a240
273    0x1c00 0x2e29f000   66 81c8a300
274    0x1c00 0x2e29f800   66 81c8a3c0
275    0x1c00 0x2e2a0000   66 81c8a480
276    0x1c00 0x2e2a0800   66 81c8a540
277    0x1c00 0x2e2a1000   66 81c8a600
278    0x1c00 0x2e2a1800   66 81c8a6c0
279    0x1c00 0x2e2a2000   66 81c8a780
280    0x1c00 0x2e2a2800   66 81c8a840
281    0x1c00 0x2e2a3000   66 81c8a900
282    0x1c00 0x2e2a3800   66 81c8a9c0
283    0x1c00 0x2e2a4000   66 81c8aa80
284    0x1c00 0x2e2a4800   66 81c8ab40
285    0x1c00 0x2e2a5000   66 81c8ac00
286    0x1c00 0x2e2a5800   66 81c8acc0
287    0x1c00 0x2e2a6000   66 81c8ad80
288    0x1c00 0x2e2a6800   66 81c8ae40
289    0x1c00 0x2e2a7000   66 81c8af00
290    0x1c00 0x2e2a7800   66 81c860c0
291    0x1c00 0x2e2a8000   66 81c82000
292    0x1c00 0x2e2a8800   66 81c82180
293    0x1c00 0x2e2a9000   66 81c82240
294    0x1c00 0x2e2a9800   66 81c82300
295    0x1c00 0x2e2aa000   66 81c823c0
296    0x1c00 0x2e2aa800   66 81c82480
297    0x1c00 0x2e2ab000   66 81c82540
298    0x1c00 0x2e2ab800   66 81c82600
299    0x1c00 0x2e2ac000   66 81c826c0
300    0x1c00 0x2e2ac800   66 81c82780
301    0x1c00 0x2e2ad000   66 81c82840
302    0x1c00 0x2e2ad800   66 81c82900
303    0x1c00 0x2e2ae000   66 81c829c0
304    0x1c00 0x2e2ae800   66 81c82a80
305    0x1c00 0x2e2af000   66 81c82b40
306    0x1c00 0x2e2af800   66 81c82c00
307    0x1c00 0x2e2b0000   66 81c82cc0
308    0x1c00 0x2e2b0800   66 81c82d80
309    0x1c00 0x2e2b1000   66 81c82e40
310    0x1c00 0x2e2b1800   66 81c82f00
311    0x1c00 0x2e2b2000   66 81c8a0c0
312    0x1c00 0x2e2b2800   66 81c83000
313    0x1c00 0x2e2b3000   66 81c83180
314    0x1c00 0x2e2b3800   66 81c83240
315    0x1c00 0x2e2b4000   66 81c83300
316    0x1c00 0x2e2b4800   66 81c833c0
317    0x1c00 0x2e2b5000   66 81c83480
318    0x1c00 0x2e2b5800   66 81c83540
319    0x1c00 0x2e2b6000   66 81c83600
320    0x1c00 0x2e2b6800   66 81c836c0
321    0x1c00 0x2e2b7000   66 81c83780
322    0x1c00 0x2e2b7800   66 81c83840
323    0x1c00 0x2e2b8000   66 81c83900
324    0x1c00 0x2e2b8800   66 81c839c0
325    0x1c00 0x2e2b9000   66 81c83a80
326    0x1c00 0x2e2b9800   66 81c83b40
327    0x1c00 0x2e2ba000   66 81c83c00
328    0x1c00 0x2e2ba800   66 81c83cc0
329    0x1c00 0x2e2bb000   66 81c83d80
330    0x1c00 0x2e2bb800   66 81c83e40
331    0x1c00 0x2e2bc000   66 81c83f00
332    0x1c00 0x2e2bc800   66 81c820c0
333    0x1c00 0x2e2bd000   66 81cf4000
334    0x1c00 0x2e2bd800   66 81cf4180
335    0x1c00 0x2e2be000   66 81cf4240
336    0x1c00 0x2e2be800   66 81cf4300
337    0x1c00 0x2e2bf000   66 81cf43c0
338    0x1c00 0x2e2bf800   66 81cf4480
339    0x1c00 0x2e240000   66 81cf4540
340    0x1c00 0x2e240800   66 81cf4600
341    0x1c00 0x2e241000   66 81cf46c0
342    0x1c00 0x2e241800   66 81cf4780
343    0x1c00 0x2e242000   66 81cf4840
344    0x1c00 0x2e242800   66 81cf4900
345    0x1c00 0x2e243000   66 81cf49c0
346    0x1c00 0x2e243800   66 81cf4a80
347    0x1c00 0x2e244000   66 81cf4b40
348    0x1c00 0x2e244800   66 81cf4c00
349    0x1c00 0x2e245000   66 81cf4cc0
350    0x1c00 0x2e245800   66 81cf4d80
351    0x1c00 0x2e246000   66 81cf4e40
352    0x1c00 0x2e246800   66 81cf4f00
353    0x1c00 0x2e247000   66 81c830c0
354    0x1c00 0x2e247800   66 81cf5000
355    0x1c00 0x2e248000   66 81cf5180
356    0x1c00 0x2e248800   66 81cf5240
357    0x1c00 0x2e249000   66 81cf5300
358    0x1c00 0x2e249800   66 81cf53c0
359    0x1c00 0x2e24a000   66 81cf5480
360    0x1c00 0x2e24a800   66 81cf5540
361    0x1c00 0x2e24b000   66 81cf5600
362    0x1c00 0x2e24b800   66 81cf56c0
363    0x1c00 0x2e24c000   66 81cf5780
364    0x1c00 0x2e24c800   66 81cf5840
365    0x1c00 0x2e24d000   66 81cf5900
366    0x1c00 0x2e24d800   66 81cf59c0
367    0x1c00 0x2e24e000   66 81cf5a80
368    0x1c00 0x2e24e800   66 81cf5b40
369    0x1c00 0x2e24f000   66 81cf5c00
370    0x1c00 0x2e24f800   66 81cf5cc0
371    0x1c00 0x2e250000   66 81cf5d80
372    0x1c00 0x2e250800   66 81cf5e40
373    0x1c00 0x2e251000   66 81cf5f00
374    0x1c00 0x2e251800   66 81cf40c0
375    0x1c00 0x2e252000   66 81cf6000
376    0x1c00 0x2e252800   66 81cf6180
377    0x1c00 0x2e253000   66 81cf6240
378 S  0x1c00 0x00000000   66   (null)
379    0x1c00 0x00000000   66   (null)
380    0x1c00 0x00000000   66   (null)
381    0x1c00 0x00000000   66   (null)
382    0x1c00 0x00000000   66   (null)
383    0x1c00 0x00000000   66   (null)
384    0x1c00 0x00000000   66   (null)
385    0x1c00 0x00000000   66   (null)
386    0x1c00 0x00000000   66   (null)
387    0x1c00 0x00000000   66   (null)
388    0x1c00 0x00000000   66   (null)
389    0x1c00 0x00000000   66   (null)
390    0x1c00 0x00000000   66   (null)
391    0x1c00 0x00000000   66   (null)
392    0x1c00 0x00000000   66   (null)
393    0x1c00 0x00000000   66   (null)
394    0x1c00 0x00000000   66   (null)
395    0x1c00 0x00000000   66   (null)
396    0x1c00 0x00000000   66   (null)
397    0x1c00 0x00000000   66   (null)
398    0x1c00 0x00000000   66   (null)
399    0x1c00 0x00000000   66   (null)
400    0x1c00 0x00000000   66   (null)
401    0x1c00 0x00000000   66   (null)
402    0x1c00 0x00000000   66   (null)
403    0x1c00 0x00000000   66   (null)
404    0x1c00 0x00000000   66   (null)
405    0x1c00 0x00000000   66   (null)
406    0x1c00 0x00000000   66   (null)
407    0x1c00 0x00000000   66   (null)
408    0x1c00 0x00000000   66   (null)
409    0x1c00 0x00000000   66   (null)
410    0x1c00 0x00000000   66   (null)
411    0x1c00 0x00000000   66   (null)
412    0x1c00 0x00000000   66   (null)
413    0x1c00 0x00000000   66   (null)
414    0x1c00 0x00000000   66   (null)
415    0x1c00 0x00000000   66   (null)
416    0x1c00 0x00000000   66   (null)
417    0x1c00 0x00000000   66   (null)
418    0x1c00 0x00000000   66   (null)
419    0x1c00 0x00000000   66   (null)
420    0x1c00 0x00000000   66   (null)
421    0x1c00 0x00000000   66   (null)
422    0x1c00 0x00000000   66   (null)
423    0x1c00 0x00000000   66   (null)
424    0x1c00 0x00000000   66   (null)
425    0x1c00 0x00000000   66   (null)
426    0x1c00 0x00000000   66   (null)
427    0x1c00 0x00000000   66   (null)
428    0x1c00 0x00000000   66   (null)
429    0x1c00 0x00000000   66   (null)
430    0x1c00 0x00000000   66   (null)
431    0x1c00 0x00000000   66   (null)
432    0x1c00 0x00000000   66   (null)
433    0x1c00 0x00000000   66   (null)
434    0x1c00 0x00000000   66   (null)
435    0x1c00 0x00000000   66   (null)
436    0x1c00 0x00000000   66   (null)
437    0x1c00 0x00000000   66   (null)
438    0x1c00 0x00000000   66   (null)
439    0x1c00 0x00000000   66   (null)
440    0x1c00 0x00000000   66   (null)
441    0x1c00 0x00000000   66   (null)
442    0x1c00 0x00000000   66   (null)
443    0x1c00 0x00000000   66   (null)
444    0x1c00 0x00000000   66   (null)
445    0x1c00 0x00000000   66   (null)
446    0x1c00 0x00000000   66   (null)
447    0x1c00 0x00000000   66   (null)
448    0x1c00 0x00000000   66   (null)
449    0x1c00 0x00000000   66   (null)
450    0x1c00 0x00000000   66   (null)
451    0x1c00 0x00000000   66   (null)
452    0x1c00 0x00000000   66   (null)
453    0x1c00 0x00000000   66   (null)
454    0x1c00 0x00000000   66   (null)
455    0x1c00 0x00000000   66   (null)
456    0x1c00 0x00000000   66   (null)
457    0x1c00 0x00000000   66   (null)
458    0x1c00 0x00000000   66   (null)
459    0x1c00 0x00000000   66   (null)
460    0x1c00 0x00000000   66   (null)
461    0x1c00 0x00000000   66   (null)
462    0x1c00 0x00000000   66   (null)
463    0x1c00 0x00000000   66   (null)
464    0x1c00 0x00000000   66   (null)
465    0x1c00 0x00000000   66   (null)
466    0x1c00 0x00000000   66   (null)
467    0x1c00 0x00000000   66   (null)
468    0x1c00 0x00000000   66   (null)
469    0x1c00 0x00000000   66   (null)
470    0x1c00 0x00000000   66   (null)
471    0x1c00 0x00000000  202   (null)
472    0x1c00 0x00000000   66   (null)
473    0x1c00 0x00000000   66   (null)
474    0x1c00 0x00000000   66   (null)
475    0x1c00 0x00000000   66   (null)
476    0x1c00 0x00000000   66   (null)
477    0x1c00 0x00000000   66   (null)
478    0x1c00 0x00000000   66   (null)
479    0x1c00 0x00000000   66   (null)
480    0x1c00 0x00000000   66   (null)
481    0x1c00 0x00000000   66   (null)
482    0x1c00 0x00000000   66   (null)
483    0x1c00 0x00000000   66   (null)
484    0x1c00 0x00000000   66   (null)
485    0x1c00 0x00000000   66   (null)
486    0x1c00 0x00000000   66   (null)
487    0x1c00 0x00000000   66   (null)
488    0x1c00 0x00000000   66   (null)
489    0x1c00 0x00000000   66   (null)
490    0x1c00 0x00000000   66   (null)
491    0x1c00 0x00000000   66   (null)
492    0x1c00 0x00000000   66   (null)
493    0x1c00 0x00000000   66   (null)
494    0x1c00 0x00000000   66   (null)
495    0x1c00 0x00000000   66   (null)
496    0x1c00 0x00000000   66   (null)
497    0x1c00 0x00000000   66   (null)
498    0x1c00 0x00000000   66   (null)
499    0x1c00 0x00000000   66   (null)
500    0x1c00 0x00000000   66   (null)
501    0x1c00 0x00000000   66   (null)
502    0x1c00 0x00000000   66   (null)
503    0x1c00 0x00000000   66   (null)
504    0x1c00 0x00000000   66   (null)
505    0x1c00 0x00000000   66   (null)
506    0x1c00 0x00000000   66   (null)
507    0x1c00 0x00000000   66   (null)
508    0x1c00 0x00000000   66   (null)
509    0x1c00 0x00000000   66   (null)
510    0x1c00 0x00000000   66   (null)
511    0x3c00 0x00000000   66   (null)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-05 13:31               ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-05 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell!

> Now because things have changed during the last merge window, I've got an
> even bigger problem sorting through that patch set and getting it back into a
> submittable state.  I've just sent out v2 for it onto the
> netdev at vger.kernel.org mailing list.
> 
> The initial version (marked RFC) attracted very little interest from testers, or
> acks.  I'd very much like to have some testing of it, so if you want to try it out,
> I can provide you with a git URL, patches or a combined patch.

We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
currently running some stability tests.

During our first test round we triggered a timeout which caused the fec driver
to become unresponsive for several minutes. The attached backtrace was
shown when the hardware was rebooted.

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fec-transmit-queue-timed-out.txt
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140805/048707db/attachment-0001.txt>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-05 13:31               ` Mattis Lorentzon
@ 2014-08-05 13:53                 ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-05 13:53 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

On Tue, Aug 5, 2014 at 10:31 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:

> We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
> currently running some stability tests.
>
> During our first test round we triggered a timeout which caused the fec driver
> to become unresponsive for several minutes. The attached backtrace was
> shown when the hardware was rebooted.

Could this problem be the same one as reported at:
http://www.spinics.net/lists/arm-kernel/msg347914.html ?

Which Ethernet PHY do you use? Do you have pull-up in the MDIO line?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-05 13:53                 ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-05 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 5, 2014 at 10:31 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:

> We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
> currently running some stability tests.
>
> During our first test round we triggered a timeout which caused the fec driver
> to become unresponsive for several minutes. The attached backtrace was
> shown when the hardware was rebooted.

Could this problem be the same one as reported at:
http://www.spinics.net/lists/arm-kernel/msg347914.html ?

Which Ethernet PHY do you use? Do you have pull-up in the MDIO line?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-05 13:53                 ` Fabio Estevam
@ 2014-08-06  6:48                   ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-06  6:48 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 996 bytes --]

Hi Fabio,

> Could this problem be the same one as reported at:
> http://www.spinics.net/lists/arm-kernel/msg347914.html ?

The problem you link to describes a permanent issue, our problem seems
to be sporadic as most of our tests work fine (at least for a while).

> Which Ethernet PHY do you use? Do you have pull-up in the MDIO line?

Our hardware has the KSZ9021RN PHY, so the MDIO line should be pull-up.

Do you know if there are debug options that could help us determine the
cause of the timeout?

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-06  6:48                   ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-06  6:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Fabio,

> Could this problem be the same one as reported at:
> http://www.spinics.net/lists/arm-kernel/msg347914.html ?

The problem you link to describes a permanent issue, our problem seems
to be sporadic as most of our tests work fine (at least for a while).

> Which Ethernet PHY do you use? Do you have pull-up in the MDIO line?

Our hardware has the KSZ9021RN PHY, so the MDIO line should be pull-up.

Do you know if there are debug options that could help us determine the
cause of the timeout?

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-05 13:31               ` Mattis Lorentzon
@ 2014-08-06  9:50                 ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-06  9:50 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

On Tue, Aug 05, 2014 at 01:31:29PM +0000, Mattis Lorentzon wrote:
> We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
> currently running some stability tests.
> 
> During our first test round we triggered a timeout which caused the fec driver
> to become unresponsive for several minutes. The attached backtrace was
> shown when the hardware was rebooted.

What is on the other end of the link?

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
> NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
...
> fec 2188000.ethernet eth0: TX ring dump
> Nr     SC     addr       len  SKB
>   0    0x1c00 0x00000000   66   (null)
...
>  83    0x1c00 0x00000000   66   (null)
>  84  H 0x1c00 0x00000000   66   (null)
>  85    0x9c00 0x2e205000   66 9e384f00
>  86    0x1c00 0x2e204800   66 9e384d80
>  87    0x1c00 0x2e204000   66 9e384180
...
> 376    0x1c00 0x2e252800   66 81cf6180
> 377    0x1c00 0x2e253000   66 81cf6240
> 378 S  0x1c00 0x00000000   66   (null)

So, the software would insert the next packet into slot 378.  However,
the slots from 85 to 377 have not been reaped, despite those in 86 to
377 allegedly having been sent.  This is because the entry in slot 85
shows that it has yet to be sent.

I've no idea what causes this; it looks like there's something screwed
with the hardware which causes the transmitter to skip an entry in the
ring under certain circumstances.  As I've never been able to reproduce
it here, I've not been able to investigate it.

What I would like to do is to stamp each packet in some way with an
identifier marking its ring position, and then monitor the network to
find out whether the packet at slot 85 was actually transmitted - that's
made slightly harder because packets may be dropped at the receiver
when operating in promisc mode.  This would then allow us to work out
some likely causes.

Note that after the transmit watchdog, the interface should recover and
start operating normally again - and that should not take "several
minutes."

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-06  9:50                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-06  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 05, 2014 at 01:31:29PM +0000, Mattis Lorentzon wrote:
> We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
> currently running some stability tests.
> 
> During our first test round we triggered a timeout which caused the fec driver
> to become unresponsive for several minutes. The attached backtrace was
> shown when the hardware was rebooted.

What is on the other end of the link?

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
> NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
...
> fec 2188000.ethernet eth0: TX ring dump
> Nr     SC     addr       len  SKB
>   0    0x1c00 0x00000000   66   (null)
...
>  83    0x1c00 0x00000000   66   (null)
>  84  H 0x1c00 0x00000000   66   (null)
>  85    0x9c00 0x2e205000   66 9e384f00
>  86    0x1c00 0x2e204800   66 9e384d80
>  87    0x1c00 0x2e204000   66 9e384180
...
> 376    0x1c00 0x2e252800   66 81cf6180
> 377    0x1c00 0x2e253000   66 81cf6240
> 378 S  0x1c00 0x00000000   66   (null)

So, the software would insert the next packet into slot 378.  However,
the slots from 85 to 377 have not been reaped, despite those in 86 to
377 allegedly having been sent.  This is because the entry in slot 85
shows that it has yet to be sent.

I've no idea what causes this; it looks like there's something screwed
with the hardware which causes the transmitter to skip an entry in the
ring under certain circumstances.  As I've never been able to reproduce
it here, I've not been able to investigate it.

What I would like to do is to stamp each packet in some way with an
identifier marking its ring position, and then monitor the network to
find out whether the packet at slot 85 was actually transmitted - that's
made slightly harder because packets may be dropped at the receiver
when operating in promisc mode.  This would then allow us to work out
some likely causes.

Note that after the transmit watchdog, the interface should recover and
start operating normally again - and that should not take "several
minutes."

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-06  9:50                 ` Russell King - ARM Linux
@ 2014-08-06 11:10                   ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-06 11:10 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

Russell,

> What is on the other end of the link?

16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).

There may be multiple problems. The backtrace has only been seen a few
times, on two different cards. Most of the time, the network for a random
card just stalls without any visible backtrace or error messages. The other
cards seem to be unaffected when this happens.

> What I would like to do is to stamp each packet in some way with an
> identifier marking its ring position, and then monitor the network to find out
> whether the packet at slot 85 was actually transmitted - that's made slightly
> harder because packets may be dropped at the receiver when operating in
> promisc mode.  This would then allow us to work out some likely causes.

We would be glad to run this test on our setup, do you have more detailed
information on how to set it up?

> Note that after the transmit watchdog, the interface should recover and start
> operating normally again - and that should not take "several minutes."

After a network stall, we usually have to powercycle the ARM hardware to
get it back to a usable state. These stalls last at least several minutes,
perhaps indefinitely. It does not seem to recover properly, and is no longer
reachable via the network.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-06 11:10                   ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-06 11:10 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

> What is on the other end of the link?

16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).

There may be multiple problems. The backtrace has only been seen a few
times, on two different cards. Most of the time, the network for a random
card just stalls without any visible backtrace or error messages. The other
cards seem to be unaffected when this happens.

> What I would like to do is to stamp each packet in some way with an
> identifier marking its ring position, and then monitor the network to find out
> whether the packet at slot 85 was actually transmitted - that's made slightly
> harder because packets may be dropped at the receiver when operating in
> promisc mode.  This would then allow us to work out some likely causes.

We would be glad to run this test on our setup, do you have more detailed
information on how to set it up?

> Note that after the transmit watchdog, the interface should recover and start
> operating normally again - and that should not take "several minutes."

After a network stall, we usually have to powercycle the ARM hardware to
get it back to a usable state. These stalls last at least several minutes,
perhaps indefinitely. It does not seem to recover properly, and is no longer
reachable via the network.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-06 11:10                   ` Mattis Lorentzon
@ 2014-08-06 12:55                     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-06 12:55 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

On Wed, Aug 06, 2014 at 11:10:06AM +0000, Mattis Lorentzon wrote:
> Russell,
> 
> > What is on the other end of the link?
> 
> 16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
> machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).
> 
> There may be multiple problems. The backtrace has only been seen a few
> times, on two different cards. Most of the time, the network for a random
> card just stalls without any visible backtrace or error messages. The other
> cards seem to be unaffected when this happens.

Can you ascertain whether these stalls are a result of some failure of the
receive side or the transmit side - you should be able to tell that if you
watch the packet counts via ifconfig on the stalled card.  Also, it would
be useful to know whether the FEC interrupt was firing.

I hope you have some kind of serial console on these cards?

> > What I would like to do is to stamp each packet in some way with an
> > identifier marking its ring position, and then monitor the network to find out
> > whether the packet at slot 85 was actually transmitted - that's made slightly
> > harder because packets may be dropped at the receiver when operating in
> > promisc mode.  This would then allow us to work out some likely causes.
> 
> We would be glad to run this test on our setup, do you have more detailed
> information on how to set it up?

One of the problems is to find some way to stamp each packet with a 10-bit
number without having any side effects.  I guess one possibility would be
to overwrite the source MAC address on transmit, which hopefully should not
cause any side effects.

> After a network stall, we usually have to powercycle the ARM hardware to
> get it back to a usable state. These stalls last at least several minutes,
> perhaps indefinitely. It does not seem to recover properly, and is no longer
> reachable via the network.

Hmm.  Okay, I think the first thing we need to do is to work out why
the silent stalls are happening.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-06 12:55                     ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-06 12:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 06, 2014 at 11:10:06AM +0000, Mattis Lorentzon wrote:
> Russell,
> 
> > What is on the other end of the link?
> 
> 16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20
> machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05).
> 
> There may be multiple problems. The backtrace has only been seen a few
> times, on two different cards. Most of the time, the network for a random
> card just stalls without any visible backtrace or error messages. The other
> cards seem to be unaffected when this happens.

Can you ascertain whether these stalls are a result of some failure of the
receive side or the transmit side - you should be able to tell that if you
watch the packet counts via ifconfig on the stalled card.  Also, it would
be useful to know whether the FEC interrupt was firing.

I hope you have some kind of serial console on these cards?

> > What I would like to do is to stamp each packet in some way with an
> > identifier marking its ring position, and then monitor the network to find out
> > whether the packet at slot 85 was actually transmitted - that's made slightly
> > harder because packets may be dropped at the receiver when operating in
> > promisc mode.  This would then allow us to work out some likely causes.
> 
> We would be glad to run this test on our setup, do you have more detailed
> information on how to set it up?

One of the problems is to find some way to stamp each packet with a 10-bit
number without having any side effects.  I guess one possibility would be
to overwrite the source MAC address on transmit, which hopefully should not
cause any side effects.

> After a network stall, we usually have to powercycle the ARM hardware to
> get it back to a usable state. These stalls last at least several minutes,
> perhaps indefinitely. It does not seem to recover properly, and is no longer
> reachable via the network.

Hmm.  Okay, I think the first thing we need to do is to work out why
the silent stalls are happening.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-06 12:55                     ` Russell King - ARM Linux
@ 2014-08-07 11:11                       ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-07 11:11 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

Russell,

> Can you ascertain whether these stalls are a result of some failure of the
> receive side or the transmit side - you should be able to tell that if you watch
> the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> know whether the FEC interrupt was firing.

grep eth /proc/interrupts
151:          0          0          0          0       GIC 151  2188000.ethernet
166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet

The interrupt counter 166 increases regularly during the stalls.
Ifconfig indicates that the RX and TX  counters do not increase.

> I hope you have some kind of serial console on these cards?

Yes, indeed. Local stimuli seems to be able to unstall the network in a
somewhat random fashion. Running e.g. ifconfig or ping locally may
immediately or after up to about half a minute make the network responsive.
However, it usually degenerates again to a complete stall within seconds.
Without local stimuli the network does not appear to recover at all. The card
does not even respond to pings (again, most often without any apparent
error messages).

Running both of the following commands in parallel from the FC server seems
to trigger the problem within minutes (please note that the arm card stops
responding to both ping and ssh):

# while :; do ssh arm-card echo Ok; done
# ping arm-card

We have noticed the same problem on both the i.MX6 and the Zynq cards
(using KSZ9021 and Cadence GEM drivers). However, the number of
iterations required to trigger the problem vary. Sometimes it might stall after
less than 100, but in other cases the stalls begin after nearly 10000 iterations.
Once stalled (and unstalled after stimuli), the network on that particular card
degenerates a lot more often. Apart from the kernel, IP numbers and MAC
addresses, the software configurations are identical between the Zynq and
the i.MX6. Perhaps the fault is unrelated to the Freescale driver?

> Hmm.  Okay, I think the first thing we need to do is to work out why the
> silent stalls are happening.

Would you have any ideas on what to check next?

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-07 11:11                       ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-07 11:11 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

> Can you ascertain whether these stalls are a result of some failure of the
> receive side or the transmit side - you should be able to tell that if you watch
> the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> know whether the FEC interrupt was firing.

grep eth /proc/interrupts
151:          0          0          0          0       GIC 151  2188000.ethernet
166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet

The interrupt counter 166 increases regularly during the stalls.
Ifconfig indicates that the RX and TX  counters do not increase.

> I hope you have some kind of serial console on these cards?

Yes, indeed. Local stimuli seems to be able to unstall the network in a
somewhat random fashion. Running e.g. ifconfig or ping locally may
immediately or after up to about half a minute make the network responsive.
However, it usually degenerates again to a complete stall within seconds.
Without local stimuli the network does not appear to recover at all. The card
does not even respond to pings (again, most often without any apparent
error messages).

Running both of the following commands in parallel from the FC server seems
to trigger the problem within minutes (please note that the arm card stops
responding to both ping and ssh):

# while :; do ssh arm-card echo Ok; done
# ping arm-card

We have noticed the same problem on both the i.MX6 and the Zynq cards
(using KSZ9021 and Cadence GEM drivers). However, the number of
iterations required to trigger the problem vary. Sometimes it might stall after
less than 100, but in other cases the stalls begin after nearly 10000 iterations.
Once stalled (and unstalled after stimuli), the network on that particular card
degenerates a lot more often. Apart from the kernel, IP numbers and MAC
addresses, the software configurations are identical between the Zynq and
the i.MX6. Perhaps the fault is unrelated to the Freescale driver?

> Hmm.  Okay, I think the first thing we need to do is to work out why the
> silent stalls are happening.

Would you have any ideas on what to check next?

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-07 11:11                       ` Mattis Lorentzon
@ 2014-08-07 12:12                         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-07 12:12 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

On Thu, Aug 07, 2014 at 11:11:06AM +0000, Mattis Lorentzon wrote:
> Russell,
> 
> > Can you ascertain whether these stalls are a result of some failure of the
> > receive side or the transmit side - you should be able to tell that if you watch
> > the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> > know whether the FEC interrupt was firing.
> 
> grep eth /proc/interrupts
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet
> 
> The interrupt counter 166 increases regularly during the stalls.
> Ifconfig indicates that the RX and TX  counters do not increase.

Hmm, I'm slightly confused.  On my iMX6Q, I have:

150:     581754          0          0          0       GIC 150  2188000.ethernet
151:          0          0          0          0       GIC 151  2188000.ethernet

In the DT file, we have:

                        fec: ethernet@02188000 {
                                compatible = "fsl,imx6q-fec";
                                reg = <0x02188000 0x4000>;
                                interrupts-extended =
                                        <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
                                        <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
                                clocks = <&clks 117>, <&clks 117>, <&clks 190>;
                                clock-names = "ipg", "ahb", "ptp";
                                status = "disabled";
                        };

which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
Yet you seem to have nothing registered against GIC 150, instead having
an interrupt against GPIO 6.

This seems very odd, and as this is an on-SoC device, I don't see why
you would want to bind the interrupts for the FEC device any differently
to standard platforms.

This could well be the cause of your stalls.

What's GPIO 6 used for on your board?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-07 12:12                         ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-07 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 07, 2014 at 11:11:06AM +0000, Mattis Lorentzon wrote:
> Russell,
> 
> > Can you ascertain whether these stalls are a result of some failure of the
> > receive side or the transmit side - you should be able to tell that if you watch
> > the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> > know whether the FEC interrupt was firing.
> 
> grep eth /proc/interrupts
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet
> 
> The interrupt counter 166 increases regularly during the stalls.
> Ifconfig indicates that the RX and TX  counters do not increase.

Hmm, I'm slightly confused.  On my iMX6Q, I have:

150:     581754          0          0          0       GIC 150  2188000.ethernet
151:          0          0          0          0       GIC 151  2188000.ethernet

In the DT file, we have:

                        fec: ethernet at 02188000 {
                                compatible = "fsl,imx6q-fec";
                                reg = <0x02188000 0x4000>;
                                interrupts-extended =
                                        <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
                                        <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
                                clocks = <&clks 117>, <&clks 117>, <&clks 190>;
                                clock-names = "ipg", "ahb", "ptp";
                                status = "disabled";
                        };

which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
Yet you seem to have nothing registered against GIC 150, instead having
an interrupt against GPIO 6.

This seems very odd, and as this is an on-SoC device, I don't see why
you would want to bind the interrupts for the FEC device any differently
to standard platforms.

This could well be the cause of your stalls.

What's GPIO 6 used for on your board?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-07 12:12                         ` Russell King - ARM Linux
@ 2014-08-07 14:20                           ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-07 14:20 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mattis Lorentzon, Fredrik Noring, linux-kernel, linux-arm-kernel

On Thu, Aug 7, 2014 at 9:12 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> Hmm, I'm slightly confused.  On my iMX6Q, I have:
>
> 150:     581754          0          0          0       GIC 150  2188000.ethernet
> 151:          0          0          0          0       GIC 151  2188000.ethernet

Same here on a mx6qsabresd.

> In the DT file, we have:
>
>                         fec: ethernet@02188000 {
>                                 compatible = "fsl,imx6q-fec";
>                                 reg = <0x02188000 0x4000>;
>                                 interrupts-extended =
>                                         <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>                                         <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>                                 clocks = <&clks 117>, <&clks 117>, <&clks 190>;
>                                 clock-names = "ipg", "ahb", "ptp";
>                                 status = "disabled";
>                         };
>
> which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
> Yet you seem to have nothing registered against GIC 150, instead having
> an interrupt against GPIO 6.
>
> This seems very odd, and as this is an on-SoC device, I don't see why
> you would want to bind the interrupts for the FEC device any differently
> to standard platforms.
>
> This could well be the cause of your stalls.
>
> What's GPIO 6 used for on your board?

On a imx6q sabreauto I also get:

151:          0          0          0          0       GIC 151  2188000.ethernet
166:       4577          0          0          0  gpio-mxc   6  2188000.ethernet

and the GPIO1_6 interrupt comes from this commit:

commit bc20a5d6da718f9d60da0a78f70c653c1cd16af3
Author: Troy Kisky <troy.kisky@boundarydevices.com>
Date:   Fri Dec 20 11:47:12 2013 -0700

    ARM: dts: imx6qdl-sabreauto: use GPIO_6 for FEC interrupt.

    This works around a hardware bug.

    Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
    Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-07 14:20                           ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-07 14:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 7, 2014 at 9:12 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> Hmm, I'm slightly confused.  On my iMX6Q, I have:
>
> 150:     581754          0          0          0       GIC 150  2188000.ethernet
> 151:          0          0          0          0       GIC 151  2188000.ethernet

Same here on a mx6qsabresd.

> In the DT file, we have:
>
>                         fec: ethernet at 02188000 {
>                                 compatible = "fsl,imx6q-fec";
>                                 reg = <0x02188000 0x4000>;
>                                 interrupts-extended =
>                                         <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>                                         <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>                                 clocks = <&clks 117>, <&clks 117>, <&clks 190>;
>                                 clock-names = "ipg", "ahb", "ptp";
>                                 status = "disabled";
>                         };
>
> which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
> Yet you seem to have nothing registered against GIC 150, instead having
> an interrupt against GPIO 6.
>
> This seems very odd, and as this is an on-SoC device, I don't see why
> you would want to bind the interrupts for the FEC device any differently
> to standard platforms.
>
> This could well be the cause of your stalls.
>
> What's GPIO 6 used for on your board?

On a imx6q sabreauto I also get:

151:          0          0          0          0       GIC 151  2188000.ethernet
166:       4577          0          0          0  gpio-mxc   6  2188000.ethernet

and the GPIO1_6 interrupt comes from this commit:

commit bc20a5d6da718f9d60da0a78f70c653c1cd16af3
Author: Troy Kisky <troy.kisky@boundarydevices.com>
Date:   Fri Dec 20 11:47:12 2013 -0700

    ARM: dts: imx6qdl-sabreauto: use GPIO_6 for FEC interrupt.

    This works around a hardware bug.

    Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
    Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-07 14:20                           ` Fabio Estevam
@ 2014-08-07 14:38                             ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-07 14:38 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mattis Lorentzon, Fredrik Noring, linux-kernel, linux-arm-kernel,
	Troy Kisky

On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam <festevam@gmail.com> wrote:

> On a imx6q sabreauto I also get:
>
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 166:       4577          0          0          0  gpio-mxc   6  2188000.ethernet
>
> and the GPIO1_6 interrupt comes from this commit:
>
> commit bc20a5d6da718f9d60da0a78f70c653c1cd16af3
> Author: Troy Kisky <troy.kisky@boundarydevices.com>
> Date:   Fri Dec 20 11:47:12 2013 -0700
>
>     ARM: dts: imx6qdl-sabreauto: use GPIO_6 for FEC interrupt.
>
>     This works around a hardware bug.
>
>     Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
>     Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

Actually a more descriptive commit log can be found here:

commit 6261c4c8f13eb91f733e8ba6d67c409a2e841667
Author: Troy Kisky <troy.kisky@boundarydevices.com>
Date:   Fri Dec 20 11:47:11 2013 -0700

    ARM: dts: imx6qdl-sabrelite: use GPIO_6 for FEC interrupt.

    This works around a hardware bug.
    From "Chip Errata for the i.MX 6Dual/6Quad"

    ERR006687 ENET: Only the ENET wake-up interrupt request can wake the
    system from Wait mode.

    The ENET block generates many interrupts. Only one of these interrupt lines
    is connected to the General Power Controller (GPC) block, but a logical OR
    of all of the ENET interrupts is connected to the General
Interrupt Controller
    (GIC). When the system enters Wait mode, a normal RX Done or TX
Done does not
    wake up the system because the GPC cannot see this interrupt. This impacts
    performance of the ENET block because its interrupts are serviced only when
    the chip exits Wait mode due to an interrupt from some other wake-up source.

    Before this patch, ping times of a Sabre Lite board are quite
    random:
    ping 192.168.0.13 -i.5 -c5
    PING 192.168.0.13 (192.168.0.13) 56(84) bytes of data.
    64 bytes from 192.168.0.13: icmp_req=1 ttl=64 time=15.7 ms
    64 bytes from 192.168.0.13: icmp_req=2 ttl=64 time=14.4 ms
    64 bytes from 192.168.0.13: icmp_req=3 ttl=64 time=13.4 ms
    64 bytes from 192.168.0.13: icmp_req=4 ttl=64 time=12.4 ms
    64 bytes from 192.168.0.13: icmp_req=5 ttl=64 time=11.4 ms

    === 192.168.0.13 ping statistics ===
    5 packets transmitted, 5 received, 0% packet loss, time 2004ms
    rtt min/avg/max/mdev = 11.431/13.501/15.746/1.508 ms
    ____________________________________________________
    After this patch:

    ping 192.168.0.13 -i.5 -c5
    PING 192.168.0.13 (192.168.0.13) 56(84) bytes of data.
    64 bytes from 192.168.0.13: icmp_req=1 ttl=64 time=0.120 ms
    64 bytes from 192.168.0.13: icmp_req=2 ttl=64 time=0.175 ms
    64 bytes from 192.168.0.13: icmp_req=3 ttl=64 time=0.169 ms
    64 bytes from 192.168.0.13: icmp_req=4 ttl=64 time=0.168 ms
    64 bytes from 192.168.0.13: icmp_req=5 ttl=64 time=0.172 ms

    === 192.168.0.13 ping statistics ===
    5 packets transmitted, 5 received, 0% packet loss, time 1999ms
    rtt min/avg/max/mdev = 0.120/0.160/0.175/0.026 ms
    ____________________________________________________

    Also, apply same change to imx6qdl-nitrogen6x.

    This change may not be appropriate for all boards.
    Sabre Lite uses GPIO6 as a power down output for a ov5642
    camera. As this expansion board does not yet work with mainline,
    this is not yet a conflict. It would be nice to have an alternative
    fix for boards where this is a problem.

    For example Sabre SD uses GPIO6 for I2C3_SDA. It also
    has long ping times currently. But cannot use this fix
    without giving up a touchscreen.

    Its ping times are also random.

    ping 192.168.0.19 -i.5 -c5
    PING 192.168.0.19 (192.168.0.19) 56(84) bytes of data.
    64 bytes from 192.168.0.19: icmp_req=1 ttl=64 time=16.0 ms
    64 bytes from 192.168.0.19: icmp_req=2 ttl=64 time=15.4 ms
    64 bytes from 192.168.0.19: icmp_req=3 ttl=64 time=14.4 ms
    64 bytes from 192.168.0.19: icmp_req=4 ttl=64 time=13.4 ms
    64 bytes from 192.168.0.19: icmp_req=5 ttl=64 time=12.4 ms

    === 192.168.0.19 ping statistics ---
    5 packets transmitted, 5 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 12.451/14.369/16.057/1.316 ms

    Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
    CC: Ranjani Vaidyanathan <ra5478@freescale.com>
    Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

,but I am wondering if we should also do:

--- a/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
+++ b/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
@@ -66,6 +66,7 @@
        pinctrl-0 = <&pinctrl_enet>;
        phy-mode = "rgmii";
        interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>,
+                             <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
                              <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
        status = "okay";
 };
@@ -226,7 +227,7 @@
                                MX6QDL_PAD_RGMII_RD2__RGMII_RD2         0x1b0b0
                                MX6QDL_PAD_RGMII_RD3__RGMII_RD3         0x1b0b0
                                MX6QDL_PAD_RGMII_RX_CTL__RGMII_RX_CTL   0x1b0b0
-                               MX6QDL_PAD_GPIO_6__ENET_IRQ             0x000b1
+                               MX6QDL_PAD_GPIO_6__ENET_IRQ
 0x400000b1

Since the Workaround for erratum ERR006687 states that the SION bit
needs to be used:

"All of the interrupts can be selected by MUX and output to pad GPIO6.
If GPIO6 is selected to
output ENET interrupts and GPIO6 SION is set, the resulting GPIO
interrupt will wake the system
from Wait mode."

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-07 14:38                             ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-07 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam <festevam@gmail.com> wrote:

> On a imx6q sabreauto I also get:
>
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 166:       4577          0          0          0  gpio-mxc   6  2188000.ethernet
>
> and the GPIO1_6 interrupt comes from this commit:
>
> commit bc20a5d6da718f9d60da0a78f70c653c1cd16af3
> Author: Troy Kisky <troy.kisky@boundarydevices.com>
> Date:   Fri Dec 20 11:47:12 2013 -0700
>
>     ARM: dts: imx6qdl-sabreauto: use GPIO_6 for FEC interrupt.
>
>     This works around a hardware bug.
>
>     Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
>     Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

Actually a more descriptive commit log can be found here:

commit 6261c4c8f13eb91f733e8ba6d67c409a2e841667
Author: Troy Kisky <troy.kisky@boundarydevices.com>
Date:   Fri Dec 20 11:47:11 2013 -0700

    ARM: dts: imx6qdl-sabrelite: use GPIO_6 for FEC interrupt.

    This works around a hardware bug.
    From "Chip Errata for the i.MX 6Dual/6Quad"

    ERR006687 ENET: Only the ENET wake-up interrupt request can wake the
    system from Wait mode.

    The ENET block generates many interrupts. Only one of these interrupt lines
    is connected to the General Power Controller (GPC) block, but a logical OR
    of all of the ENET interrupts is connected to the General
Interrupt Controller
    (GIC). When the system enters Wait mode, a normal RX Done or TX
Done does not
    wake up the system because the GPC cannot see this interrupt. This impacts
    performance of the ENET block because its interrupts are serviced only when
    the chip exits Wait mode due to an interrupt from some other wake-up source.

    Before this patch, ping times of a Sabre Lite board are quite
    random:
    ping 192.168.0.13 -i.5 -c5
    PING 192.168.0.13 (192.168.0.13) 56(84) bytes of data.
    64 bytes from 192.168.0.13: icmp_req=1 ttl=64 time=15.7 ms
    64 bytes from 192.168.0.13: icmp_req=2 ttl=64 time=14.4 ms
    64 bytes from 192.168.0.13: icmp_req=3 ttl=64 time=13.4 ms
    64 bytes from 192.168.0.13: icmp_req=4 ttl=64 time=12.4 ms
    64 bytes from 192.168.0.13: icmp_req=5 ttl=64 time=11.4 ms

    === 192.168.0.13 ping statistics ===
    5 packets transmitted, 5 received, 0% packet loss, time 2004ms
    rtt min/avg/max/mdev = 11.431/13.501/15.746/1.508 ms
    ____________________________________________________
    After this patch:

    ping 192.168.0.13 -i.5 -c5
    PING 192.168.0.13 (192.168.0.13) 56(84) bytes of data.
    64 bytes from 192.168.0.13: icmp_req=1 ttl=64 time=0.120 ms
    64 bytes from 192.168.0.13: icmp_req=2 ttl=64 time=0.175 ms
    64 bytes from 192.168.0.13: icmp_req=3 ttl=64 time=0.169 ms
    64 bytes from 192.168.0.13: icmp_req=4 ttl=64 time=0.168 ms
    64 bytes from 192.168.0.13: icmp_req=5 ttl=64 time=0.172 ms

    === 192.168.0.13 ping statistics ===
    5 packets transmitted, 5 received, 0% packet loss, time 1999ms
    rtt min/avg/max/mdev = 0.120/0.160/0.175/0.026 ms
    ____________________________________________________

    Also, apply same change to imx6qdl-nitrogen6x.

    This change may not be appropriate for all boards.
    Sabre Lite uses GPIO6 as a power down output for a ov5642
    camera. As this expansion board does not yet work with mainline,
    this is not yet a conflict. It would be nice to have an alternative
    fix for boards where this is a problem.

    For example Sabre SD uses GPIO6 for I2C3_SDA. It also
    has long ping times currently. But cannot use this fix
    without giving up a touchscreen.

    Its ping times are also random.

    ping 192.168.0.19 -i.5 -c5
    PING 192.168.0.19 (192.168.0.19) 56(84) bytes of data.
    64 bytes from 192.168.0.19: icmp_req=1 ttl=64 time=16.0 ms
    64 bytes from 192.168.0.19: icmp_req=2 ttl=64 time=15.4 ms
    64 bytes from 192.168.0.19: icmp_req=3 ttl=64 time=14.4 ms
    64 bytes from 192.168.0.19: icmp_req=4 ttl=64 time=13.4 ms
    64 bytes from 192.168.0.19: icmp_req=5 ttl=64 time=12.4 ms

    === 192.168.0.19 ping statistics ---
    5 packets transmitted, 5 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 12.451/14.369/16.057/1.316 ms

    Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
    CC: Ranjani Vaidyanathan <ra5478@freescale.com>
    Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

,but I am wondering if we should also do:

--- a/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
+++ b/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
@@ -66,6 +66,7 @@
        pinctrl-0 = <&pinctrl_enet>;
        phy-mode = "rgmii";
        interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>,
+                             <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
                              <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
        status = "okay";
 };
@@ -226,7 +227,7 @@
                                MX6QDL_PAD_RGMII_RD2__RGMII_RD2         0x1b0b0
                                MX6QDL_PAD_RGMII_RD3__RGMII_RD3         0x1b0b0
                                MX6QDL_PAD_RGMII_RX_CTL__RGMII_RX_CTL   0x1b0b0
-                               MX6QDL_PAD_GPIO_6__ENET_IRQ             0x000b1
+                               MX6QDL_PAD_GPIO_6__ENET_IRQ
 0x400000b1

Since the Workaround for erratum ERR006687 states that the SION bit
needs to be used:

"All of the interrupts can be selected by MUX and output to pad GPIO6.
If GPIO6 is selected to
output ENET interrupts and GPIO6 SION is set, the resulting GPIO
interrupt will wake the system
from Wait mode."

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-07 14:38                             ` Fabio Estevam
@ 2014-08-08  1:30                               ` Troy Kisky
  -1 siblings, 0 replies; 91+ messages in thread
From: Troy Kisky @ 2014-08-08  1:30 UTC (permalink / raw)
  To: Fabio Estevam, Russell King - ARM Linux
  Cc: Mattis Lorentzon, Fredrik Noring, linux-kernel, linux-arm-kernel

On 8/7/2014 7:38 AM, Fabio Estevam wrote:
> On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam <festevam@gmail.com> wrote:
> 
> ,but I am wondering if we should also do:
> 
> --- a/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
> @@ -66,6 +66,7 @@
>         pinctrl-0 = <&pinctrl_enet>;
>         phy-mode = "rgmii";
>         interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>,
> +                             <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>                               <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>         status = "okay";
>  };
> @@ -226,7 +227,7 @@
>                                 MX6QDL_PAD_RGMII_RD2__RGMII_RD2         0x1b0b0
>                                 MX6QDL_PAD_RGMII_RD3__RGMII_RD3         0x1b0b0
>                                 MX6QDL_PAD_RGMII_RX_CTL__RGMII_RX_CTL   0x1b0b0
> -                               MX6QDL_PAD_GPIO_6__ENET_IRQ             0x000b1
> +                               MX6QDL_PAD_GPIO_6__ENET_IRQ
>  0x400000b1
> 
> Since the Workaround for erratum ERR006687 states that the SION bit
> needs to be used:
> 
> "All of the interrupts can be selected by MUX and output to pad GPIO6.
> If GPIO6 is selected to
> output ENET interrupts and GPIO6 SION is set, the resulting GPIO
> interrupt will wake the system
> from Wait mode."
> 
arch/arm/boot/dts/imx6q-pinfunc.h:#define MX6QDL_PAD_GPIO_6__ENET_IRQ               0x230 0x600
0x03c 0x11 0xff000609

So, the ion bit should already be set(0x11). But the other way works too.


Troy

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-08  1:30                               ` Troy Kisky
  0 siblings, 0 replies; 91+ messages in thread
From: Troy Kisky @ 2014-08-08  1:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/7/2014 7:38 AM, Fabio Estevam wrote:
> On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam <festevam@gmail.com> wrote:
> 
> ,but I am wondering if we should also do:
> 
> --- a/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi
> @@ -66,6 +66,7 @@
>         pinctrl-0 = <&pinctrl_enet>;
>         phy-mode = "rgmii";
>         interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>,
> +                             <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>                               <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>         status = "okay";
>  };
> @@ -226,7 +227,7 @@
>                                 MX6QDL_PAD_RGMII_RD2__RGMII_RD2         0x1b0b0
>                                 MX6QDL_PAD_RGMII_RD3__RGMII_RD3         0x1b0b0
>                                 MX6QDL_PAD_RGMII_RX_CTL__RGMII_RX_CTL   0x1b0b0
> -                               MX6QDL_PAD_GPIO_6__ENET_IRQ             0x000b1
> +                               MX6QDL_PAD_GPIO_6__ENET_IRQ
>  0x400000b1
> 
> Since the Workaround for erratum ERR006687 states that the SION bit
> needs to be used:
> 
> "All of the interrupts can be selected by MUX and output to pad GPIO6.
> If GPIO6 is selected to
> output ENET interrupts and GPIO6 SION is set, the resulting GPIO
> interrupt will wake the system
> from Wait mode."
> 
arch/arm/boot/dts/imx6q-pinfunc.h:#define MX6QDL_PAD_GPIO_6__ENET_IRQ               0x230 0x600
0x03c 0x11 0xff000609

So, the ion bit should already be set(0x11). But the other way works too.


Troy

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-07 14:20                           ` Fabio Estevam
@ 2014-08-08 14:05                             ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-08 14:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mattis Lorentzon, Fredrik Noring, linux-kernel, linux-arm-kernel

Mattis,

On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam <festevam@gmail.com> wrote:
> On Thu, Aug 7, 2014 at 9:12 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>
>> Hmm, I'm slightly confused.  On my iMX6Q, I have:
>>
>> 150:     581754          0          0          0       GIC 150  2188000.ethernet
>> 151:          0          0          0          0       GIC 151  2188000.ethernet
>
> Same here on a mx6qsabresd.
>
>> In the DT file, we have:
>>
>>                         fec: ethernet@02188000 {
>>                                 compatible = "fsl,imx6q-fec";
>>                                 reg = <0x02188000 0x4000>;
>>                                 interrupts-extended =
>>                                         <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>>                                         <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>>                                 clocks = <&clks 117>, <&clks 117>, <&clks 190>;
>>                                 clock-names = "ipg", "ahb", "ptp";
>>                                 status = "disabled";
>>                         };
>>
>> which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
>> Yet you seem to have nothing registered against GIC 150, instead having
>> an interrupt against GPIO 6.
>>
>> This seems very odd, and as this is an on-SoC device, I don't see why
>> you would want to bind the interrupts for the FEC device any differently
>> to standard platforms.
>>
>> This could well be the cause of your stalls.
>>
>> What's GPIO 6 used for on your board?
>
> On a imx6q sabreauto I also get:
>
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 166:       4577          0          0          0  gpio-mxc   6  2188000.ethernet

Could you remove 'interrupts-extended'  from the FEC node and also
MX6QDL_PAD_GPIO_6__ENET_IRQ from the pinctrl node and test again?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-08 14:05                             ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-08 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

Mattis,

On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam <festevam@gmail.com> wrote:
> On Thu, Aug 7, 2014 at 9:12 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>
>> Hmm, I'm slightly confused.  On my iMX6Q, I have:
>>
>> 150:     581754          0          0          0       GIC 150  2188000.ethernet
>> 151:          0          0          0          0       GIC 151  2188000.ethernet
>
> Same here on a mx6qsabresd.
>
>> In the DT file, we have:
>>
>>                         fec: ethernet at 02188000 {
>>                                 compatible = "fsl,imx6q-fec";
>>                                 reg = <0x02188000 0x4000>;
>>                                 interrupts-extended =
>>                                         <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>>                                         <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>>                                 clocks = <&clks 117>, <&clks 117>, <&clks 190>;
>>                                 clock-names = "ipg", "ahb", "ptp";
>>                                 status = "disabled";
>>                         };
>>
>> which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
>> Yet you seem to have nothing registered against GIC 150, instead having
>> an interrupt against GPIO 6.
>>
>> This seems very odd, and as this is an on-SoC device, I don't see why
>> you would want to bind the interrupts for the FEC device any differently
>> to standard platforms.
>>
>> This could well be the cause of your stalls.
>>
>> What's GPIO 6 used for on your board?
>
> On a imx6q sabreauto I also get:
>
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 166:       4577          0          0          0  gpio-mxc   6  2188000.ethernet

Could you remove 'interrupts-extended'  from the FEC node and also
MX6QDL_PAD_GPIO_6__ENET_IRQ from the pinctrl node and test again?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-07 12:12                         ` Russell King - ARM Linux
@ 2014-08-08 18:09                           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-08 18:09 UTC (permalink / raw)
  To: Mattis Lorentzon; +Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

On Thu, Aug 07, 2014 at 01:12:48PM +0100, Russell King - ARM Linux wrote:
> On Thu, Aug 07, 2014 at 11:11:06AM +0000, Mattis Lorentzon wrote:
> > Russell,
> > 
> > > Can you ascertain whether these stalls are a result of some failure of the
> > > receive side or the transmit side - you should be able to tell that if you watch
> > > the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> > > know whether the FEC interrupt was firing.
> > 
> > grep eth /proc/interrupts
> > 151:          0          0          0          0       GIC 151  2188000.ethernet
> > 166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet
> > 
> > The interrupt counter 166 increases regularly during the stalls.
> > Ifconfig indicates that the RX and TX  counters do not increase.
> 
> Hmm, I'm slightly confused.  On my iMX6Q, I have:
> 
> 150:     581754          0          0          0       GIC 150  2188000.ethernet
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 
> In the DT file, we have:
> 
>                         fec: ethernet@02188000 {
>                                 compatible = "fsl,imx6q-fec";
>                                 reg = <0x02188000 0x4000>;
>                                 interrupts-extended =
>                                         <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>                                         <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>                                 clocks = <&clks 117>, <&clks 117>, <&clks 190>;
>                                 clock-names = "ipg", "ahb", "ptp";
>                                 status = "disabled";
>                         };
> 
> which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
> Yet you seem to have nothing registered against GIC 150, instead having
> an interrupt against GPIO 6.
> 
> This seems very odd, and as this is an on-SoC device, I don't see why
> you would want to bind the interrupts for the FEC device any differently
> to standard platforms.
> 
> This could well be the cause of your stalls.
> 
> What's GPIO 6 used for on your board?

We have a second report of instability with the FEC today, and the
problem board (wanboard) is also using GPIO1 6 for the ethernet IRQ.
We have confirmation from the reporter that reverting the change
(thus making the FEC use the standard interrupt) fixes their problem.

Therefore, it seems that the workaround for ERR006687 is itself buggy.

I'd be interested to hear whether removing the 

	interrupts-extended = ...

property from your board's DT file, thereby causing you to revert back
to the default I list above, also fixes the instability you are seeing.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-08 18:09                           ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-08 18:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 07, 2014 at 01:12:48PM +0100, Russell King - ARM Linux wrote:
> On Thu, Aug 07, 2014 at 11:11:06AM +0000, Mattis Lorentzon wrote:
> > Russell,
> > 
> > > Can you ascertain whether these stalls are a result of some failure of the
> > > receive side or the transmit side - you should be able to tell that if you watch
> > > the packet counts via ifconfig on the stalled card.  Also, it would be useful to
> > > know whether the FEC interrupt was firing.
> > 
> > grep eth /proc/interrupts
> > 151:          0          0          0          0       GIC 151  2188000.ethernet
> > 166:    1205661          0          0          0  gpio-mxc   6  2188000.ethernet
> > 
> > The interrupt counter 166 increases regularly during the stalls.
> > Ifconfig indicates that the RX and TX  counters do not increase.
> 
> Hmm, I'm slightly confused.  On my iMX6Q, I have:
> 
> 150:     581754          0          0          0       GIC 150  2188000.ethernet
> 151:          0          0          0          0       GIC 151  2188000.ethernet
> 
> In the DT file, we have:
> 
>                         fec: ethernet at 02188000 {
>                                 compatible = "fsl,imx6q-fec";
>                                 reg = <0x02188000 0x4000>;
>                                 interrupts-extended =
>                                         <&intc 0 118 IRQ_TYPE_LEVEL_HIGH>,
>                                         <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>;
>                                 clocks = <&clks 117>, <&clks 117>, <&clks 190>;
>                                 clock-names = "ipg", "ahb", "ptp";
>                                 status = "disabled";
>                         };
> 
> which, for the gic, would be 118 + 32 (first SPI) = 150, 119 + 32 = 151.
> Yet you seem to have nothing registered against GIC 150, instead having
> an interrupt against GPIO 6.
> 
> This seems very odd, and as this is an on-SoC device, I don't see why
> you would want to bind the interrupts for the FEC device any differently
> to standard platforms.
> 
> This could well be the cause of your stalls.
> 
> What's GPIO 6 used for on your board?

We have a second report of instability with the FEC today, and the
problem board (wanboard) is also using GPIO1 6 for the ethernet IRQ.
We have confirmation from the reporter that reverting the change
(thus making the FEC use the standard interrupt) fixes their problem.

Therefore, it seems that the workaround for ERR006687 is itself buggy.

I'd be interested to hear whether removing the 

	interrupts-extended = ...

property from your board's DT file, thereby causing you to revert back
to the default I list above, also fixes the instability you are seeing.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-08 18:09                           ` Russell King - ARM Linux
@ 2014-08-11 13:32                             ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-11 13:32 UTC (permalink / raw)
  To: Russell King - ARM Linux, Fabio Estevam (festevam@gmail.com)
  Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

Russell and Fabio,

> I'd be interested to hear whether removing the
> 
> 	interrupts-extended = ...
> 
> property from your board's DT file, thereby causing you to revert back to the
> default I list above, also fixes the instability you are seeing.

We have tried to remove the board specific interrupts-extended field and the
MX6QDL_PAD_GPIO_6__ENET_IRQ entry. Sadly this did not seem to improve
the stalls. Our interrupts look like this now:

150:      15519          0          0          0       GIC 150  2188000.ethernet
151:          0          0          0          0       GIC 151  2188000.ethernet

Our device tree might still be slightly incorrect. We have noticed that our
RGMII_INT is connected to GPIO 19 (P5) which might be nonstandard (we are
a bit surprised that this works at all). We are not quite sure how to configure
this properly.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-11 13:32                             ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-11 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

Russell and Fabio,

> I'd be interested to hear whether removing the
> 
> 	interrupts-extended = ...
> 
> property from your board's DT file, thereby causing you to revert back to the
> default I list above, also fixes the instability you are seeing.

We have tried to remove the board specific interrupts-extended field and the
MX6QDL_PAD_GPIO_6__ENET_IRQ entry. Sadly this did not seem to improve
the stalls. Our interrupts look like this now:

150:      15519          0          0          0       GIC 150  2188000.ethernet
151:          0          0          0          0       GIC 151  2188000.ethernet

Our device tree might still be slightly incorrect. We have noticed that our
RGMII_INT is connected to GPIO 19 (P5) which might be nonstandard (we are
a bit surprised that this works at all). We are not quite sure how to configure
this properly.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-11 13:32                             ` Mattis Lorentzon
@ 2014-08-11 17:41                               ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-11 17:41 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

On Mon, Aug 11, 2014 at 10:32 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:
> Russell and Fabio,
>
>> I'd be interested to hear whether removing the
>>
>>       interrupts-extended = ...
>>
>> property from your board's DT file, thereby causing you to revert back to the
>> default I list above, also fixes the instability you are seeing.
>
> We have tried to remove the board specific interrupts-extended field and the
> MX6QDL_PAD_GPIO_6__ENET_IRQ entry. Sadly this did not seem to improve
> the stalls. Our interrupts look like this now:
>
> 150:      15519          0          0          0       GIC 150  2188000.ethernet
> 151:          0          0          0          0       GIC 151  2188000.ethernet
>
> Our device tree might still be slightly incorrect. We have noticed that our
> RGMII_INT is connected to GPIO 19 (P5) which might be nonstandard (we are
> a bit surprised that this works at all). We are not quite sure how to configure
> this properly.

In order to try to narrow down whether this is a board issue, could
you try to run the same kernel on a mx6q development board, such as
mx6qsabresd, cubox-i, wandboard, etc?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-11 17:41                               ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-11 17:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 11, 2014 at 10:32 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:
> Russell and Fabio,
>
>> I'd be interested to hear whether removing the
>>
>>       interrupts-extended = ...
>>
>> property from your board's DT file, thereby causing you to revert back to the
>> default I list above, also fixes the instability you are seeing.
>
> We have tried to remove the board specific interrupts-extended field and the
> MX6QDL_PAD_GPIO_6__ENET_IRQ entry. Sadly this did not seem to improve
> the stalls. Our interrupts look like this now:
>
> 150:      15519          0          0          0       GIC 150  2188000.ethernet
> 151:          0          0          0          0       GIC 151  2188000.ethernet
>
> Our device tree might still be slightly incorrect. We have noticed that our
> RGMII_INT is connected to GPIO 19 (P5) which might be nonstandard (we are
> a bit surprised that this works at all). We are not quite sure how to configure
> this properly.

In order to try to narrow down whether this is a board issue, could
you try to run the same kernel on a mx6q development board, such as
mx6qsabresd, cubox-i, wandboard, etc?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-11 17:41                               ` Fabio Estevam
@ 2014-08-13 13:39                                 ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-13 13:39 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

Fabio and Russell,

> In order to try to narrow down whether this is a board issue, could you try to
> run the same kernel on a mx6q development board, such as mx6qsabresd,
> cubox-i, wandboard, etc?

Indeed, we have a Sabrelite development board and have run the same kernel
configuration (please find attached). Russells 30 FEC related patches are applied.
We have also tried with and without the extended interrupts entry in the DT.

All our tests seem to behave the same way on the Sabrelite as on our own board.
A working theory is that the switch (3Com Switch 4400) triggers the degeneration
of the network stack from which Linux does not seem to recover, even if we later
bypass the switch and directly connect the board to the server machine.

Since the problem is stochastic in nature we are not completely sure if we can
trigger the problem without the switch. It's the switch that allows us to run many
cards simultaneously and thus trigger the problem more easily. :-)

What are your thoughts?

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 14775 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-13 13:39                                 ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-13 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

Fabio and Russell,

> In order to try to narrow down whether this is a board issue, could you try to
> run the same kernel on a mx6q development board, such as mx6qsabresd,
> cubox-i, wandboard, etc?

Indeed, we have a Sabrelite development board and have run the same kernel
configuration (please find attached). Russells 30 FEC related patches are applied.
We have also tried with and without the extended interrupts entry in the DT.

All our tests seem to behave the same way on the Sabrelite as on our own board.
A working theory is that the switch (3Com Switch 4400) triggers the degeneration
of the network stack from which Linux does not seem to recover, even if we later
bypass the switch and directly connect the board to the server machine.

Since the problem is stochastic in nature we are not completely sure if we can
trigger the problem without the switch. It's the switch that allows us to run many
cards simultaneously and thus trigger the problem more easily. :-)

What are your thoughts?

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/x-gzip
Size: 14775 bytes
Desc: config.gz
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140813/f7f2884e/attachment.bin>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-11 17:41                               ` Fabio Estevam
@ 2014-08-14 14:43                                 ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-14 14:43 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 919 bytes --]

Fabio and Russell,

> A working theory is that the switch (3Com Switch 4400) triggers the
> degeneration of the network stack from which Linux does not seem to
> recover, even if we later bypass the switch and directly connect the board to
> the server machine.

After a few more tests we have finally been able to trigger the exact same stalls
on the Sabrelite board with a direct network connection (i.e. without the switch).

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-14 14:43                                 ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-14 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

Fabio and Russell,

> A working theory is that the switch (3Com Switch 4400) triggers the
> degeneration of the network stack from which Linux does not seem to
> recover, even if we later bypass the switch and directly connect the board to
> the server machine.

After a few more tests we have finally been able to trigger the exact same stalls
on the Sabrelite board with a direct network connection (i.e. without the switch).

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-14 14:43                                 ` Mattis Lorentzon
@ 2014-08-14 15:30                                   ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-14 15:30 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

On Thu, Aug 14, 2014 at 11:43 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:

> After a few more tests we have finally been able to trigger the exact same stalls
> on the Sabrelite board with a direct network connection (i.e. without the switch).

Do the stalls also happen on a pure 3.16 kernel?

How can we reproduce the error?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-14 15:30                                   ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-14 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 14, 2014 at 11:43 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:

> After a few more tests we have finally been able to trigger the exact same stalls
> on the Sabrelite board with a direct network connection (i.e. without the switch).

Do the stalls also happen on a pure 3.16 kernel?

How can we reproduce the error?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-14 15:30                                   ` Fabio Estevam
@ 2014-08-15  5:42                                     ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-15  5:42 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Russell King - ARM Linux, Fredrik Noring, linux-kernel, linux-arm-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1227 bytes --]

Fabio,

> Do the stalls also happen on a pure 3.16 kernel?

Yes, we just tried this out overnight and we get the same stalls here.
We have seen similar problems on a Zynq-based board. It might be
worth noting that a common chip between all three boards is, for
example, the KSZ9021RN, while the FEC driver, for example, only
runs on the two iMX6-boards.

> How can we reproduce the error?

We mostly run SSH with benchmarks using NFS, it can probably be
triggered by using only SSH with the following loop:

# while : ; do ssh arm-card date; done

Our (pure) 3.16 kernel uses the following config.
http://lkml.iu.edu/hypermail/linux/kernel/1408.1/03045/config.gz

(We have quite generously disabled a lot of sub-systems in our config.)

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-15  5:42                                     ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-15  5:42 UTC (permalink / raw)
  To: linux-arm-kernel

Fabio,

> Do the stalls also happen on a pure 3.16 kernel?

Yes, we just tried this out overnight and we get the same stalls here.
We have seen similar problems on a Zynq-based board. It might be
worth noting that a common chip between all three boards is, for
example, the KSZ9021RN, while the FEC driver, for example, only
runs on the two iMX6-boards.

> How can we reproduce the error?

We mostly run SSH with benchmarks using NFS, it can probably be
triggered by using only SSH with the following loop:

# while : ; do ssh arm-card date; done

Our (pure) 3.16 kernel uses the following config.
http://lkml.iu.edu/hypermail/linux/kernel/1408.1/03045/config.gz

(We have quite generously disabled a lot of sub-systems in our config.)

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-15  5:42                                     ` Mattis Lorentzon
@ 2014-08-17 21:34                                       ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-17 21:34 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Fabio Estevam, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

On 15/08/14 06:42, Mattis Lorentzon wrote:

> We mostly run SSH with benchmarks using NFS, it can probably be
> triggered by using only SSH with the following loop:
> 
> # while : ; do ssh arm-card date; done

Mattis,

What sort of time does it take for you to see a problem?

I've been running the above for nearly two days on 3.16.0 on a board 
with fec interrupts routed through gpio_6 and haven't seen a hint of 
a problem.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-17 21:34                                       ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-17 21:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/08/14 06:42, Mattis Lorentzon wrote:

> We mostly run SSH with benchmarks using NFS, it can probably be
> triggered by using only SSH with the following loop:
> 
> # while : ; do ssh arm-card date; done

Mattis,

What sort of time does it take for you to see a problem?

I've been running the above for nearly two days on 3.16.0 on a board 
with fec interrupts routed through gpio_6 and haven't seen a hint of 
a problem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-17 21:34                                       ` Iain Paton
@ 2014-08-17 21:46                                         ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-17 21:46 UTC (permalink / raw)
  To: Iain Paton
  Cc: Mattis Lorentzon, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

Iain,

On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
> On 15/08/14 06:42, Mattis Lorentzon wrote:
>
>> We mostly run SSH with benchmarks using NFS, it can probably be
>> triggered by using only SSH with the following loop:
>>
>> # while : ; do ssh arm-card date; done
>
> Mattis,
>
> What sort of time does it take for you to see a problem?
>
> I've been running the above for nearly two days on 3.16.0 on a board
> with fec interrupts routed through gpio_6 and haven't seen a hint of
> a problem.

Thanks for testing.

Which mx6 board have you used on this test?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-17 21:46                                         ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-17 21:46 UTC (permalink / raw)
  To: linux-arm-kernel

Iain,

On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
> On 15/08/14 06:42, Mattis Lorentzon wrote:
>
>> We mostly run SSH with benchmarks using NFS, it can probably be
>> triggered by using only SSH with the following loop:
>>
>> # while : ; do ssh arm-card date; done
>
> Mattis,
>
> What sort of time does it take for you to see a problem?
>
> I've been running the above for nearly two days on 3.16.0 on a board
> with fec interrupts routed through gpio_6 and haven't seen a hint of
> a problem.

Thanks for testing.

Which mx6 board have you used on this test?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-17 21:46                                         ` Fabio Estevam
@ 2014-08-19  6:03                                           ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-19  6:03 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Mattis Lorentzon, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

On 17/08/14 22:46, Fabio Estevam wrote:
> Iain,
> 
> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
>> On 15/08/14 06:42, Mattis Lorentzon wrote:
>>
>>> We mostly run SSH with benchmarks using NFS, it can probably be
>>> triggered by using only SSH with the following loop:
>>>
>>> # while : ; do ssh arm-card date; done
>>
>> Mattis,
>>
>> What sort of time does it take for you to see a problem?
>>
>> I've been running the above for nearly two days on 3.16.0 on a board
>> with fec interrupts routed through gpio_6 and haven't seen a hint of
>> a problem.
> 
> Thanks for testing.
> 
> Which mx6 board have you used on this test?

It's currently pointed at a RIoTboard (atheros phy) but I'm happy to 
try it against both a Sabre-Lite and a Wandboard B1, all running the 
same kernel binary, as well. 

I'm interested enough in why different people get different results 
with this that I'll put some time towards testing to try to help 
narrow down the cause.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-19  6:03                                           ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-19  6:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/08/14 22:46, Fabio Estevam wrote:
> Iain,
> 
> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
>> On 15/08/14 06:42, Mattis Lorentzon wrote:
>>
>>> We mostly run SSH with benchmarks using NFS, it can probably be
>>> triggered by using only SSH with the following loop:
>>>
>>> # while : ; do ssh arm-card date; done
>>
>> Mattis,
>>
>> What sort of time does it take for you to see a problem?
>>
>> I've been running the above for nearly two days on 3.16.0 on a board
>> with fec interrupts routed through gpio_6 and haven't seen a hint of
>> a problem.
> 
> Thanks for testing.
> 
> Which mx6 board have you used on this test?

It's currently pointed at a RIoTboard (atheros phy) but I'm happy to 
try it against both a Sabre-Lite and a Wandboard B1, all running the 
same kernel binary, as well. 

I'm interested enough in why different people get different results 
with this that I'll put some time towards testing to try to help 
narrow down the cause.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-19  6:03                                           ` Iain Paton
@ 2014-08-21  9:39                                             ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-21  9:39 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Mattis Lorentzon, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

On 19/08/14 07:03, Iain Paton wrote:
> On 17/08/14 22:46, Fabio Estevam wrote:
>> Iain,
>>
>> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
>>> On 15/08/14 06:42, Mattis Lorentzon wrote:
>>>
>>>> We mostly run SSH with benchmarks using NFS, it can probably be
>>>> triggered by using only SSH with the following loop:
>>>>
>>>> # while : ; do ssh arm-card date; done
>>>
>>> Mattis,
>>>
>>> What sort of time does it take for you to see a problem?
>>>
>>> I've been running the above for nearly two days on 3.16.0 on a board
>>> with fec interrupts routed through gpio_6 and haven't seen a hint of
>>> a problem.
>>
>> Thanks for testing.
>>
>> Which mx6 board have you used on this test?
> 
> It's currently pointed at a RIoTboard (atheros phy) but I'm happy to 
> try it against both a Sabre-Lite and a Wandboard B1, all running the 
> same kernel binary, as well. 
> 
> I'm interested enough in why different people get different results 
> with this that I'll put some time towards testing to try to help 
> narrow down the cause.
> 

two and a half days of running this against both a sabre-lite and a 
wandboard quad B1 and I still have no reason to think there's any 
sort of a problem.

Up to now, my testing has been done with my own config, I'll now
repeat the whole thing using the config Mattis posted to see if 
I can reproduce it that way.

Suggestions on a better / easier / quicker way to reproduce it are 
welcome.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-21  9:39                                             ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-21  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/08/14 07:03, Iain Paton wrote:
> On 17/08/14 22:46, Fabio Estevam wrote:
>> Iain,
>>
>> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
>>> On 15/08/14 06:42, Mattis Lorentzon wrote:
>>>
>>>> We mostly run SSH with benchmarks using NFS, it can probably be
>>>> triggered by using only SSH with the following loop:
>>>>
>>>> # while : ; do ssh arm-card date; done
>>>
>>> Mattis,
>>>
>>> What sort of time does it take for you to see a problem?
>>>
>>> I've been running the above for nearly two days on 3.16.0 on a board
>>> with fec interrupts routed through gpio_6 and haven't seen a hint of
>>> a problem.
>>
>> Thanks for testing.
>>
>> Which mx6 board have you used on this test?
> 
> It's currently pointed at a RIoTboard (atheros phy) but I'm happy to 
> try it against both a Sabre-Lite and a Wandboard B1, all running the 
> same kernel binary, as well. 
> 
> I'm interested enough in why different people get different results 
> with this that I'll put some time towards testing to try to help 
> narrow down the cause.
> 

two and a half days of running this against both a sabre-lite and a 
wandboard quad B1 and I still have no reason to think there's any 
sort of a problem.

Up to now, my testing has been done with my own config, I'll now
repeat the whole thing using the config Mattis posted to see if 
I can reproduce it that way.

Suggestions on a better / easier / quicker way to reproduce it are 
welcome.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-21  9:39                                             ` Iain Paton
@ 2014-08-22  0:01                                               ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-22  0:01 UTC (permalink / raw)
  To: Iain Paton
  Cc: Mattis Lorentzon, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

On Thu, Aug 21, 2014 at 6:39 AM, Iain Paton <ipaton0@gmail.com> wrote:

> two and a half days of running this against both a sabre-lite and a
> wandboard quad B1 and I still have no reason to think there's any
> sort of a problem.
>
> Up to now, my testing has been done with my own config, I'll now
> repeat the whole thing using the config Mattis posted to see if
> I can reproduce it that way.
>
> Suggestions on a better / easier / quicker way to reproduce it are
> welcome.

Thanks, Iain.

Mattis,

What is the silicon version of the mx6 in your sabrelite? What GCC
version do you use?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-22  0:01                                               ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-22  0:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 6:39 AM, Iain Paton <ipaton0@gmail.com> wrote:

> two and a half days of running this against both a sabre-lite and a
> wandboard quad B1 and I still have no reason to think there's any
> sort of a problem.
>
> Up to now, my testing has been done with my own config, I'll now
> repeat the whole thing using the config Mattis posted to see if
> I can reproduce it that way.
>
> Suggestions on a better / easier / quicker way to reproduce it are
> welcome.

Thanks, Iain.

Mattis,

What is the silicon version of the mx6 in your sabrelite? What GCC
version do you use?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-22  0:01                                               ` Fabio Estevam
@ 2014-08-22  6:39                                                 ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-22  6:39 UTC (permalink / raw)
  To: Fabio Estevam, Iain Paton
  Cc: Fredrik Noring, Russell King - ARM Linux, linux-kernel, linux-arm-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1099 bytes --]

Fabio,

> What is the silicon version of the mx6 in your sabrelite? What GCC version do
> you use?

The silicon version is PCIMX6Q6AVT10AA and the GCC version we use is
arm-none-eabi-gcc (Fedora 2013.11.24-2.fc19) 4.8.1.

Iain,

> Up to now, my testing has been done with my own config, I'll now
> repeat the whole thing using the config Mattis posted to see if I can
> reproduce it that way.

Thanks for testing this. Could you also send me the config that you used for
your Sabrelite?

Do you know of any options that enable additional debug information about
the network driver state (full buffers etc.)?

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-22  6:39                                                 ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-22  6:39 UTC (permalink / raw)
  To: linux-arm-kernel

Fabio,

> What is the silicon version of the mx6 in your sabrelite? What GCC version do
> you use?

The silicon version is PCIMX6Q6AVT10AA and the GCC version we use is
arm-none-eabi-gcc (Fedora 2013.11.24-2.fc19) 4.8.1.

Iain,

> Up to now, my testing has been done with my own config, I'll now
> repeat the whole thing using the config Mattis posted to see if I can
> reproduce it that way.

Thanks for testing this. Could you also send me the config that you used for
your Sabrelite?

Do you know of any options that enable additional debug information about
the network driver state (full buffers etc.)?

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-14 14:43                                 ` Mattis Lorentzon
@ 2014-08-22  8:27                                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-22  8:27 UTC (permalink / raw)
  To: Mattis Lorentzon, Fabio Estevam
  Cc: Fredrik Noring, linux-kernel, linux-arm-kernel

On Thu, Aug 14, 2014 at 02:43:56PM +0000, Mattis Lorentzon wrote:
> Fabio and Russell,
> 
> > A working theory is that the switch (3Com Switch 4400) triggers the
> > degeneration of the network stack from which Linux does not seem to
> > recover, even if we later bypass the switch and directly connect the board to
> > the server machine.
> 
> After a few more tests we have finally been able to trigger the exact
> same stalls on the Sabrelite board with a direct network connection
> (i.e. without the switch).

That's a setup which I can't reproduce, as all my MX6 hardware runs
root-NFS, so using a direct connection to a machine to test will
result in the MX6 losing its root filesystem.

That said, on SolidRun hardware, there is some investigation going on
at the moment concerning poor UDP performance - this is an on-going
problem that has been present for a long time.

What we find is that TCP performance achieves around the 600mbps mark,
but UDP performance can be extremely poor with high packet loss.
Adding a udelay(210) into the fec_enet_rx() can perversely (on
multi-core SoCs) increase UDP performance to around 500mbps at the
expense of a reduction in TCP performance.

This "solution" was tripped over while trying to debug this problem,
and it was found that adding printk()s to the driver increased UDP
performance - so subsituting udelay() for printk() was then tried.

I tried to run perf on the kernel yesterday to find out what's going
on, but for some reason, perf gave me impossible call traces, so I
gave up with that idea.  For example, perf told me that there was a
high hit rate in memcpy() being called from net_rx_action(), but
net_rx_action() doesn't call memcpy(), nor do any of the called
functions as a tail-call.

That said, I don't think perf could tell us what's going on - what
we need is a trace of the CPU's execution while iperf is running,
*without* affecting the CPU itself.  This is something I can't do
with the hardware I have.

My suspicion (unproven) is that a batch of packets get processed in
the softirq handler called during the FEC interrupt exit path.  Then,
because there's more work to be done, ksoftirqd is scheduled, but it
takes time for ksoftirqd to start running - during which time we drop
a lot of packets.  ksoftirqd processes some packets, but then finds
that it can't complete the NAPI "work budget", and so stops running,
resulting in the packet processing being triggered by the next FEC
interrupt, and the cycle repeats.

TCP notices this, and adjusts its sending rate to match, whereas UDP
just carries on regardless, resulting in lots of packets dropped each
time we switch from the tail of hardirq processing to ksoftirqd.

With the udelay() in place, processing takes enough time that it gets
bounced onto ksoftirqd, where it stays.

I'm adding this to this thread in case it has any bearing on the
problem(s) you're seeing - yes, it seems like a different problem, but
could it be related...

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-22  8:27                                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-22  8:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 14, 2014 at 02:43:56PM +0000, Mattis Lorentzon wrote:
> Fabio and Russell,
> 
> > A working theory is that the switch (3Com Switch 4400) triggers the
> > degeneration of the network stack from which Linux does not seem to
> > recover, even if we later bypass the switch and directly connect the board to
> > the server machine.
> 
> After a few more tests we have finally been able to trigger the exact
> same stalls on the Sabrelite board with a direct network connection
> (i.e. without the switch).

That's a setup which I can't reproduce, as all my MX6 hardware runs
root-NFS, so using a direct connection to a machine to test will
result in the MX6 losing its root filesystem.

That said, on SolidRun hardware, there is some investigation going on
at the moment concerning poor UDP performance - this is an on-going
problem that has been present for a long time.

What we find is that TCP performance achieves around the 600mbps mark,
but UDP performance can be extremely poor with high packet loss.
Adding a udelay(210) into the fec_enet_rx() can perversely (on
multi-core SoCs) increase UDP performance to around 500mbps at the
expense of a reduction in TCP performance.

This "solution" was tripped over while trying to debug this problem,
and it was found that adding printk()s to the driver increased UDP
performance - so subsituting udelay() for printk() was then tried.

I tried to run perf on the kernel yesterday to find out what's going
on, but for some reason, perf gave me impossible call traces, so I
gave up with that idea.  For example, perf told me that there was a
high hit rate in memcpy() being called from net_rx_action(), but
net_rx_action() doesn't call memcpy(), nor do any of the called
functions as a tail-call.

That said, I don't think perf could tell us what's going on - what
we need is a trace of the CPU's execution while iperf is running,
*without* affecting the CPU itself.  This is something I can't do
with the hardware I have.

My suspicion (unproven) is that a batch of packets get processed in
the softirq handler called during the FEC interrupt exit path.  Then,
because there's more work to be done, ksoftirqd is scheduled, but it
takes time for ksoftirqd to start running - during which time we drop
a lot of packets.  ksoftirqd processes some packets, but then finds
that it can't complete the NAPI "work budget", and so stops running,
resulting in the packet processing being triggered by the next FEC
interrupt, and the cycle repeats.

TCP notices this, and adjusts its sending rate to match, whereas UDP
just carries on regardless, resulting in lots of packets dropped each
time we switch from the tail of hardirq processing to ksoftirqd.

With the udelay() in place, processing takes enough time that it gets
bounced onto ksoftirqd, where it stays.

I'm adding this to this thread in case it has any bearing on the
problem(s) you're seeing - yes, it seems like a different problem, but
could it be related...

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-22  0:01                                               ` Fabio Estevam
@ 2014-08-22 10:36                                                 ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-22 10:36 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Mattis Lorentzon, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

On 22/08/14 01:01, Fabio Estevam wrote:
> On Thu, Aug 21, 2014 at 6:39 AM, Iain Paton <ipaton0@gmail.com> wrote:
> 
>> two and a half days of running this against both a sabre-lite and a
>> wandboard quad B1 and I still have no reason to think there's any
>> sort of a problem.
>>
>> Up to now, my testing has been done with my own config, I'll now
>> repeat the whole thing using the config Mattis posted to see if
>> I can reproduce it that way.
>>
>> Suggestions on a better / easier / quicker way to reproduce it are
>> welcome.
> 
> Thanks, Iain.
> 
> Mattis,
> 
> What is the silicon version of the mx6 in your sabrelite? What GCC
> version do you use?
> 

For reference, both my SL and WBQUAD report silicon rev 1.2
The RIoTboard uses a Solo and reports silicon rev 1.1

I'm using vanilla gcc 4.9.1 and compiling the kernel natively on a 
sabre-lite.




^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-22 10:36                                                 ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-22 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/08/14 01:01, Fabio Estevam wrote:
> On Thu, Aug 21, 2014 at 6:39 AM, Iain Paton <ipaton0@gmail.com> wrote:
> 
>> two and a half days of running this against both a sabre-lite and a
>> wandboard quad B1 and I still have no reason to think there's any
>> sort of a problem.
>>
>> Up to now, my testing has been done with my own config, I'll now
>> repeat the whole thing using the config Mattis posted to see if
>> I can reproduce it that way.
>>
>> Suggestions on a better / easier / quicker way to reproduce it are
>> welcome.
> 
> Thanks, Iain.
> 
> Mattis,
> 
> What is the silicon version of the mx6 in your sabrelite? What GCC
> version do you use?
> 

For reference, both my SL and WBQUAD report silicon rev 1.2
The RIoTboard uses a Solo and reports silicon rev 1.1

I'm using vanilla gcc 4.9.1 and compiling the kernel natively on a 
sabre-lite.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-13 13:39                                 ` Mattis Lorentzon
@ 2014-08-25 10:18                                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-25 10:18 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Fabio Estevam, Fredrik Noring, linux-kernel, linux-arm-kernel

On Wed, Aug 13, 2014 at 01:39:27PM +0000, Mattis Lorentzon wrote:
> All our tests seem to behave the same way on the Sabrelite as on our own board.
> A working theory is that the switch (3Com Switch 4400) triggers the degeneration
> of the network stack from which Linux does not seem to recover, even if we later
> bypass the switch and directly connect the board to the server machine.

Please can you try something - what happens if you completely disable
pause frame support (flow control) on all machines on the switch?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-25 10:18                                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 91+ messages in thread
From: Russell King - ARM Linux @ 2014-08-25 10:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 13, 2014 at 01:39:27PM +0000, Mattis Lorentzon wrote:
> All our tests seem to behave the same way on the Sabrelite as on our own board.
> A working theory is that the switch (3Com Switch 4400) triggers the degeneration
> of the network stack from which Linux does not seem to recover, even if we later
> bypass the switch and directly connect the board to the server machine.

Please can you try something - what happens if you completely disable
pause frame support (flow control) on all machines on the switch?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-25 10:18                                   ` Russell King - ARM Linux
@ 2014-08-26 13:11                                     ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-26 13:11 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mattis Lorentzon, Fredrik Noring, Fabio Estevam, linux-kernel,
	linux-arm-kernel

On 25/08/14 11:18, Russell King - ARM Linux wrote:
> On Wed, Aug 13, 2014 at 01:39:27PM +0000, Mattis Lorentzon wrote:
>> All our tests seem to behave the same way on the Sabrelite as on our own board.
>> A working theory is that the switch (3Com Switch 4400) triggers the degeneration
>> of the network stack from which Linux does not seem to recover, even if we later
>> bypass the switch and directly connect the board to the server machine.
> 
> Please can you try something - what happens if you completely disable
> pause frame support (flow control) on all machines on the switch?

Russell, while trying to duplicate this I have flow-control disabled 
on the switch which leads to it being auto-negotiated off on all devices.
Do you think it could be worth turning it on and trying again?



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-26 13:11                                     ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-26 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/08/14 11:18, Russell King - ARM Linux wrote:
> On Wed, Aug 13, 2014 at 01:39:27PM +0000, Mattis Lorentzon wrote:
>> All our tests seem to behave the same way on the Sabrelite as on our own board.
>> A working theory is that the switch (3Com Switch 4400) triggers the degeneration
>> of the network stack from which Linux does not seem to recover, even if we later
>> bypass the switch and directly connect the board to the server machine.
> 
> Please can you try something - what happens if you completely disable
> pause frame support (flow control) on all machines on the switch?

Russell, while trying to duplicate this I have flow-control disabled 
on the switch which leads to it being auto-negotiated off on all devices.
Do you think it could be worth turning it on and trying again?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-21  9:39                                             ` Iain Paton
@ 2014-08-26 13:12                                               ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-26 13:12 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Mattis Lorentzon, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

On 21/08/14 10:39, Iain Paton wrote:
> On 19/08/14 07:03, Iain Paton wrote:
>> On 17/08/14 22:46, Fabio Estevam wrote:
>>> Iain,
>>>
>>> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
>>>> On 15/08/14 06:42, Mattis Lorentzon wrote:
>>>>
>>>>> We mostly run SSH with benchmarks using NFS, it can probably be
>>>>> triggered by using only SSH with the following loop:
>>>>>
>>>>> # while : ; do ssh arm-card date; done
>>>>
>>>> Mattis,
>>>>
>>>> What sort of time does it take for you to see a problem?
>>>>
>>>> I've been running the above for nearly two days on 3.16.0 on a board
>>>> with fec interrupts routed through gpio_6 and haven't seen a hint of
>>>> a problem.
>>>
>>> Thanks for testing.
>>>
>>> Which mx6 board have you used on this test?
>>
>> It's currently pointed at a RIoTboard (atheros phy) but I'm happy to 
>> try it against both a Sabre-Lite and a Wandboard B1, all running the 
>> same kernel binary, as well. 
>>
>> I'm interested enough in why different people get different results 
>> with this that I'll put some time towards testing to try to help 
>> narrow down the cause.
>>
> 
> two and a half days of running this against both a sabre-lite and a 
> wandboard quad B1 and I still have no reason to think there's any 
> sort of a problem.
> 
> Up to now, my testing has been done with my own config, I'll now
> repeat the whole thing using the config Mattis posted to see if 
> I can reproduce it that way.
> 
> Suggestions on a better / easier / quicker way to reproduce it are 
> welcome.
> 

So I wasn't able to use Mattis exact configuration as I couldn't 
get it to boot properly on anything. 

I made changes enough to enable mmc/sata and to disable the 
compiled in kernel command line and appended devicetree and initrd.
Even then it still won't boot on my WBQUAD. 
It is running on Sabre-Lite and RIoTboard though, so useful enough 
to test against the SL in a similar manner to Mattis tests with SL.

I've had the test running against both for approx one day and again 
no sign of any problems. I'm happy to leave this running, but at 
this stage I'm not expecting I'll see any problems even if I leave 
it running for a week.



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-26 13:12                                               ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-26 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/08/14 10:39, Iain Paton wrote:
> On 19/08/14 07:03, Iain Paton wrote:
>> On 17/08/14 22:46, Fabio Estevam wrote:
>>> Iain,
>>>
>>> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton <ipaton0@gmail.com> wrote:
>>>> On 15/08/14 06:42, Mattis Lorentzon wrote:
>>>>
>>>>> We mostly run SSH with benchmarks using NFS, it can probably be
>>>>> triggered by using only SSH with the following loop:
>>>>>
>>>>> # while : ; do ssh arm-card date; done
>>>>
>>>> Mattis,
>>>>
>>>> What sort of time does it take for you to see a problem?
>>>>
>>>> I've been running the above for nearly two days on 3.16.0 on a board
>>>> with fec interrupts routed through gpio_6 and haven't seen a hint of
>>>> a problem.
>>>
>>> Thanks for testing.
>>>
>>> Which mx6 board have you used on this test?
>>
>> It's currently pointed at a RIoTboard (atheros phy) but I'm happy to 
>> try it against both a Sabre-Lite and a Wandboard B1, all running the 
>> same kernel binary, as well. 
>>
>> I'm interested enough in why different people get different results 
>> with this that I'll put some time towards testing to try to help 
>> narrow down the cause.
>>
> 
> two and a half days of running this against both a sabre-lite and a 
> wandboard quad B1 and I still have no reason to think there's any 
> sort of a problem.
> 
> Up to now, my testing has been done with my own config, I'll now
> repeat the whole thing using the config Mattis posted to see if 
> I can reproduce it that way.
> 
> Suggestions on a better / easier / quicker way to reproduce it are 
> welcome.
> 

So I wasn't able to use Mattis exact configuration as I couldn't 
get it to boot properly on anything. 

I made changes enough to enable mmc/sata and to disable the 
compiled in kernel command line and appended devicetree and initrd.
Even then it still won't boot on my WBQUAD. 
It is running on Sabre-Lite and RIoTboard though, so useful enough 
to test against the SL in a similar manner to Mattis tests with SL.

I've had the test running against both for approx one day and again 
no sign of any problems. I'm happy to leave this running, but at 
this stage I'm not expecting I'll see any problems even if I leave 
it running for a week.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-22 10:36                                                 ` Iain Paton
@ 2014-08-27  6:32                                                   ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-27  6:32 UTC (permalink / raw)
  To: Iain Paton, Fabio Estevam
  Cc: Fredrik Noring, Russell King - ARM Linux, linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

Hi Iain, Russell and Fabio,

> The config is attached. Note that there's a lot of additional stuff enabled as
> I'm aiming for a single general purpose kernel that covers i.MX6, AM3359,
> Allwinner A10/A20 along with several versions of boards using those
> particular SoCs.
>
> Same kernel binary on all the boards I've tried this on, only real differences
> will be the devicetree and u-boot

Amazingly we have been able to run a complete nightly test on eight i.MX6
boards without hickups using Iain's config! We had to modify it slightly to get
it to boot, please find attached patch and Iain's patched config.

On Russell's suggestion we also began to disable flow control on the machines.
However it did not seem to make a difference because all our Zynq cards
stalled during the same test run (using our own Zynq config).

Iain's config seems promising and we will continue to run tests during the
next couple of days. We will also try to adapt Iain's config to our Zynq board.

Many thanks for all suggestions, patches and configs so far!

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

[-- Attachment #2: config.patch --]
[-- Type: application/octet-stream, Size: 692 bytes --]

3c3
< # Linux/arm 3.16.0-rc2 Kernel Configuration
---
> # Linux/arm 3.16.0 Kernel Configuration
263a264
> CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
264a266
> CONFIG_RWSEM_SPIN_ON_OWNER=y
537,540c539
< CONFIG_ARM_APPENDED_DTB=y
< CONFIG_ARM_ATAG_DTB_COMPAT=y
< CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_FROM_BOOTLOADER=y
< # CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_EXTEND is not set
---
> # CONFIG_ARM_APPENDED_DTB is not set
638,641c637
< CONFIG_IP_PNP=y
< # CONFIG_IP_PNP_DHCP is not set
< # CONFIG_IP_PNP_BOOTP is not set
< # CONFIG_IP_PNP_RARP is not set
---
> # CONFIG_IP_PNP is not set
1229d1224
< # CONFIG_PARPORT is not set
1230a1226
> # CONFIG_PARPORT is not set
3671d3666
< # CONFIG_ROOT_NFS is not set

[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 23527 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-27  6:32                                                   ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-27  6:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Iain, Russell and Fabio,

> The config is attached. Note that there's a lot of additional stuff enabled as
> I'm aiming for a single general purpose kernel that covers i.MX6, AM3359,
> Allwinner A10/A20 along with several versions of boards using those
> particular SoCs.
> 
> Same kernel binary on all the boards I've tried this on, only real differences
> will be the devicetree and u-boot

Amazingly we have been able to run a complete nightly test on eight i.MX6
boards without hickups using Iain's config! We had to modify it slightly to get
it to boot, please find attached patch and Iain's patched config.

On Russell's suggestion we also began to disable flow control on the machines.
However it did not seem to make a difference because all our Zynq cards
stalled during the same test run (using our own Zynq config).

Iain's config seems promising and we will continue to run tests during the
next couple of days. We will also try to adapt Iain's config to our Zynq board.

Many thanks for all suggestions, patches and configs so far!

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.patch
Type: application/octet-stream
Size: 692 bytes
Desc: config.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140827/44ff0458/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/x-gzip
Size: 23527 bytes
Desc: config.gz
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140827/44ff0458/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-27  6:32                                                   ` Mattis Lorentzon
@ 2014-08-27 10:43                                                     ` Iain Paton
  -1 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-27 10:43 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Fabio Estevam, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On 27/08/14 07:32, Mattis Lorentzon wrote:
> Hi Iain, Russell and Fabio,
> 
>> The config is attached. Note that there's a lot of additional stuff enabled as
>> I'm aiming for a single general purpose kernel that covers i.MX6, AM3359,
>> Allwinner A10/A20 along with several versions of boards using those
>> particular SoCs.
>>
>> Same kernel binary on all the boards I've tried this on, only real differences
>> will be the devicetree and u-boot
> 
> Amazingly we have been able to run a complete nightly test on eight i.MX6
> boards without hickups using Iain's config! We had to modify it slightly to get
> it to boot, please find attached patch and Iain's patched config.

Interesting. We obviously have some differences in how we boot, my changes to 
your config to get it to boot basically amount to reverting the patch you attached 
and then enabling sata and mmc. So far I've been unable to get your config to fail.

I'm attaching the patch showing what I changed in case it sheds any light on 
what's going on, although I don't see why any of the changes make any difference.

My kernel command line is also fairly obvious with nothing I'd think is odd:
console=ttymxc1,115200n8 root=/dev/sda1 ro rootfstype=ext2 rootwait video= ahci-imx.hotplug=1

It would be good to know what makes my config work for you, I don't think I've 
done anything special with it.

Iain


[-- Attachment #2: config.patch --]
[-- Type: text/plain, Size: 2995 bytes --]

3c3
< # Linux/arm 3.16.0 Kernel Configuration
---
> # Linux/arm 3.16.0-rc2 Kernel Configuration
38c38
< CONFIG_KERNEL_GZIP=y
---
> # CONFIG_KERNEL_GZIP is not set
41c41
< # CONFIG_KERNEL_LZO is not set
---
> CONFIG_KERNEL_LZO=y
233,239c233
< CONFIG_PARTITION_ADVANCED=y
< # CONFIG_ACORN_PARTITION is not set
< # CONFIG_AIX_PARTITION is not set
< # CONFIG_OSF_PARTITION is not set
< # CONFIG_AMIGA_PARTITION is not set
< # CONFIG_ATARI_PARTITION is not set
< # CONFIG_MAC_PARTITION is not set
---
> # CONFIG_PARTITION_ADVANCED is not set
241,249d234
< # CONFIG_BSD_DISKLABEL is not set
< # CONFIG_MINIX_SUBPARTITION is not set
< # CONFIG_SOLARIS_X86_PARTITION is not set
< # CONFIG_UNIXWARE_DISKLABEL is not set
< # CONFIG_LDM_PARTITION is not set
< # CONFIG_SGI_PARTITION is not set
< # CONFIG_ULTRIX_PARTITION is not set
< # CONFIG_SUN_PARTITION is not set
< # CONFIG_KARMA_PARTITION is not set
251,252d235
< # CONFIG_SYSV68_PARTITION is not set
< # CONFIG_CMDLINE_PARTITION is not set
265,266d247
< CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
< CONFIG_RWSEM_SPIN_ON_OWNER=y
533,534c514,521
< # CONFIG_ARM_APPENDED_DTB is not set
< CONFIG_CMDLINE=""
---
> CONFIG_ARM_APPENDED_DTB=y
> CONFIG_ARM_ATAG_DTB_COMPAT=y
> CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_FROM_BOOTLOADER=y
> # CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_EXTEND is not set
> CONFIG_CMDLINE="___console=ttymxc0,115200 ___debug ___LOGLEVEL=8 ___initrd=0x11800040,12383491 ___dyndbg=\"file * +p\""
> CONFIG_CMDLINE_FROM_BOOTLOADER=y
> # CONFIG_CMDLINE_EXTEND is not set
> # CONFIG_CMDLINE_FORCE is not set
591c578
< # CONFIG_PM_TEST_SUSPEND is not set
---
> CONFIG_PM_TEST_SUSPEND=y
919d905
< CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
920a907
> CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
1045,1059c1032
< CONFIG_ATA=y
< # CONFIG_ATA_NONSTANDARD is not set
< CONFIG_ATA_VERBOSE_ERROR=y
< CONFIG_SATA_PMP=y
< 
< #
< # Controllers with non-SFF native interface
< #
< CONFIG_SATA_AHCI=y
< CONFIG_SATA_AHCI_PLATFORM=y
< CONFIG_AHCI_IMX=y
< # CONFIG_SATA_INIC162X is not set
< # CONFIG_SATA_ACARD_AHCI is not set
< # CONFIG_SATA_SIL24 is not set
< # CONFIG_ATA_SFF is not set
---
> # CONFIG_ATA is not set
1786,1815c1759
< CONFIG_MMC=y
< # CONFIG_MMC_DEBUG is not set
< # CONFIG_MMC_CLKGATE is not set
< 
< #
< # MMC/SD/SDIO Card Drivers
< #
< CONFIG_MMC_BLOCK=y
< CONFIG_MMC_BLOCK_MINORS=8
< CONFIG_MMC_BLOCK_BOUNCE=y
< # CONFIG_SDIO_UART is not set
< # CONFIG_MMC_TEST is not set
< 
< #
< # MMC/SD/SDIO Host Controller Drivers
< #
< CONFIG_MMC_SDHCI=y
< CONFIG_MMC_SDHCI_IO_ACCESSORS=y
< # CONFIG_MMC_SDHCI_PCI is not set
< CONFIG_MMC_SDHCI_PLTFM=y
< # CONFIG_MMC_SDHCI_OF_ARASAN is not set
< CONFIG_MMC_SDHCI_ESDHC_IMX=y
< # CONFIG_MMC_SDHCI_PXAV3 is not set
< # CONFIG_MMC_SDHCI_PXAV2 is not set
< # CONFIG_MMC_MXC is not set
< # CONFIG_MMC_TIFM_SD is not set
< # CONFIG_MMC_CB710 is not set
< # CONFIG_MMC_VIA_SDMMC is not set
< # CONFIG_MMC_DW is not set
< # CONFIG_MMC_USDHI6ROL0 is not set
---
> # CONFIG_MMC is not set
1968d1911
< # CONFIG_WIMAX_GDM72XX is not set

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-27 10:43                                                     ` Iain Paton
  0 siblings, 0 replies; 91+ messages in thread
From: Iain Paton @ 2014-08-27 10:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/08/14 07:32, Mattis Lorentzon wrote:
> Hi Iain, Russell and Fabio,
> 
>> The config is attached. Note that there's a lot of additional stuff enabled as
>> I'm aiming for a single general purpose kernel that covers i.MX6, AM3359,
>> Allwinner A10/A20 along with several versions of boards using those
>> particular SoCs.
>>
>> Same kernel binary on all the boards I've tried this on, only real differences
>> will be the devicetree and u-boot
> 
> Amazingly we have been able to run a complete nightly test on eight i.MX6
> boards without hickups using Iain's config! We had to modify it slightly to get
> it to boot, please find attached patch and Iain's patched config.

Interesting. We obviously have some differences in how we boot, my changes to 
your config to get it to boot basically amount to reverting the patch you attached 
and then enabling sata and mmc. So far I've been unable to get your config to fail.

I'm attaching the patch showing what I changed in case it sheds any light on 
what's going on, although I don't see why any of the changes make any difference.

My kernel command line is also fairly obvious with nothing I'd think is odd:
console=ttymxc1,115200n8 root=/dev/sda1 ro rootfstype=ext2 rootwait video= ahci-imx.hotplug=1

It would be good to know what makes my config work for you, I don't think I've 
done anything special with it.

Iain

-------------- next part --------------
3c3
< # Linux/arm 3.16.0 Kernel Configuration
---
> # Linux/arm 3.16.0-rc2 Kernel Configuration
38c38
< CONFIG_KERNEL_GZIP=y
---
> # CONFIG_KERNEL_GZIP is not set
41c41
< # CONFIG_KERNEL_LZO is not set
---
> CONFIG_KERNEL_LZO=y
233,239c233
< CONFIG_PARTITION_ADVANCED=y
< # CONFIG_ACORN_PARTITION is not set
< # CONFIG_AIX_PARTITION is not set
< # CONFIG_OSF_PARTITION is not set
< # CONFIG_AMIGA_PARTITION is not set
< # CONFIG_ATARI_PARTITION is not set
< # CONFIG_MAC_PARTITION is not set
---
> # CONFIG_PARTITION_ADVANCED is not set
241,249d234
< # CONFIG_BSD_DISKLABEL is not set
< # CONFIG_MINIX_SUBPARTITION is not set
< # CONFIG_SOLARIS_X86_PARTITION is not set
< # CONFIG_UNIXWARE_DISKLABEL is not set
< # CONFIG_LDM_PARTITION is not set
< # CONFIG_SGI_PARTITION is not set
< # CONFIG_ULTRIX_PARTITION is not set
< # CONFIG_SUN_PARTITION is not set
< # CONFIG_KARMA_PARTITION is not set
251,252d235
< # CONFIG_SYSV68_PARTITION is not set
< # CONFIG_CMDLINE_PARTITION is not set
265,266d247
< CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
< CONFIG_RWSEM_SPIN_ON_OWNER=y
533,534c514,521
< # CONFIG_ARM_APPENDED_DTB is not set
< CONFIG_CMDLINE=""
---
> CONFIG_ARM_APPENDED_DTB=y
> CONFIG_ARM_ATAG_DTB_COMPAT=y
> CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_FROM_BOOTLOADER=y
> # CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_EXTEND is not set
> CONFIG_CMDLINE="___console=ttymxc0,115200 ___debug ___LOGLEVEL=8 ___initrd=0x11800040,12383491 ___dyndbg=\"file * +p\""
> CONFIG_CMDLINE_FROM_BOOTLOADER=y
> # CONFIG_CMDLINE_EXTEND is not set
> # CONFIG_CMDLINE_FORCE is not set
591c578
< # CONFIG_PM_TEST_SUSPEND is not set
---
> CONFIG_PM_TEST_SUSPEND=y
919d905
< CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
920a907
> CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
1045,1059c1032
< CONFIG_ATA=y
< # CONFIG_ATA_NONSTANDARD is not set
< CONFIG_ATA_VERBOSE_ERROR=y
< CONFIG_SATA_PMP=y
< 
< #
< # Controllers with non-SFF native interface
< #
< CONFIG_SATA_AHCI=y
< CONFIG_SATA_AHCI_PLATFORM=y
< CONFIG_AHCI_IMX=y
< # CONFIG_SATA_INIC162X is not set
< # CONFIG_SATA_ACARD_AHCI is not set
< # CONFIG_SATA_SIL24 is not set
< # CONFIG_ATA_SFF is not set
---
> # CONFIG_ATA is not set
1786,1815c1759
< CONFIG_MMC=y
< # CONFIG_MMC_DEBUG is not set
< # CONFIG_MMC_CLKGATE is not set
< 
< #
< # MMC/SD/SDIO Card Drivers
< #
< CONFIG_MMC_BLOCK=y
< CONFIG_MMC_BLOCK_MINORS=8
< CONFIG_MMC_BLOCK_BOUNCE=y
< # CONFIG_SDIO_UART is not set
< # CONFIG_MMC_TEST is not set
< 
< #
< # MMC/SD/SDIO Host Controller Drivers
< #
< CONFIG_MMC_SDHCI=y
< CONFIG_MMC_SDHCI_IO_ACCESSORS=y
< # CONFIG_MMC_SDHCI_PCI is not set
< CONFIG_MMC_SDHCI_PLTFM=y
< # CONFIG_MMC_SDHCI_OF_ARASAN is not set
< CONFIG_MMC_SDHCI_ESDHC_IMX=y
< # CONFIG_MMC_SDHCI_PXAV3 is not set
< # CONFIG_MMC_SDHCI_PXAV2 is not set
< # CONFIG_MMC_MXC is not set
< # CONFIG_MMC_TIFM_SD is not set
< # CONFIG_MMC_CB710 is not set
< # CONFIG_MMC_VIA_SDMMC is not set
< # CONFIG_MMC_DW is not set
< # CONFIG_MMC_USDHI6ROL0 is not set
---
> # CONFIG_MMC is not set
1968d1911
< # CONFIG_WIMAX_GDM72XX is not set

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-27 10:43                                                     ` Iain Paton
@ 2014-08-29 10:57                                                       ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-29 10:57 UTC (permalink / raw)
  To: Iain Paton
  Cc: Fabio Estevam, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

Iain,

> Interesting. We obviously have some differences in how we boot, my
> changes to your config to get it to boot basically amount to reverting the
> patch you attached and then enabling sata and mmc. So far I've been unable
> to get your config to fail.

Our version of U-boot doesn't support specifying a device tree separate from
the kernel, so we append it to the end of the kernel binary. We also enable
automatic configuration of IP addresses (CONFIG_IP_PNP). Our bootargs are:
console=ttymxc1,115200
ip=192.168.2.157:192.168.2.1:192.168.2.1:255.255.255.0:armcard:eth0:on
earlyprintk enable_wait_mode=off

> It would be good to know what makes my config work for you, I don't think
> I've done anything special with it.

With a couple of modifications (attached) we have been able to get your
config running on our Zynq boards as well, solving our ethernet issues.

The serial port and ethernet are essentially the only things we use. No disks,
no graphics, no USB, etc. which is why we tried to reduce the kernel
configuration to a bare minimum. We have no idea which disabled and/or
enabled options that are causing the stalls.

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

[-- Attachment #2: config.patch --]
[-- Type: application/octet-stream, Size: 2670 bytes --]

372c372
< CONFIG_ARCH_ZYNQ=y
---
> # CONFIG_ARCH_ZYNQ is not set
399d398
< # CONFIG_CPU_BIG_ENDIAN is not set
416d414
< CONFIG_ARCH_SUPPORTS_BIG_ENDIAN=y
427d424
< CONFIG_ICST=y
432d428
< CONFIG_ARM_AMBA=y
543,546c539
< CONFIG_ARM_APPENDED_DTB=y
< CONFIG_ARM_ATAG_DTB_COMPAT=y
< CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_FROM_BOOTLOADER=y
< # CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_EXTEND is not set
---
> # CONFIG_ARM_APPENDED_DTB is not set
644,647c637
< CONFIG_IP_PNP=y
< # CONFIG_IP_PNP_DHCP is not set
< # CONFIG_IP_PNP_BOOTP is not set
< # CONFIG_IP_PNP_RARP is not set
---
> # CONFIG_IP_PNP is not set
1012d1001
< # CONFIG_CAN_XILINXCAN is not set
1454,1455c1443
< CONFIG_NET_CADENCE=y
< CONFIG_MACB=y
---
> # CONFIG_NET_CADENCE is not set
1510,1511d1497
< CONFIG_NET_VENDOR_XILINX=y
< # CONFIG_XILINX_EMACLITE is not set
1789d1774
< # CONFIG_SERIO_AMBAKMI is not set
1825,1826d1809
< # CONFIG_SERIAL_AMBA_PL010 is not set
< # CONFIG_SERIAL_AMBA_PL011 is not set
1833,1834d1815
< CONFIG_SERIAL_UARTLITE=y
< CONFIG_SERIAL_UARTLITE_CONSOLE=y
1843,1844c1824
< CONFIG_SERIAL_XILINX_PS_UART=y
< CONFIG_SERIAL_XILINX_PS_UART_CONSOLE=y
---
> # CONFIG_SERIAL_XILINX_PS_UART is not set
1907d1886
< # CONFIG_I2C_CADENCE is not set
1914d1892
< # CONFIG_I2C_NOMADIK is not set
1952d1929
< # CONFIG_SPI_PL022 is not set
2038d2014
< # CONFIG_GPIO_PL061 is not set
2041d2016
< # CONFIG_GPIO_XILINX is not set
2245d2219
< # CONFIG_ARM_SP805_WATCHDOG is not set
2652d2625
< # CONFIG_FB_ARMCLCD is not set
2680d2652
< # CONFIG_FB_XILINX is not set
2828d2799
< # CONFIG_SND_ARMAACI is not set
2839d2809
< # CONFIG_SND_SOC_ADI is not set
3289d3258
< # CONFIG_MMC_ARMMMCI is not set
3440,3441d3408
< # CONFIG_RTC_DRV_PL030 is not set
< # CONFIG_RTC_DRV_PL031 is not set
3458d3424
< # CONFIG_AMBA_PL08X is not set
3464d3429
< # CONFIG_PL330_DMA is not set
3469d3433
< # CONFIG_XILINX_VDMA is not set
3561d3524
< # CONFIG_COMMON_CLK_AXI_CLKGEN is not set
3571d3533
< CONFIG_CADENCE_TTC_TIMER=y
3705d3666
< # CONFIG_ROOT_NFS is not set
3912,3922c3873
< CONFIG_DEBUG_LL=y
< CONFIG_DEBUG_ZYNQ_UART0=y
< # CONFIG_DEBUG_ZYNQ_UART1 is not set
< # CONFIG_DEBUG_IMX6Q_UART is not set
< # CONFIG_DEBUG_IMX6SL_UART is not set
< # CONFIG_DEBUG_SUNXI_UART0 is not set
< # CONFIG_DEBUG_SUNXI_UART1 is not set
< # CONFIG_DEBUG_ICEDCC is not set
< # CONFIG_DEBUG_SEMIHOSTING is not set
< # CONFIG_DEBUG_LL_UART_8250 is not set
< # CONFIG_DEBUG_LL_UART_PL01X is not set
---
> # CONFIG_DEBUG_LL is not set
3924c3875
< CONFIG_DEBUG_LL_INCLUDE="debug/zynq.S"
---
> CONFIG_DEBUG_LL_INCLUDE="mach/debug-macro.S"
3927d3877
< CONFIG_DEBUG_UNCOMPRESS=y
3929,3930d3878
< CONFIG_EARLY_PRINTK=y
< # CONFIG_OC_ETM is not set

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-29 10:57                                                       ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-08-29 10:57 UTC (permalink / raw)
  To: linux-arm-kernel

Iain,

> Interesting. We obviously have some differences in how we boot, my
> changes to your config to get it to boot basically amount to reverting the
> patch you attached and then enabling sata and mmc. So far I've been unable
> to get your config to fail.

Our version of U-boot doesn't support specifying a device tree separate from
the kernel, so we append it to the end of the kernel binary. We also enable
automatic configuration of IP addresses (CONFIG_IP_PNP). Our bootargs are:
console=ttymxc1,115200
ip=192.168.2.157:192.168.2.1:192.168.2.1:255.255.255.0:armcard:eth0:on
earlyprintk enable_wait_mode=off

> It would be good to know what makes my config work for you, I don't think
> I've done anything special with it.

With a couple of modifications (attached) we have been able to get your
config running on our Zynq boards as well, solving our ethernet issues.

The serial port and ethernet are essentially the only things we use. No disks,
no graphics, no USB, etc. which is why we tried to reduce the kernel
configuration to a bare minimum. We have no idea which disabled and/or
enabled options that are causing the stalls.

Best regards,
Mattis Lorentzon

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.patch
Type: application/octet-stream
Size: 2670 bytes
Desc: config.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140829/479869c2/attachment.obj>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-29 10:57                                                       ` Mattis Lorentzon
@ 2014-08-29 11:30                                                         ` Fabio Estevam
  -1 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-29 11:30 UTC (permalink / raw)
  To: Mattis Lorentzon
  Cc: Iain Paton, Fredrik Noring, Russell King - ARM Linux,
	linux-kernel, linux-arm-kernel

Hi Mattis,

On Fri, Aug 29, 2014 at 7:57 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:
> Iain,
>
>> Interesting. We obviously have some differences in how we boot, my
>> changes to your config to get it to boot basically amount to reverting the
>> patch you attached and then enabling sata and mmc. So far I've been unable
>> to get your config to fail.
>
> Our version of U-boot doesn't support specifying a device tree separate from
> the kernel, so we append it to the end of the kernel binary. We also enable
> automatic configuration of IP addresses (CONFIG_IP_PNP). Our bootargs are:
> console=ttymxc1,115200
> ip=192.168.2.157:192.168.2.1:192.168.2.1:255.255.255.0:armcard:eth0:on
> earlyprintk enable_wait_mode=off

I suppose that this 'enable_wait_mode=off' is a left over from the
time you used the FSL BSP.

This is not needed in mainline.

>> It would be good to know what makes my config work for you, I don't think
>> I've done anything special with it.
>
> With a couple of modifications (attached) we have been able to get your
> config running on our Zynq boards as well, solving our ethernet issues.
>
> The serial port and ethernet are essentially the only things we use. No disks,
> no graphics, no USB, etc. which is why we tried to reduce the kernel
> configuration to a bare minimum. We have no idea which disabled and/or
> enabled options that are causing the stalls.

It's good to hear you do not have the lockups anymore, but this is
still a big mistery for us as we have not yet understood the root
cause and what is the 'guilty' kernel config option that makes things
FEC to work unreliably.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-08-29 11:30                                                         ` Fabio Estevam
  0 siblings, 0 replies; 91+ messages in thread
From: Fabio Estevam @ 2014-08-29 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mattis,

On Fri, Aug 29, 2014 at 7:57 AM, Mattis Lorentzon
<Mattis.Lorentzon@autoliv.com> wrote:
> Iain,
>
>> Interesting. We obviously have some differences in how we boot, my
>> changes to your config to get it to boot basically amount to reverting the
>> patch you attached and then enabling sata and mmc. So far I've been unable
>> to get your config to fail.
>
> Our version of U-boot doesn't support specifying a device tree separate from
> the kernel, so we append it to the end of the kernel binary. We also enable
> automatic configuration of IP addresses (CONFIG_IP_PNP). Our bootargs are:
> console=ttymxc1,115200
> ip=192.168.2.157:192.168.2.1:192.168.2.1:255.255.255.0:armcard:eth0:on
> earlyprintk enable_wait_mode=off

I suppose that this 'enable_wait_mode=off' is a left over from the
time you used the FSL BSP.

This is not needed in mainline.

>> It would be good to know what makes my config work for you, I don't think
>> I've done anything special with it.
>
> With a couple of modifications (attached) we have been able to get your
> config running on our Zynq boards as well, solving our ethernet issues.
>
> The serial port and ethernet are essentially the only things we use. No disks,
> no graphics, no USB, etc. which is why we tried to reduce the kernel
> configuration to a bare minimum. We have no idea which disabled and/or
> enabled options that are causing the stalls.

It's good to hear you do not have the lockups anymore, but this is
still a big mistery for us as we have not yet understood the root
cause and what is the 'guilty' kernel config option that makes things
FEC to work unreliably.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
  2014-08-29 10:57                                                       ` Mattis Lorentzon
@ 2014-12-16 14:50                                                         ` Mattis Lorentzon
  -1 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-12-16 14:50 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: 'Fabio Estevam',
	Fredrik Noring, 'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org',
	'Iain Paton'

Hi Russell,

> Now because things have changed during the last merge window, I've got
> an even bigger problem sorting through that patch set and getting it
> back into a submittable state.  I've just sent out v2 for it onto the
> netdev@vger.kernel.org mailing list.
>
> The initial version (marked RFC) attracted very little interest from
> testers, or acks.  I'd very much like to have some testing of it, so
> if you want to try it out, I can provide you with a git URL, patches or a
> combined patch.
 
We have run v3.16 for about three months now, and many millions of ssh
connections on eight separate systems, both without and with your network
patches. Our conclusion is that the patches clearly reduce the number of
network timeouts, and this is a great improvement. However, after a month
or so of uptime, the number of timeouts began to increase again, forcing us
to reboot the cards.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-12-16 14:50                                                         ` Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-12-16 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

> Now because things have changed during the last merge window, I've got
> an even bigger problem sorting through that patch set and getting it
> back into a submittable state.  I've just sent out v2 for it onto the
> netdev at vger.kernel.org mailing list.
>
> The initial version (marked RFC) attracted very little interest from
> testers, or acks.  I'd very much like to have some testing of it, so
> if you want to try it out, I can provide you with a git URL, patches or a
> combined patch.
 
We have run v3.16 for about three months now, and many millions of ssh
connections on eight separate systems, both without and with your network
patches. Our conclusion is that the patches clearly reduce the number of
network timeouts, and this is a great improvement. However, after a month
or so of uptime, the number of timeouts began to increase again, forcing us
to reboot the cards.

Best regards,
Mattis Lorentzon
***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: Oops: 17 SMP ARM (v3.16-rc2)
@ 2014-06-26 13:16 Mattis Lorentzon
  0 siblings, 0 replies; 91+ messages in thread
From: Mattis Lorentzon @ 2014-06-26 13:16 UTC (permalink / raw)
  To: linux-kernel, linux; +Cc: Fredrik Noring

[-- Attachment #1: Type: text/plain, Size: 5110 bytes --]

Hi again,

The Oops seems to have been introduced somewhere between v3.12 and v3.13:

- The Oops is reproducible within seconds when running Linux 3.16-rc2.
- We have observed the Oops on 8 different hardware units and two different chipsets (Freescale i.MX6 and Xilinx Zynq).
- The Oops has not been seen on Linux 3.12 so it appears to be good.
- The Oops has been seen on Linux 3.13, 3.14, 3.15, 3.16-rc2 so these appear to be bad.

Configs and a couple of Oops reports are attached.

Best regards,
Mattis Lorentzon

> Hello kernel people,
>
> I have a similar issue with v3.16-rc2 as previously reported by Waldemar
> Brodkorb for v3.15-rc4.
> https://lkml.org/lkml/2014/5/9/330
>
> We are running a benchmark application, sometimes using perf, with heavy
> traffic over NFS.
> The error is sporadic and it seems to occur more frequently when using perf.
>
> Linux imx6-test0 3.16.0-rc2+ #1 SMP Wed Jun 25 15:04:16 CEST 2014 armv7l
> armv7l armv7l GNU/Linux
>
> Any help is greatly appreciated.
>
> Best regards,
> Mattis Lorentzon
>
> Unable to handle kernel paging request at virtual address ffffffff pgd =
> 9e338000 [ffffffff] *pgd=2fffd821, *pte=00000000, *ppte=00000000 Internal
> error: Oops: 17 [#1] SMP ARM Modules linked in:
> CPU: 0 PID: 146 Comm: stereo Not tainted 3.16.0-rc2+ #1
> task: 9e07a700 ti: 81c42000 task.ti: 81c42000 PC is at
> find_get_entry+0x60/0xfc LR is at radix_tree_lookup_slot+0x1c/0x2c
> pc : [<800a34d8>]    lr : [<80290448>]    psr: a0000013
> sp : 81c43d98  ip : 00000000  fp : 81c43dcc
> r10: 00000001  r9 : 9e30e3c0  r8 : 000002a7
> r7 : 9f3758a0  r6 : 00000000  r5 : 00000001  r4 : 00000000
> r3 : 81c43d84  r2 : 00000000  r1 : 000002a7  r0 : ffffffff
> Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c5387d  Table: 2e33804a  DAC: 00000015 Process stereo (pid: 146,
> stack limit = 0x81c42240)
> Stack: (0x81c43d98 to 0x81c44000)
> 3d80:                                                       00000000 00000000
> 3da0: 800a3478 000a6000 81c43ecc 00000000 9f37589c 00000000 806cb02a
> 000002a7
> 3dc0: 81c43e04 81c43dd0 800a406c 800a3484 80061ca0 9fc2dfe0 00000013
> 00000059
> 3de0: 9f37589c 9f375770 00000300 000002a7 9e30e3c0 000002a7 81c43e94
> 81c43e08
> 3e00: 800a50c4 800a4040 00000000 00000000 801d1818 00000000 00001000
> 00080001
> 3e20: 000002a6 9f3757f4 00000300 000a7000 00000000 801d1818 9e30e490
> 9f37567c
> 3e40: 81c43ee8 81c43ed4 00000000 00000000 804d87e0 80067098 00000004
> 9f375770
> 3e60: 81c43e94 81c43e70 801d491c 81c43ee8 9f375770 81c43ed4 9e30e3c0
> 9e07a700
> 3e80: 76907000 00000000 81c43ebc 81c43e98 801d1818 800a4dfc 80061ca0
> 80061b0c
> 3ea0: 9f375770 00200000 00000000 81c43f78 81c43f44 81c43ec0 800e1348
> 801d17b8
> 3ec0: 00100000 81c43ed0 800e1764 76907000 00100000 00000000 000a7000
> 00059000
> 3ee0: 81c43ecc 00000001 9e30e3c0 00000000 00000000 00000000 9e07a700
> 00000000
> 3f00: 00000000 00000000 00200000 00000000 00100000 00000000 00000000
> 00000000
> 3f20: 9e30e3c0 9e30e3c0 76907000 81c43f78 9e30e3c0 00100000 81c43f74
> 81c43f48
> 3f40: 800e1adc 800e12b8 00000000 0027cce0 00200000 00000000 9e30e3c0
> 9e30e3c0
> 3f60: 00100000 76907000 81c43fa4 81c43f78 800e2200 800e1a58 00200000
> 00000000
> 3f80: 0027cce0 00000000 0007cce0 00000003 8000ebc4 81c42000 00000000
> 81c43fa8
> 3fa0: 8000ea00 800e21c8 0027cce0 00000000 00000003 76907000 00100000
> 00000000
> 3fc0: 0027cce0 00000000 0007cce0 00000003 0142b5a0 00000000 00000000
> 00000000
> 3fe0: 00000000 7ec59d94 76dc26ac 76e1762c 60000010 00000003 00000000
> 00000000
> Backtrace:
> [<800a3478>] (find_get_entry) from [<800a406c>]
> (pagecache_get_page+0x38/0x1d8)
>  r8:000002a7 r7:806cb02a r6:00000000 r5:9f37589c r4:00000000 [<800a4034>]
> (pagecache_get_page) from [<800a50c4>]
> (generic_file_read_iter+0x2d4/0x750)
>  r10:000002a7 r9:9e30e3c0 r8:000002a7 r7:00000300 r6:9f375770 r5:9f37589c
>  r4:00000059
> [<800a4df0>] (generic_file_read_iter) from [<801d1818>]
> (nfs_file_read+0x6c/0xa8)
>  r10:00000000 r9:76907000 r8:9e07a700 r7:9e30e3c0 r6:81c43ed4 r5:9f375770
>  r4:81c43ee8
> [<801d17ac>] (nfs_file_read) from [<800e1348>]
> (new_sync_read+0x9c/0xc4)
>  r6:81c43f78 r5:00000000 r4:00200000
> [<800e12ac>] (new_sync_read) from [<800e1adc>] (vfs_read+0x90/0x150)
>  r8:00100000 r7:9e30e3c0 r6:81c43f78 r5:76907000 r4:9e30e3c0 [<800e1a4c>]
> (vfs_read) from [<800e2200>] (SyS_read+0x44/0x98)
>  r9:76907000 r8:00100000 r7:9e30e3c0 r6:9e30e3c0 r5:00000000 r4:00200000
> [<800e21bc>] (SyS_read) from [<8000ea00>] (ret_fast_syscall+0x0/0x48)
>  r9:81c42000 r8:8000ebc4 r7:00000003 r6:0007cce0 r5:00000000 r4:0027cce0
> Code: e1a01008 eb07b3d6 e3500000 0a00001c (e5904000) ---[ end trace
> bebb56a5d6f464ed ]---

***************************************************************
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***************************************************************

[-- Attachment #2: oops_v3.13.txt --]
[-- Type: text/plain, Size: 3755 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000037
pgd = 9e6cc000
[00000037] *pgd=2e744831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 246 Comm: top Not tainted 3.13.0+ #12
task: 9e646c00 ti: 97c50000 task.ti: 97c50000
PC is at lookup_fast+0x54/0x318
LR is at mark_held_locks+0x78/0x13c
pc : [<800e2838>]    lr : [<8005b028>]    psr: a0000013
sp : 97c51d20  ip : 636f7270  fp : 97c51d74
r10: 00000000  r9 : 97c51d98  r8 : 800e330c
r7 : 9e52a015  r6 : 9f402170  r5 : 97c51e60  r4 : 97c51d90
r3 : 9e52a015  r2 : 00000000  r1 : 9f458824  r0 : ffffffff
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 2e6cc04a  DAC: 00000015
Process top (pid: 246, stack limit = 0x97c50240)
Stack: (0x97c51d20 to 0x97c52000)
1d20: 00000003 9f8542e0 9e52a010 97c51e60 97c51e60 9f811c10 97c51d64 ffffffff
1d40: 800e2e70 00000004 60000013 97c51e60 97c51d98 97c51d90 9e52a015 97c50000
1d60: 9e52a016 00000000 97c51ddc 97c51d78 800e330c 800e27f0 8005b2e4 8005b0f8
1d80: 97c50038 97c51d90 800e41d0 8005b2dc 00000002 00000000 800e42d4 800bb7a8
1da0: 7eeb05f0 00000004 9e52a011 8005dbf8 9e646c00 00000041 9e52a010 97c51e60
1dc0: 97c51e60 97c50020 97c50000 00000000 97c51e3c 97c51de0 800e42fc 800e318c
1de0: 97c51df8 00000000 800bb7a8 9e646c00 00000008 00000fe8 00000000 97c51e08
1e00: 8029ebb4 800bb770 9e52a000 00000ff0 80b997f0 00000001 9e52a000 97c51e60
1e20: ffffff9c ffffff9c 97c50000 00000000 97c51e5c 97c51e40 800e49e8 800e42ac
1e40: 9e52a000 00000001 97c51e60 97c51f00 97c51ee4 97c51e60 800e7350 800e49cc
1e60: 9f811c10 9f402170 7eeb05f0 00000004 9e52a011 8005dbf8 9f811c10 9f402170
1e80: 9f8542e0 00000051 00000002 0000004c 00000000 00000000 00000001 00000000
1ea0: 9f827a80 9e4fa000 8062c340 800f6978 800f6c54 800f6b9c 00000010 00000000
1ec0: 97c51f04 7efe06c0 00000001 ffffff9c 7efe073c 97c51f40 97c51efc 97c51ee8
1ee0: 800e7394 800e7300 00000000 9f6ae420 97c51f2c 97c51f00 800dcb6c 800e7384
1f00: 800f6c54 800f695c 97c51f54 7efe06c0 00083633 00000017 000000c3 8000e9c4
1f20: 97c51f3c 97c51f30 800dcbe0 800dcb2c 97c51fa4 97c51f40 800dd2bc 800dcbcc
1f40: 8000e890 9e646c00 00000001 8000e9c4 97c50000 00000000 97c51f84 97c51f68
1f60: 8005b1f8 8005afbc 013bc048 00083633 00000017 000000c3 97c51f94 97c51f88
1f80: 8005b2e4 8005b0f8 00000000 97c51f98 8000e918 013bc048 00000000 97c51fa8
1fa0: 8000e800 800dd2ac 013bc048 00083633 7efe073c 7efe06c0 00000000 7efe073c
1fc0: 013bc048 00083633 00000017 000000c3 00083622 00133fa4 000003c0 7efe06b4
1fe0: 0000000a 7efe0698 000c15dc 000c1558 00000010 7efe073c 00000000 00000000
Backtrace: 
[<800e27e4>] (lookup_fast+0x0/0x318) from [<800e330c>] (link_path_walk+0x18c/0x814)
[<800e3180>] (link_path_walk+0x0/0x814) from [<800e42fc>] (path_lookupat+0x5c/0x720)
[<800e42a0>] (path_lookupat+0x0/0x720) from [<800e49e8>] (filename_lookup.isra.55+0x28/0x68)
[<800e49c0>] (filename_lookup.isra.55+0x0/0x68) from [<800e7350>] (user_path_at_empty+0x5c/0x84)
 r7:97c51f00 r6:97c51e60 r5:00000001 r4:9e52a000
[<800e72f4>] (user_path_at_empty+0x0/0x84) from [<800e7394>] (user_path_at+0x1c/0x24)
 r8:97c51f40 r7:7efe073c r6:ffffff9c r5:00000001 r4:7efe06c0
[<800e7378>] (user_path_at+0x0/0x24) from [<800dcb6c>] (vfs_fstatat+0x4c/0xa0)
[<800dcb20>] (vfs_fstatat+0x0/0xa0) from [<800dcbe0>] (vfs_stat+0x20/0x24)
 r8:8000e9c4 r7:000000c3 r6:00000017 r5:00083633 r4:7efe06c0
[<800dcbc0>] (vfs_stat+0x0/0x24) from [<800dd2bc>] (SyS_stat64+0x1c/0x38)
[<800dd2a0>] (SyS_stat64+0x0/0x38) from [<8000e800>] (ret_fast_syscall+0x0/0x48)
 r4:013bc048
Code: eb0034a1 e3500000 e50b0038 0a000082 (e5903038) 
---[ end trace 5b371848e2866ee2 ]---
NOHZ: local_softirq_pending 100

[-- Attachment #3: oops_v3.15.txt --]
[-- Type: text/plain, Size: 9848 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000037
pgd = 80004000
[00000037] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.15.0+ #8
task: 9f470000 ti: 9f452000 task.ti: 9f452000
PC is at lookup_fast+0x54/0x338
LR is at trace_hardirqs_on_caller+0x10c/0x1e4
pc : [<800e8e68>]    lr : [<80060b94>]    psr: a0000113
sp : 9f453b48  ip : ffffffff  fp : 9f453b9c
r10: 9f453c78  r9 : 9f453bc0  r8 : 800e9a7c
r7 : 9f504013  r6 : 9f0020b8  r5 : 9f453c78  r4 : 9f453bb8
r3 : 9f504010  r2 : 00000000  r1 : 9f0ba0f4  r0 : ffffffff
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 1000404a  DAC: 00000015
Process swapper/0 (pid: 1, stack limit = 0x9f452240)
Stack: (0x9f453b48 to 0x9f454000)
3b40:                   00000003 9f4542e8 00000076 9f453d04 9f504000 9f411b50
3b60: 9f453b8c ffffffff 800e95e0 00000004 800ea668 9f453bc0 00000003 9f453bb8
3b80: 9f504013 9f452000 9f504014 9f453c78 9f453c04 9f453ba0 800e9a7c 800e8e20
3ba0: 9f453bbc 9f453bb0 9f452020 80060a94 9f453c04 9f453bc0 800ea668 80060c78
3bc0: 00000001 00000000 00726176 00000003 9f504010 806993d8 800e0990 9f75c000
3be0: 9f453c78 9f453d04 9f504000 00000000 00000000 9f452000 9f453c6c 9f453c08
3c00: 800ec7a4 800e98fc 9f453c28 00000000 80060a60 9f453cf8 00000000 ffffff9c
3c20: 9f70265e 60000113 00000000 00000000 00000001 00000400 9f438000 81800040
3c40: 9f453c84 9f453d04 00000001 9f504000 ffffff9c 00000007 806787b8 81800040
3c60: 9f453cf4 9f453c70 800edd00 800ec71c 00000041 00000000 9f411b50 9f0020b8
3c80: 00726176 00000003 9f504010 806993d8 00000000 80065f74 9f4542e8 00000051
3ca0: 00000002 00000018 00000000 00000000 00000241 9f438040 00000241 00000241
3cc0: 9f504000 ffffff9c 806787b8 00000007 806787b8 81800040 00000241 9f504000
3ce0: ffffff9c 00000000 9f453d3c 9f453cf8 800de710 800edcd8 80649b3c 806491c8
3d00: 800db8dc 00000241 303081a4 00000022 00000300 00000001 806787b8 806787b8
3d20: 9f70265e 806787b8 00000007 806787b8 9f453d4c 9f453d40 800de7f4 800de608
3d40: 9f453d6c 9f453d50 8064a048 800de7e0 806787b8 806787b8 00005a10 806787b8
3d60: 9f453d8c 9f453d70 8064931c 80649f80 806787b8 00005a10 9f7025f0 00008000
3d80: 9f453db4 9f453d90 80649374 806492ec 9f700000 9f6ae040 00000000 8065fce8
3da0: 80649330 00008000 9f453df4 9f453db8 8065ffb4 8064933c 00000000 00000000
3dc0: 81800040 00008000 9f453df4 00000000 00000000 00bce825 81800040 806787b8
3de0: 806787b8 806bf9d4 9f453e44 9f453df8 80649798 8065fd08 00000000 806bf9d4
3e00: 80649198 806bf9d0 00000000 00000000 804c9744 805e7b88 805beaa0 8067ee90
3e20: 806bf9d0 00000000 806bf9cc 80649d34 80678728 806bf9d0 9f453ebc 9f453e48
3e40: 80649db0 80649620 806a0ca0 80661750 80678728 00000000 9f453e84 9f453e68
3e60: 804c9744 8006a69c 805eee98 9f453e8c ffffffff 9f453e8c 9f453ebc 9f453e98
3e80: 80661858 804c9718 805eee98 00000040 00000040 9f452000 00000005 806bf980
3ea0: 806bf980 80649d34 80678728 00000000 9f453f54 9f453ec0 800089a4 80649d40
3ec0: 9f453ee4 9f453ed0 804d0ea0 9f452000 00000000 806921fc 9f453f24 9f453ee8
3ee0: 9f453f0c 9f453ef0 80647500 8028e40c 80647558 9fffcb2d 804ee974 00000069
3f00: 9f453f54 9f453f10 80040d64 80647564 80678710 00000005 9fffcb41 00000005
3f20: 80644ed0 00000000 9f470000 8067eb44 00000005 806bf980 806bf980 80647558
3f40: 80678728 00000069 9f453f94 9f453f58 80647d00 800088a0 00000005 00000005
3f60: 80647558 00000000 8004a61c 806bf980 804c5848 00000000 00000000 00000000
3f80: 00000000 00000000 9f453fac 9f453f98 804c5860 80647bc8 00000000 00000000
3fa0: 00000000 9f453fb0 8000e9c8 804c5854 00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 5a405a7b 0a611b6a
Backtrace: 
[<800e8e14>] (lookup_fast) from [<800e9a7c>] (link_path_walk+0x18c/0x804)
 r10:9f453c78 r9:9f504014 r8:9f452000 r7:9f504013 r6:9f453bb8 r5:00000003
 r4:9f453bc0
[<800e98f0>] (link_path_walk) from [<800ec7a4>] (path_openat+0x94/0x624)
 r10:9f452000 r9:00000000 r8:00000000 r7:9f504000 r6:9f453d04 r5:9f453c78
 r4:9f75c000
[<800ec710>] (path_openat) from [<800edd00>] (do_filp_open+0x34/0x88)
 r10:81800040 r9:806787b8 r8:00000007 r7:ffffff9c r6:9f504000 r5:00000001
 r4:9f453d04
[<800edccc>] (do_filp_open) from [<800de710>] (do_sys_open+0x114/0x1d8)
 r7:00000000 r6:ffffff9c r5:9f504000 r4:00000241
[<800de5fc>] (do_sys_open) from [<800de7f4>] (SyS_open+0x20/0x24)
 r9:806787b8 r8:00000007 r7:806787b8 r6:9f70265e r5:806787b8 r4:806787b8
[<800de7d4>] (SyS_open) from [<8064a048>] (do_name+0xd4/0x230)
[<80649f74>] (do_name) from [<8064931c>] (write_buffer+0x3c/0x50)
 r7:806787b8 r6:00005a10 r5:806787b8 r4:806787b8
[<806492e0>] (write_buffer) from [<80649374>] (flush_buffer+0x44/0xa8)
 r6:00008000 r5:9f7025f0 r4:00005a10 r3:806787b8
[<80649330>] (flush_buffer) from [<8065ffb4>] (gunzip+0x2b8/0x378)
 r9:00008000 r8:80649330 r7:8065fce8 r6:00000000 r5:9f6ae040 r4:9f700000
[<8065fcfc>] (gunzip) from [<80649798>] (unpack_to_rootfs+0x184/0x2a8)
 r10:806bf9d4 r9:806787b8 r8:806787b8 r7:81800040 r6:00bce825 r5:00000000
 r4:00000000
[<80649614>] (unpack_to_rootfs) from [<80649db0>] (populate_rootfs+0x7c/0x240)
 r10:806bf9d0 r9:80678728 r8:80649d34 r7:806bf9cc r6:00000000 r5:806bf9d0
 r4:8067ee90
[<80649d34>] (populate_rootfs) from [<800089a4>] (do_one_initcall+0x110/0x160)
 r10:00000000 r9:80678728 r8:80649d34 r7:806bf980 r6:806bf980 r5:00000005
 r4:9f452000
[<80008894>] (do_one_initcall) from [<80647d00>] (kernel_init_freeable+0x144/0x1e8)
 r10:00000069 r9:80678728 r8:80647558 r7:806bf980 r6:806bf980 r5:00000005
 r4:8067eb44
[<80647bbc>] (kernel_init_freeable) from [<804c5860>] (kernel_init+0x18/0xf0)
 r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:804c5848
 r4:806bf980
[<804c5848>] (kernel_init) from [<8000e9c8>] (ret_from_fork+0x14/0x2c)
 r4:00000000 r3:00000000
Code: eb00361c e3500000 e50b0038 0a00008a (e5903038) 
---[ end trace dd723d1d10b06b5b ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

CPU0: stopping
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D       3.15.0+ #8
Backtrace: 
[<8001219c>] (dump_backtrace) from [<80012478>] (show_stack+0x18/0x1c)
 r6:00000000 r5:00000000 r4:8069fe84 r3:00000000
[<80012460>] (show_stack) from [<804cb798>] (dump_stack+0x8c/0x9c)
[<804cb70c>] (dump_stack) from [<80014fa8>] (handle_IPI+0x154/0x164)
 r5:8067fffc r4:00000000
[<80014e54>] (handle_IPI) from [<800085c0>] (gic_handle_irq+0x60/0x64)
 r6:80683f20 r5:8068aa40 r4:f400010c r3:8068d6e8
[<80008560>] (gic_handle_irq) from [<80012fc4>] (__irq_svc+0x44/0x58)
Exception stack(0x80683f20 to 0x80683f68)
3f20: 8000f5b8 80682000 00000000 00000000 8068a530 80682000 8068a4e0 806bf81e
3f40: 804d40f0 80682000 80682000 80683f74 80683f58 80683f68 80060c80 8000f5bc
3f60: 60000113 ffffffff
 r7:80683f54 r6:ffffffff r5:60000113 r4:8000f5bc
[<8000f590>] (arch_cpu_idle) from [<8005c504>] (cpu_startup_entry+0x108/0x160)
[<8005c3fc>] (cpu_startup_entry) from [<804c5838>] (rest_init+0xc8/0xd8)
 r7:80678928 r3:00000000
[<804c5770>] (rest_init) from [<80647bb0>] (start_kernel+0x374/0x380)
 r5:00000001 r4:8068a5d8
[<8064783c>] (start_kernel) from [<10008074>] (0x10008074)
 r9:412fc09a r8:1000406a r7:8068ed84 r6:80678924 r5:8068a4d0 r4:10c5387d
CPU2: stopping
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D       3.15.0+ #8
Backtrace: 
[<8001219c>] (dump_backtrace) from [<80012478>] (show_stack+0x18/0x1c)
 r6:00000000 r5:00000000 r4:8069fe84 r3:00000000
[<80012460>] (show_stack) from [<804cb798>] (dump_stack+0x8c/0x9c)
[<804cb70c>] (dump_stack) from [<80014fa8>] (handle_IPI+0x154/0x164)
 r5:8067fffc r4:00000002
[<80014e54>] (handle_IPI) from [<800085c0>] (gic_handle_irq+0x60/0x64)
 r6:9f49bf70 r5:8068aa40 r4:f400010c r3:9f476180
[<80008560>] (gic_handle_irq) from [<80012fc4>] (__irq_svc+0x44/0x58)
Exception stack(0x9f49bf70 to 0x9f49bfb8)
bf60:                                     8000f5b8 9f49a000 00000000 00000000
bf80: 8068a530 9f49a000 8068a4e0 806bf81e 804d40f0 9f49a000 9f49a000 9f49bfc4
bfa0: 9f49bfa8 9f49bfb8 80060c80 8000f5bc 60000113 ffffffff
 r7:9f49bfa4 r6:ffffffff r5:60000113 r4:8000f5bc
[<8000f590>] (arch_cpu_idle) from [<8005c504>] (cpu_startup_entry+0x108/0x160)
[<8005c3fc>] (cpu_startup_entry) from [<80014bf8>] (secondary_start_kernel+0x140/0x148)
 r7:806bfc88 r3:9f476180
[<80014ab8>] (secondary_start_kernel) from [<10008664>] (0x10008664)
 r5:00000015 r4:2f48806a
CPU3: stopping
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D       3.15.0+ #8
Backtrace: 
[<8001219c>] (dump_backtrace) from [<80012478>] (show_stack+0x18/0x1c)
 r6:00000000 r5:00000000 r4:8069fe84 r3:00000000
[<80012460>] (show_stack) from [<804cb798>] (dump_stack+0x8c/0x9c)
[<804cb70c>] (dump_stack) from [<80014fa8>] (handle_IPI+0x154/0x164)
 r5:8067fffc r4:00000003
[<80014e54>] (handle_IPI) from [<800085c0>] (gic_handle_irq+0x60/0x64)
 r6:9f49df70 r5:8068aa40 r4:f400010c r3:9f476b40
[<80008560>] (gic_handle_irq) from [<80012fc4>] (__irq_svc+0x44/0x58)
Exception stack(0x9f49df70 to 0x9f49dfb8)
df60:                                     8000f5b8 9f49c000 00000000 00000000
df80: 8068a530 9f49c000 8068a4e0 806bf81e 804d40f0 9f49c000 9f49c000 9f49dfc4
dfa0: 9f49dfa8 9f49dfb8 80060c80 8000f5bc 60000113 ffffffff
 r7:9f49dfa4 r6:ffffffff r5:60000113 r4:8000f5bc
[<8000f590>] (arch_cpu_idle) from [<8005c504>] (cpu_startup_entry+0x108/0x160)
[<8005c3fc>] (cpu_startup_entry) from [<80014bf8>] (secondary_start_kernel+0x140/0x148)
 r7:806bfc88 r3:9f476b40
[<80014ab8>] (secondary_start_kernel) from [<10008664>] (0x10008664)
 r5:00000015 r4:2f48806a
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

[-- Attachment #4: config-v3.12.gz --]
[-- Type: application/x-gzip, Size: 12937 bytes --]

[-- Attachment #5: config-v3.13.gz --]
[-- Type: application/x-gzip, Size: 13164 bytes --]

[-- Attachment #6: config-v3.14.gz --]
[-- Type: application/x-gzip, Size: 13298 bytes --]

[-- Attachment #7: config-v3.15.gz --]
[-- Type: application/x-gzip, Size: 14686 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2014-12-16 15:16 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 13:55 Oops: 17 SMP ARM (v3.16-rc2) Mattis Lorentzon
2014-06-26 14:01 ` Russell King - ARM Linux
2014-06-26 14:44   ` Mattis Lorentzon
2014-06-26 14:44     ` Mattis Lorentzon
2014-06-26 15:14     ` Russell King - ARM Linux
2014-06-26 15:14       ` Russell King - ARM Linux
2014-06-27 11:21       ` Russell King - ARM Linux
2014-06-27 11:21         ` Russell King - ARM Linux
2014-06-27 16:16         ` Fredrik Noring
2014-06-27 16:16           ` Fredrik Noring
2014-06-27 16:31           ` Russell King - ARM Linux
2014-06-27 16:31             ` Russell King - ARM Linux
2014-06-30  6:22             ` Fredrik Noring
2014-06-30  6:22               ` Fredrik Noring
2014-06-30 12:30             ` Fredrik Noring
2014-06-30 12:30               ` Fredrik Noring
2014-06-30 13:00               ` Nathan Lynch
2014-06-30 13:00                 ` Nathan Lynch
2014-07-02  6:02             ` Fredrik Noring
2014-07-02  6:02               ` Fredrik Noring
2014-08-05 13:31             ` Mattis Lorentzon
2014-08-05 13:31               ` Mattis Lorentzon
2014-08-05 13:53               ` Fabio Estevam
2014-08-05 13:53                 ` Fabio Estevam
2014-08-06  6:48                 ` Mattis Lorentzon
2014-08-06  6:48                   ` Mattis Lorentzon
2014-08-06  9:50               ` Russell King - ARM Linux
2014-08-06  9:50                 ` Russell King - ARM Linux
2014-08-06 11:10                 ` Mattis Lorentzon
2014-08-06 11:10                   ` Mattis Lorentzon
2014-08-06 12:55                   ` Russell King - ARM Linux
2014-08-06 12:55                     ` Russell King - ARM Linux
2014-08-07 11:11                     ` Mattis Lorentzon
2014-08-07 11:11                       ` Mattis Lorentzon
2014-08-07 12:12                       ` Russell King - ARM Linux
2014-08-07 12:12                         ` Russell King - ARM Linux
2014-08-07 14:20                         ` Fabio Estevam
2014-08-07 14:20                           ` Fabio Estevam
2014-08-07 14:38                           ` Fabio Estevam
2014-08-07 14:38                             ` Fabio Estevam
2014-08-08  1:30                             ` Troy Kisky
2014-08-08  1:30                               ` Troy Kisky
2014-08-08 14:05                           ` Fabio Estevam
2014-08-08 14:05                             ` Fabio Estevam
2014-08-08 18:09                         ` Russell King - ARM Linux
2014-08-08 18:09                           ` Russell King - ARM Linux
2014-08-11 13:32                           ` Mattis Lorentzon
2014-08-11 13:32                             ` Mattis Lorentzon
2014-08-11 17:41                             ` Fabio Estevam
2014-08-11 17:41                               ` Fabio Estevam
2014-08-13 13:39                               ` Mattis Lorentzon
2014-08-13 13:39                                 ` Mattis Lorentzon
2014-08-25 10:18                                 ` Russell King - ARM Linux
2014-08-25 10:18                                   ` Russell King - ARM Linux
2014-08-26 13:11                                   ` Iain Paton
2014-08-26 13:11                                     ` Iain Paton
2014-08-14 14:43                               ` Mattis Lorentzon
2014-08-14 14:43                                 ` Mattis Lorentzon
2014-08-14 15:30                                 ` Fabio Estevam
2014-08-14 15:30                                   ` Fabio Estevam
2014-08-15  5:42                                   ` Mattis Lorentzon
2014-08-15  5:42                                     ` Mattis Lorentzon
2014-08-17 21:34                                     ` Iain Paton
2014-08-17 21:34                                       ` Iain Paton
2014-08-17 21:46                                       ` Fabio Estevam
2014-08-17 21:46                                         ` Fabio Estevam
2014-08-19  6:03                                         ` Iain Paton
2014-08-19  6:03                                           ` Iain Paton
2014-08-21  9:39                                           ` Iain Paton
2014-08-21  9:39                                             ` Iain Paton
2014-08-22  0:01                                             ` Fabio Estevam
2014-08-22  0:01                                               ` Fabio Estevam
2014-08-22  6:39                                               ` Mattis Lorentzon
2014-08-22  6:39                                                 ` Mattis Lorentzon
2014-08-22 10:36                                               ` Iain Paton
2014-08-22 10:36                                                 ` Iain Paton
2014-08-27  6:32                                                 ` Mattis Lorentzon
2014-08-27  6:32                                                   ` Mattis Lorentzon
2014-08-27 10:43                                                   ` Iain Paton
2014-08-27 10:43                                                     ` Iain Paton
2014-08-29 10:57                                                     ` Mattis Lorentzon
2014-08-29 10:57                                                       ` Mattis Lorentzon
2014-08-29 11:30                                                       ` Fabio Estevam
2014-08-29 11:30                                                         ` Fabio Estevam
2014-12-16 14:50                                                       ` Mattis Lorentzon
2014-12-16 14:50                                                         ` Mattis Lorentzon
2014-08-26 13:12                                             ` Iain Paton
2014-08-26 13:12                                               ` Iain Paton
2014-08-22  8:27                                 ` Russell King - ARM Linux
2014-08-22  8:27                                   ` Russell King - ARM Linux
2014-06-26 13:16 Mattis Lorentzon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.