All of lore.kernel.org
 help / color / mirror / Atom feed
* sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-06-16 21:06 ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2013-06-16 21:06 UTC (permalink / raw)
  To: sparclinux, Linux Kernel list

Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
including most -rc's). Does not seem to be reproducible.

[  568.834221] ------------[ cut here ]------------
[  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
[  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
[  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
[  569.129412] Call Trace:
[  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
[  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
[  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
[  569.359301]  [000000000045dc24] do_exit+0x244/0x340
[  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
[  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
[  569.572039]  [0000000000447874] do_signal32+0x14/0x220
[  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
[  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
[  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
[  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
[  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1

Full dmesg:

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.a 2010/01/06 14:48'
[    0.000000] PROMLIB: Root node compatible: 
[    0.000000] Linux version 3.10.0-rc6 (mroos@v210) (gcc version 4.6.4 (Debian 4.6.4-2) ) #85 SMP Sun Jun 16 16:02:21 EEST 2013
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:0a:f3:85
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF stdout device is: /pci@1e,600000/isa@7/serial@0,3f8
[    0.000000] PROM: Built device tree with 77761 bytes of memory.
[    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x00000000-0x0fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00000000-0x0fffffff]
[    0.000000] On node 0 totalpages: 32768
[    0.000000]   Normal zone: 256 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 32768 pages, LIFO batch:7
[    0.000000] Booting Linux...
[    0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32]
[    0.000000] CPU CAPS: [v8plus,vis,vis2]
[    0.000000] PERCPU: Embedded 6 pages/cpu @fffff8000f000000 s13440 r8192 d27520 u2097152
[    0.000000] pcpu-alloc: s13440 r8192 d27520 u2097152 alloc=1*4194304
[    0.000000] pcpu-alloc: [0] 0 1 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
[    0.000000] Kernel command line: root=/dev/sda2 ro mem=256M debug ignore_loglevel
[    0.000000] PID hash table entries: 1024 (order: 0, 8192 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 131072 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] Memory: 253216k available (3248k kernel code, 1032k data, 152k init) [fffff80000000000,0000000010000000]
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	Additional per-CPU info printed with stalls.
[    0.000000] NR_IRQS:255
[    0.000000] clocksource: mult[53555555] shift[24]
[    0.000000] clockevent: mult[3126e98] shift[32]
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled, bootconsole disabled
[   33.851595] Calibrating delay using timer specific routine.. 24.00 BogoMIPS (lpj=120048)
[   33.851610] pid_max: default: 32768 minimum: 301
[   33.851772] Mount-cache hash table entries: 512
[   33.854162] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 6 cycles)
[   33.854242] Brought up 2 CPUs
[   33.854269] Testing NMI watchdog ... OK.
[   34.055163] NET: Registered protocol family 16
[   34.063864] /pci@1f,700000: TOMATILLO PCI Bus Module ver[4:0]
[   34.063883] /pci@1f,700000: PCI IO[7f601000000] MEM[7f700000000]
[   34.065866] PCI: Scanning PBM /pci@1f,700000
[   34.066071] schizo f0069c00: PCI host bridge to bus 0000:00
[   34.066093] pci_bus 0000:00: root bus resource [io  0x7f601000000-0x7f601ffffff] (bus address [0x0000-0xffffff])
[   34.066111] pci_bus 0000:00: root bus resource [mem 0x7f700000000-0x7f7ffffffff] (bus address [0x00000000-0xffffffff])
[   34.066127] pci_bus 0000:00: root bus resource [bus 00]
[   34.066228] pci 0000:00:02.0: PME# supported from D3hot
[   34.066453] pci 0000:00:02.1: PME# supported from D3hot
[   34.066764] /pci@1e,600000: TOMATILLO PCI Bus Module ver[4:0]
[   34.066778] /pci@1e,600000: PCI IO[7fe01000000] MEM[7ff00000000]
[   34.068753] PCI: Scanning PBM /pci@1e,600000
[   34.068943] schizo f00732d0: PCI host bridge to bus 0001:00
[   34.068969] pci_bus 0001:00: root bus resource [io  0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
[   34.068996] pci_bus 0001:00: root bus resource [mem 0x7ff00000000-0x7ffffffffff] (bus address [0x00000000-0xffffffff])
[   34.069022] pci_bus 0001:00: root bus resource [bus 00]
[   34.069281] pci 0001:00:06.0: quirk: [io  0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
[   34.069310] pci 0001:00:06.0: quirk: [io  0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
[   34.069518] pci 0001:00:0a.0: PME# supported from D3cold
[   34.070045] /pci@1c,600000: TOMATILLO PCI Bus Module ver[4:0]
[   34.070065] /pci@1c,600000: PCI IO[7ce01000000] MEM[7cf00000000]
[   34.072083] PCI: Scanning PBM /pci@1c,600000
[   34.072281] schizo f007c6ac: PCI host bridge to bus 0002:00
[   34.072306] pci_bus 0002:00: root bus resource [io  0x7ce01000000-0x7ce01ffffff] (bus address [0x0000-0xffffff])
[   34.072332] pci_bus 0002:00: root bus resource [mem 0x7cf00000000-0x7cfffffffff] (bus address [0x00000000-0xffffffff])
[   34.072358] pci_bus 0002:00: root bus resource [bus 00]
[   34.072452] pci 0002:00:02.0: supports D1 D2
[   34.072674] pci 0002:00:02.1: supports D1 D2
[   34.072990] /pci@1d,700000: TOMATILLO PCI Bus Module ver[4:0]
[   34.073010] /pci@1d,700000: PCI IO[7c601000000] MEM[7c700000000]
[   34.075005] PCI: Scanning PBM /pci@1d,700000
[   34.075213] schizo f00859d4: PCI host bridge to bus 0003:00
[   34.075239] pci_bus 0003:00: root bus resource [io  0x7c601000000-0x7c601ffffff] (bus address [0x0000-0xffffff])
[   34.075266] pci_bus 0003:00: root bus resource [mem 0x7c700000000-0x7c7ffffffff] (bus address [0x00000000-0xffffffff])
[   34.075292] pci_bus 0003:00: root bus resource [bus 00]
[   34.075413] pci 0003:00:02.0: PME# supported from D3hot
[   34.075676] pci 0003:00:02.1: PME# supported from D3hot
[   34.081421] bio: create slab <bio-0> at 0
[   34.081959] vgaarb: loaded
[   34.082430] SCSI subsystem initialized
[   34.083594] /pci@1e,600000/isa@7/rtc@0,70: RTC regs at 0x7fe01000070
[   34.084869] Switching to clocksource stick
[   34.092652] NET: Registered protocol family 2
[   34.093064] TCP established hash table entries: 2048 (order: 2, 32768 bytes)
[   34.093187] TCP bind hash table entries: 2048 (order: 2, 32768 bytes)
[   34.093303] TCP: Hash tables configured (established 2048 bind 2048)
[   34.093390] TCP: reno registered
[   34.093407] UDP hash table entries: 256 (order: 0, 8192 bytes)
[   34.093451] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes)
[   34.093698] NET: Registered protocol family 1
[   34.093757] pci 0001:00:07.0: Activating ISA DMA hang workarounds
[   34.093789] PCI: Enabling device: (0001:00:0a.0), cmd 2
[   34.154948] PCI: CLS 64 bytes, default 64
[   34.155142] power: Control reg at 7fe01000800
[   34.155530] chmc: UltraSPARC-IIIi memory controller at /memory-controller@0,0
[   34.155563] chmc: UltraSPARC-IIIi memory controller at /memory-controller@1,0
[   34.165715] msgmni has been set to 494
[   34.166349] io scheduler noop registered
[   34.166523] io scheduler cfq registered (default)
[   34.167195] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15) is a 16550A
[   34.167215] Console: ttyS0 (SU)
[   42.374594] console [ttyS0] enabled
[   42.420665] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15) is a 16550A
[   42.509515] PCI: Enabling device: (0002:00:02.0), cmd 147
[   42.581037] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
[   42.659821] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
[   42.778199] sym0: SCSI BUS has been reset.
[   42.831989] scsi0 : sym-2.2.3
[   45.866159] scsi 0:0:0:0: Direct-Access     FUJITSU  MAW3073NCSUN72G  1703 PQ: 0 ANSI: 4
[   45.972582] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
[   46.075599] scsi target0:0:0: Beginning Domain Validation
[   46.152355] scsi target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
[   46.358085] scsi target0:0:0: Ending Domain Validation
[   50.552347] PCI: Enabling device: (0002:00:02.1), cmd 147
[   50.623892] sym1: <1010-66> rev 0x1 at pci 0002:00:02.1 irq 25
[   50.702671] sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
[   50.821133] sym1: SCSI BUS has been reset.
[   50.874938] scsi1 : sym-2.2.3
[   58.301097] mousedev: PS/2 mouse device common for all mice
[   58.301843] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
[   58.304753] sd 0:0:0:0: [sda] Write Protect is off
[   58.304759] sd 0:0:0:0: [sda] Mode Sense: c7 00 00 08
[   58.305897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   58.317159]  sda: sda1 sda2 sda3 sda4
[   58.321798] sd 0:0:0:0: [sda] Attached SCSI disk
[   58.833788] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[   58.917383] rtc_cmos rtc_cmos: no alarms, 114 bytes nvram
[   58.989328] TCP: cubic registered
[   59.032880] NET: Registered protocol family 17
[   59.091956] rtc_cmos rtc_cmos: setting system clock to 2013-06-16 13:05:00 UTC (1371387900)
[   59.204672] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
[   59.306482] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
[   59.429442] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[   59.530217] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
[   61.113258] pps_core: LinuxPPS API ver. 1 registered
[   61.178571] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[   61.299855] PTP clock support registered
[   61.378830] tg3.c:v3.132 (May 21, 2013)
[   61.429280] PCI: Enabling device: (0000:00:02.0), cmd 2
[   61.679664] tg3 0000:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   62.136797] tg3 0000:00:02.0 eth0: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:85
[   62.272986] tg3 0000:00:02.0 eth0: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   62.401073] tg3 0000:00:02.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   62.504008] tg3 0000:00:02.0 eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
[   62.592101] PCI: Enabling device: (0000:00:02.1), cmd 2
[   62.839311] tg3 0000:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   63.295874] tg3 0000:00:02.1 eth1: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:86
[   63.431987] tg3 0000:00:02.1 eth1: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   63.560078] tg3 0000:00:02.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   63.663011] tg3 0000:00:02.1 eth1: dma_rwctrl[763f0000] dma_mask[32-bit]
[   63.751109] PCI: Enabling device: (0003:00:02.0), cmd 2
[   63.999292] tg3 0003:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   64.455848] tg3 0003:00:02.0 eth2: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:87
[   64.592030] tg3 0003:00:02.0 eth2: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   64.720118] tg3 0003:00:02.0 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   64.823056] tg3 0003:00:02.0 eth2: dma_rwctrl[763f0000] dma_mask[32-bit]
[   64.911140] PCI: Enabling device: (0003:00:02.1), cmd 2
[   65.159296] tg3 0003:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   65.615808] tg3 0003:00:02.1 eth3: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:88
[   65.751967] tg3 0003:00:02.1 eth3: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   65.880064] tg3 0003:00:02.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   65.982991] tg3 0003:00:02.1 eth3: dma_rwctrl[763f0000] dma_mask[32-bit]
[   66.549023] Adding 3084472k swap on /dev/sda4.  Priority:-1 extents:1 across:3084472k 
[   66.690487] EXT4-fs (sda2): re-mounted. Opts: (null)
[   66.944966] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
[   68.324377] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[   68.447686] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[   69.338117] NET: Registered protocol family 10
[   70.842065] tg3 0000:00:02.0 eth0: No firmware running
[   71.285041] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   72.946419] tg3 0000:00:02.0 eth0: Link is up at 100 Mbps, full duplex
[   73.039237] tg3 0000:00:02.0 eth0: Flow control is on for TX and on for RX
[   73.146397] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  568.834221] ------------[ cut here ]------------
[  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
[  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
[  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
[  569.129412] Call Trace:
[  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
[  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
[  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
[  569.359301]  [000000000045dc24] do_exit+0x244/0x340
[  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
[  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
[  569.572039]  [0000000000447874] do_signal32+0x14/0x220
[  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
[  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
[  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
[  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
[  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
[  569.994271] [sched_delayed] sched: RT throttling activated


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-06-16 21:06 ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2013-06-16 21:06 UTC (permalink / raw)
  To: sparclinux, Linux Kernel list

Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
including most -rc's). Does not seem to be reproducible.

[  568.834221] ------------[ cut here ]------------
[  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
[  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
[  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
[  569.129412] Call Trace:
[  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
[  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
[  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
[  569.359301]  [000000000045dc24] do_exit+0x244/0x340
[  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
[  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
[  569.572039]  [0000000000447874] do_signal32+0x14/0x220
[  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
[  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
[  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
[  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
[  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1

Full dmesg:

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.a 2010/01/06 14:48'
[    0.000000] PROMLIB: Root node compatible: 
[    0.000000] Linux version 3.10.0-rc6 (mroos@v210) (gcc version 4.6.4 (Debian 4.6.4-2) ) #85 SMP Sun Jun 16 16:02:21 EEST 2013
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:0a:f3:85
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF stdout device is: /pci@1e,600000/isa@7/serial@0,3f8
[    0.000000] PROM: Built device tree with 77761 bytes of memory.
[    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x00000000-0x0fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00000000-0x0fffffff]
[    0.000000] On node 0 totalpages: 32768
[    0.000000]   Normal zone: 256 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 32768 pages, LIFO batch:7
[    0.000000] Booting Linux...
[    0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32]
[    0.000000] CPU CAPS: [v8plus,vis,vis2]
[    0.000000] PERCPU: Embedded 6 pages/cpu @fffff8000f000000 s13440 r8192 d27520 u2097152
[    0.000000] pcpu-alloc: s13440 r8192 d27520 u2097152 alloc=1*4194304
[    0.000000] pcpu-alloc: [0] 0 1 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
[    0.000000] Kernel command line: root=/dev/sda2 ro mem%6M debug ignore_loglevel
[    0.000000] PID hash table entries: 1024 (order: 0, 8192 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 131072 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] Memory: 253216k available (3248k kernel code, 1032k data, 152k init) [fffff80000000000,0000000010000000]
[    0.000000] SLUB: HWalign2, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	Additional per-CPU info printed with stalls.
[    0.000000] NR_IRQS:255
[    0.000000] clocksource: mult[53555555] shift[24]
[    0.000000] clockevent: mult[3126e98] shift[32]
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled, bootconsole disabled
[   33.851595] Calibrating delay using timer specific routine.. 24.00 BogoMIPS (lpj\x120048)
[   33.851610] pid_max: default: 32768 minimum: 301
[   33.851772] Mount-cache hash table entries: 512
[   33.854162] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 6 cycles)
[   33.854242] Brought up 2 CPUs
[   33.854269] Testing NMI watchdog ... OK.
[   34.055163] NET: Registered protocol family 16
[   34.063864] /pci@1f,700000: TOMATILLO PCI Bus Module ver[4:0]
[   34.063883] /pci@1f,700000: PCI IO[7f601000000] MEM[7f700000000]
[   34.065866] PCI: Scanning PBM /pci@1f,700000
[   34.066071] schizo f0069c00: PCI host bridge to bus 0000:00
[   34.066093] pci_bus 0000:00: root bus resource [io  0x7f601000000-0x7f601ffffff] (bus address [0x0000-0xffffff])
[   34.066111] pci_bus 0000:00: root bus resource [mem 0x7f700000000-0x7f7ffffffff] (bus address [0x00000000-0xffffffff])
[   34.066127] pci_bus 0000:00: root bus resource [bus 00]
[   34.066228] pci 0000:00:02.0: PME# supported from D3hot
[   34.066453] pci 0000:00:02.1: PME# supported from D3hot
[   34.066764] /pci@1e,600000: TOMATILLO PCI Bus Module ver[4:0]
[   34.066778] /pci@1e,600000: PCI IO[7fe01000000] MEM[7ff00000000]
[   34.068753] PCI: Scanning PBM /pci@1e,600000
[   34.068943] schizo f00732d0: PCI host bridge to bus 0001:00
[   34.068969] pci_bus 0001:00: root bus resource [io  0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
[   34.068996] pci_bus 0001:00: root bus resource [mem 0x7ff00000000-0x7ffffffffff] (bus address [0x00000000-0xffffffff])
[   34.069022] pci_bus 0001:00: root bus resource [bus 00]
[   34.069281] pci 0001:00:06.0: quirk: [io  0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
[   34.069310] pci 0001:00:06.0: quirk: [io  0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
[   34.069518] pci 0001:00:0a.0: PME# supported from D3cold
[   34.070045] /pci@1c,600000: TOMATILLO PCI Bus Module ver[4:0]
[   34.070065] /pci@1c,600000: PCI IO[7ce01000000] MEM[7cf00000000]
[   34.072083] PCI: Scanning PBM /pci@1c,600000
[   34.072281] schizo f007c6ac: PCI host bridge to bus 0002:00
[   34.072306] pci_bus 0002:00: root bus resource [io  0x7ce01000000-0x7ce01ffffff] (bus address [0x0000-0xffffff])
[   34.072332] pci_bus 0002:00: root bus resource [mem 0x7cf00000000-0x7cfffffffff] (bus address [0x00000000-0xffffffff])
[   34.072358] pci_bus 0002:00: root bus resource [bus 00]
[   34.072452] pci 0002:00:02.0: supports D1 D2
[   34.072674] pci 0002:00:02.1: supports D1 D2
[   34.072990] /pci@1d,700000: TOMATILLO PCI Bus Module ver[4:0]
[   34.073010] /pci@1d,700000: PCI IO[7c601000000] MEM[7c700000000]
[   34.075005] PCI: Scanning PBM /pci@1d,700000
[   34.075213] schizo f00859d4: PCI host bridge to bus 0003:00
[   34.075239] pci_bus 0003:00: root bus resource [io  0x7c601000000-0x7c601ffffff] (bus address [0x0000-0xffffff])
[   34.075266] pci_bus 0003:00: root bus resource [mem 0x7c700000000-0x7c7ffffffff] (bus address [0x00000000-0xffffffff])
[   34.075292] pci_bus 0003:00: root bus resource [bus 00]
[   34.075413] pci 0003:00:02.0: PME# supported from D3hot
[   34.075676] pci 0003:00:02.1: PME# supported from D3hot
[   34.081421] bio: create slab <bio-0> at 0
[   34.081959] vgaarb: loaded
[   34.082430] SCSI subsystem initialized
[   34.083594] /pci@1e,600000/isa@7/rtc@0,70: RTC regs at 0x7fe01000070
[   34.084869] Switching to clocksource stick
[   34.092652] NET: Registered protocol family 2
[   34.093064] TCP established hash table entries: 2048 (order: 2, 32768 bytes)
[   34.093187] TCP bind hash table entries: 2048 (order: 2, 32768 bytes)
[   34.093303] TCP: Hash tables configured (established 2048 bind 2048)
[   34.093390] TCP: reno registered
[   34.093407] UDP hash table entries: 256 (order: 0, 8192 bytes)
[   34.093451] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes)
[   34.093698] NET: Registered protocol family 1
[   34.093757] pci 0001:00:07.0: Activating ISA DMA hang workarounds
[   34.093789] PCI: Enabling device: (0001:00:0a.0), cmd 2
[   34.154948] PCI: CLS 64 bytes, default 64
[   34.155142] power: Control reg at 7fe01000800
[   34.155530] chmc: UltraSPARC-IIIi memory controller at /memory-controller@0,0
[   34.155563] chmc: UltraSPARC-IIIi memory controller at /memory-controller@1,0
[   34.165715] msgmni has been set to 494
[   34.166349] io scheduler noop registered
[   34.166523] io scheduler cfq registered (default)
[   34.167195] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15) is a 16550A
[   34.167215] Console: ttyS0 (SU)
[   42.374594] console [ttyS0] enabled
[   42.420665] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15) is a 16550A
[   42.509515] PCI: Enabling device: (0002:00:02.0), cmd 147
[   42.581037] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
[   42.659821] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
[   42.778199] sym0: SCSI BUS has been reset.
[   42.831989] scsi0 : sym-2.2.3
[   45.866159] scsi 0:0:0:0: Direct-Access     FUJITSU  MAW3073NCSUN72G  1703 PQ: 0 ANSI: 4
[   45.972582] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
[   46.075599] scsi target0:0:0: Beginning Domain Validation
[   46.152355] scsi target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
[   46.358085] scsi target0:0:0: Ending Domain Validation
[   50.552347] PCI: Enabling device: (0002:00:02.1), cmd 147
[   50.623892] sym1: <1010-66> rev 0x1 at pci 0002:00:02.1 irq 25
[   50.702671] sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
[   50.821133] sym1: SCSI BUS has been reset.
[   50.874938] scsi1 : sym-2.2.3
[   58.301097] mousedev: PS/2 mouse device common for all mice
[   58.301843] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
[   58.304753] sd 0:0:0:0: [sda] Write Protect is off
[   58.304759] sd 0:0:0:0: [sda] Mode Sense: c7 00 00 08
[   58.305897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   58.317159]  sda: sda1 sda2 sda3 sda4
[   58.321798] sd 0:0:0:0: [sda] Attached SCSI disk
[   58.833788] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[   58.917383] rtc_cmos rtc_cmos: no alarms, 114 bytes nvram
[   58.989328] TCP: cubic registered
[   59.032880] NET: Registered protocol family 17
[   59.091956] rtc_cmos rtc_cmos: setting system clock to 2013-06-16 13:05:00 UTC (1371387900)
[   59.204672] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
[   59.306482] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
[   59.429442] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[   59.530217] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
[   61.113258] pps_core: LinuxPPS API ver. 1 registered
[   61.178571] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[   61.299855] PTP clock support registered
[   61.378830] tg3.c:v3.132 (May 21, 2013)
[   61.429280] PCI: Enabling device: (0000:00:02.0), cmd 2
[   61.679664] tg3 0000:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   62.136797] tg3 0000:00:02.0 eth0: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:85
[   62.272986] tg3 0000:00:02.0 eth0: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   62.401073] tg3 0000:00:02.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   62.504008] tg3 0000:00:02.0 eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
[   62.592101] PCI: Enabling device: (0000:00:02.1), cmd 2
[   62.839311] tg3 0000:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   63.295874] tg3 0000:00:02.1 eth1: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:86
[   63.431987] tg3 0000:00:02.1 eth1: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   63.560078] tg3 0000:00:02.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   63.663011] tg3 0000:00:02.1 eth1: dma_rwctrl[763f0000] dma_mask[32-bit]
[   63.751109] PCI: Enabling device: (0003:00:02.0), cmd 2
[   63.999292] tg3 0003:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   64.455848] tg3 0003:00:02.0 eth2: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:87
[   64.592030] tg3 0003:00:02.0 eth2: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   64.720118] tg3 0003:00:02.0 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   64.823056] tg3 0003:00:02.0 eth2: dma_rwctrl[763f0000] dma_mask[32-bit]
[   64.911140] PCI: Enabling device: (0003:00:02.1), cmd 2
[   65.159296] tg3 0003:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
[   65.615808] tg3 0003:00:02.1 eth3: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:88
[   65.751967] tg3 0003:00:02.1 eth3: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[   65.880064] tg3 0003:00:02.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[   65.982991] tg3 0003:00:02.1 eth3: dma_rwctrl[763f0000] dma_mask[32-bit]
[   66.549023] Adding 3084472k swap on /dev/sda4.  Priority:-1 extents:1 across:3084472k 
[   66.690487] EXT4-fs (sda2): re-mounted. Opts: (null)
[   66.944966] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
[   68.324377] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[   68.447686] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[   69.338117] NET: Registered protocol family 10
[   70.842065] tg3 0000:00:02.0 eth0: No firmware running
[   71.285041] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   72.946419] tg3 0000:00:02.0 eth0: Link is up at 100 Mbps, full duplex
[   73.039237] tg3 0000:00:02.0 eth0: Flow control is on for TX and on for RX
[   73.146397] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  568.834221] ------------[ cut here ]------------
[  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
[  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
[  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
[  569.129412] Call Trace:
[  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
[  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
[  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
[  569.359301]  [000000000045dc24] do_exit+0x244/0x340
[  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
[  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
[  569.572039]  [0000000000447874] do_signal32+0x14/0x220
[  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
[  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
[  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
[  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
[  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
[  569.994271] [sched_delayed] sched: RT throttling activated


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2013-06-16 21:06 ` Meelis Roos
@ 2013-06-17  5:32   ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2013-06-17  5:32 UTC (permalink / raw)
  To: Meelis Roos; +Cc: sparclinux, Linux Kernel list

Hi,

On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
> 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
> including most -rc's). Does not seem to be reproducible.

I get this regularly on Ultrasparc during long compilations. It's been
there with all recent kernels (probably at least since 3.8). Latest I
saw with 3.10-rc5.

A.

> [  568.834221] ------------[ cut here ]------------
> [  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> [  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
> [  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
> [  569.129412] Call Trace:
> [  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
> [  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
> [  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
> [  569.359301]  [000000000045dc24] do_exit+0x244/0x340
> [  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
> [  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
> [  569.572039]  [0000000000447874] do_signal32+0x14/0x220
> [  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
> [  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
> [  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
> [  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
> [  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
> 
> Full dmesg:
> 
> [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.a 2010/01/06 14:48'
> [    0.000000] PROMLIB: Root node compatible: 
> [    0.000000] Linux version 3.10.0-rc6 (mroos@v210) (gcc version 4.6.4 (Debian 4.6.4-2) ) #85 SMP Sun Jun 16 16:02:21 EEST 2013
> [    0.000000] debug: ignoring loglevel setting.
> [    0.000000] bootconsole [earlyprom0] enabled
> [    0.000000] ARCH: SUN4U
> [    0.000000] Ethernet address: 00:03:ba:0a:f3:85
> [    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
> [    0.000000] Remapping the kernel... done.
> [    0.000000] OF stdout device is: /pci@1e,600000/isa@7/serial@0,3f8
> [    0.000000] PROM: Built device tree with 77761 bytes of memory.
> [    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
> [    0.000000] Memory hole size: 0MB
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x00000000-0x0fffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x00000000-0x0fffffff]
> [    0.000000] On node 0 totalpages: 32768
> [    0.000000]   Normal zone: 256 pages used for memmap
> [    0.000000]   Normal zone: 0 pages reserved
> [    0.000000]   Normal zone: 32768 pages, LIFO batch:7
> [    0.000000] Booting Linux...
> [    0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32]
> [    0.000000] CPU CAPS: [v8plus,vis,vis2]
> [    0.000000] PERCPU: Embedded 6 pages/cpu @fffff8000f000000 s13440 r8192 d27520 u2097152
> [    0.000000] pcpu-alloc: s13440 r8192 d27520 u2097152 alloc=1*4194304
> [    0.000000] pcpu-alloc: [0] 0 1 
> [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
> [    0.000000] Kernel command line: root=/dev/sda2 ro mem=256M debug ignore_loglevel
> [    0.000000] PID hash table entries: 1024 (order: 0, 8192 bytes)
> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 262144 bytes)
> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 131072 bytes)
> [    0.000000] Sorting __ex_table...
> [    0.000000] Memory: 253216k available (3248k kernel code, 1032k data, 152k init) [fffff80000000000,0000000010000000]
> [    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> [    0.000000] Hierarchical RCU implementation.
> [    0.000000] 	Additional per-CPU info printed with stalls.
> [    0.000000] NR_IRQS:255
> [    0.000000] clocksource: mult[53555555] shift[24]
> [    0.000000] clockevent: mult[3126e98] shift[32]
> [    0.000000] Console: colour dummy device 80x25
> [    0.000000] console [tty0] enabled, bootconsole disabled
> [   33.851595] Calibrating delay using timer specific routine.. 24.00 BogoMIPS (lpj=120048)
> [   33.851610] pid_max: default: 32768 minimum: 301
> [   33.851772] Mount-cache hash table entries: 512
> [   33.854162] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 6 cycles)
> [   33.854242] Brought up 2 CPUs
> [   33.854269] Testing NMI watchdog ... OK.
> [   34.055163] NET: Registered protocol family 16
> [   34.063864] /pci@1f,700000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.063883] /pci@1f,700000: PCI IO[7f601000000] MEM[7f700000000]
> [   34.065866] PCI: Scanning PBM /pci@1f,700000
> [   34.066071] schizo f0069c00: PCI host bridge to bus 0000:00
> [   34.066093] pci_bus 0000:00: root bus resource [io  0x7f601000000-0x7f601ffffff] (bus address [0x0000-0xffffff])
> [   34.066111] pci_bus 0000:00: root bus resource [mem 0x7f700000000-0x7f7ffffffff] (bus address [0x00000000-0xffffffff])
> [   34.066127] pci_bus 0000:00: root bus resource [bus 00]
> [   34.066228] pci 0000:00:02.0: PME# supported from D3hot
> [   34.066453] pci 0000:00:02.1: PME# supported from D3hot
> [   34.066764] /pci@1e,600000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.066778] /pci@1e,600000: PCI IO[7fe01000000] MEM[7ff00000000]
> [   34.068753] PCI: Scanning PBM /pci@1e,600000
> [   34.068943] schizo f00732d0: PCI host bridge to bus 0001:00
> [   34.068969] pci_bus 0001:00: root bus resource [io  0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
> [   34.068996] pci_bus 0001:00: root bus resource [mem 0x7ff00000000-0x7ffffffffff] (bus address [0x00000000-0xffffffff])
> [   34.069022] pci_bus 0001:00: root bus resource [bus 00]
> [   34.069281] pci 0001:00:06.0: quirk: [io  0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
> [   34.069310] pci 0001:00:06.0: quirk: [io  0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
> [   34.069518] pci 0001:00:0a.0: PME# supported from D3cold
> [   34.070045] /pci@1c,600000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.070065] /pci@1c,600000: PCI IO[7ce01000000] MEM[7cf00000000]
> [   34.072083] PCI: Scanning PBM /pci@1c,600000
> [   34.072281] schizo f007c6ac: PCI host bridge to bus 0002:00
> [   34.072306] pci_bus 0002:00: root bus resource [io  0x7ce01000000-0x7ce01ffffff] (bus address [0x0000-0xffffff])
> [   34.072332] pci_bus 0002:00: root bus resource [mem 0x7cf00000000-0x7cfffffffff] (bus address [0x00000000-0xffffffff])
> [   34.072358] pci_bus 0002:00: root bus resource [bus 00]
> [   34.072452] pci 0002:00:02.0: supports D1 D2
> [   34.072674] pci 0002:00:02.1: supports D1 D2
> [   34.072990] /pci@1d,700000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.073010] /pci@1d,700000: PCI IO[7c601000000] MEM[7c700000000]
> [   34.075005] PCI: Scanning PBM /pci@1d,700000
> [   34.075213] schizo f00859d4: PCI host bridge to bus 0003:00
> [   34.075239] pci_bus 0003:00: root bus resource [io  0x7c601000000-0x7c601ffffff] (bus address [0x0000-0xffffff])
> [   34.075266] pci_bus 0003:00: root bus resource [mem 0x7c700000000-0x7c7ffffffff] (bus address [0x00000000-0xffffffff])
> [   34.075292] pci_bus 0003:00: root bus resource [bus 00]
> [   34.075413] pci 0003:00:02.0: PME# supported from D3hot
> [   34.075676] pci 0003:00:02.1: PME# supported from D3hot
> [   34.081421] bio: create slab <bio-0> at 0
> [   34.081959] vgaarb: loaded
> [   34.082430] SCSI subsystem initialized
> [   34.083594] /pci@1e,600000/isa@7/rtc@0,70: RTC regs at 0x7fe01000070
> [   34.084869] Switching to clocksource stick
> [   34.092652] NET: Registered protocol family 2
> [   34.093064] TCP established hash table entries: 2048 (order: 2, 32768 bytes)
> [   34.093187] TCP bind hash table entries: 2048 (order: 2, 32768 bytes)
> [   34.093303] TCP: Hash tables configured (established 2048 bind 2048)
> [   34.093390] TCP: reno registered
> [   34.093407] UDP hash table entries: 256 (order: 0, 8192 bytes)
> [   34.093451] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes)
> [   34.093698] NET: Registered protocol family 1
> [   34.093757] pci 0001:00:07.0: Activating ISA DMA hang workarounds
> [   34.093789] PCI: Enabling device: (0001:00:0a.0), cmd 2
> [   34.154948] PCI: CLS 64 bytes, default 64
> [   34.155142] power: Control reg at 7fe01000800
> [   34.155530] chmc: UltraSPARC-IIIi memory controller at /memory-controller@0,0
> [   34.155563] chmc: UltraSPARC-IIIi memory controller at /memory-controller@1,0
> [   34.165715] msgmni has been set to 494
> [   34.166349] io scheduler noop registered
> [   34.166523] io scheduler cfq registered (default)
> [   34.167195] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15) is a 16550A
> [   34.167215] Console: ttyS0 (SU)
> [   42.374594] console [ttyS0] enabled
> [   42.420665] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15) is a 16550A
> [   42.509515] PCI: Enabling device: (0002:00:02.0), cmd 147
> [   42.581037] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
> [   42.659821] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
> [   42.778199] sym0: SCSI BUS has been reset.
> [   42.831989] scsi0 : sym-2.2.3
> [   45.866159] scsi 0:0:0:0: Direct-Access     FUJITSU  MAW3073NCSUN72G  1703 PQ: 0 ANSI: 4
> [   45.972582] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
> [   46.075599] scsi target0:0:0: Beginning Domain Validation
> [   46.152355] scsi target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
> [   46.358085] scsi target0:0:0: Ending Domain Validation
> [   50.552347] PCI: Enabling device: (0002:00:02.1), cmd 147
> [   50.623892] sym1: <1010-66> rev 0x1 at pci 0002:00:02.1 irq 25
> [   50.702671] sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
> [   50.821133] sym1: SCSI BUS has been reset.
> [   50.874938] scsi1 : sym-2.2.3
> [   58.301097] mousedev: PS/2 mouse device common for all mice
> [   58.301843] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
> [   58.304753] sd 0:0:0:0: [sda] Write Protect is off
> [   58.304759] sd 0:0:0:0: [sda] Mode Sense: c7 00 00 08
> [   58.305897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [   58.317159]  sda: sda1 sda2 sda3 sda4
> [   58.321798] sd 0:0:0:0: [sda] Attached SCSI disk
> [   58.833788] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> [   58.917383] rtc_cmos rtc_cmos: no alarms, 114 bytes nvram
> [   58.989328] TCP: cubic registered
> [   59.032880] NET: Registered protocol family 17
> [   59.091956] rtc_cmos rtc_cmos: setting system clock to 2013-06-16 13:05:00 UTC (1371387900)
> [   59.204672] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
> [   59.306482] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
> [   59.429442] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> [   59.530217] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
> [   61.113258] pps_core: LinuxPPS API ver. 1 registered
> [   61.178571] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
> [   61.299855] PTP clock support registered
> [   61.378830] tg3.c:v3.132 (May 21, 2013)
> [   61.429280] PCI: Enabling device: (0000:00:02.0), cmd 2
> [   61.679664] tg3 0000:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   62.136797] tg3 0000:00:02.0 eth0: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:85
> [   62.272986] tg3 0000:00:02.0 eth0: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   62.401073] tg3 0000:00:02.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   62.504008] tg3 0000:00:02.0 eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   62.592101] PCI: Enabling device: (0000:00:02.1), cmd 2
> [   62.839311] tg3 0000:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   63.295874] tg3 0000:00:02.1 eth1: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:86
> [   63.431987] tg3 0000:00:02.1 eth1: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   63.560078] tg3 0000:00:02.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   63.663011] tg3 0000:00:02.1 eth1: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   63.751109] PCI: Enabling device: (0003:00:02.0), cmd 2
> [   63.999292] tg3 0003:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   64.455848] tg3 0003:00:02.0 eth2: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:87
> [   64.592030] tg3 0003:00:02.0 eth2: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   64.720118] tg3 0003:00:02.0 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   64.823056] tg3 0003:00:02.0 eth2: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   64.911140] PCI: Enabling device: (0003:00:02.1), cmd 2
> [   65.159296] tg3 0003:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   65.615808] tg3 0003:00:02.1 eth3: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:88
> [   65.751967] tg3 0003:00:02.1 eth3: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   65.880064] tg3 0003:00:02.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   65.982991] tg3 0003:00:02.1 eth3: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   66.549023] Adding 3084472k swap on /dev/sda4.  Priority:-1 extents:1 across:3084472k 
> [   66.690487] EXT4-fs (sda2): re-mounted. Opts: (null)
> [   66.944966] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
> [   68.324377] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
> [   68.447686] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
> [   69.338117] NET: Registered protocol family 10
> [   70.842065] tg3 0000:00:02.0 eth0: No firmware running
> [   71.285041] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   72.946419] tg3 0000:00:02.0 eth0: Link is up at 100 Mbps, full duplex
> [   73.039237] tg3 0000:00:02.0 eth0: Flow control is on for TX and on for RX
> [   73.146397] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [  568.834221] ------------[ cut here ]------------
> [  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> [  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
> [  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
> [  569.129412] Call Trace:
> [  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
> [  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
> [  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
> [  569.359301]  [000000000045dc24] do_exit+0x244/0x340
> [  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
> [  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
> [  569.572039]  [0000000000447874] do_signal32+0x14/0x220
> [  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
> [  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
> [  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
> [  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
> [  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
> [  569.994271] [sched_delayed] sched: RT throttling activated
> 
> 
> -- 
> Meelis Roos (mroos@linux.ee)
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-06-17  5:32   ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2013-06-17  5:32 UTC (permalink / raw)
  To: Meelis Roos; +Cc: sparclinux, Linux Kernel list

Hi,

On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
> 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
> including most -rc's). Does not seem to be reproducible.

I get this regularly on Ultrasparc during long compilations. It's been
there with all recent kernels (probably at least since 3.8). Latest I
saw with 3.10-rc5.

A.

> [  568.834221] ------------[ cut here ]------------
> [  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> [  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
> [  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
> [  569.129412] Call Trace:
> [  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
> [  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
> [  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
> [  569.359301]  [000000000045dc24] do_exit+0x244/0x340
> [  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
> [  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
> [  569.572039]  [0000000000447874] do_signal32+0x14/0x220
> [  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
> [  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
> [  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
> [  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
> [  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
> 
> Full dmesg:
> 
> [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.a 2010/01/06 14:48'
> [    0.000000] PROMLIB: Root node compatible: 
> [    0.000000] Linux version 3.10.0-rc6 (mroos@v210) (gcc version 4.6.4 (Debian 4.6.4-2) ) #85 SMP Sun Jun 16 16:02:21 EEST 2013
> [    0.000000] debug: ignoring loglevel setting.
> [    0.000000] bootconsole [earlyprom0] enabled
> [    0.000000] ARCH: SUN4U
> [    0.000000] Ethernet address: 00:03:ba:0a:f3:85
> [    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
> [    0.000000] Remapping the kernel... done.
> [    0.000000] OF stdout device is: /pci@1e,600000/isa@7/serial@0,3f8
> [    0.000000] PROM: Built device tree with 77761 bytes of memory.
> [    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
> [    0.000000] Memory hole size: 0MB
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x00000000-0x0fffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x00000000-0x0fffffff]
> [    0.000000] On node 0 totalpages: 32768
> [    0.000000]   Normal zone: 256 pages used for memmap
> [    0.000000]   Normal zone: 0 pages reserved
> [    0.000000]   Normal zone: 32768 pages, LIFO batch:7
> [    0.000000] Booting Linux...
> [    0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,ultra3,mul32,div32]
> [    0.000000] CPU CAPS: [v8plus,vis,vis2]
> [    0.000000] PERCPU: Embedded 6 pages/cpu @fffff8000f000000 s13440 r8192 d27520 u2097152
> [    0.000000] pcpu-alloc: s13440 r8192 d27520 u2097152 alloc=1*4194304
> [    0.000000] pcpu-alloc: [0] 0 1 
> [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
> [    0.000000] Kernel command line: root=/dev/sda2 ro mem%6M debug ignore_loglevel
> [    0.000000] PID hash table entries: 1024 (order: 0, 8192 bytes)
> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 262144 bytes)
> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 131072 bytes)
> [    0.000000] Sorting __ex_table...
> [    0.000000] Memory: 253216k available (3248k kernel code, 1032k data, 152k init) [fffff80000000000,0000000010000000]
> [    0.000000] SLUB: HWalign2, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> [    0.000000] Hierarchical RCU implementation.
> [    0.000000] 	Additional per-CPU info printed with stalls.
> [    0.000000] NR_IRQS:255
> [    0.000000] clocksource: mult[53555555] shift[24]
> [    0.000000] clockevent: mult[3126e98] shift[32]
> [    0.000000] Console: colour dummy device 80x25
> [    0.000000] console [tty0] enabled, bootconsole disabled
> [   33.851595] Calibrating delay using timer specific routine.. 24.00 BogoMIPS (lpj\x120048)
> [   33.851610] pid_max: default: 32768 minimum: 301
> [   33.851772] Mount-cache hash table entries: 512
> [   33.854162] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 6 cycles)
> [   33.854242] Brought up 2 CPUs
> [   33.854269] Testing NMI watchdog ... OK.
> [   34.055163] NET: Registered protocol family 16
> [   34.063864] /pci@1f,700000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.063883] /pci@1f,700000: PCI IO[7f601000000] MEM[7f700000000]
> [   34.065866] PCI: Scanning PBM /pci@1f,700000
> [   34.066071] schizo f0069c00: PCI host bridge to bus 0000:00
> [   34.066093] pci_bus 0000:00: root bus resource [io  0x7f601000000-0x7f601ffffff] (bus address [0x0000-0xffffff])
> [   34.066111] pci_bus 0000:00: root bus resource [mem 0x7f700000000-0x7f7ffffffff] (bus address [0x00000000-0xffffffff])
> [   34.066127] pci_bus 0000:00: root bus resource [bus 00]
> [   34.066228] pci 0000:00:02.0: PME# supported from D3hot
> [   34.066453] pci 0000:00:02.1: PME# supported from D3hot
> [   34.066764] /pci@1e,600000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.066778] /pci@1e,600000: PCI IO[7fe01000000] MEM[7ff00000000]
> [   34.068753] PCI: Scanning PBM /pci@1e,600000
> [   34.068943] schizo f00732d0: PCI host bridge to bus 0001:00
> [   34.068969] pci_bus 0001:00: root bus resource [io  0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
> [   34.068996] pci_bus 0001:00: root bus resource [mem 0x7ff00000000-0x7ffffffffff] (bus address [0x00000000-0xffffffff])
> [   34.069022] pci_bus 0001:00: root bus resource [bus 00]
> [   34.069281] pci 0001:00:06.0: quirk: [io  0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
> [   34.069310] pci 0001:00:06.0: quirk: [io  0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
> [   34.069518] pci 0001:00:0a.0: PME# supported from D3cold
> [   34.070045] /pci@1c,600000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.070065] /pci@1c,600000: PCI IO[7ce01000000] MEM[7cf00000000]
> [   34.072083] PCI: Scanning PBM /pci@1c,600000
> [   34.072281] schizo f007c6ac: PCI host bridge to bus 0002:00
> [   34.072306] pci_bus 0002:00: root bus resource [io  0x7ce01000000-0x7ce01ffffff] (bus address [0x0000-0xffffff])
> [   34.072332] pci_bus 0002:00: root bus resource [mem 0x7cf00000000-0x7cfffffffff] (bus address [0x00000000-0xffffffff])
> [   34.072358] pci_bus 0002:00: root bus resource [bus 00]
> [   34.072452] pci 0002:00:02.0: supports D1 D2
> [   34.072674] pci 0002:00:02.1: supports D1 D2
> [   34.072990] /pci@1d,700000: TOMATILLO PCI Bus Module ver[4:0]
> [   34.073010] /pci@1d,700000: PCI IO[7c601000000] MEM[7c700000000]
> [   34.075005] PCI: Scanning PBM /pci@1d,700000
> [   34.075213] schizo f00859d4: PCI host bridge to bus 0003:00
> [   34.075239] pci_bus 0003:00: root bus resource [io  0x7c601000000-0x7c601ffffff] (bus address [0x0000-0xffffff])
> [   34.075266] pci_bus 0003:00: root bus resource [mem 0x7c700000000-0x7c7ffffffff] (bus address [0x00000000-0xffffffff])
> [   34.075292] pci_bus 0003:00: root bus resource [bus 00]
> [   34.075413] pci 0003:00:02.0: PME# supported from D3hot
> [   34.075676] pci 0003:00:02.1: PME# supported from D3hot
> [   34.081421] bio: create slab <bio-0> at 0
> [   34.081959] vgaarb: loaded
> [   34.082430] SCSI subsystem initialized
> [   34.083594] /pci@1e,600000/isa@7/rtc@0,70: RTC regs at 0x7fe01000070
> [   34.084869] Switching to clocksource stick
> [   34.092652] NET: Registered protocol family 2
> [   34.093064] TCP established hash table entries: 2048 (order: 2, 32768 bytes)
> [   34.093187] TCP bind hash table entries: 2048 (order: 2, 32768 bytes)
> [   34.093303] TCP: Hash tables configured (established 2048 bind 2048)
> [   34.093390] TCP: reno registered
> [   34.093407] UDP hash table entries: 256 (order: 0, 8192 bytes)
> [   34.093451] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes)
> [   34.093698] NET: Registered protocol family 1
> [   34.093757] pci 0001:00:07.0: Activating ISA DMA hang workarounds
> [   34.093789] PCI: Enabling device: (0001:00:0a.0), cmd 2
> [   34.154948] PCI: CLS 64 bytes, default 64
> [   34.155142] power: Control reg at 7fe01000800
> [   34.155530] chmc: UltraSPARC-IIIi memory controller at /memory-controller@0,0
> [   34.155563] chmc: UltraSPARC-IIIi memory controller at /memory-controller@1,0
> [   34.165715] msgmni has been set to 494
> [   34.166349] io scheduler noop registered
> [   34.166523] io scheduler cfq registered (default)
> [   34.167195] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15) is a 16550A
> [   34.167215] Console: ttyS0 (SU)
> [   42.374594] console [ttyS0] enabled
> [   42.420665] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15) is a 16550A
> [   42.509515] PCI: Enabling device: (0002:00:02.0), cmd 147
> [   42.581037] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
> [   42.659821] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
> [   42.778199] sym0: SCSI BUS has been reset.
> [   42.831989] scsi0 : sym-2.2.3
> [   45.866159] scsi 0:0:0:0: Direct-Access     FUJITSU  MAW3073NCSUN72G  1703 PQ: 0 ANSI: 4
> [   45.972582] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
> [   46.075599] scsi target0:0:0: Beginning Domain Validation
> [   46.152355] scsi target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
> [   46.358085] scsi target0:0:0: Ending Domain Validation
> [   50.552347] PCI: Enabling device: (0002:00:02.1), cmd 147
> [   50.623892] sym1: <1010-66> rev 0x1 at pci 0002:00:02.1 irq 25
> [   50.702671] sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
> [   50.821133] sym1: SCSI BUS has been reset.
> [   50.874938] scsi1 : sym-2.2.3
> [   58.301097] mousedev: PS/2 mouse device common for all mice
> [   58.301843] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
> [   58.304753] sd 0:0:0:0: [sda] Write Protect is off
> [   58.304759] sd 0:0:0:0: [sda] Mode Sense: c7 00 00 08
> [   58.305897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [   58.317159]  sda: sda1 sda2 sda3 sda4
> [   58.321798] sd 0:0:0:0: [sda] Attached SCSI disk
> [   58.833788] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> [   58.917383] rtc_cmos rtc_cmos: no alarms, 114 bytes nvram
> [   58.989328] TCP: cubic registered
> [   59.032880] NET: Registered protocol family 17
> [   59.091956] rtc_cmos rtc_cmos: setting system clock to 2013-06-16 13:05:00 UTC (1371387900)
> [   59.204672] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
> [   59.306482] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
> [   59.429442] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> [   59.530217] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
> [   61.113258] pps_core: LinuxPPS API ver. 1 registered
> [   61.178571] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
> [   61.299855] PTP clock support registered
> [   61.378830] tg3.c:v3.132 (May 21, 2013)
> [   61.429280] PCI: Enabling device: (0000:00:02.0), cmd 2
> [   61.679664] tg3 0000:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   62.136797] tg3 0000:00:02.0 eth0: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:85
> [   62.272986] tg3 0000:00:02.0 eth0: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   62.401073] tg3 0000:00:02.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   62.504008] tg3 0000:00:02.0 eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   62.592101] PCI: Enabling device: (0000:00:02.1), cmd 2
> [   62.839311] tg3 0000:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   63.295874] tg3 0000:00:02.1 eth1: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:86
> [   63.431987] tg3 0000:00:02.1 eth1: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   63.560078] tg3 0000:00:02.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   63.663011] tg3 0000:00:02.1 eth1: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   63.751109] PCI: Enabling device: (0003:00:02.0), cmd 2
> [   63.999292] tg3 0003:00:02.0 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   64.455848] tg3 0003:00:02.0 eth2: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:87
> [   64.592030] tg3 0003:00:02.0 eth2: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   64.720118] tg3 0003:00:02.0 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   64.823056] tg3 0003:00:02.0 eth2: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   64.911140] PCI: Enabling device: (0003:00:02.1), cmd 2
> [   65.159296] tg3 0003:00:02.1 (unregistered net_device): Cannot get nvram lock, tg3_nvram_init failed
> [   65.615808] tg3 0003:00:02.1 eth3: Tigon3 [partno(none) rev 2003] (PCI:66MHz:64-bit) MAC address 00:03:ba:0a:f3:88
> [   65.751967] tg3 0003:00:02.1 eth3: attached PHY is 5704 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> [   65.880064] tg3 0003:00:02.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> [   65.982991] tg3 0003:00:02.1 eth3: dma_rwctrl[763f0000] dma_mask[32-bit]
> [   66.549023] Adding 3084472k swap on /dev/sda4.  Priority:-1 extents:1 across:3084472k 
> [   66.690487] EXT4-fs (sda2): re-mounted. Opts: (null)
> [   66.944966] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
> [   68.324377] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
> [   68.447686] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
> [   69.338117] NET: Registered protocol family 10
> [   70.842065] tg3 0000:00:02.0 eth0: No firmware running
> [   71.285041] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   72.946419] tg3 0000:00:02.0 eth0: Link is up at 100 Mbps, full duplex
> [   73.039237] tg3 0000:00:02.0 eth0: Flow control is on for TX and on for RX
> [   73.146397] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [  568.834221] ------------[ cut here ]------------
> [  568.894907] WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> [  568.971594] Modules linked in: ipv6 tg3 ptp pps_core hwmon
> [  569.043635] CPU: 1 PID: 2952 Comm: aptitude Not tainted 3.10.0-rc6 #85
> [  569.129412] Call Trace:
> [  569.161440]  [00000000004d811c] exit_mmap+0x13c/0x160
> [  569.227785]  [000000000045680c] mmput.part.62+0xc/0xc0
> [  569.295258]  [000000000045c25c] exit_mm+0x11c/0x180
> [  569.359301]  [000000000045dc24] do_exit+0x244/0x340
> [  569.423354]  [000000000045dea8] do_group_exit+0x28/0xc0
> [  569.491982]  [0000000000469a08] get_signal_to_deliver+0x1c8/0x3a0
> [  569.572039]  [0000000000447874] do_signal32+0x14/0x220
> [  569.639526]  [000000000042c8e0] do_signal+0x2c0/0x520
> [  569.705854]  [000000000042d340] do_notify_resume+0x40/0x60
> [  569.777907]  [0000000000404b04] __handle_signal+0xc/0x2c
> [  569.847671] ---[ end trace 4acf84f71c8b5f1b ]---
> [  569.908406] BUG: Bad rss-counter state mm:fffff8000dcb6700 idx:1 val:1
> [  569.994271] [sched_delayed] sched: RT throttling activated
> 
> 
> -- 
> Meelis Roos (mroos@linux.ee)
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2013-06-17  5:32   ` Aaro Koskinen
@ 2013-06-17  5:58     ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2013-06-17  5:58 UTC (permalink / raw)
  To: Meelis Roos; +Cc: sparclinux, Linux Kernel list

Hi,

On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
> > including most -rc's). Does not seem to be reproducible.
> 
> I get this regularly on Ultrasparc during long compilations. It's been
> there with all recent kernels (probably at least since 3.8). Latest I
> saw with 3.10-rc5.

Two examples:

[  417.006586] ------------[ cut here ]------------
[  417.065813] WARNING: at /home/aaro/los/work/shared/linux-v3.10-rc5/mm/mmap.c:2757 exit_mmap+0x134/0x160()
[  417.189209] Modules linked in:
[  417.229203] CPU: 0 PID: 1787 Comm: ld Not tainted 3.10.0-rc5-ultra #1
[  417.310031] Call Trace:
[  417.342875]  [00000000004b5ef4] exit_mmap+0x134/0x160
[  417.406941]  [000000000044ef40] mmput+0x40/0xe0
[  417.464591]  [0000000000454b38] do_exit+0x1b8/0x800
[  417.526429]  [0000000000455d6c] do_group_exit+0x2c/0xa0
[  417.592383]  [0000000000455df4] SyS_exit_group+0x14/0x20
[  417.659257]  [0000000000406074] linux_sparc_syscall32+0x34/0x40
[  417.733400] ---[ end trace b92a93fbf6d0204a ]---
[  417.791913] BUG: Bad rss-counter state mm:fffff8001ebbb740 idx:1 val:1

[ 1674.164634] ------------[ cut here ]------------
[ 1674.218933] WARNING: at /home/aaro/los/work/shared/linux-v3.10-rc5/mm/mmap.c:2757 exit_mmap+0x134/0x160()
[ 1674.333505] Modules linked in:
[ 1674.369872] CPU: 0 PID: 26306 Comm: date Not tainted 3.10.0-rc5-ultra #1
[ 1674.450075] Call Trace:
[ 1674.479245]  [00000000004b5ef4] exit_mmap+0x134/0x160
[ 1674.539661]  [000000000044ef40] mmput+0x40/0xe0
[ 1674.593827]  [0000000000454b38] do_exit+0x1b8/0x800
[ 1674.652140]  [0000000000455d6c] do_group_exit+0x2c/0xa0
[ 1674.714630]  [0000000000455df4] SyS_exit_group+0x14/0x20
[ 1674.778172]  [0000000000406074] linux_sparc_syscall32+0x34/0x40
[ 1674.848993] ---[ end trace 77928f0ca6684101 ]---
[ 1674.904199] BUG: Bad rss-counter state mm:fffff8000f0c9d40 idx:0 val:1

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-06-17  5:58     ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2013-06-17  5:58 UTC (permalink / raw)
  To: Meelis Roos; +Cc: sparclinux, Linux Kernel list

Hi,

On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
> > including most -rc's). Does not seem to be reproducible.
> 
> I get this regularly on Ultrasparc during long compilations. It's been
> there with all recent kernels (probably at least since 3.8). Latest I
> saw with 3.10-rc5.

Two examples:

[  417.006586] ------------[ cut here ]------------
[  417.065813] WARNING: at /home/aaro/los/work/shared/linux-v3.10-rc5/mm/mmap.c:2757 exit_mmap+0x134/0x160()
[  417.189209] Modules linked in:
[  417.229203] CPU: 0 PID: 1787 Comm: ld Not tainted 3.10.0-rc5-ultra #1
[  417.310031] Call Trace:
[  417.342875]  [00000000004b5ef4] exit_mmap+0x134/0x160
[  417.406941]  [000000000044ef40] mmput+0x40/0xe0
[  417.464591]  [0000000000454b38] do_exit+0x1b8/0x800
[  417.526429]  [0000000000455d6c] do_group_exit+0x2c/0xa0
[  417.592383]  [0000000000455df4] SyS_exit_group+0x14/0x20
[  417.659257]  [0000000000406074] linux_sparc_syscall32+0x34/0x40
[  417.733400] ---[ end trace b92a93fbf6d0204a ]---
[  417.791913] BUG: Bad rss-counter state mm:fffff8001ebbb740 idx:1 val:1

[ 1674.164634] ------------[ cut here ]------------
[ 1674.218933] WARNING: at /home/aaro/los/work/shared/linux-v3.10-rc5/mm/mmap.c:2757 exit_mmap+0x134/0x160()
[ 1674.333505] Modules linked in:
[ 1674.369872] CPU: 0 PID: 26306 Comm: date Not tainted 3.10.0-rc5-ultra #1
[ 1674.450075] Call Trace:
[ 1674.479245]  [00000000004b5ef4] exit_mmap+0x134/0x160
[ 1674.539661]  [000000000044ef40] mmput+0x40/0xe0
[ 1674.593827]  [0000000000454b38] do_exit+0x1b8/0x800
[ 1674.652140]  [0000000000455d6c] do_group_exit+0x2c/0xa0
[ 1674.714630]  [0000000000455df4] SyS_exit_group+0x14/0x20
[ 1674.778172]  [0000000000406074] linux_sparc_syscall32+0x34/0x40
[ 1674.848993] ---[ end trace 77928f0ca6684101 ]---
[ 1674.904199] BUG: Bad rss-counter state mm:fffff8000f0c9d40 idx:0 val:1

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2013-06-17  5:58     ` Aaro Koskinen
@ 2013-08-03 20:40       ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2013-08-03 20:40 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 17 Jun 2013 08:58:39 +0300

> On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
>> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
>> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
>> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
>> > including most -rc's). Does not seem to be reproducible.
>> 
>> I get this regularly on Ultrasparc during long compilations. It's been
>> there with all recent kernels (probably at least since 3.8). Latest I
>> saw with 3.10-rc5.
> 
> Two examples:

Thanks for the reports, I'm actively looking into this.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-08-03 20:40       ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2013-08-03 20:40 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 17 Jun 2013 08:58:39 +0300

> On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
>> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
>> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
>> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
>> > including most -rc's). Does not seem to be reproducible.
>> 
>> I get this regularly on Ultrasparc during long compilations. It's been
>> there with all recent kernels (probably at least since 3.8). Latest I
>> saw with 3.10-rc5.
> 
> Two examples:

Thanks for the reports, I'm actively looking into this.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2013-08-03 20:40       ` David Miller
@ 2013-10-22 17:46         ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2013-10-22 17:46 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel

Hi,

On Sat, Aug 03, 2013 at 01:40:42PM -0700, David Miller wrote:
> > On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
> >> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> >> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
> >> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
> >> > including most -rc's). Does not seem to be reproducible.
> >> 
> >> I get this regularly on Ultrasparc during long compilations. It's been
> >> there with all recent kernels (probably at least since 3.8). Latest I
> >> saw with 3.10-rc5.
> > 
> > Two examples:
> 
> Thanks for the reports, I'm actively looking into this.

Got this again with 3.12-rc5 while doing GCC 4.8.2 bootstrap (during
make check phase):

[83998.025998] ------------[ cut here ]------------
[83998.080312] WARNING: CPU: 0 PID: 3983 at /home/aaro/los/work/shared/linux-v3.12-rc5/mm/mmap.c:2729 exit_mmap+0x138/0x160()
[83998.212541] Modules linked in:
[83998.248987] CPU: 0 PID: 3983 Comm: expect Not tainted 3.12.0-rc5-ultra-los.git-d7b26d7-dirty #1
[83998.353171] Call Trace:
[83998.382310]  [00000000004b79d8] exit_mmap+0x138/0x160
[83998.442723]  [00000000004503cc] mmput+0x2c/0xc0
[83998.496885]  [00000000004cf338] flush_old_exec+0x418/0x520
[83998.562508]  [000000000051294c] load_elf_binary+0x20c/0x1660
[83998.630194]  [00000000004ce9f8] search_binary_handler+0x78/0x200
[83998.702061]  [00000000004cfc1c] do_execve_common.isra.48+0x3dc/0x500
[83998.778105]  [00000000004cffcc] compat_sys_execve+0x2c/0x60
[83998.844758]  [0000000000406074] linux_sparc_syscall32+0x34/0x40
[83998.915565] ---[ end trace 1f66da8de6eddeb8 ]---
[83998.970773] BUG: Bad rss-counter state mm:fffff80019c4c000 idx:1 val:512
[85190.371621] ld[17707]: segfault at 58 ip 00000000f7d0d164 (rpc 00000000f7d0cf90) sp 00000000ffc36a20 error 30001 in libbfd-2.23.2.so[f7cc8000+ba000]

The box didn't die yet... Let's hope it will let GCC testsuite to finish.

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-10-22 17:46         ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2013-10-22 17:46 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel

Hi,

On Sat, Aug 03, 2013 at 01:40:42PM -0700, David Miller wrote:
> > On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
> >> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
> >> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
> >> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
> >> > including most -rc's). Does not seem to be reproducible.
> >> 
> >> I get this regularly on Ultrasparc during long compilations. It's been
> >> there with all recent kernels (probably at least since 3.8). Latest I
> >> saw with 3.10-rc5.
> > 
> > Two examples:
> 
> Thanks for the reports, I'm actively looking into this.

Got this again with 3.12-rc5 while doing GCC 4.8.2 bootstrap (during
make check phase):

[83998.025998] ------------[ cut here ]------------
[83998.080312] WARNING: CPU: 0 PID: 3983 at /home/aaro/los/work/shared/linux-v3.12-rc5/mm/mmap.c:2729 exit_mmap+0x138/0x160()
[83998.212541] Modules linked in:
[83998.248987] CPU: 0 PID: 3983 Comm: expect Not tainted 3.12.0-rc5-ultra-los.git-d7b26d7-dirty #1
[83998.353171] Call Trace:
[83998.382310]  [00000000004b79d8] exit_mmap+0x138/0x160
[83998.442723]  [00000000004503cc] mmput+0x2c/0xc0
[83998.496885]  [00000000004cf338] flush_old_exec+0x418/0x520
[83998.562508]  [000000000051294c] load_elf_binary+0x20c/0x1660
[83998.630194]  [00000000004ce9f8] search_binary_handler+0x78/0x200
[83998.702061]  [00000000004cfc1c] do_execve_common.isra.48+0x3dc/0x500
[83998.778105]  [00000000004cffcc] compat_sys_execve+0x2c/0x60
[83998.844758]  [0000000000406074] linux_sparc_syscall32+0x34/0x40
[83998.915565] ---[ end trace 1f66da8de6eddeb8 ]---
[83998.970773] BUG: Bad rss-counter state mm:fffff80019c4c000 idx:1 val:512
[85190.371621] ld[17707]: segfault at 58 ip 00000000f7d0d164 (rpc 00000000f7d0cf90) sp 00000000ffc36a20 error 30001 in libbfd-2.23.2.so[f7cc8000+ba000]

The box didn't die yet... Let's hope it will let GCC testsuite to finish.

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2013-10-22 17:46         ` Aaro Koskinen
@ 2013-10-22 17:54           ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2013-10-22 17:54 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Tue, 22 Oct 2013 20:46:12 +0300

> Hi,
> 
> On Sat, Aug 03, 2013 at 01:40:42PM -0700, David Miller wrote:
>> > On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
>> >> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
>> >> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
>> >> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
>> >> > including most -rc's). Does not seem to be reproducible.
>> >> 
>> >> I get this regularly on Ultrasparc during long compilations. It's been
>> >> there with all recent kernels (probably at least since 3.8). Latest I
>> >> saw with 3.10-rc5.
>> > 
>> > Two examples:
>> 
>> Thanks for the reports, I'm actively looking into this.
> 
> Got this again with 3.12-rc5 while doing GCC 4.8.2 bootstrap (during
> make check phase):
 ...
> The box didn't die yet... Let's hope it will let GCC testsuite to finish.

Thanks for reporting this again, it's in my long TODO list to look into
still.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2013-10-22 17:54           ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2013-10-22 17:54 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Tue, 22 Oct 2013 20:46:12 +0300

> Hi,
> 
> On Sat, Aug 03, 2013 at 01:40:42PM -0700, David Miller wrote:
>> > On Mon, Jun 17, 2013 at 08:32:25AM +0300, Aaro Koskinen wrote:
>> >> On Mon, Jun 17, 2013 at 12:06:00AM +0300, Meelis Roos wrote:
>> >> > Got this in 3.10-rc6 whil testing debian unstable upgrade with aptitude. 
>> >> > 3.10-rc5 did not exhibit this (nor any other kernel recently tried, 
>> >> > including most -rc's). Does not seem to be reproducible.
>> >> 
>> >> I get this regularly on Ultrasparc during long compilations. It's been
>> >> there with all recent kernels (probably at least since 3.8). Latest I
>> >> saw with 3.10-rc5.
>> > 
>> > Two examples:
>> 
>> Thanks for the reports, I'm actively looking into this.
> 
> Got this again with 3.12-rc5 while doing GCC 4.8.2 bootstrap (during
> make check phase):
 ...
> The box didn't die yet... Let's hope it will let GCC testsuite to finish.

Thanks for reporting this again, it's in my long TODO list to look into
still.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2013-10-22 17:54           ` David Miller
@ 2014-04-14 18:43             ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-04-14 18:43 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel, Hugh Dickins

Hi,

Just for the archives, I got one of these again with 3.14:

[68674.536190] ------------[ cut here ]------------
[68674.590467] WARNING: CPU: 0 PID: 14600 at /home/aaro/los/work/shared/linux-v3.14/mm/mmap.c:2738 exit_mmap+0x138/0x160()
[68674.719635] Modules linked in:
[68674.756022] CPU: 0 PID: 14600 Comm: rm Not tainted 3.14.0-ultra-los_0a2b #1
[68674.839349] Call Trace:
[68674.868507]  [00000000004b9c78] exit_mmap+0x138/0x160
[68674.928931]  [00000000004503cc] mmput+0x2c/0xc0
[68674.983103]  [0000000000452e98] do_exit+0x1b8/0x800
[68675.041409]  [000000000045406c] do_group_exit+0x2c/0xa0
[68675.103897]  [00000000004540f4] SyS_exit_group+0x14/0x20
[68675.167439]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
[68675.238258] ---[ end trace 8a52741fbdb89d8e ]---
[68675.293470] BUG: Bad rss-counter state mm:ffffff001df3d900 idx:1 val:1

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-14 18:43             ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-04-14 18:43 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel, Hugh Dickins

Hi,

Just for the archives, I got one of these again with 3.14:

[68674.536190] ------------[ cut here ]------------
[68674.590467] WARNING: CPU: 0 PID: 14600 at /home/aaro/los/work/shared/linux-v3.14/mm/mmap.c:2738 exit_mmap+0x138/0x160()
[68674.719635] Modules linked in:
[68674.756022] CPU: 0 PID: 14600 Comm: rm Not tainted 3.14.0-ultra-los_0a2b #1
[68674.839349] Call Trace:
[68674.868507]  [00000000004b9c78] exit_mmap+0x138/0x160
[68674.928931]  [00000000004503cc] mmput+0x2c/0xc0
[68674.983103]  [0000000000452e98] do_exit+0x1b8/0x800
[68675.041409]  [000000000045406c] do_group_exit+0x2c/0xa0
[68675.103897]  [00000000004540f4] SyS_exit_group+0x14/0x20
[68675.167439]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
[68675.238258] ---[ end trace 8a52741fbdb89d8e ]---
[68675.293470] BUG: Bad rss-counter state mm:ffffff001df3d900 idx:1 val:1

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-14 18:43             ` Aaro Koskinen
@ 2014-04-14 18:58               ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-14 18:58 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 14 Apr 2014 21:43:53 +0300

> Just for the archives, I got one of these again with 3.14:
> 
> [68674.536190] ------------[ cut here ]------------
> [68674.590467] WARNING: CPU: 0 PID: 14600 at /home/aaro/los/work/shared/linux-v3.14/mm/mmap.c:2738 exit_mmap+0x138/0x160()
> [68674.719635] Modules linked in:
> [68674.756022] CPU: 0 PID: 14600 Comm: rm Not tainted 3.14.0-ultra-los_0a2b #1
> [68674.839349] Call Trace:
> [68674.868507]  [00000000004b9c78] exit_mmap+0x138/0x160
> [68674.928931]  [00000000004503cc] mmput+0x2c/0xc0
> [68674.983103]  [0000000000452e98] do_exit+0x1b8/0x800
> [68675.041409]  [000000000045406c] do_group_exit+0x2c/0xa0
> [68675.103897]  [00000000004540f4] SyS_exit_group+0x14/0x20
> [68675.167439]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
> [68675.238258] ---[ end trace 8a52741fbdb89d8e ]---
> [68675.293470] BUG: Bad rss-counter state mm:ffffff001df3d900 idx:1 val:1

Yes, I have reports of this going back several releases and I started trying to figure
out what causes this.

I suspect there is something that runs during exit_mmap() that indirectly faults in
new pages, and that's how the rss-counter ends up being non-zero at the end of
exit_mmap().

I'll let you know if I figure out exactly what the problem is.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-14 18:58               ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-14 18:58 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 14 Apr 2014 21:43:53 +0300

> Just for the archives, I got one of these again with 3.14:
> 
> [68674.536190] ------------[ cut here ]------------
> [68674.590467] WARNING: CPU: 0 PID: 14600 at /home/aaro/los/work/shared/linux-v3.14/mm/mmap.c:2738 exit_mmap+0x138/0x160()
> [68674.719635] Modules linked in:
> [68674.756022] CPU: 0 PID: 14600 Comm: rm Not tainted 3.14.0-ultra-los_0a2b #1
> [68674.839349] Call Trace:
> [68674.868507]  [00000000004b9c78] exit_mmap+0x138/0x160
> [68674.928931]  [00000000004503cc] mmput+0x2c/0xc0
> [68674.983103]  [0000000000452e98] do_exit+0x1b8/0x800
> [68675.041409]  [000000000045406c] do_group_exit+0x2c/0xa0
> [68675.103897]  [00000000004540f4] SyS_exit_group+0x14/0x20
> [68675.167439]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
> [68675.238258] ---[ end trace 8a52741fbdb89d8e ]---
> [68675.293470] BUG: Bad rss-counter state mm:ffffff001df3d900 idx:1 val:1

Yes, I have reports of this going back several releases and I started trying to figure
out what causes this.

I suspect there is something that runs during exit_mmap() that indirectly faults in
new pages, and that's how the rss-counter ends up being non-zero at the end of
exit_mmap().

I'll let you know if I figure out exactly what the problem is.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-14 18:43             ` Aaro Koskinen
@ 2014-04-16 18:58               ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-16 18:58 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 14 Apr 2014 21:43:53 +0300

> Just for the archives, I got one of these again with 3.14:

Meelis and Aaro, thanks again for all of your reports.

After pouring over a lot of the data and auditing some code I'm
suspecting it's a problem with transparent huge pages.

One thing you two can do to help me further confirm this is to run
with THP disabled for a while and see if you still get the log
messages.

Simply, as root:

bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled

And then do your gcc bootstraps or whatever else seems to usually
run when you trigger this problem.

Thanks!

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-16 18:58               ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-16 18:58 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 14 Apr 2014 21:43:53 +0300

> Just for the archives, I got one of these again with 3.14:

Meelis and Aaro, thanks again for all of your reports.

After pouring over a lot of the data and auditing some code I'm
suspecting it's a problem with transparent huge pages.

One thing you two can do to help me further confirm this is to run
with THP disabled for a while and see if you still get the log
messages.

Simply, as root:

bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled

And then do your gcc bootstraps or whatever else seems to usually
run when you trigger this problem.

Thanks!

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-16 18:58               ` David Miller
@ 2014-04-16 22:22                 ` mroos
  -1 siblings, 0 replies; 60+ messages in thread
From: mroos @ 2014-04-16 22:22 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, Linux Kernel list, hughd

> > Just for the archives, I got one of these again with 3.14:
> 
> Meelis and Aaro, thanks again for all of your reports.
> 
> After pouring over a lot of the data and auditing some code I'm
> suspecting it's a problem with transparent huge pages.
> 
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.

I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
that had this problem (actually most of my sparc64 machines) and the 4th 
has

CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not se

and also has not had this problem since then. All 4 machines have been 
running through most -rc's of every kernel.

-- 
Meelis Roos <mroos@linux.ee>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-16 22:22                 ` mroos
  0 siblings, 0 replies; 60+ messages in thread
From: mroos @ 2014-04-16 22:22 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, Linux Kernel list, hughd

> > Just for the archives, I got one of these again with 3.14:
> 
> Meelis and Aaro, thanks again for all of your reports.
> 
> After pouring over a lot of the data and auditing some code I'm
> suspecting it's a problem with transparent huge pages.
> 
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.

I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
that had this problem (actually most of my sparc64 machines) and the 4th 
has

CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not se

and also has not had this problem since then. All 4 machines have been 
running through most -rc's of every kernel.

-- 
Meelis Roos <mroos@linux.ee>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-16 22:22                 ` mroos
@ 2014-04-16 22:49                   ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-16 22:49 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

From: mroos@linux.ee
Date: Thu, 17 Apr 2014 01:22:17 +0300 (EEST)

>> > Just for the archives, I got one of these again with 3.14:
>> 
>> Meelis and Aaro, thanks again for all of your reports.
>> 
>> After pouring over a lot of the data and auditing some code I'm
>> suspecting it's a problem with transparent huge pages.
>> 
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
> 
> I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
> that had this problem (actually most of my sparc64 machines) and the 4th 
> has
 ...
> and also has not had this problem since then. All 4 machines have been 
> running through most -rc's of every kernel.

Thanks this is a very useful datapoint.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-16 22:49                   ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-16 22:49 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

From: mroos@linux.ee
Date: Thu, 17 Apr 2014 01:22:17 +0300 (EEST)

>> > Just for the archives, I got one of these again with 3.14:
>> 
>> Meelis and Aaro, thanks again for all of your reports.
>> 
>> After pouring over a lot of the data and auditing some code I'm
>> suspecting it's a problem with transparent huge pages.
>> 
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
> 
> I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
> that had this problem (actually most of my sparc64 machines) and the 4th 
> has
 ...
> and also has not had this problem since then. All 4 machines have been 
> running through most -rc's of every kernel.

Thanks this is a very useful datapoint.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-16 18:58               ` David Miller
@ 2014-04-25 20:09                 ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-04-25 20:09 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel, hughd

Hi,

On Wed, Apr 16, 2014 at 02:58:22PM -0400, David Miller wrote:
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.
> 
> Simply, as root:
> 
> bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
> 
> And then do your gcc bootstraps or whatever else seems to usually
> run when you trigger this problem.

I'm running my Ultras with "# CONFIG_TRANSPARENT_HUGEPAGE is not set"
and I still see the issue.

I tried reproducing the bug with function tracer running. It works
but reproducing the bug takes several days... This time it was "expect"
segfault during GCC testsuite that triggered the bug.

For the test I added tracing_off() after the "Bad rss-counter
state" printout. Now I see it should be done maybe earlier as warning/bug
printouts are polluting the trace.

Anyway, the results are here:

	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace.txt
	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace-dmesg.txt

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-25 20:09                 ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-04-25 20:09 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel, hughd

Hi,

On Wed, Apr 16, 2014 at 02:58:22PM -0400, David Miller wrote:
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.
> 
> Simply, as root:
> 
> bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
> 
> And then do your gcc bootstraps or whatever else seems to usually
> run when you trigger this problem.

I'm running my Ultras with "# CONFIG_TRANSPARENT_HUGEPAGE is not set"
and I still see the issue.

I tried reproducing the bug with function tracer running. It works
but reproducing the bug takes several days... This time it was "expect"
segfault during GCC testsuite that triggered the bug.

For the test I added tracing_off() after the "Bad rss-counter
state" printout. Now I see it should be done maybe earlier as warning/bug
printouts are polluting the trace.

Anyway, the results are here:

	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace.txt
	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace-dmesg.txt

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-25 20:09                 ` Aaro Koskinen
@ 2014-04-25 20:17                   ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-25 20:17 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Fri, 25 Apr 2014 23:09:08 +0300

> On Wed, Apr 16, 2014 at 02:58:22PM -0400, David Miller wrote:
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
>> 
>> Simply, as root:
>> 
>> bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
>> 
>> And then do your gcc bootstraps or whatever else seems to usually
>> run when you trigger this problem.
> 
> I'm running my Ultras with "# CONFIG_TRANSPARENT_HUGEPAGE is not set"
> and I still see the issue.

Thanks, that's an important datapoint.

> I tried reproducing the bug with function tracer running. It works
> but reproducing the bug takes several days... This time it was "expect"
> segfault during GCC testsuite that triggered the bug.
> 
> For the test I added tracing_off() after the "Bad rss-counter
> state" printout. Now I see it should be done maybe earlier as warning/bug
> printouts are polluting the trace.
> 
> Anyway, the results are here:
> 
> 	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace.txt
> 	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace-dmesg.txt

Thanks a lot for doing this, I'll take a look.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-04-25 20:17                   ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-04-25 20:17 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Fri, 25 Apr 2014 23:09:08 +0300

> On Wed, Apr 16, 2014 at 02:58:22PM -0400, David Miller wrote:
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
>> 
>> Simply, as root:
>> 
>> bash# echo "never" >/sys/kernel/mm/transparent_hugepage/enabled
>> 
>> And then do your gcc bootstraps or whatever else seems to usually
>> run when you trigger this problem.
> 
> I'm running my Ultras with "# CONFIG_TRANSPARENT_HUGEPAGE is not set"
> and I still see the issue.

Thanks, that's an important datapoint.

> I tried reproducing the bug with function tracer running. It works
> but reproducing the bug takes several days... This time it was "expect"
> segfault during GCC testsuite that triggered the bug.
> 
> For the test I added tracing_off() after the "Bad rss-counter
> state" printout. Now I see it should be done maybe earlier as warning/bug
> printouts are polluting the trace.
> 
> Anyway, the results are here:
> 
> 	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace.txt
> 	http://www.iki.fi/aaro/junk/linux-3.14-sparc-mm-bug-trace-dmesg.txt

Thanks a lot for doing this, I'll take a look.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-25 20:17                   ` David Miller
@ 2014-05-24 20:02                     ` mroos
  -1 siblings, 0 replies; 60+ messages in thread
From: mroos @ 2014-05-24 20:02 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
enabled & always on. Got this and a segfault on apt-spawned xz.

[  142.599575] ------------[ cut here ]------------
[  142.660349] WARNING: CPU: 1 PID: 2237 at mm/mmap.c:2741 exit_mmap+0x140/0x160()
[  142.756483] Modules linked in: ipv6 tg3 hwmon ptp pps_core
[  142.830269] CPU: 1 PID: 2237 Comm: aptitude Not tainted 3.15.0-rc6-00190-g1ee1cea #93
[  142.933226] Call Trace:
[  142.965358]  [000000000045a12c] warn_slowpath_common+0x4c/0x80
[  143.042074]  [00000000004e7a40] exit_mmap+0x140/0x160
[  143.108410]  [00000000004586a0] mmput.part.60+0x20/0xe0
[  143.177030]  [000000000045af3c] exit_mm+0x11c/0x180
[  143.241071]  [000000000045c920] do_exit+0x240/0x340
[  143.305134]  [000000000045cb48] do_group_exit+0x28/0xc0
[  143.373753]  [0000000000468ac8] get_signal_to_deliver+0x1c8/0x3a0
[  143.453819]  [0000000000449334] do_signal32+0x14/0x220
[  143.521292]  [000000000042d0e0] do_signal+0x2c0/0x520
[  143.587624]  [000000000042db40] do_notify_resume+0x40/0x60
[  143.659683]  [0000000000404b04] __handle_signal+0xc/0x2c
[  143.729448] ---[ end trace b34008751438e7e6 ]---
[  143.790182] BUG: Bad rss-counter state mm:fffffc000d9d3660 idx:1 val:1


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-05-24 20:02                     ` mroos
  0 siblings, 0 replies; 60+ messages in thread
From: mroos @ 2014-05-24 20:02 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
enabled & always on. Got this and a segfault on apt-spawned xz.

[  142.599575] ------------[ cut here ]------------
[  142.660349] WARNING: CPU: 1 PID: 2237 at mm/mmap.c:2741 exit_mmap+0x140/0x160()
[  142.756483] Modules linked in: ipv6 tg3 hwmon ptp pps_core
[  142.830269] CPU: 1 PID: 2237 Comm: aptitude Not tainted 3.15.0-rc6-00190-g1ee1cea #93
[  142.933226] Call Trace:
[  142.965358]  [000000000045a12c] warn_slowpath_common+0x4c/0x80
[  143.042074]  [00000000004e7a40] exit_mmap+0x140/0x160
[  143.108410]  [00000000004586a0] mmput.part.60+0x20/0xe0
[  143.177030]  [000000000045af3c] exit_mm+0x11c/0x180
[  143.241071]  [000000000045c920] do_exit+0x240/0x340
[  143.305134]  [000000000045cb48] do_group_exit+0x28/0xc0
[  143.373753]  [0000000000468ac8] get_signal_to_deliver+0x1c8/0x3a0
[  143.453819]  [0000000000449334] do_signal32+0x14/0x220
[  143.521292]  [000000000042d0e0] do_signal+0x2c0/0x520
[  143.587624]  [000000000042db40] do_notify_resume+0x40/0x60
[  143.659683]  [0000000000404b04] __handle_signal+0xc/0x2c
[  143.729448] ---[ end trace b34008751438e7e6 ]---
[  143.790182] BUG: Bad rss-counter state mm:fffffc000d9d3660 idx:1 val:1


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-05-24 20:02                     ` mroos
@ 2014-05-24 21:08                       ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-05-24 21:08 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

From: mroos@linux.ee
Date: Sat, 24 May 2014 23:02:28 +0300 (EEST)

> This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
> enabled & always on. Got this and a segfault on apt-spawned xz.

Thanks a lot for the report.

I've been bogged down with other things but I will come back to
this stuff soon.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-05-24 21:08                       ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-05-24 21:08 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

From: mroos@linux.ee
Date: Sat, 24 May 2014 23:02:28 +0300 (EEST)

> This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
> enabled & always on. Got this and a segfault on apt-spawned xz.

Thanks a lot for the report.

I've been bogged down with other things but I will come back to
this stuff soon.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-05-24 21:08                       ` David Miller
@ 2014-06-09  6:36                         ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-06-09  6:36 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

> > This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
> > enabled & always on. Got this and a segfault on apt-spawned xz.
> 
> Thanks a lot for the report.
> 
> I've been bogged down with other things but I will come back to
> this stuff soon.

Just to document a strangeness that does not seem to fit the pattern of 
UltraSparc III era split: 

V210 with USIII family CPUs is still problematic (can not survive 
repetituos local git clone, hangs (watchdog detects hang) with 
filesystem corruption. This still holds for 3.15 release.

On the other hand, E420R with 4 USII CPUs does not hang and works 100% 
stable with git clones etc. Very similar E220R with different config 
hangs on that load.

Maybe related to some other options in kernel configs - each config is 
unique and they vary intentionally.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-06-09  6:36                         ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-06-09  6:36 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd

> > This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
> > enabled & always on. Got this and a segfault on apt-spawned xz.
> 
> Thanks a lot for the report.
> 
> I've been bogged down with other things but I will come back to
> this stuff soon.

Just to document a strangeness that does not seem to fit the pattern of 
UltraSparc III era split: 

V210 with USIII family CPUs is still problematic (can not survive 
repetituos local git clone, hangs (watchdog detects hang) with 
filesystem corruption. This still holds for 3.15 release.

On the other hand, E420R with 4 USII CPUs does not hang and works 100% 
stable with git clones etc. Very similar E220R with different config 
hangs on that load.

Maybe related to some other options in kernel configs - each config is 
unique and they vary intentionally.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-04-16 22:22                 ` mroos
@ 2014-07-29 23:26                   ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-07-29 23:26 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

From: mroos@linux.ee
Date: Thu, 17 Apr 2014 01:22:17 +0300 (EEST)

>> > Just for the archives, I got one of these again with 3.14:
>> 
>> Meelis and Aaro, thanks again for all of your reports.
>> 
>> After pouring over a lot of the data and auditing some code I'm
>> suspecting it's a problem with transparent huge pages.
>> 
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
> 
> I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
> that had this problem (actually most of my sparc64 machines) and the 4th 
> has
> 
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
> # CONFIG_HUGETLBFS is not set
> # CONFIG_HUGETLB_PAGE is not se
> 
> and also has not had this problem since then. All 4 machines have been 
> running through most -rc's of every kernel.

Here is something I'd like you guys to test.

Yesterday, Christopher (CC:'d), posted some fixes yesterday and one of
them is very interesting.

Basically the update_mmu_cache() methods on sparc64 can insert an
invalid PTE into the TSB hash tables, causing livelocks and other
annoying issues.

The path where this can happen is via remove_migration_pte().

I had a discussion with Johannes Weiner about this and we determined
that it would make sense to mis-diagnose THP as being the root cause
in the RSS counter et al. problems if this bug here is the real
reason those things are happening.

That's because if you're not using THP there is less compaction going
on.  Less compaction means less migration, and therefore a lower
likelyhood of this code path triggering like this.

Could you guys please try this patch below?  Thanks.

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16b58ff..8e894e0 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -351,6 +351,10 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 
 	mm = vma->vm_mm;
 
+	/* Don't insert a non-valid PTE into the TSB, we'll deadlock.  */
+	if (!pte_accessible(mm, pte))
+		return;
+
 	spin_lock_irqsave(&mm->context.lock, flags);
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2617,6 +2621,10 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (!pmd_large(entry) || !pmd_young(entry))
 		return;
 
+	/* Don't insert a non-valid PMD into the TSB, we'll deadlock.  */
+	if (!(pte & _PAGE_VALID))
+		return;
+
 	pte = pmd_val(entry);
 
 	/* We are fabricating 8MB pages using 4MB real hw pages.  */

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-07-29 23:26                   ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-07-29 23:26 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

From: mroos@linux.ee
Date: Thu, 17 Apr 2014 01:22:17 +0300 (EEST)

>> > Just for the archives, I got one of these again with 3.14:
>> 
>> Meelis and Aaro, thanks again for all of your reports.
>> 
>> After pouring over a lot of the data and auditing some code I'm
>> suspecting it's a problem with transparent huge pages.
>> 
>> One thing you two can do to help me further confirm this is to run
>> with THP disabled for a while and see if you still get the log
>> messages.
> 
> I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
> that had this problem (actually most of my sparc64 machines) and the 4th 
> has
> 
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
> # CONFIG_HUGETLBFS is not set
> # CONFIG_HUGETLB_PAGE is not se
> 
> and also has not had this problem since then. All 4 machines have been 
> running through most -rc's of every kernel.

Here is something I'd like you guys to test.

Yesterday, Christopher (CC:'d), posted some fixes yesterday and one of
them is very interesting.

Basically the update_mmu_cache() methods on sparc64 can insert an
invalid PTE into the TSB hash tables, causing livelocks and other
annoying issues.

The path where this can happen is via remove_migration_pte().

I had a discussion with Johannes Weiner about this and we determined
that it would make sense to mis-diagnose THP as being the root cause
in the RSS counter et al. problems if this bug here is the real
reason those things are happening.

That's because if you're not using THP there is less compaction going
on.  Less compaction means less migration, and therefore a lower
likelyhood of this code path triggering like this.

Could you guys please try this patch below?  Thanks.

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16b58ff..8e894e0 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -351,6 +351,10 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 
 	mm = vma->vm_mm;
 
+	/* Don't insert a non-valid PTE into the TSB, we'll deadlock.  */
+	if (!pte_accessible(mm, pte))
+		return;
+
 	spin_lock_irqsave(&mm->context.lock, flags);
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2617,6 +2621,10 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (!pmd_large(entry) || !pmd_young(entry))
 		return;
 
+	/* Don't insert a non-valid PMD into the TSB, we'll deadlock.  */
+	if (!(pte & _PAGE_VALID))
+		return;
+
 	pte = pmd_val(entry);
 
 	/* We are fabricating 8MB pages using 4MB real hw pages.  */

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-07-29 23:26                   ` David Miller
@ 2014-07-30 22:02                     ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-07-30 22:02 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

> Here is something I'd like you guys to test.

Very interesting.

[...]
> Could you guys please try this patch below?  Thanks.

  CC      arch/sparc/mm/init_64.o
arch/sparc/mm/init_64.c: In function 'update_mmu_cache_pmd':
arch/sparc/mm/init_64.c:2625:6: error: 'pte' may be used uninitialized in this function [-Werror=uninitialized]

gcc 4.6.4.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-07-30 22:02                     ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-07-30 22:02 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

> Here is something I'd like you guys to test.

Very interesting.

[...]
> Could you guys please try this patch below?  Thanks.

  CC      arch/sparc/mm/init_64.o
arch/sparc/mm/init_64.c: In function 'update_mmu_cache_pmd':
arch/sparc/mm/init_64.c:2625:6: error: 'pte' may be used uninitialized in this function [-Werror=uninitialized]

gcc 4.6.4.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-07-30 22:02                     ` Meelis Roos
@ 2014-07-30 22:07                       ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-07-30 22:07 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

From: Meelis Roos <mroos@linux.ee>
Date: Thu, 31 Jul 2014 01:02:53 +0300 (EEST)

>> Here is something I'd like you guys to test.
> 
> Very interesting.
> 
> [...]
>> Could you guys please try this patch below?  Thanks.
> 
>   CC      arch/sparc/mm/init_64.o
> arch/sparc/mm/init_64.c: In function 'update_mmu_cache_pmd':
> arch/sparc/mm/init_64.c:2625:6: error: 'pte' may be used uninitialized in this function [-Werror=uninitialized]
> 
> gcc 4.6.4.

I'm very disappointed that gcc-4.6.3 didn't say anything to me about
this :-)

Here is a fixed patch, thanks.

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16b58ff..db5ddde 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -351,6 +351,10 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 
 	mm = vma->vm_mm;
 
+	/* Don't insert a non-valid PTE into the TSB, we'll deadlock.  */
+	if (!pte_accessible(mm, pte))
+		return;
+
 	spin_lock_irqsave(&mm->context.lock, flags);
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2619,6 +2623,10 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 	pte = pmd_val(entry);
 
+	/* Don't insert a non-valid PMD into the TSB, we'll deadlock.  */
+	if (!(pte & _PAGE_VALID))
+		return;
+
 	/* We are fabricating 8MB pages using 4MB real hw pages.  */
 	pte |= (addr & (1UL << REAL_HPAGE_SHIFT));
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-07-30 22:07                       ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-07-30 22:07 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

From: Meelis Roos <mroos@linux.ee>
Date: Thu, 31 Jul 2014 01:02:53 +0300 (EEST)

>> Here is something I'd like you guys to test.
> 
> Very interesting.
> 
> [...]
>> Could you guys please try this patch below?  Thanks.
> 
>   CC      arch/sparc/mm/init_64.o
> arch/sparc/mm/init_64.c: In function 'update_mmu_cache_pmd':
> arch/sparc/mm/init_64.c:2625:6: error: 'pte' may be used uninitialized in this function [-Werror=uninitialized]
> 
> gcc 4.6.4.

I'm very disappointed that gcc-4.6.3 didn't say anything to me about
this :-)

Here is a fixed patch, thanks.

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16b58ff..db5ddde 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -351,6 +351,10 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 
 	mm = vma->vm_mm;
 
+	/* Don't insert a non-valid PTE into the TSB, we'll deadlock.  */
+	if (!pte_accessible(mm, pte))
+		return;
+
 	spin_lock_irqsave(&mm->context.lock, flags);
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2619,6 +2623,10 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 	pte = pmd_val(entry);
 
+	/* Don't insert a non-valid PMD into the TSB, we'll deadlock.  */
+	if (!(pte & _PAGE_VALID))
+		return;
+
 	/* We are fabricating 8MB pages using 4MB real hw pages.  */
 	pte |= (addr & (1UL << REAL_HPAGE_SHIFT));
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-07-30 22:07                       ` David Miller
@ 2014-08-13 11:44                         ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-13 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze


I tested the merged sparc64 fixes in current git.

V100: with hugetlb, looping git clone still hangs the machine. RED 
state on reboot has changed - before it gave the trace on reboot and 
continued, now it contonues looping with the trace indefinitely, gcc 
4.6.4

Netra X1: works fine with HugeTLB (but no hugetlbfs) and looping git 
clone, gcc 4.6.4.

V440: fails to boot with 3.16 and gcc 4.9.1 (same message as before, in 
fault_in_user_windows+0xe0/0x100). Tried latest git with gcc 4.9.1 but 
it stops after before getting to the previous failure point:
[   77.871887] console [tty0] enabled
[   77.912630] bootconsole [earlyprom0] disabled

T2000: works fine with gcc 4.9.1 and hugetlb, 3.16. Hangs at boot at the 
same point.

U2: looping git clone fails with 3.16, no hugetlb(!!!), gcc 4.6.4;

E420R: works fine with gcc 4.6.4. 3.16, hugetlb, looping git clone; 
fails to boot with current git

E220R: works with gcc 4.6.4, no hugetlb, 3.16. Hangs on boot with 
current git + hugetlb.

Did not test current git more. Will test the patches on top of 3.16 
separately to see if/which one of these is the culprit.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-13 11:44                         ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-13 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze


I tested the merged sparc64 fixes in current git.

V100: with hugetlb, looping git clone still hangs the machine. RED 
state on reboot has changed - before it gave the trace on reboot and 
continued, now it contonues looping with the trace indefinitely, gcc 
4.6.4

Netra X1: works fine with HugeTLB (but no hugetlbfs) and looping git 
clone, gcc 4.6.4.

V440: fails to boot with 3.16 and gcc 4.9.1 (same message as before, in 
fault_in_user_windows+0xe0/0x100). Tried latest git with gcc 4.9.1 but 
it stops after before getting to the previous failure point:
[   77.871887] console [tty0] enabled
[   77.912630] bootconsole [earlyprom0] disabled

T2000: works fine with gcc 4.9.1 and hugetlb, 3.16. Hangs at boot at the 
same point.

U2: looping git clone fails with 3.16, no hugetlb(!!!), gcc 4.6.4;

E420R: works fine with gcc 4.6.4. 3.16, hugetlb, looping git clone; 
fails to boot with current git

E220R: works with gcc 4.6.4, no hugetlb, 3.16. Hangs on boot with 
current git + hugetlb.

Did not test current git more. Will test the patches on top of 3.16 
separately to see if/which one of these is the culprit.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-13 11:44                         ` Meelis Roos
@ 2014-08-13 19:46                           ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-08-13 19:46 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

From: Meelis Roos <mroos@linux.ee>
Date: Wed, 13 Aug 2014 14:44:42 +0300 (EEST)

> Did not test current git more.

Current git fails to boot without this fix which I posted the other
day:

====================
[PATCH 1/2] sparc64: Do not disable interrupts in nmi_cpu_busy()

nmi_cpu_busy() is a SMP function call that just makes sure that all of the
cpus are spinning using cpu cycles while the NMI test runs.

It does not need to disable IRQs because we just care about NMIs executing
which will even with 'normal' IRQs disabled.

It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
triggers the BUG check in irq_work_run_list():

	BUG_ON(!irqs_disabled());

Because now irq_work_run() is invoked from the tail of
generic_smp_call_function_single_interrupt().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/kernel/nmi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 3370945..5b1151d 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -130,7 +130,6 @@ static inline unsigned int get_nmi_count(int cpu)
 
 static __init void nmi_cpu_busy(void *data)
 {
-	local_irq_enable_in_hardirq();
 	while (endflag == 0)
 		mb();
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-13 19:46                           ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-08-13 19:46 UTC (permalink / raw)
  To: mroos; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

From: Meelis Roos <mroos@linux.ee>
Date: Wed, 13 Aug 2014 14:44:42 +0300 (EEST)

> Did not test current git more.

Current git fails to boot without this fix which I posted the other
day:

==========
[PATCH 1/2] sparc64: Do not disable interrupts in nmi_cpu_busy()

nmi_cpu_busy() is a SMP function call that just makes sure that all of the
cpus are spinning using cpu cycles while the NMI test runs.

It does not need to disable IRQs because we just care about NMIs executing
which will even with 'normal' IRQs disabled.

It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
triggers the BUG check in irq_work_run_list():

	BUG_ON(!irqs_disabled());

Because now irq_work_run() is invoked from the tail of
generic_smp_call_function_single_interrupt().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/kernel/nmi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 3370945..5b1151d 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -130,7 +130,6 @@ static inline unsigned int get_nmi_count(int cpu)
 
 static __init void nmi_cpu_busy(void *data)
 {
-	local_irq_enable_in_hardirq();
 	while (endflag = 0)
 		mb();
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-13 19:46                           ` David Miller
@ 2014-08-14 12:20                             ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-14 12:20 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

> > Did not test current git more.
> 
> Current git fails to boot without this fix which I posted the other
> day:
> 
> ====================
> [PATCH 1/2] sparc64: Do not disable interrupts in nmi_cpu_busy()

Thanks, I noticed it on sparclinux@ but did not add one and one 
together. Now it seems to work with at least T2000. Will test other 
machines as I get some time.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-14 12:20                             ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-14 12:20 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

> > Did not test current git more.
> 
> Current git fails to boot without this fix which I posted the other
> day:
> 
> ==========
> [PATCH 1/2] sparc64: Do not disable interrupts in nmi_cpu_busy()

Thanks, I noticed it on sparclinux@ but did not add one and one 
together. Now it seems to work with at least T2000. Will test other 
machines as I get some time.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-13 19:46                           ` David Miller
@ 2014-08-15 12:42                             ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-15 12:42 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

> > Did not test current git more.
> 
> Current git fails to boot without this fix which I posted the other
> day:

T2000 is OK with todays GIT, hugepages gcc 4.9.1.

V100 and Netra X1 now loop indefinitely on successful reboot in PROM 
recursive fault (3.16 had the fault once and continued).

Got this from one reboot of X1:
[info] Using makefile-style concurrent boot in runlevel 6.
[....] Stopping deferred execution scheduler: atd. ok
[....] Stopping MTA: exim4_listener. ok
[....] Asking all remaining processes to terminate...done.
[....] All processes ended within 4 seconds...done.
[  565.689832] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [rsyslogd:1715]
[  565.788276] Modules linked in: ipv6 loop ohci_pci ohci_hcd i2c_ali15x3 usbcore i2c_ali1535 i2ccorn
[  565.922072] CPU: 0 PID: 1715 Comm: rsyslogd Not tainted 3.16.0-10959-gf0094b2 #130
[  566.021635] task: ffffff006c772f00 ti: ffffff006c6b0000 task.ti: ffffff006c6b0000
[  566.120035] TSTATE: 0000004411001606 TPC: 00000000007895f0 TNPC: 00000000007895f4 Y: 00000000    d
[  566.249317] TPC: <put_compound_page.part.22+0x154/0x1c0>
[  566.319098] g0: 00000000004209d0 g1: 0000000000000000 g2: 0000000000000002 g3: 00000000004b0840
[  566.433415] g4: ffffff006c772f00 g5: 0000000000000008 g6: ffffff006c6b0000 g7: 0000000000000000
[  566.547817] o0: 0000000000000001 o1: 0000010000d5f818 o2: 00000000f77c2000 o3: 0000000000000001
[  566.662217] o4: ffffff006c6b3a98 o5: ffffff006c6b39dc sp: ffffff006c6b3131 ret_pc: 000000000078950
[  566.781197] RPC: <put_compound_page.part.22+0x134/0x1c0>
[  566.850994] l0: 00000000f77c2000 l1: fffffffe00000000 l2: 0000000200000000 l3: 00000000f77c1fff
[  566.965312] l4: 0000000000000000 l5: 0000000000000001 l6: 0000000000000000 l7: 0000000000000008
[  567.079714] i0: 0000010000d5f800 i1: 00000000f77c2000 i2: 0000000000000001 i3: 0000000000000000
[  567.194116] i4: 0000010000d5001c i5: 0000010000d50000 i6: ffffff006c6b31e1 i7: 000000000049aaa4
[  567.308527] I7: <get_futex_key+0x1c4/0x280>
[  567.363456] Call Trace:
[  567.395464]  [000000000049aaa4] get_futex_key+0x1c4/0x280
[  567.466332]  [000000000049ad7c] futex_wait_setup+0x1c/0xc0
[  567.538443]  [000000000049af14] futex_wait+0xf4/0x1c0
[  567.604738]  [000000000049c878] do_futex+0x138/0x240
[  567.669990]  [000000000049ce48] compat_SyS_futex+0x128/0x180
[  567.744394]  [0000000000406074] linux_sparc_syscall32+0x34/0x60

Otherwise V100 and X1 seems to survive looping git cloen well with 
transparent hugepages on and gcc 4.6.4.

U10 not tested yet so no test to CPI ROm changes yet (need to get to the 
machine). Similar for U5 and RED state exceptions on reboot.

V210 has a new problem - hans on boot during SCSI detection:
[   34.523440] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15, base_baud = 115387) is a 16550A
[   34.523467] Console: ttyS0 (SU)
[   43.731627] console [ttyS0] enabled
[   43.777688] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15, base_baud = 115387) is a 16550A
[   43.889462] PCI: Enabling device: (0002:00:02.0), cmd 147
[   43.960956] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
[   44.039849] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
[   44.158317] sym0: SCSI BUS has been reset.
[   44.212124] scsi host0: sym-2.2.3

Retested with todays git, same.


I also solved my mysterious hangs of V100 - it was a simple user error 
with serial console and Break dropping me to OBP when the other end of 
the serial connection was rebooted with minicom open.

U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
to them physically.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-15 12:42                             ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-15 12:42 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze

> > Did not test current git more.
> 
> Current git fails to boot without this fix which I posted the other
> day:

T2000 is OK with todays GIT, hugepages gcc 4.9.1.

V100 and Netra X1 now loop indefinitely on successful reboot in PROM 
recursive fault (3.16 had the fault once and continued).

Got this from one reboot of X1:
[info] Using makefile-style concurrent boot in runlevel 6.
[....] Stopping deferred execution scheduler: atd. ok
[....] Stopping MTA: exim4_listener. ok
[....] Asking all remaining processes to terminate...done.
[....] All processes ended within 4 seconds...done.
[  565.689832] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [rsyslogd:1715]
[  565.788276] Modules linked in: ipv6 loop ohci_pci ohci_hcd i2c_ali15x3 usbcore i2c_ali1535 i2ccorn
[  565.922072] CPU: 0 PID: 1715 Comm: rsyslogd Not tainted 3.16.0-10959-gf0094b2 #130
[  566.021635] task: ffffff006c772f00 ti: ffffff006c6b0000 task.ti: ffffff006c6b0000
[  566.120035] TSTATE: 0000004411001606 TPC: 00000000007895f0 TNPC: 00000000007895f4 Y: 00000000    d
[  566.249317] TPC: <put_compound_page.part.22+0x154/0x1c0>
[  566.319098] g0: 00000000004209d0 g1: 0000000000000000 g2: 0000000000000002 g3: 00000000004b0840
[  566.433415] g4: ffffff006c772f00 g5: 0000000000000008 g6: ffffff006c6b0000 g7: 0000000000000000
[  566.547817] o0: 0000000000000001 o1: 0000010000d5f818 o2: 00000000f77c2000 o3: 0000000000000001
[  566.662217] o4: ffffff006c6b3a98 o5: ffffff006c6b39dc sp: ffffff006c6b3131 ret_pc: 000000000078950
[  566.781197] RPC: <put_compound_page.part.22+0x134/0x1c0>
[  566.850994] l0: 00000000f77c2000 l1: fffffffe00000000 l2: 0000000200000000 l3: 00000000f77c1fff
[  566.965312] l4: 0000000000000000 l5: 0000000000000001 l6: 0000000000000000 l7: 0000000000000008
[  567.079714] i0: 0000010000d5f800 i1: 00000000f77c2000 i2: 0000000000000001 i3: 0000000000000000
[  567.194116] i4: 0000010000d5001c i5: 0000010000d50000 i6: ffffff006c6b31e1 i7: 000000000049aaa4
[  567.308527] I7: <get_futex_key+0x1c4/0x280>
[  567.363456] Call Trace:
[  567.395464]  [000000000049aaa4] get_futex_key+0x1c4/0x280
[  567.466332]  [000000000049ad7c] futex_wait_setup+0x1c/0xc0
[  567.538443]  [000000000049af14] futex_wait+0xf4/0x1c0
[  567.604738]  [000000000049c878] do_futex+0x138/0x240
[  567.669990]  [000000000049ce48] compat_SyS_futex+0x128/0x180
[  567.744394]  [0000000000406074] linux_sparc_syscall32+0x34/0x60

Otherwise V100 and X1 seems to survive looping git cloen well with 
transparent hugepages on and gcc 4.6.4.

U10 not tested yet so no test to CPI ROm changes yet (need to get to the 
machine). Similar for U5 and RED state exceptions on reboot.

V210 has a new problem - hans on boot during SCSI detection:
[   34.523440] f00aba6c: ttyS0 at MMIO 0x7fe010003f8 (irq = 15, base_baud = 115387) is a 16550A
[   34.523467] Console: ttyS0 (SU)
[   43.731627] console [ttyS0] enabled
[   43.777688] f00ad5ec: ttyS1 at MMIO 0x7fe010002e8 (irq = 15, base_baud = 115387) is a 16550A
[   43.889462] PCI: Enabling device: (0002:00:02.0), cmd 147
[   43.960956] sym0: <1010-66> rev 0x1 at pci 0002:00:02.0 irq 24
[   44.039849] sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
[   44.158317] sym0: SCSI BUS has been reset.
[   44.212124] scsi host0: sym-2.2.3

Retested with todays git, same.


I also solved my mysterious hangs of V100 - it was a simple user error 
with serial console and Break dropping me to OBP when the other end of 
the serial connection was rebooted with minicom open.

U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
to them physically.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-15 12:42                             ` Meelis Roos
@ 2014-08-18 12:30                               ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-18 12:30 UTC (permalink / raw)
  To: David Miller
  Cc: aaro.koskinen, sparclinux, Linux Kernel list, hughd, cat.schulze

> U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
> to them physically.

Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer 
strarting postfix and ntpd, gets RED state exception and contiunes 
looping with it (before it gor RED state only after prom reboot).

ntpd.

RED State Exception

TL=0000.0000.0000.0005 TT=0000.0000.0000.0064
   TPC=0000.0000.0042.4c80 TnPC=0000.0000.0042.4c84 TSTATE=0000.0000.1104.1407
TL=0000.0000.0000.0004 TT=0000.0000.0000.0064
   TPC=0000.0000.0042.4c80 TnPC=0000.0000.0042.4c84 TSTATE=0000.0000.1104.1407
TL=0000.0000.0000.0003 TT=0000.0000.0000.0064
   TPC=0000.0000.0042.4c80 TnPC=0000.0000.0042.4c84 TSTATE=0000.0000.1104.1407
TL=0000.0000.0000.0002 TT=0000.0000.0000.0064
   TPC=0000.0000.0042.0c80 TnPC=0000.0000.0042.0c84 TSTATE=0000.0000.1104.1407
TL=0000.0000.0000.0001 TT=0000.0000.0000.0064
   TPC=0000.0000.0044.8580 TnPC=0000.0000.0044.8584 TSTATE=0000.0000.1100.1607


RED State Exception

TL=0000.0000.0000.0005 TT=0000.0000.0000.0064
   TPC=0000.0000.f000.4c80 TnPC=0000.0000.f000.4c84 TSTATE=0000.0044.5604.1400
TL=0000.0000.0000.0004 TT=0000.0000.0000.0064
   TPC=0000.0000.f000.4c80 TnPC=0000.0000.f000.4c84 TSTATE=0000.0044.5604.1400
TL=0000.0000.0000.0003 TT=0000.0000.0000.0064
   TPC=0000.0000.f000.4c80 TnPC=0000.0000.f000.4c84 TSTATE=0000.0044.5604.1400
TL=0000.0000.0000.0002 TT=0000.0000.0000.0064
   TPC=0000.0000.f000.0c80 TnPC=0000.0000.f000.0c84 TSTATE=0000.0044.5604.1400
TL=0000.0000.0000.0001 TT=0000.0000.0000.0064
   TPC=0000.0000.f000.3a00 TnPC=0000.0000.f000.3a04 TSTATE=0000.0044.5600.0400



-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-18 12:30                               ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-18 12:30 UTC (permalink / raw)
  To: David Miller
  Cc: aaro.koskinen, sparclinux, Linux Kernel list, hughd, cat.schulze

> U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
> to them physically.

Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer 
strarting postfix and ntpd, gets RED state exception and contiunes 
looping with it (before it gor RED state only after prom reboot).

ntpd.

RED State Exception

TL\000.0000.0000.0005 TT\000.0000.0000.0064
   TPC\000.0000.0042.4c80 TnPC\000.0000.0042.4c84 TSTATE\000.0000.1104.1407
TL\000.0000.0000.0004 TT\000.0000.0000.0064
   TPC\000.0000.0042.4c80 TnPC\000.0000.0042.4c84 TSTATE\000.0000.1104.1407
TL\000.0000.0000.0003 TT\000.0000.0000.0064
   TPC\000.0000.0042.4c80 TnPC\000.0000.0042.4c84 TSTATE\000.0000.1104.1407
TL\000.0000.0000.0002 TT\000.0000.0000.0064
   TPC\000.0000.0042.0c80 TnPC\000.0000.0042.0c84 TSTATE\000.0000.1104.1407
TL\000.0000.0000.0001 TT\000.0000.0000.0064
   TPC\000.0000.0044.8580 TnPC\000.0000.0044.8584 TSTATE\000.0000.1100.1607


RED State Exception

TL\000.0000.0000.0005 TT\000.0000.0000.0064
   TPC\000.0000.f000.4c80 TnPC\000.0000.f000.4c84 TSTATE\000.0044.5604.1400
TL\000.0000.0000.0004 TT\000.0000.0000.0064
   TPC\000.0000.f000.4c80 TnPC\000.0000.f000.4c84 TSTATE\000.0044.5604.1400
TL\000.0000.0000.0003 TT\000.0000.0000.0064
   TPC\000.0000.f000.4c80 TnPC\000.0000.f000.4c84 TSTATE\000.0044.5604.1400
TL\000.0000.0000.0002 TT\000.0000.0000.0064
   TPC\000.0000.f000.0c80 TnPC\000.0000.f000.0c84 TSTATE\000.0044.5604.1400
TL\000.0000.0000.0001 TT\000.0000.0000.0064
   TPC\000.0000.f000.3a00 TnPC\000.0000.f000.3a04 TSTATE\000.0044.5600.0400



-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-18 12:30                               ` Meelis Roos
@ 2014-08-18 17:35                                 ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-08-18 17:35 UTC (permalink / raw)
  To: Meelis Roos
  Cc: David Miller, sparclinux, Linux Kernel list, hughd, cat.schulze

Hi,

On Mon, Aug 18, 2014 at 03:30:16PM +0300, Meelis Roos wrote:
> > U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
> > to them physically.
> 
> Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer 
> strarting postfix and ntpd, gets RED state exception and contiunes 
> looping with it (before it gor RED state only after prom reboot).

My Ultra 5 is fine with 3.17-rc1 (I'm writing this mail from it),
also Ultra 10 seems to be OK based on quick test.

I'm going to run GCC 4.9.1 bootstrap & testsuite on these machines
maybe next week. Unfortunately due to summer schedules I'm a bit lost
if there are still some special patches I should try (to get rid
of $SUBJECT)? If not I'll probably try it with plain 3.17-rc2.

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-18 17:35                                 ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-08-18 17:35 UTC (permalink / raw)
  To: Meelis Roos
  Cc: David Miller, sparclinux, Linux Kernel list, hughd, cat.schulze

Hi,

On Mon, Aug 18, 2014 at 03:30:16PM +0300, Meelis Roos wrote:
> > U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
> > to them physically.
> 
> Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer 
> strarting postfix and ntpd, gets RED state exception and contiunes 
> looping with it (before it gor RED state only after prom reboot).

My Ultra 5 is fine with 3.17-rc1 (I'm writing this mail from it),
also Ultra 10 seems to be OK based on quick test.

I'm going to run GCC 4.9.1 bootstrap & testsuite on these machines
maybe next week. Unfortunately due to summer schedules I'm a bit lost
if there are still some special patches I should try (to get rid
of $SUBJECT)? If not I'll probably try it with plain 3.17-rc2.

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-18 17:35                                 ` Aaro Koskinen
@ 2014-08-18 17:38                                   ` David Miller
  -1 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-08-18 17:38 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd, cat.schulze

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 18 Aug 2014 20:35:52 +0300

> Hi,
> 
> On Mon, Aug 18, 2014 at 03:30:16PM +0300, Meelis Roos wrote:
>> > U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
>> > to them physically.
>> 
>> Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer 
>> strarting postfix and ntpd, gets RED state exception and contiunes 
>> looping with it (before it gor RED state only after prom reboot).
> 
> My Ultra 5 is fine with 3.17-rc1 (I'm writing this mail from it),
> also Ultra 10 seems to be OK based on quick test.
> 
> I'm going to run GCC 4.9.1 bootstrap & testsuite on these machines
> maybe next week. Unfortunately due to summer schedules I'm a bit lost
> if there are still some special patches I should try (to get rid
> of $SUBJECT)? If not I'll probably try it with plain 3.17-rc2.

All patches are in 3,17-rc1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-18 17:38                                   ` David Miller
  0 siblings, 0 replies; 60+ messages in thread
From: David Miller @ 2014-08-18 17:38 UTC (permalink / raw)
  To: aaro.koskinen; +Cc: mroos, sparclinux, linux-kernel, hughd, cat.schulze

From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Mon, 18 Aug 2014 20:35:52 +0300

> Hi,
> 
> On Mon, Aug 18, 2014 at 03:30:16PM +0300, Meelis Roos wrote:
>> > U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get 
>> > to them physically.
>> 
>> Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer 
>> strarting postfix and ntpd, gets RED state exception and contiunes 
>> looping with it (before it gor RED state only after prom reboot).
> 
> My Ultra 5 is fine with 3.17-rc1 (I'm writing this mail from it),
> also Ultra 10 seems to be OK based on quick test.
> 
> I'm going to run GCC 4.9.1 bootstrap & testsuite on these machines
> maybe next week. Unfortunately due to summer schedules I'm a bit lost
> if there are still some special patches I should try (to get rid
> of $SUBJECT)? If not I'll probably try it with plain 3.17-rc2.

All patches are in 3,17-rc1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-18 17:35                                 ` Aaro Koskinen
@ 2014-08-18 23:45                                   ` Julian Calaby
  -1 siblings, 0 replies; 60+ messages in thread
From: Julian Calaby @ 2014-08-18 23:45 UTC (permalink / raw)
  To: Aaro Koskinen
  Cc: Meelis Roos, David Miller, sparclinux, Linux Kernel list, hughd,
	cat.schulze

Hi All,

On Tue, Aug 19, 2014 at 3:35 AM, Aaro Koskinen <aaro.koskinen@iki.fi> wrote:
> Hi,
>
> On Mon, Aug 18, 2014 at 03:30:16PM +0300, Meelis Roos wrote:
>> > U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get
>> > to them physically.
>>
>> Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer
>> strarting postfix and ntpd, gets RED state exception and contiunes
>> looping with it (before it gor RED state only after prom reboot).
>
> My Ultra 5 is fine with 3.17-rc1 (I'm writing this mail from it),
> also Ultra 10 seems to be OK based on quick test.

Stupid question: aren't the Ultra 5 and Ultra 10 essentially the same hardware?

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-18 23:45                                   ` Julian Calaby
  0 siblings, 0 replies; 60+ messages in thread
From: Julian Calaby @ 2014-08-18 23:45 UTC (permalink / raw)
  To: Aaro Koskinen
  Cc: Meelis Roos, David Miller, sparclinux, Linux Kernel list, hughd,
	cat.schulze

Hi All,

On Tue, Aug 19, 2014 at 3:35 AM, Aaro Koskinen <aaro.koskinen@iki.fi> wrote:
> Hi,
>
> On Mon, Aug 18, 2014 at 03:30:16PM +0300, Meelis Roos wrote:
>> > U1, U2, U5, U10, E220R, E420R later or some other day, whenever I get
>> > to them physically.
>>
>> Ultra 5 is bad news with 3.17-rc1: it almost boots up, then aftyer
>> strarting postfix and ntpd, gets RED state exception and contiunes
>> looping with it (before it gor RED state only after prom reboot).
>
> My Ultra 5 is fine with 3.17-rc1 (I'm writing this mail from it),
> also Ultra 10 seems to be OK based on quick test.

Stupid question: aren't the Ultra 5 and Ultra 10 essentially the same hardware?

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-07-29 23:26                   ` David Miller
@ 2014-08-19  8:22                     ` Meelis Roos
  -1 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-19  8:22 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze


Meanwhile, a Ultra 1 with overnight looping git clone got exit_mmap 
warning again with 3.17.0-rc1. Otherwise it is working good.

[11052.686935] ------------[ cut here ]------------
[11052.740486] WARNING: CPU: 0 PID: 2541 at mm/mmap.c:2766 exit_mmap+0x138/0x160()
[11052.827934] Modules linked in: osst snd_sun_cs4231 snd_pcm snd_timer snd soundcore parport_sunbpp parport st ch qlogicpti sunhme ipv6 sr_mod cdrom sg evdev
[11052.994500] CPU: 0 PID: 2541 Comm: git Not tainted 3.17.0-rc1 #49
[11053.067464] Call Trace:
[11053.096647]  [00000000004d1758] exit_mmap+0x138/0x160
[11053.157091]  [000000000044c1ec] mmput+0x2c/0xc0
[11053.211256]  [000000000044e168] exit_mm+0x108/0x180
[11053.269597]  [000000000044f908] do_exit+0x228/0x320
[11053.327935]  [000000000044fae4] do_group_exit+0x24/0xc0
[11053.390440]  [000000000044fb94] SyS_exit_group+0x14/0x20
[11053.454004]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
[11053.524809] ---[ end trace 7b6188ceaeca01dd ]---
[11053.580132] BUG: Bad rss-counter state mm:ffffff0032c778c0 idx:0 val:7

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-19  8:22                     ` Meelis Roos
  0 siblings, 0 replies; 60+ messages in thread
From: Meelis Roos @ 2014-08-19  8:22 UTC (permalink / raw)
  To: David Miller; +Cc: aaro.koskinen, sparclinux, linux-kernel, hughd, cat.schulze


Meanwhile, a Ultra 1 with overnight looping git clone got exit_mmap 
warning again with 3.17.0-rc1. Otherwise it is working good.

[11052.686935] ------------[ cut here ]------------
[11052.740486] WARNING: CPU: 0 PID: 2541 at mm/mmap.c:2766 exit_mmap+0x138/0x160()
[11052.827934] Modules linked in: osst snd_sun_cs4231 snd_pcm snd_timer snd soundcore parport_sunbpp parport st ch qlogicpti sunhme ipv6 sr_mod cdrom sg evdev
[11052.994500] CPU: 0 PID: 2541 Comm: git Not tainted 3.17.0-rc1 #49
[11053.067464] Call Trace:
[11053.096647]  [00000000004d1758] exit_mmap+0x138/0x160
[11053.157091]  [000000000044c1ec] mmput+0x2c/0xc0
[11053.211256]  [000000000044e168] exit_mm+0x108/0x180
[11053.269597]  [000000000044f908] do_exit+0x228/0x320
[11053.327935]  [000000000044fae4] do_group_exit+0x24/0xc0
[11053.390440]  [000000000044fb94] SyS_exit_group+0x14/0x20
[11053.454004]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
[11053.524809] ---[ end trace 7b6188ceaeca01dd ]---
[11053.580132] BUG: Bad rss-counter state mm:ffffff0032c778c0 idx:0 val:7

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-18 23:45                                   ` Julian Calaby
@ 2014-08-19 21:29                                     ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-08-19 21:29 UTC (permalink / raw)
  To: Julian Calaby
  Cc: Meelis Roos, David Miller, sparclinux, Linux Kernel list, hughd,
	cat.schulze

Hi,

On Tue, Aug 19, 2014 at 09:45:03AM +1000, Julian Calaby wrote:
> Stupid question: aren't the Ultra 5 and Ultra 10 essentially
> the same hardware?

Basically yes, but often configurations are different (CPU speed,
memory capacity, peripherals, PROM versions).

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-19 21:29                                     ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-08-19 21:29 UTC (permalink / raw)
  To: Julian Calaby
  Cc: Meelis Roos, David Miller, sparclinux, Linux Kernel list, hughd,
	cat.schulze

Hi,

On Tue, Aug 19, 2014 at 09:45:03AM +1000, Julian Calaby wrote:
> Stupid question: aren't the Ultra 5 and Ultra 10 essentially
> the same hardware?

Basically yes, but often configurations are different (CPU speed,
memory capacity, peripherals, PROM versions).

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
  2014-08-18 17:38                                   ` David Miller
@ 2014-08-30 22:27                                     ` Aaro Koskinen
  -1 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-08-30 22:27 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel, hughd, cat.schulze

Hi,

On Mon, Aug 18, 2014 at 10:38:50AM -0700, David Miller wrote:
> All patches are in 3,17-rc1

FYI, the warning/bug still triggers with 3.17-rc2 during GCC bootstrap:

[94075.963753] ------------[ cut here ]------------
[94076.018105] WARNING: CPU: 0 PID: 17192 at /home/aaro/los/work/shared/linux-v3.17-rc2/mm/mmap.c:2766 exit_mmap+0x128/0x160()
[94076.151407] Modules linked in:
[94076.187825] CPU: 0 PID: 17192 Comm: rm Not tainted 3.17.0-rc2-ultra-los_3ec1 #1
[94076.275319] Call Trace:
[94076.304490]  [00000000004c1308] exit_mmap+0x128/0x160
[94076.364915]  [000000000045118c] mmput+0x2c/0xc0
[94076.419062]  [0000000000453cb0] do_exit+0x1b0/0x880
[94076.477387]  [0000000000454ff8] do_group_exit+0x38/0xc0
[94076.539880]  [0000000000455094] SyS_exit_group+0x14/0x20
[94076.603429]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
[94076.674225] ---[ end trace b4b3ce0b3bcc0234 ]---
[94076.729446] BUG: Bad rss-counter state mm:ffffff0016898000 idx:1 val:2

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
@ 2014-08-30 22:27                                     ` Aaro Koskinen
  0 siblings, 0 replies; 60+ messages in thread
From: Aaro Koskinen @ 2014-08-30 22:27 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel, hughd, cat.schulze

Hi,

On Mon, Aug 18, 2014 at 10:38:50AM -0700, David Miller wrote:
> All patches are in 3,17-rc1

FYI, the warning/bug still triggers with 3.17-rc2 during GCC bootstrap:

[94075.963753] ------------[ cut here ]------------
[94076.018105] WARNING: CPU: 0 PID: 17192 at /home/aaro/los/work/shared/linux-v3.17-rc2/mm/mmap.c:2766 exit_mmap+0x128/0x160()
[94076.151407] Modules linked in:
[94076.187825] CPU: 0 PID: 17192 Comm: rm Not tainted 3.17.0-rc2-ultra-los_3ec1 #1
[94076.275319] Call Trace:
[94076.304490]  [00000000004c1308] exit_mmap+0x128/0x160
[94076.364915]  [000000000045118c] mmput+0x2c/0xc0
[94076.419062]  [0000000000453cb0] do_exit+0x1b0/0x880
[94076.477387]  [0000000000454ff8] do_group_exit+0x38/0xc0
[94076.539880]  [0000000000455094] SyS_exit_group+0x14/0x20
[94076.603429]  [0000000000406074] linux_sparc_syscall32+0x34/0x60
[94076.674225] ---[ end trace b4b3ce0b3bcc0234 ]---
[94076.729446] BUG: Bad rss-counter state mm:ffffff0016898000 idx:1 val:2

A.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2014-08-30 22:27 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-16 21:06 sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160() Meelis Roos
2013-06-16 21:06 ` Meelis Roos
2013-06-17  5:32 ` Aaro Koskinen
2013-06-17  5:32   ` Aaro Koskinen
2013-06-17  5:58   ` Aaro Koskinen
2013-06-17  5:58     ` Aaro Koskinen
2013-08-03 20:40     ` David Miller
2013-08-03 20:40       ` David Miller
2013-10-22 17:46       ` Aaro Koskinen
2013-10-22 17:46         ` Aaro Koskinen
2013-10-22 17:54         ` David Miller
2013-10-22 17:54           ` David Miller
2014-04-14 18:43           ` Aaro Koskinen
2014-04-14 18:43             ` Aaro Koskinen
2014-04-14 18:58             ` David Miller
2014-04-14 18:58               ` David Miller
2014-04-16 18:58             ` David Miller
2014-04-16 18:58               ` David Miller
2014-04-16 22:22               ` mroos
2014-04-16 22:22                 ` mroos
2014-04-16 22:49                 ` David Miller
2014-04-16 22:49                   ` David Miller
2014-07-29 23:26                 ` David Miller
2014-07-29 23:26                   ` David Miller
2014-07-30 22:02                   ` Meelis Roos
2014-07-30 22:02                     ` Meelis Roos
2014-07-30 22:07                     ` David Miller
2014-07-30 22:07                       ` David Miller
2014-08-13 11:44                       ` Meelis Roos
2014-08-13 11:44                         ` Meelis Roos
2014-08-13 19:46                         ` David Miller
2014-08-13 19:46                           ` David Miller
2014-08-14 12:20                           ` Meelis Roos
2014-08-14 12:20                             ` Meelis Roos
2014-08-15 12:42                           ` Meelis Roos
2014-08-15 12:42                             ` Meelis Roos
2014-08-18 12:30                             ` Meelis Roos
2014-08-18 12:30                               ` Meelis Roos
2014-08-18 17:35                               ` Aaro Koskinen
2014-08-18 17:35                                 ` Aaro Koskinen
2014-08-18 17:38                                 ` David Miller
2014-08-18 17:38                                   ` David Miller
2014-08-30 22:27                                   ` Aaro Koskinen
2014-08-30 22:27                                     ` Aaro Koskinen
2014-08-18 23:45                                 ` Julian Calaby
2014-08-18 23:45                                   ` Julian Calaby
2014-08-19 21:29                                   ` Aaro Koskinen
2014-08-19 21:29                                     ` Aaro Koskinen
2014-08-19  8:22                   ` Meelis Roos
2014-08-19  8:22                     ` Meelis Roos
2014-04-25 20:09               ` Aaro Koskinen
2014-04-25 20:09                 ` Aaro Koskinen
2014-04-25 20:17                 ` David Miller
2014-04-25 20:17                   ` David Miller
2014-05-24 20:02                   ` mroos
2014-05-24 20:02                     ` mroos
2014-05-24 21:08                     ` David Miller
2014-05-24 21:08                       ` David Miller
2014-06-09  6:36                       ` Meelis Roos
2014-06-09  6:36                         ` Meelis Roos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.