All of lore.kernel.org
 help / color / mirror / Atom feed
* Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini
@ 2008-12-19  8:56 Hermann Lauer
  2009-01-20  6:25 ` Sunfire V880 and 480R 2.6.27.x startup hangs David Miller
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: Hermann Lauer @ 2008-12-19  8:56 UTC (permalink / raw)
  To: sparclinux

On Wed, Dec 17, 2008 at 07:46:46PM -0800, David Miller wrote:
> > tried 2.6.27.2 today, but this hangs already at loading the kernel.
> > Output is attached. Please tell me what else I can provide.
> 
> Unfortunately, "-p" doesn't do anything any more and the kernel
> is stopping or crashing during the part of the boot between
> when the early prom console is disabled and the real console is
> setup.
> 
> The way to debug this is to manually get rid of the CON_BOOT flag
> in the early prom console structure, like the following patch.

Applied your patch to the 2.6.27.9 vanilla source and booted,
console output up to the hang is below. Just to remember:
2.6.26.5 did not hang.

-------------------------------------------------------------------
Sun Fire 880, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.34, 8192 MB memory installed, Serial #50911524.
Ethernet address 0:3:ba:8:d9:24, Host ID: 8308d924.

ERROR: OpenBoot Diagnostics failed
WARNING: Device /pci@8,600000/SUNW,qlc@2 being marked with 'status' = fail
Rebooting with command: boot
Boot device: disk  File and args:
SILO Version 1.4.13
boot:
Linux                    27.9                     LinuxOLD
boot: 27.9
Allocated 8 Megs of memory at 0x40000000 for kernel
Loaded kernel version 2.6.27
Loading initial ramdisk (5350711 bytes at 0xA000400000 phys, 0x40C00000 virt)...
\
[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.22.34 2007/07/23 13:01'
[    0.000000] PROMLIB: Root node compatible:
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.27.9 (hlauer@install1) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Dec 18 17:20:27 CET 2008
[    0.000000] console [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:08:d9:24
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF stdout device is: /pci@9,700000/ebus@1/serial@1,400000:a
[    0.000000] PROM: Built device tree with 102802 bytes of memory.
[    0.000000] Top of RAM: 0xa1ffb1a000, Total RAM: 0x1ffb0e000
[    0.000000] Memory hole size: 655360MB
[    0.000000] [0000000318000000-fffff8a000c00000] page_structs\x131072 node=0 entry\x1120/0
[    0.000000] [0000000318000000-fffff8a001000000] page_structs\x131072 node=0 entry\x1121/0
[    0.000000] [0000000318700000-fffff8a001400000] page_structs\x131072 node=0 entry\x1122/0
[    0.000000] [0000000318700000-fffff8a001800000] page_structs\x131072 node=0 entry\x1123/0
[    0.000000] [0000000318e00000-fffff8a001c00000] page_structs\x131072 node=0 entry\x1124/0
[    0.000000] [0000000318e00000-fffff8a002000000] page_structs\x131072 node=0 entry\x1125/0
[    0.000000] [0000000319500000-fffff8a002400000] page_structs\x131072 node=0 entry\x1126/0
[    0.000000] [0000000319c00000-fffff8a002800000] page_structs\x131072 node=0 entry\x1127/0
[    0.000000] [0000000319c00000-fffff8a002c00000] page_structs\x131072 node=0 entry\x1128/0
[    0.000000] [000000031a300000-fffff8a003000000] page_structs\x131072 node=0 entry\x1129/0
[    0.000000] [000000031a300000-fffff8a003400000] page_structs\x131072 node=0 entry\x1130/0
[    0.000000] [000000031aa00000-fffff8a003800000] page_structs\x131072 node=0 entry\x1131/0
[    0.000000] [000000031aa00000-fffff8a003c00000] page_structs\x131072 node=0 entry\x1132/0
[    0.000000] [000000031b100000-fffff8a004000000] page_structs\x131072 node=0 entry\x1133/0
[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x05000000 -> 0x050ffd8d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     0: 0x05000000 -> 0x050ff7ff
[    0.000000]     0: 0x050ff800 -> 0x050ffd09
[    0.000000]     0: 0x050ffd0b -> 0x050ffd7b
[    0.000000]     0: 0x050ffd7e -> 0x050ffd8d
[    0.000000] Booting Linux...
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 1040779
[    0.000000] Kernel command line: root=/dev/sda1 ro
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] clocksource: mult[640000] shift[16]
[    0.000000] clockevent: mult[28f5c28] shift[32]
[   46.108402] Console: colour dummy device 80x25
[   46.161498] console [tty0] enabled
[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.22.34 2007/07/23 13:01'
[    0.000000] PROMLIB: Root node compatible:
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.27.9 (hlauer@install1) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Dec 18 17:20:27 CET 2008
[    0.000000] console [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:08:d9:24
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF stdout device is: /pci@9,700000/ebus@1/serial@1,400000:a
[    0.000000] PROM: Built device tree with 102802 bytes of memory.
[    0.000000] Top of RAM: 0xa1ffb1a000, Total RAM: 0x1ffb0e000
[    0.000000] Memory hole size: 655360MB
[    0.000000] [0000000318000000-fffff8a000c00000] page_structs\x131072 node=0 entry\x1120/0
[    0.000000] [0000000318000000-fffff8a001000000] page_structs\x131072 node=0 entry\x1121/0
[    0.000000] [0000000318700000-fffff8a001400000] page_structs\x131072 node=0 entry\x1122/0
[    0.000000] [0000000318700000-fffff8a001800000] page_structs\x131072 node=0 entry\x1123/0
[    0.000000] [0000000318e00000-fffff8a001c00000] page_structs\x131072 node=0 entry\x1124/0
[    0.000000] [0000000318e00000-fffff8a002000000] page_structs\x131072 node=0 entry\x1125/0
[    0.000000] [0000000319500000-fffff8a002400000] page_structs\x131072 node=0 entry\x1126/0
[    0.000000] [0000000319c00000-fffff8a002800000] page_structs\x131072 node=0 entry\x1127/0
[    0.000000] [0000000319c00000-fffff8a002c00000] page_structs\x131072 node=0 entry\x1128/0
[    0.000000] [000000031a300000-fffff8a003000000] page_structs\x131072 node=0 entry\x1129/0
[    0.000000] [000000031a300000-fffff8a003400000] page_structs\x131072 node=0 entry\x1130/0
[    0.000000] [000000031aa00000-fffff8a003800000] page_structs\x131072 node=0 entry\x1131/0
[    0.000000] [000000031aa00000-fffff8a003c00000] page_structs\x131072 node=0 entry\x1132/0
[    0.000000] [000000031b100000-fffff8a004000000] page_structs\x131072 node=0 entry\x1133/0
[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x05000000 -> 0x050ffd8d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     0: 0x05000000 -> 0x050ff7ff
[    0.000000]     0: 0x050ff800 -> 0x050ffd09
[    0.000000]     0: 0x050ffd0b -> 0x050ffd7b
[    0.000000]     0: 0x050ffd7e -> 0x050ffd8d
[    0.000000] Booting Linux...
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 1040779
[    0.000000] Kernel command line: root=/dev/sda1 ro
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] clocksource: mult[640000] shift[16]
[    0.000000] clockevent: mult[28f5c28] shift[32]
[   46.108402] Console: colour dummy device 80x25
[   46.161498] console [tty0] enabled
[   49.320774] Dentry cache hash table entries: 1048576 (order: 10, 8388608 bytes)
[   49.426009] Inode-cache hash table entries: 524288 (order: 9, 4194304 bytes)
[   49.786779] Memory: 8301680k available (2928k kernel code, 1104k data, 208k init) [fffff80000000000,000000a1ffb1a000]
[   49.991734] Calibrating delay using timer specific routine.. 19.90 BogoMIPS (lpj9810)
[   50.085944] Security Framework initialized
[   50.134751] SELinux:  Disabled at boot.
[   50.180620] Mount-cache hash table entries: 512
[   50.235257] Initializing cgroup subsys ns
[   50.282662] Initializing cgroup subsys cpuacct
[   50.335781] Initializing cgroup subsys devices
[   50.390759] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 11 cycles)
[   50.390776] Brought up 2 CPUs
[   50.391634] net_namespace: 1552 bytes
[   50.568542] NET: Registered protocol family 16

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
@ 2009-01-20  6:25 ` David Miller
  2009-01-22 13:29 ` Hermann Lauer
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-01-20  6:25 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Fri, 19 Dec 2008 09:56:22 +0100

> On Wed, Dec 17, 2008 at 07:46:46PM -0800, David Miller wrote:
> > > tried 2.6.27.2 today, but this hangs already at loading the kernel.
> > > Output is attached. Please tell me what else I can provide.
> > 
> > Unfortunately, "-p" doesn't do anything any more and the kernel
> > is stopping or crashing during the part of the boot between
> > when the early prom console is disabled and the real console is
> > setup.
> > 
> > The way to debug this is to manually get rid of the CON_BOOT flag
> > in the early prom console structure, like the following patch.
> 
> Applied your patch to the 2.6.27.9 vanilla source and booted,
> console output up to the hang is below. Just to remember:
> 2.6.26.5 did not hang.
 ...
> [   50.390759] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 11 cycles)
> [   50.390776] Brought up 2 CPUs
> [   50.391634] net_namespace: 1552 bytes
> [   50.568542] NET: Registered protocol family 16
> 

The next thing that probably should run is the PCI controller
probe.  But I can't say that for certain, so we need more
info.

Please reboot this test kernel with the following added boot
command line options: initcall_debug=1 ignore_loglevel

Let us see the console log output that generates.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
  2009-01-20  6:25 ` Sunfire V880 and 480R 2.6.27.x startup hangs David Miller
@ 2009-01-22 13:29 ` Hermann Lauer
  2009-01-26  2:30 ` David Miller
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-01-22 13:29 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

On Mon, Jan 19, 2009 at 10:25:14PM -0800, David Miller wrote:

>  ...
> > [   50.390759] CPU 0: synchronized TICK with master CPU (last diff 0 cycles, maxerr 11 cycles)
> > [   50.390776] Brought up 2 CPUs
> > [   50.391634] net_namespace: 1552 bytes
> > [   50.568542] NET: Registered protocol family 16
> > 
> 
> The next thing that probably should run is the PCI controller
> probe.  But I can't say that for certain, so we need more
> info.
> 
> Please reboot this test kernel with the following added boot
> command line options: initcall_debug=1 ignore_loglevel

Compiled 2.6.27.12 today without the CON_BOOT flag and booted with the
options above. The hang seems to be in of_bus_driver_init, see
console output below.

Hope this helps advancing towards the cassini stuff. Thanks.


-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

[-- Attachment #2: 480R.console --]
[-- Type: text/plain, Size: 10410 bytes --]

ERROR: OpenBoot Diagnostics failed
WARNING: Device /pci@8,600000/SUNW,qlc@2 being marked with 'status' == fail
Rebooting with command: boot
Boot device: disk  File and args:
SILO Version 1.4.13
boot: boot:
Linux                    27.x                     LinuxOLD
boot: 27.x
Allocated 8 Megs of memory at 0x40000000 for kernel
Loaded kernel version 2.6.27
Loading initial ramdisk (5350711 bytes at 0xA000400000 phys, 0x40C00000 virt)...
\
[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.22.34 2007/07/23 13:01'
[    0.000000] PROMLIB: Root node compatible:
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.27.12 (hlauer@install1) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #2 SMP Thu Jan 22 14:28:41 CET 2009
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] console [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:08:d9:24
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF stdout device is: /pci@9,700000/ebus@1/serial@1,400000:a
[    0.000000] PROM: Built device tree with 102802 bytes of memory.
[    0.000000] Top of RAM: 0xa1ffb1a000, Total RAM: 0x1ffb0e000
[    0.000000] Memory hole size: 655360MB
[    0.000000] [0000000318000000-fffff8a000c00000] page_structs=131072 node=0 entry=1120/0
[    0.000000] [0000000318000000-fffff8a001000000] page_structs=131072 node=0 entry=1121/0
[    0.000000] [0000000318700000-fffff8a001400000] page_structs=131072 node=0 entry=1122/0
[    0.000000] [0000000318700000-fffff8a001800000] page_structs=131072 node=0 entry=1123/0
[    0.000000] [0000000318e00000-fffff8a001c00000] page_structs=131072 node=0 entry=1124/0
[    0.000000] [0000000318e00000-fffff8a002000000] page_structs=131072 node=0 entry=1125/0
[    0.000000] [0000000319500000-fffff8a002400000] page_structs=131072 node=0 entry=1126/0
[    0.000000] [0000000319c00000-fffff8a002800000] page_structs=131072 node=0 entry=1127/0
[    0.000000] [0000000319c00000-fffff8a002c00000] page_structs=131072 node=0 entry=1128/0
[    0.000000] [000000031a300000-fffff8a003000000] page_structs=131072 node=0 entry=1129/0
[    0.000000] [000000031a300000-fffff8a003400000] page_structs=131072 node=0 entry=1130/0
[    0.000000] [000000031aa00000-fffff8a003800000] page_structs=131072 node=0 entry=1131/0
[    0.000000] [000000031aa00000-fffff8a003c00000] page_structs=131072 node=0 entry=1132/0
[    0.000000] [000000031b100000-fffff8a004000000] page_structs=131072 node=0 entry=1133/0
[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x05000000 -> 0x050ffd8d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     0: 0x05000000 -> 0x050ff7ff
[    0.000000]     0: 0x050ff800 -> 0x050ffd09
[    0.000000]     0: 0x050ffd0b -> 0x050ffd7b
[    0.000000]     0: 0x050ffd7e -> 0x050ffd8d
[    0.000000] On node 0 totalpages: 1047943
[    0.000000]   Normal zone: 1040779 pages, LIFO batch:15
[    0.000000] Booting Linux...
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 1040779
[    0.000000] Kernel command line: root=/dev/sda1 ro initcall_debug=1 ignore_loglevel
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] clocksource: mult[640000] shift[16]
[    0.000000] clockevent: mult[28f5c28] shift[32]
[   45.627252] Console: colour dummy device 80x25
[   45.680346] console [tty0] enabled
[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.22.34 2007/07/23 13:01'
[    0.000000] PROMLIB: Root node compatible:
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.27.12 (hlauer@install1) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #2 SMP Thu Jan 22 14:28:41 CET 2009
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] console [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:08:d9:24
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF stdout device is: /pci@9,700000/ebus@1/serial@1,400000:a
[    0.000000] PROM: Built device tree with 102802 bytes of memory.
[    0.000000] Top of RAM: 0xa1ffb1a000, Total RAM: 0x1ffb0e000
[    0.000000] Memory hole size: 655360MB
[    0.000000] [0000000318000000-fffff8a000c00000] page_structs=131072 node=0 entry=1120/0
[    0.000000] [0000000318000000-fffff8a001000000] page_structs=131072 node=0 entry=1121/0
[    0.000000] [0000000318700000-fffff8a001400000] page_structs=131072 node=0 entry=1122/0
[    0.000000] [0000000318700000-fffff8a001800000] page_structs=131072 node=0 entry=1123/0
[    0.000000] [0000000318e00000-fffff8a001c00000] page_structs=131072 node=0 entry=1124/0
[    0.000000] [0000000318e00000-fffff8a002000000] page_structs=131072 node=0 entry=1125/0
[    0.000000] [0000000319500000-fffff8a002400000] page_structs=131072 node=0 entry=1126/0
[    0.000000] [0000000319c00000-fffff8a002800000] page_structs=131072 node=0 entry=1127/0
[    0.000000] [0000000319c00000-fffff8a002c00000] page_structs=131072 node=0 entry=1128/0
[    0.000000] [000000031a300000-fffff8a003000000] page_structs=131072 node=0 entry=1129/0
[    0.000000] [000000031a300000-fffff8a003400000] page_structs=131072 node=0 entry=1130/0
[    0.000000] [000000031aa00000-fffff8a003800000] page_structs=131072 node=0 entry=1131/0
[    0.000000] [000000031aa00000-fffff8a003c00000] page_structs=131072 node=0 entry=1132/0
[    0.000000] [000000031b100000-fffff8a004000000] page_structs=131072 node=0 entry=1133/0
[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x05000000 -> 0x050ffd8d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     0: 0x05000000 -> 0x050ff7ff
[    0.000000]     0: 0x050ff800 -> 0x050ffd09
[    0.000000]     0: 0x050ffd0b -> 0x050ffd7b
[    0.000000]     0: 0x050ffd7e -> 0x050ffd8d
[    0.000000] On node 0 totalpages: 1047943
[    0.000000]   Normal zone: 1040779 pages, LIFO batch:15
[    0.000000] Booting Linux...
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 1040779
[    0.000000] Kernel command line: root=/dev/sda1 ro initcall_debug=1 ignore_loglevel
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] clocksource: mult[640000] shift[16]
[    0.000000] clockevent: mult[28f5c28] shift[32]
[   45.627252] Console: colour dummy device 80x25
[   45.680346] console [tty0] enabled
[   49.037535] Dentry cache hash table entries: 1048576 (order: 10, 8388608 bytes)
[   49.142619] Inode-cache hash table entries: 524288 (order: 9, 4194304 bytes)
[   49.502995] Memory: 8301680k available (2928k kernel code, 1104k data, 208k init) [fffff80000000000,000000a1ffb1a000]
[   49.707978] Calibrating delay using timer specific routine.. 19.90 BogoMIPS (lpj=39810)
[   49.802194] Security Framework initialized
[   49.851001] SELinux:  Disabled at boot.
[   49.896870] Mount-cache hash table entries: 512
[   49.951499] Initializing cgroup subsys ns
[   49.998912] Initializing cgroup subsys cpuacct
[   50.052031] Initializing cgroup subsys devices
[   50.105342] calling  migration_init+0x0/0x6c
[   50.156316] initcall migration_init+0x0/0x6c returned 1 after 0 msecs
[   50.233290] initcall migration_init+0x0/0x6c returned with error code 1
[   50.313498] calling  spawn_ksoftirqd+0x0/0x64
[   50.365654] initcall spawn_ksoftirqd+0x0/0x64 returned 0 after 0 msecs
[   50.443717] calling  init_call_single_data+0x0/0x68
[   50.502047] initcall init_call_single_data+0x0/0x68 returned 0 after 0 msecs
[   50.586426] calling  spawn_softlockup_task+0x0/0x8c
[   50.644826] initcall spawn_softlockup_task+0x0/0x8c returned 0 after 0 msecs
[   50.729141] calling  relay_init+0x0/0x8
[   50.774973] initcall relay_init+0x0/0x8 returned 0 after 0 msecs
[   50.848299] CPU 0: synchronized TICK with master CPU (last diff -1 cycles, maxerr 11 cycles)
[   50.848316] Brought up 2 CPUs
[   50.848351] CPU0 attaching sched-domain:
[   50.848360]  domain 0: span 0,2 level CPU
[   50.848368]   groups: 0 2
[   50.848380] CPU2 attaching sched-domain:
[   50.848386]  domain 0: span 0,2 level CPU
[   50.848391]   groups: 2 0
[   50.849193] calling  net_ns_init+0x0/0x1a4
[   50.849200] net_namespace: 1552 bytes
[   50.849213] initcall net_ns_init+0x0/0x1a4 returned 0 after 0 msecs
[   50.849231] calling  sparc_globreg_init+0x0/0x1c
[   50.849243] initcall sparc_globreg_init+0x0/0x1c returned 0 after 0 msecs
[   50.849251] calling  sysctl_init+0x0/0x2c
[   50.849663] initcall sysctl_init+0x0/0x2c returned 0 after 0 msecs
[   50.849672] calling  ksysfs_init+0x0/0xc8
[   50.849695] initcall ksysfs_init+0x0/0xc8 returned 0 after 0 msecs
[   50.849704] calling  init_jiffies_clocksource+0x0/0x18
[   50.849716] initcall init_jiffies_clocksource+0x0/0x18 returned 0 after 0 msecs
[   50.849729] calling  filelock_init+0x0/0x38
[   51.982874] initcall filelock_init+0x0/0x38 returned 0 after 1091 msecs
[   52.061688] calling  init_script_binfmt+0x0/0x18
[   52.116898] initcall init_script_binfmt+0x0/0x18 returned 0 after 0 msecs
[   52.198149] calling  init_elf_binfmt+0x0/0x18
[   52.250234] initcall init_elf_binfmt+0x0/0x18 returned 0 after 0 msecs
[   52.328362] calling  init_compat_elf_binfmt+0x0/0x18
[   52.387740] initcall init_compat_elf_binfmt+0x0/0x18 returned 0 after 0 msecs
[   52.473158] calling  debugfs_init+0x0/0x58
[   52.522128] initcall debugfs_init+0x0/0x58 returned 0 after 0 msecs
[   52.597121] calling  securityfs_init+0x0/0x58
[   52.649213] initcall securityfs_init+0x0/0x58 returned 0 after 0 msecs
[   52.727334] calling  random32_init+0x0/0xf0
[   52.777338] initcall random32_init+0x0/0xf0 returned 0 after 0 msecs
[   52.853380] calling  sock_init+0x0/0x64
[   52.899320] initcall sock_init+0x0/0x64 returned 0 after 0 msecs
[   52.971096] calling  netpoll_init+0x0/0x30
[   53.020053] initcall netpoll_init+0x0/0x30 returned 0 after 0 msecs
[   53.095056] calling  netlink_proto_init+0x0/0x238
[   53.151334] NET: Registered protocol family 16
[   53.204465] initcall netlink_proto_init+0x0/0x238 returned 0 after 3 msecs
[   53.286733] calling  of_bus_driver_init+0x0/0xa0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
  2009-01-20  6:25 ` Sunfire V880 and 480R 2.6.27.x startup hangs David Miller
  2009-01-22 13:29 ` Hermann Lauer
@ 2009-01-26  2:30 ` David Miller
  2009-01-26 10:04 ` Hermann Lauer
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-01-26  2:30 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Thu, 22 Jan 2009 14:29:34 +0100

> Compiled 2.6.27.12 today without the CON_BOOT flag and booted with the
> options above. The hang seems to be in of_bus_driver_init, see
> console output below.

I wonder what's getting tripped up in there :-)

Please add this patch and get the new console output, thanks!

diff --git a/arch/sparc64/kernel/of_device.c b/arch/sparc64/kernel/of_device.c
index 100ebd5..fefd415 100644
--- a/arch/sparc64/kernel/of_device.c
+++ b/arch/sparc64/kernel/of_device.c
@@ -818,10 +818,16 @@ static struct of_device * __init scan_one_device(struct device_node *dp,
 static void __init scan_tree(struct device_node *dp, struct device *parent)
 {
 	while (dp) {
-		struct of_device *op = scan_one_device(dp, parent);
-
-		if (op)
+		struct of_device *op;
+
+		printk(KERN_ERR "scan_tree: Scanning device %p:[%s]\n",
+		       dp, (dp ? dp->full_name : "<NULL>"));
+		op = scan_one_device(dp, parent);
+		if (op) {
+			printk(KERN_ERR "scan_tree: Recursing to child %p\n",
+			       dp->child);
 			scan_tree(dp->child, &op->dev);
+		}
 
 		dp = dp->sibling;
 	}
@@ -832,10 +838,12 @@ static void __init scan_of_devices(void)
 	struct device_node *root = of_find_node_by_path("/");
 	struct of_device *parent;
 
+	printk(KERN_ERR "Building root from %p\n", root);
 	parent = scan_one_device(root, NULL);
 	if (!parent)
 		return;
 
+	printk(KERN_ERR "Scanning tree...\n");
 	scan_tree(root->child, &parent->dev);
 }
 
@@ -843,18 +851,26 @@ static int __init of_bus_driver_init(void)
 {
 	int err;
 
+	printk(KERN_ERR "Setting up of bus\n");
 	err = of_bus_type_init(&of_platform_bus_type, "of");
 #ifdef CONFIG_PCI
-	if (!err)
+	if (!err) {
+		printk(KERN_ERR "Setting up ebus bus\n");
 		err = of_bus_type_init(&ebus_bus_type, "ebus");
+	}
 #endif
 #ifdef CONFIG_SBUS
-	if (!err)
+	if (!err) {
+		printk(KERN_ERR "Setting up sbus bus\n");
 		err = of_bus_type_init(&sbus_bus_type, "sbus");
+	}
 #endif
 
-	if (!err)
+	if (!err) {
+		printk(KERN_ERR "scan_of_devices()\n");
 		scan_of_devices();
+		printk(KERN_ERR "Done with scan_of_devices()\n");
+	}
 
 	return err;
 }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (2 preceding siblings ...)
  2009-01-26  2:30 ` David Miller
@ 2009-01-26 10:04 ` Hermann Lauer
  2009-01-27  9:33 ` Hermann Lauer
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-01-26 10:04 UTC (permalink / raw)
  To: sparclinux

On Sun, Jan 25, 2009 at 06:30:23PM -0800, David Miller wrote:
>
> > Compiled 2.6.27.12 today without the CON_BOOT flag and booted with the
> > options above. The hang seems to be in of_bus_driver_init, see
> > console output below.
> 
> I wonder what's getting tripped up in there :-)
> 
> Please add this patch and get the new console output, thanks!

Not getting much futher, only a "Setting up of bus", see below. 
The full console output I leave on:

http://www.iwr.uni-heidelberg.de/ftp/linux/sparc-boot/console20090126.txt

Is that ok or should I always attach the complete console ?
If time permits I'll test on a 480R, too. Thanks.

~> tail /tmp/console20090126.txt
[   48.455736] initcall random32_init+0x0/0xf0 returned 0 after 0 msecs
[   48.531777] calling  sock_init+0x0/0x64
[   48.577716] initcall sock_init+0x0/0x64 returned 0 after 0 msecs
[   48.649494] calling  netpoll_init+0x0/0x30
[   48.698453] initcall netpoll_init+0x0/0x30 returned 0 after 0 msecs
[   48.773455] calling  netlink_proto_init+0x0/0x238
[   48.829727] NET: Registered protocol family 16
[   48.882871] initcall netlink_proto_init+0x0/0x238 returned 0 after 3 msecs
[   48.965131] calling  of_bus_driver_init+0x0/0x104
[   49.021376] Setting up of bus

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (3 preceding siblings ...)
  2009-01-26 10:04 ` Hermann Lauer
@ 2009-01-27  9:33 ` Hermann Lauer
  2009-01-28  6:29 ` David Miller
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-01-27  9:33 UTC (permalink / raw)
  To: sparclinux

On Mon, Jan 26, 2009 at 11:04:39AM +0100, Hermann Lauer wrote:
> 
> Not getting much futher, only a "Setting up of bus", see below. 
> The full console output I leave on:
> 
> http://www.iwr.uni-heidelberg.de/ftp/linux/sparc-boot/console20090126.txt
> 
> Is that ok or should I always attach the complete console ?
> If time permits I'll test on a 480R, too. Thanks.

Just to confirm myself: the 480R hangs at the same point, so some
change is in the kernel which make of_bus_driver_init unhappy
on this types of machines. Full console log is available at:

http://www.iwr.uni-heidelberg.de/ftp/linux/sparc-boot/console480R20090127.txt

Hermann

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (4 preceding siblings ...)
  2009-01-27  9:33 ` Hermann Lauer
@ 2009-01-28  6:29 ` David Miller
  2009-01-28  8:45 ` Hermann Lauer
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-01-28  6:29 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Mon, 26 Jan 2009 11:04:39 +0100

> [   48.965131] calling  of_bus_driver_init+0x0/0x104
> [   49.021376] Setting up of bus

Strance, here is a patch to get some more information.

Please add this patch on top of what you have so far
and reboot.

Thanks.

diff --git a/arch/sparc64/kernel/of_device.c b/arch/sparc64/kernel/of_device.c
index 100ebd5..031efa2 100644
--- a/arch/sparc64/kernel/of_device.c
+++ b/arch/sparc64/kernel/of_device.c
@@ -853,16 +853,19 @@ static int __init of_bus_driver_init(voi
 
 	printk(KERN_ERR "Setting up of bus\n");
 	err = of_bus_type_init(&of_platform_bus_type, "of");
+	printk(KERN_ERR "of_bus_type_init() returns %d\n", err);
 #ifdef CONFIG_PCI
 	if (!err) {
 		printk(KERN_ERR "Setting up ebus bus\n");
 		err = of_bus_type_init(&ebus_bus_type, "ebus");
+		printk(KERN_ERR "of_bus_type_init() returns %d\n", err);
 	}
 #endif
 #ifdef CONFIG_SBUS
 	if (!err) {
 		printk(KERN_ERR "Setting up sbus bus\n");
 		err = of_bus_type_init(&sbus_bus_type, "sbus");
+		printk(KERN_ERR "of_bus_type_init() returns %d\n", err);
 	}
 #endif
 
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index ef522ae..427f582 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -869,57 +869,77 @@ int bus_register(struct bus_type *bus)
 	int retval;
 	struct bus_type_private *priv;
 
+	printk(KERN_ERR "In bus_register().\n");
 	priv = kzalloc(sizeof(struct bus_type_private), GFP_KERNEL);
-	if (!priv)
+	if (!priv) {
+		printk(KERN_ERR "priv allocation failed\n");
 		return -ENOMEM;
+	}
 
 	priv->bus = bus;
 	bus->p = priv;
 
 	BLOCKING_INIT_NOTIFIER_HEAD(&priv->bus_notifier);
 
+	printk(KERN_ERR "Doing kobject_set_name()\n");
 	retval = kobject_set_name(&priv->subsys.kobj, "%s", bus->name);
-	if (retval)
+	if (retval) {
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto out;
+	}
 
 	priv->subsys.kobj.kset = bus_kset;
 	priv->subsys.kobj.ktype = &bus_ktype;
 	priv->drivers_autoprobe = 1;
 
+	printk(KERN_ERR "kset_register()\n");
 	retval = kset_register(&priv->subsys);
-	if (retval)
+	if (retval) {
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto out;
+	}
 
+	printk(KERN_ERR "bus_create_file(bus, &bus_attr_uevent)\n");
 	retval = bus_create_file(bus, &bus_attr_uevent);
-	if (retval)
+	if (retval) {
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto bus_uevent_fail;
-
+	}
+	printk(KERN_ERR "kset_create_and_add(devices)\n");
 	priv->devices_kset = kset_create_and_add("devices", NULL,
 						 &priv->subsys.kobj);
 	if (!priv->devices_kset) {
 		retval = -ENOMEM;
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto bus_devices_fail;
 	}
 
+	printk(KERN_ERR "kset_create_and_add(drivers)\n");
 	priv->drivers_kset = kset_create_and_add("drivers", NULL,
 						 &priv->subsys.kobj);
 	if (!priv->drivers_kset) {
 		retval = -ENOMEM;
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto bus_drivers_fail;
 	}
 
 	klist_init(&priv->klist_devices, klist_devices_get, klist_devices_put);
 	klist_init(&priv->klist_drivers, NULL, NULL);
 
+	printk(KERN_ERR "add_probe_files()\n");
 	retval = add_probe_files(bus);
-	if (retval)
+	if (retval) {
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto bus_probe_files_fail;
-
+	}
+	printk(KERN_ERR "bus_add_attrs()\n");
 	retval = bus_add_attrs(bus);
-	if (retval)
+	if (retval) {
+		printk(KERN_ERR "Failed with err %d\n", retval);
 		goto bus_attrs_fail;
-
+	}
 	pr_debug("bus: '%s': registered\n", bus->name);
+	printk(KERN_ERR "Success\n");
 	return 0;
 
 bus_attrs_fail:

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (5 preceding siblings ...)
  2009-01-28  6:29 ` David Miller
@ 2009-01-28  8:45 ` Hermann Lauer
  2009-01-31  0:00 ` David Miller
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-01-28  8:45 UTC (permalink / raw)
  To: sparclinux

On Tue, Jan 27, 2009 at 10:29:10PM -0800, David Miller wrote:
> From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
> Date: Mon, 26 Jan 2009 11:04:39 +0100
> 
> > [   48.965131] calling  of_bus_driver_init+0x0/0x104
> > [   49.021376] Setting up of bus
> 
> Strance, here is a patch to get some more information.

It's hanging in kset_register. Does this ring a bell to you ?
Will move the full output to the usual place. Thanks.

[   46.994534] calling  random32_init+0x0/0xf0
[   47.044539] initcall random32_init+0x0/0xf0 returned 0 after 0 msecs
[   47.120582] calling  sock_init+0x0/0x64
[   47.166520] initcall sock_init+0x0/0x64 returned 0 after 0 msecs
[   47.238297] calling  netpoll_init+0x0/0x30
[   47.287256] initcall netpoll_init+0x0/0x30 returned 0 after 0 msecs
[   47.362256] calling  netlink_proto_init+0x0/0x238
[   47.418529] NET: Registered protocol family 16
[   47.471670] initcall netlink_proto_init+0x0/0x238 returned 0 after 3 msecs
[   47.553935] calling  of_bus_driver_init+0x0/0x12c
[   47.610180] Setting up of bus
[   47.645596] In bus_register().
[   47.682056] Doing kobject_set_name()
[   47.724764] kset_register()

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (6 preceding siblings ...)
  2009-01-28  8:45 ` Hermann Lauer
@ 2009-01-31  0:00 ` David Miller
  2009-02-02 14:27 ` Hermann Lauer
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-01-31  0:00 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Wed, 28 Jan 2009 09:45:18 +0100

> It's hanging in kset_register. Does this ring a bell to you ?
> Will move the full output to the usual place. Thanks.
 ...
> [   47.553935] calling  of_bus_driver_init+0x0/0x12c
> [   47.610180] Setting up of bus
> [   47.645596] In bus_register().
> [   47.682056] Doing kobject_set_name()
> [   47.724764] kset_register()

I suspect it's hanging in uevent generation, let's verify that.
Something really weird is going on in your box, I wonder if the bug is
surfacing because of all of the non-standard options you have enabled
in your build such as cgroups and stuff like that.

Anyways, add this patch on top of your tree and please send the tail
of the new output.

One thing you might want to try to do when it hangs is go:

1) Send a 'break' over the console then immediately type '8'.
   This will increase the kernel log level.

2) Send a 'break' then 'p', this will dump the current cpu's
   registers.

3) Send a 'break' then 'y', this will give a brief backtrace
   on all cpus.

4) Send a 'break' then 't', this will dump the state of all
   processes on the system.

Unfortunately, none of those will work if the cpu handling console
interrupts has cpu interrupts disabled for whatever reason :-/ But it
is definitely worth a try.

Thanks.

diff --git a/lib/kobject.c b/lib/kobject.c
index fbf0ae2..4553903 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -708,11 +708,17 @@ int kset_register(struct kset *k)
 	if (!k)
 		return -EINVAL;
 
+	printk(KERN_ERR "kset_register: kset_init()\n");
 	kset_init(k);
+	printk(KERN_ERR "kset_register: kset_add_internal()\n");
 	err = kobject_add_internal(&k->kobj);
-	if (err)
+	if (err) {
+		printk(KERN_ERR "kset_register: Got error %d\n", err);
 		return err;
+	}
+	printk(KERN_ERR "kset_register: kobject_uevent()\n");
 	kobject_uevent(&k->kobj, KOBJ_ADD);
+	printk(KERN_ERR "kset_register: Done\n");
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (7 preceding siblings ...)
  2009-01-31  0:00 ` David Miller
@ 2009-02-02 14:27 ` Hermann Lauer
  2009-02-02 20:50 ` David Miller
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-02-02 14:27 UTC (permalink / raw)
  To: sparclinux

On Fri, Jan 30, 2009 at 04:00:10PM -0800, David Miller wrote:
> > [   47.553935] calling  of_bus_driver_init+0x0/0x12c
> > [   47.610180] Setting up of bus
> > [   47.645596] In bus_register().
> > [   47.682056] Doing kobject_set_name()
> > [   47.724764] kset_register()
> 
> I suspect it's hanging in uevent generation, let's verify that.
> Something really weird is going on in your box, I wonder if the bug is
> surfacing because of all of the non-standard options you have enabled
> in your build such as cgroups and stuff like that.

I CC this to the debian sparc people, as the config is derived from
their default sparc kernel config. If I remember correctly, I only used
"make oldconfig" to get to the newer kernel. Maybe one of those guys can
comment on the sparc configuration choices.

I was curious, so I took vanilla kernel 2.6.27.13 from the net, 
did a "make menuconfig" with only saving (not changeing anything) the config.
This config I will put on: 
http://www.iwr.uni-heidelberg.de/ftp/linux/sparc-boot/config-2.6.27.13-20090202.txt/config-2.6.27.13-20090202.txt

> Anyways, add this patch on top of your tree and please send the tail
> of the new output.

Here is the output with all patches and that default build of 2.6.27.13:

In bus_register().
Doing kobject_set_name()
kset_register()
kset_register: kset_init()
kset_register: kset_add_internal()
kset_register: kobject_uevent()
[halt sent]
[halt sent]
[halt sent]
[halt sent]

> One thing you might want to try to do when it hangs is go:
> 
> 1) Send a 'break' over the console then immediately type '8'.
>    This will increase the kernel log level.
> 
> 2) Send a 'break' then 'p', this will dump the current cpu's
>    registers.
> 
> 3) Send a 'break' then 'y', this will give a brief backtrace
>    on all cpus.
> 
> 4) Send a 'break' then 't', this will dump the state of all
>    processes on the system.
> 
> Unfortunately, none of those will work if the cpu handling console
> interrupts has cpu interrupts disabled for whatever reason :-/ But it
> is definitely worth a try.

Tried, but as you feared, no output was produced (see above).
Any further ideas ?
If time permits, I probably should start compiling all kernels from 2.6.26.5
on to find the first non working kernel.

Thanks, Hermann

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (8 preceding siblings ...)
  2009-02-02 14:27 ` Hermann Lauer
@ 2009-02-02 20:50 ` David Miller
  2009-02-03 21:26 ` Hermann Lauer
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-02-02 20:50 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Mon, 2 Feb 2009 15:27:57 +0100

> On Fri, Jan 30, 2009 at 04:00:10PM -0800, David Miller wrote:
> > > [   47.553935] calling  of_bus_driver_init+0x0/0x12c
> > > [   47.610180] Setting up of bus
> > > [   47.645596] In bus_register().
> > > [   47.682056] Doing kobject_set_name()
> > > [   47.724764] kset_register()
> > 
> > I suspect it's hanging in uevent generation, let's verify that.
> > Something really weird is going on in your box, I wonder if the bug is
> > surfacing because of all of the non-standard options you have enabled
> > in your build such as cgroups and stuff like that.
> 
> I CC this to the debian sparc people, as the config is derived from
> their default sparc kernel config. If I remember correctly, I only used
> "make oldconfig" to get to the newer kernel. Maybe one of those guys can
> comment on the sparc configuration choices.

I'm not saying the configuration choice is wrong, not at all.

I'm saying that since it's something most active kernel hacker's
don't enable, it may be a reason why myself and others have never
seen this problem.

> Here is the output with all patches and that default build of 2.6.27.13:
> 
> In bus_register().
> Doing kobject_set_name()
> kset_register()
> kset_register: kset_init()
> kset_register: kset_add_internal()
> kset_register: kobject_uevent()
> [halt sent]
> [halt sent]
> [halt sent]
> [halt sent]

I'm pretty certain it's call_usermodehelper() that's hanging.

Perhaps something with forking kernel threads or invoking exec
is failing on sparc64 on your machine for some reason.

New patch:

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 3f91472..2c30f6a 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -100,6 +100,9 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	int i = 0;
 	int retval = 0;
 
+	printk(KERN_ERR "kobject_uevent_env: [%s] %p\n",
+	       kobject_name(kobj), kobj);
+
 	pr_debug("kobject: '%s' (%p): %s\n",
 		 kobject_name(kobj), kobj, __func__);
 
@@ -109,8 +112,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		top_kobj = top_kobj->parent;
 
 	if (!top_kobj->kset) {
-		pr_debug("kobject: '%s' (%p): %s: attempted to send uevent "
-			 "without kset!\n", kobject_name(kobj), kobj,
+		printk(KERN_ERR "kobject: '%s' (%p): %s: attempted to send uevent "
+		       "without kset!\n", kobject_name(kobj), kobj,
 			 __func__);
 		return -EINVAL;
 	}
@@ -118,12 +121,14 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	kset = top_kobj->kset;
 	uevent_ops = kset->uevent_ops;
 
+	printk(KERN_ERR "kobject_uevent_env: Checking uevent_ops->filter\n");
+
 	/* skip the event, if the filter returns zero. */
 	if (uevent_ops && uevent_ops->filter)
 		if (!uevent_ops->filter(kset, kobj)) {
-			pr_debug("kobject: '%s' (%p): %s: filter function "
-				 "caused the event to drop!\n",
-				 kobject_name(kobj), kobj, __func__);
+			printk(KERN_ERR "kobject: '%s' (%p): %s: filter function "
+			       "caused the event to drop!\n",
+			       kobject_name(kobj), kobj, __func__);
 			return 0;
 		}
 
@@ -133,16 +138,20 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	else
 		subsystem = kobject_name(&kset->kobj);
 	if (!subsystem) {
-		pr_debug("kobject: '%s' (%p): %s: unset subsystem caused the "
-			 "event to drop!\n", kobject_name(kobj), kobj,
-			 __func__);
+		printk(KERN_ERR "kobject: '%s' (%p): %s: unset subsystem caused the "
+		       "event to drop!\n", kobject_name(kobj), kobj,
+		       __func__);
 		return 0;
 	}
 
+	printk(KERN_ERR "kobject_uevent_env: Allocating and filling env buffer.\n");
+
 	/* environment buffer */
 	env = kzalloc(sizeof(struct kobj_uevent_env), GFP_KERNEL);
-	if (!env)
+	if (!env) {
+		printk(KERN_ERR "kobject_uevent_env: env kzalloc() failed\n");
 		return -ENOMEM;
+	}
 
 	/* complete object path */
 	devpath = kobject_get_path(kobj, GFP_KERNEL);
@@ -171,6 +180,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		}
 	}
 
+	printk(KERN_ERR "kobject_uevent_env: Checking uevent_ops->uevent\n");
+
 	/* let the kset specific function add its stuff */
 	if (uevent_ops && uevent_ops->uevent) {
 		retval = uevent_ops->uevent(kset, kobj, env);
@@ -207,6 +218,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		struct sk_buff *skb;
 		size_t len;
 
+		printk(KERN_ERR "kobject_uevent_env: Sending netlink msg\n");
+
 		/* allocate message with the maximum possible size */
 		len = strlen(action_string) + strlen(devpath) + 2;
 		skb = alloc_skb(len + env->buflen, GFP_KERNEL);
@@ -234,6 +247,9 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	if (uevent_helper[0]) {
 		char *argv [3];
 
+		printk(KERN_ERR "kobject_uevent_env: Invoking uevent_helper[%s]\n",
+		       uevent_helper);
+
 		argv [0] = uevent_helper;
 		argv [1] = (char *)subsystem;
 		argv [2] = NULL;
@@ -250,6 +266,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	}
 
 exit:
+	printk(KERN_ERR "kobject_uevent_env: At 'exit', retval=%d\n", retval);
 	kfree(devpath);
 	kfree(env);
 	return retval;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (9 preceding siblings ...)
  2009-02-02 20:50 ` David Miller
@ 2009-02-03 21:26 ` Hermann Lauer
  2009-02-03 23:19 ` David Miller
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-02-03 21:26 UTC (permalink / raw)
  To: sparclinux

On Mon, Feb 02, 2009 at 12:50:38PM -0800, David Miller wrote:
> I'm pretty certain it's call_usermodehelper() that's hanging.
> 
> Perhaps something with forking kernel threads or invoking exec
> is failing on sparc64 on your machine for some reason.

Looks like you are right:

> tail console20090203.txt
Doing kobject_set_name()
kset_register()
kset_register: kset_init()
kset_register: kset_add_internal()
kset_register: kobject_uevent()
kobject_uevent_env: [of] fffff8a1fe00c6d8
kobject_uevent_env: Checking uevent_ops->filter
kobject_uevent_env: Allocating and filling env buffer.
kobject_uevent_env: Checking uevent_ops->uevent
kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]

Please look also at the full console output at
http://www.iwr.uni-heidelberg.de/ftp/linux/sparc-boot/console20090203.txt
as there are some "attempted to send uevent without kset!" and similar messages.

Maybe it's important that this is a two cpu machine (as a similar 6 cpu machine seems to work,
see other report on the list) ?
Thanks.

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (10 preceding siblings ...)
  2009-02-03 21:26 ` Hermann Lauer
@ 2009-02-03 23:19 ` David Miller
  2009-02-06 10:28 ` Hermann Lauer
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-02-03 23:19 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Tue, 3 Feb 2009 22:26:32 +0100

> Maybe it's important that this is a two cpu machine (as a similar 6
> cpu machine seems to work, see other report on the list) ?

I don't think it matters, to be honest.  Memory size and layout
may have more to do with it.

I'll look at your logs and post some new debugging patches later,
thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (11 preceding siblings ...)
  2009-02-03 23:19 ` David Miller
@ 2009-02-06 10:28 ` Hermann Lauer
  2009-02-07  7:23 ` David Miller
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-02-06 10:28 UTC (permalink / raw)
  To: sparclinux

On Tue, Feb 03, 2009 at 03:19:18PM -0800, David Miller wrote:
> From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
> Date: Tue, 3 Feb 2009 22:26:32 +0100
> 
> > Maybe it's important that this is a two cpu machine (as a similar 6
> > cpu machine seems to work, see other report on the list) ?
> 
> I don't think it matters, to be honest.  Memory size and layout
> may have more to do with it.

I bisected meanwhile the complete versions from 2.6.26 to 2.6.27 series:

<2.6.26.8	boots
 2.6.27-rc1	compile fails (see below)
>2.6.27-rc2	hangs at boot

Any chance that one of the last memory related patches will fix
this problem ? Thanks.

  CC      arch/sparc64/kernel/iommu.o
In file included from arch/sparc64/kernel/iommu.c:21:
arch/sparc64/kernel/iommu_common.h:40: error: static declaration of iommu_num_pages follows non-static declaration
include/linux/iommu-helper.h:11: error: previous declaration of iommu_num_pages was here

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (12 preceding siblings ...)
  2009-02-06 10:28 ` Hermann Lauer
@ 2009-02-07  7:23 ` David Miller
  2009-02-09 22:21 ` Hermann Lauer
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-02-07  7:23 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Fri, 6 Feb 2009 11:28:25 +0100

> I bisected meanwhile the complete versions from 2.6.26 to 2.6.27 series:
> 
> <2.6.26.8	boots
>  2.6.27-rc1	compile fails (see below)
> >2.6.27-rc2	hangs at boot
> 
> Any chance that one of the last memory related patches will fix
> this problem ? Thanks.

It is possible.

Here are two seperate things you can try:

1) Boot with "memQ2m" on the kernel boot command line.

2) Try booting with the patch below.

Thanks!

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 51daae5..c9ab51a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/memcontrol.h>
 #include <linux/debugobjects.h>
+#include <linux/nmi.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -707,7 +708,26 @@ static int move_freepages(struct zone *zone,
 	 * Remove at a later date when no bug reports exist related to
 	 * grouping pages by mobility
 	 */
-	BUG_ON(page_zone(start_page) != page_zone(end_page));
+	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
+		printk(KERN_ERR "move_freepages: Bogus zones: "
+		       "start_page[%p] end_page[%p] zone[%p]\n",
+		       start_page, end_page, zone);
+		printk(KERN_ERR "move_freepages: "
+		       "start_zone[%p] end_zone[%p]\n",
+		       page_zone(start_page), page_zone(end_page));
+		printk(KERN_ERR "move_freepages: "
+		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
+		       page_to_pfn(start_page), page_to_pfn(end_page));
+		printk(KERN_ERR "move_freepages: "
+		       "start_nid[%d] end_nid[%d]\n",
+		       page_to_nid(start_page), page_to_nid(end_page));
+		spin_unlock(&zone->lock);
+		local_irq_enable();
+		while (1) {
+			barrier();
+			touch_nmi_watchdog();
+		}
+	}
 #endif
 
 	for (page = start_page; page <= end_page;) {
@@ -2583,6 +2603,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	int tmp;
 
 	z = &NODE_DATA(nid)->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -2594,7 +2615,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context = MEMMAP_EARLY) {
 			if (!early_pfn_valid(pfn))
 				continue;
-			if (!early_pfn_in_nid(pfn, nid))
+			tmp = early_pfn_to_nid(pfn);
+			if (tmp > -1 && tmp != nid)
 				continue;
 		}
 		page = pfn_to_page(pfn);
@@ -2961,8 +2983,9 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 			return early_node_map[i].nid;
 	}
 
-	return 0;
+	return -1;
 }
+
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
 /* Basic iterator support to walk early_node_map[] */

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (13 preceding siblings ...)
  2009-02-07  7:23 ` David Miller
@ 2009-02-09 22:21 ` Hermann Lauer
  2009-02-11 21:25 ` Hermann Lauer
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-02-09 22:21 UTC (permalink / raw)
  To: sparclinux

On Fri, Feb 06, 2009 at 11:23:52PM -0800, David Miller wrote:

> > Any chance that one of the last memory related patches will fix
> > this problem ? Thanks.
> 
> It is possible.
> 
> Here are two seperate things you can try:
> 
> 1) Boot with "memQ2m" on the kernel boot command line.
> 
> 2) Try booting with the patch below.

Unfortunately the patch applied to 2.6.27.13 didn't change anything, 
still hanging - see below. Full output is available at the web.

Adding memQ2k hung also with a 2.6.28 series kernel. Any further ideas ?

Doing kobject_set_name()
kset_register()
kset_register: kset_init()
kset_register: kset_add_internal()
kset_register: kobject_uevent()
kobject_uevent_env: [of] fffff8a1fe00c6d8
kobject_uevent_env: Checking uevent_ops->filter
kobject_uevent_env: Allocating and filling env buffer.
kobject_uevent_env: Checking uevent_ops->uevent
kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (14 preceding siblings ...)
  2009-02-09 22:21 ` Hermann Lauer
@ 2009-02-11 21:25 ` Hermann Lauer
  2009-02-11 21:58 ` David Miller
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-02-11 21:25 UTC (permalink / raw)
  To: sparclinux

On Mon, Feb 09, 2009 at 11:21:01PM +0100, Hermann Lauer wrote:
> 
> Unfortunately the patch applied to 2.6.27.13 didn't change anything, 
> still hanging - see below. Full output is available at the web.
> 
> Adding memQ2k hung also with a 2.6.28 series kernel. Any further ideas ?
> 
> Doing kobject_set_name()
> kset_register()
> kset_register: kset_init()
> kset_register: kset_add_internal()
> kset_register: kobject_uevent()
> kobject_uevent_env: [of] fffff8a1fe00c6d8
> kobject_uevent_env: Checking uevent_ops->filter
> kobject_uevent_env: Allocating and filling env buffer.
> kobject_uevent_env: Checking uevent_ops->uevent
> kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]

Tried today "# CONFIG_CGROUPS is not set" which mentioned David
some time ago, but still the same hang.

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (15 preceding siblings ...)
  2009-02-11 21:25 ` Hermann Lauer
@ 2009-02-11 21:58 ` David Miller
  2009-02-11 23:15 ` David Miller
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-02-11 21:58 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Wed, 11 Feb 2009 22:25:04 +0100

> On Mon, Feb 09, 2009 at 11:21:01PM +0100, Hermann Lauer wrote:
> > 
> > Unfortunately the patch applied to 2.6.27.13 didn't change anything, 
> > still hanging - see below. Full output is available at the web.
> > 
> > Adding memQ2k hung also with a 2.6.28 series kernel. Any further ideas ?
> > 
> > Doing kobject_set_name()
> > kset_register()
> > kset_register: kset_init()
> > kset_register: kset_add_internal()
> > kset_register: kobject_uevent()
> > kobject_uevent_env: [of] fffff8a1fe00c6d8
> > kobject_uevent_env: Checking uevent_ops->filter
> > kobject_uevent_env: Allocating and filling env buffer.
> > kobject_uevent_env: Checking uevent_ops->uevent
> > kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]
> 
> Tried today "# CONFIG_CGROUPS is not set" which mentioned David
> some time ago, but still the same hang.

Yes, I didn't expect that to help.

I'm real busy but will get back to you with some new debugging
patches hopefully in the next few days.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (16 preceding siblings ...)
  2009-02-11 21:58 ` David Miller
@ 2009-02-11 23:15 ` David Miller
  2009-02-12 15:30 ` Hermann Lauer
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-02-11 23:15 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Mon, 9 Feb 2009 23:21:01 +0100

> kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]
> 

Ok, let's see if it's the fork or the exec attempt which
dies.  Please reboot the same with the following debugging
patch added.

Thanks.

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 2456d1a..e1eab36 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -137,6 +137,8 @@ static int ____call_usermodehelper(void *data)
 	struct key *new_session, *old_session;
 	int retval;
 
+	printk(KERN_ERR "call_usermodehelper: Enter\n");
+
 	/* Unblock all signals and set the session keyring. */
 	new_session = key_get(sub_info->ring);
 	spin_lock_irq(&current->sighand->siglock);
@@ -146,12 +148,17 @@ static int ____call_usermodehelper(void *data)
 	recalc_sigpending();
 	spin_unlock_irq(&current->sighand->siglock);
 
+	printk(KERN_ERR "call_usermodehelper: key_put(old_session)\n");
+
 	key_put(old_session);
 
 	/* Install input pipe when needed */
 	if (sub_info->stdin) {
 		struct files_struct *f = current->files;
 		struct fdtable *fdt;
+
+		printk(KERN_ERR "call_usermodehelper: stdin pipe\n");
+
 		/* no races because files should be private here */
 		sys_close(0);
 		fd_install(0, sub_info->stdin);
@@ -165,17 +172,25 @@ static int ____call_usermodehelper(void *data)
 		current->signal->rlim[RLIMIT_CORE] = (struct rlimit){0, 0};
 	}
 
+	printk(KERN_ERR "call_usermodehelper: set_cpus_allowed_ptr()\n");
+
 	/* We can run anywhere, unlike our parent keventd(). */
 	set_cpus_allowed_ptr(current, CPU_MASK_ALL_PTR);
 
+	printk(KERN_ERR "call_usermodehelper: set_user_nice()\n");
+
 	/*
 	 * Our parent is keventd, which runs with elevated scheduling priority.
 	 * Avoid propagating that into the userspace child.
 	 */
 	set_user_nice(current, 0);
 
+	printk(KERN_ERR "call_usermodehelper: kernel_execve()\n");
+
 	retval = kernel_execve(sub_info->path, sub_info->argv, sub_info->envp);
 
+	printk(KERN_ERR "call_usermodehelper: retval=%d\n", retval);
+
 	/* Exec failed? */
 	sub_info->retval = retval;
 	do_exit(0);
@@ -243,12 +258,14 @@ static void __call_usermodehelper(struct work_struct *work)
 	/* CLONE_VFORK: wait until the usermode helper has execve'd
 	 * successfully We need the data structures to stay around
 	 * until that is done.  */
+	printk(KERN_ERR "__call_usermodehelper: wait=%d\n", (int) wait);
 	if (wait = UMH_WAIT_PROC || wait = UMH_NO_WAIT)
 		pid = kernel_thread(wait_for_helper, sub_info,
 				    CLONE_FS | CLONE_FILES | SIGCHLD);
 	else
 		pid = kernel_thread(____call_usermodehelper, sub_info,
 				    CLONE_VFORK | SIGCHLD);
+	printk(KERN_ERR "__call_usermodehelper: pid=%d\n", (int) pid);
 
 	switch (wait) {
 	case UMH_NO_WAIT:
@@ -362,9 +379,16 @@ struct subprocess_info *call_usermodehelper_setup(char *path, char **argv,
 						  char **envp, gfp_t gfp_mask)
 {
 	struct subprocess_info *sub_info;
+
+	printk(KERN_ERR "call_usermodehelper_setup: kzalloc()\n");
+
 	sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
-	if (!sub_info)
+	if (!sub_info) {
+		printk(KERN_ERR "call_usermodehelper_setup: failed\n");
 		goto out;
+	}
+
+	printk(KERN_ERR "call_usermodehelper_setup: INIT_WORK and done\n");
 
 	INIT_WORK(&sub_info->work, __call_usermodehelper);
 	sub_info->path = path;
@@ -452,11 +476,16 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info,
 	DECLARE_COMPLETION_ONSTACK(done);
 	int retval = 0;
 
+	printk(KERN_ERR "call_usermodehelper_exec: helper_lock()\n");
+
 	helper_lock();
-	if (sub_info->path[0] = '\0')
+	if (sub_info->path[0] = '\0') {
+		printk(KERN_ERR "call_usermodehelper_exec: bad path\n");
 		goto out;
+	}
 
 	if (!khelper_wq || usermodehelper_disabled) {
+		printk(KERN_ERR "call_usermodehelper_exec: disabled or !wq\n");
 		retval = -EBUSY;
 		goto out;
 	}
@@ -464,16 +493,22 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info,
 	sub_info->complete = &done;
 	sub_info->wait = wait;
 
+	printk(KERN_ERR "call_usermodehelper_exec: queue_work()\n");
 	queue_work(khelper_wq, &sub_info->work);
 	if (wait = UMH_NO_WAIT)	/* task has freed sub_info */
 		goto unlock;
+	printk(KERN_ERR "call_usermodehelper_exec: wait_for_completion\n");
 	wait_for_completion(&done);
+	printk(KERN_ERR "call_usermodehelper_exec: back from wait\n");
 	retval = sub_info->retval;
 
 out:
+	printk(KERN_ERR "call_usermodehelper_exec: free sub_info\n");
 	call_usermodehelper_freeinfo(sub_info);
 unlock:
+	printk(KERN_ERR "call_usermodehelper_exec: helper_unlock()\n");
 	helper_unlock();
+	printk(KERN_ERR "call_usermodehelper_exec: retval=%d\n", retval);
 	return retval;
 }
 EXPORT_SYMBOL(call_usermodehelper_exec);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 51daae5..c9ab51a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/memcontrol.h>
 #include <linux/debugobjects.h>
+#include <linux/nmi.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -707,7 +708,26 @@ static int move_freepages(struct zone *zone,
 	 * Remove at a later date when no bug reports exist related to
 	 * grouping pages by mobility
 	 */
-	BUG_ON(page_zone(start_page) != page_zone(end_page));
+	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
+		printk(KERN_ERR "move_freepages: Bogus zones: "
+		       "start_page[%p] end_page[%p] zone[%p]\n",
+		       start_page, end_page, zone);
+		printk(KERN_ERR "move_freepages: "
+		       "start_zone[%p] end_zone[%p]\n",
+		       page_zone(start_page), page_zone(end_page));
+		printk(KERN_ERR "move_freepages: "
+		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
+		       page_to_pfn(start_page), page_to_pfn(end_page));
+		printk(KERN_ERR "move_freepages: "
+		       "start_nid[%d] end_nid[%d]\n",
+		       page_to_nid(start_page), page_to_nid(end_page));
+		spin_unlock(&zone->lock);
+		local_irq_enable();
+		while (1) {
+			barrier();
+			touch_nmi_watchdog();
+		}
+	}
 #endif
 
 	for (page = start_page; page <= end_page;) {
@@ -2583,6 +2603,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	int tmp;
 
 	z = &NODE_DATA(nid)->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -2594,7 +2615,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context = MEMMAP_EARLY) {
 			if (!early_pfn_valid(pfn))
 				continue;
-			if (!early_pfn_in_nid(pfn, nid))
+			tmp = early_pfn_to_nid(pfn);
+			if (tmp > -1 && tmp != nid)
 				continue;
 		}
 		page = pfn_to_page(pfn);
@@ -2961,8 +2983,9 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 			return early_node_map[i].nid;
 	}
 
-	return 0;
+	return -1;
 }
+
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
 /* Basic iterator support to walk early_node_map[] */

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (17 preceding siblings ...)
  2009-02-11 23:15 ` David Miller
@ 2009-02-12 15:30 ` Hermann Lauer
  2009-03-05  8:04 ` David Miller
  2009-03-05 15:39 ` Hermann Lauer
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-02-12 15:30 UTC (permalink / raw)
  To: sparclinux

On Wed, Feb 11, 2009 at 03:15:13PM -0800, David Miller wrote:
> From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
> Date: Mon, 9 Feb 2009 23:21:01 +0100
> 
> > kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]
> > 
> 
> Ok, let's see if it's the fork or the exec attempt which
> dies.  Please reboot the same with the following debugging
> patch added.

Added patches to 2.6.27.15, hang see below (full output
is on the web). Hope this gives you any clues.

Thanks. 

kobject_uevent_env: Checking uevent_ops->filter
kobject_uevent_env: Allocating and filling env buffer.
kobject_uevent_env: Checking uevent_ops->uevent
kobject_uevent_env: Invoking uevent_helper[/sbin/hotplug]
call_usermodehelper_setup: kzalloc()
call_usermodehelper_setup: INIT_WORK and done
call_usermodehelper_exec: helper_lock()
call_usermodehelper_exec: queue_work()
call_usermodehelper_exec: wait_for_completion
__call_usermodehelper: wait=0

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (18 preceding siblings ...)
  2009-02-12 15:30 ` Hermann Lauer
@ 2009-03-05  8:04 ` David Miller
  2009-03-05 15:39 ` Hermann Lauer
  20 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2009-03-05  8:04 UTC (permalink / raw)
  To: sparclinux

From: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Date: Thu, 12 Feb 2009 16:30:30 +0100

> __call_usermodehelper: wait=0

So kernel_thread() is where it hangs...

The only big thing changing in sparc64 between 2.6.26.5 (which works)
and 2.6.27 are IRQ stacks.

Here is a test patch which reverts sparc64 IRQ stacks.  If this makes
your machine work it will be a big clue.

(BTW, why do you get "OpenBoot Diagnostics failed" from the firmware
 on reset/poweron?)

sparc64: Revert IRQ stacks.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/asm/irq_64.h  |    4 --
 arch/sparc64/kernel/irq.c        |   52 --------------------------------
 arch/sparc64/kernel/kstack.h     |   60 --------------------------------------
 arch/sparc64/kernel/process.c    |   27 ++++++++++++----
 arch/sparc64/kernel/stacktrace.c |   10 ++++--
 arch/sparc64/kernel/traps.c      |    7 ++--
 arch/sparc64/lib/mcount.S        |   22 --------------
 arch/sparc64/mm/init.c           |   11 -------
 8 files changed, 30 insertions(+), 163 deletions(-)
 delete mode 100644 arch/sparc64/kernel/kstack.h

diff --git a/arch/sparc/include/asm/irq_64.h b/arch/sparc/include/asm/irq_64.h
index e3dd930..3473e25 100644
--- a/arch/sparc/include/asm/irq_64.h
+++ b/arch/sparc/include/asm/irq_64.h
@@ -93,8 +93,4 @@ static inline unsigned long get_softint(void)
 void __trigger_all_cpu_backtrace(void);
 #define trigger_all_cpu_backtrace() __trigger_all_cpu_backtrace()
 
-extern void *hardirq_stack[NR_CPUS];
-extern void *softirq_stack[NR_CPUS];
-#define __ARCH_HAS_DO_SOFTIRQ
-
 #endif
diff --git a/arch/sparc64/kernel/irq.c b/arch/sparc64/kernel/irq.c
index 7495bc7..0bb3f50 100644
--- a/arch/sparc64/kernel/irq.c
+++ b/arch/sparc64/kernel/irq.c
@@ -683,32 +683,10 @@ void ack_bad_irq(unsigned int virt_irq)
 	       ino, virt_irq);
 }
 
-void *hardirq_stack[NR_CPUS];
-void *softirq_stack[NR_CPUS];
-
-static __attribute__((always_inline)) void *set_hardirq_stack(void)
-{
-	void *orig_sp, *sp = hardirq_stack[smp_processor_id()];
-
-	__asm__ __volatile__("mov %%sp, %0" : "=r" (orig_sp));
-	if (orig_sp < sp ||
-	    orig_sp > (sp + THREAD_SIZE)) {
-		sp += THREAD_SIZE - 192 - STACK_BIAS;
-		__asm__ __volatile__("mov %0, %%sp" : : "r" (sp));
-	}
-
-	return orig_sp;
-}
-static __attribute__((always_inline)) void restore_hardirq_stack(void *orig_sp)
-{
-	__asm__ __volatile__("mov %0, %%sp" : : "r" (orig_sp));
-}
-
 void handler_irq(int irq, struct pt_regs *regs)
 {
 	unsigned long pstate, bucket_pa;
 	struct pt_regs *old_regs;
-	void *orig_sp;
 
 	clear_softint(1 << irq);
 
@@ -726,8 +704,6 @@ void handler_irq(int irq, struct pt_regs *regs)
 			       "i" (PSTATE_IE)
 			     : "memory");
 
-	orig_sp = set_hardirq_stack();
-
 	while (bucket_pa) {
 		struct irq_desc *desc;
 		unsigned long next_pa;
@@ -744,38 +720,10 @@ void handler_irq(int irq, struct pt_regs *regs)
 		bucket_pa = next_pa;
 	}
 
-	restore_hardirq_stack(orig_sp);
-
 	irq_exit();
 	set_irq_regs(old_regs);
 }
 
-void do_softirq(void)
-{
-	unsigned long flags;
-
-	if (in_interrupt())
-		return;
-
-	local_irq_save(flags);
-
-	if (local_softirq_pending()) {
-		void *orig_sp, *sp = softirq_stack[smp_processor_id()];
-
-		sp += THREAD_SIZE - 192 - STACK_BIAS;
-
-		__asm__ __volatile__("mov %%sp, %0\n\t"
-				     "mov %1, %%sp"
-				     : "=&r" (orig_sp)
-				     : "r" (sp));
-		__do_softirq();
-		__asm__ __volatile__("mov %0, %%sp"
-				     : : "r" (orig_sp));
-	}
-
-	local_irq_restore(flags);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 void fixup_irqs(void)
 {
diff --git a/arch/sparc64/kernel/kstack.h b/arch/sparc64/kernel/kstack.h
deleted file mode 100644
index 4248d96..0000000
--- a/arch/sparc64/kernel/kstack.h
+++ /dev/null
@@ -1,60 +0,0 @@
-#ifndef _KSTACK_H
-#define _KSTACK_H
-
-#include <linux/thread_info.h>
-#include <linux/sched.h>
-#include <asm/ptrace.h>
-#include <asm/irq.h>
-
-/* SP must be STACK_BIAS adjusted already.  */
-static inline bool kstack_valid(struct thread_info *tp, unsigned long sp)
-{
-	unsigned long base = (unsigned long) tp;
-
-	if (sp >= (base + sizeof(struct thread_info)) &&
-	    sp <= (base + THREAD_SIZE - sizeof(struct sparc_stackf)))
-		return true;
-
-	if (hardirq_stack[tp->cpu]) {
-		base = (unsigned long) hardirq_stack[tp->cpu];
-		if (sp >= base &&
-		    sp <= (base + THREAD_SIZE - sizeof(struct sparc_stackf)))
-			return true;
-		base = (unsigned long) softirq_stack[tp->cpu];
-		if (sp >= base &&
-		    sp <= (base + THREAD_SIZE - sizeof(struct sparc_stackf)))
-			return true;
-	}
-	return false;
-}
-
-/* Does "regs" point to a valid pt_regs trap frame?  */
-static inline bool kstack_is_trap_frame(struct thread_info *tp, struct pt_regs *regs)
-{
-	unsigned long base = (unsigned long) tp;
-	unsigned long addr = (unsigned long) regs;
-
-	if (addr >= base &&
-	    addr <= (base + THREAD_SIZE - sizeof(*regs)))
-		goto check_magic;
-
-	if (hardirq_stack[tp->cpu]) {
-		base = (unsigned long) hardirq_stack[tp->cpu];
-		if (addr >= base &&
-		    addr <= (base + THREAD_SIZE - sizeof(*regs)))
-			goto check_magic;
-		base = (unsigned long) softirq_stack[tp->cpu];
-		if (addr >= base &&
-		    addr <= (base + THREAD_SIZE - sizeof(*regs)))
-			goto check_magic;
-	}
-	return false;
-
-check_magic:
-	if ((regs->magic & ~0x1ff) = PT_REGS_MAGIC)
-		return true;
-	return false;
-
-}
-
-#endif /* _KSTACK_H */
diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c
index 15f4178..7f5debd 100644
--- a/arch/sparc64/kernel/process.c
+++ b/arch/sparc64/kernel/process.c
@@ -52,8 +52,6 @@
 #include <asm/irq_regs.h>
 #include <asm/smp.h>
 
-#include "kstack.h"
-
 static void sparc64_yield(int cpu)
 {
 	if (tlb_type != hypervisor)
@@ -237,6 +235,19 @@ void show_regs(struct pt_regs *regs)
 struct global_reg_snapshot global_reg_snapshot[NR_CPUS];
 static DEFINE_SPINLOCK(global_reg_snapshot_lock);
 
+static bool kstack_valid(struct thread_info *tp, struct reg_window *rw)
+{
+	unsigned long thread_base, fp;
+
+	thread_base = (unsigned long) tp;
+	fp = (unsigned long) rw;
+
+	if (fp < (thread_base + sizeof(struct thread_info)) ||
+	    fp >= (thread_base + THREAD_SIZE))
+		return false;
+	return true;
+}
+
 static void __global_reg_self(struct thread_info *tp, struct pt_regs *regs,
 			      int this_cpu)
 {
@@ -253,11 +264,11 @@ static void __global_reg_self(struct thread_info *tp, struct pt_regs *regs,
 
 		rw = (struct reg_window *)
 			(regs->u_regs[UREG_FP] + STACK_BIAS);
-		if (kstack_valid(tp, (unsigned long) rw)) {
+		if (kstack_valid(tp, rw)) {
 			global_reg_snapshot[this_cpu].i7 = rw->ins[7];
 			rw = (struct reg_window *)
 				(rw->ins[6] + STACK_BIAS);
-			if (kstack_valid(tp, (unsigned long) rw))
+			if (kstack_valid(tp, rw))
 				global_reg_snapshot[this_cpu].rpc = rw->ins[7];
 		}
 	} else {
@@ -817,7 +828,7 @@ out:
 unsigned long get_wchan(struct task_struct *task)
 {
 	unsigned long pc, fp, bias = 0;
-	struct thread_info *tp;
+	unsigned long thread_info_base;
 	struct reg_window *rw;
         unsigned long ret = 0;
 	int count = 0; 
@@ -826,12 +837,14 @@ unsigned long get_wchan(struct task_struct *task)
             task->state = TASK_RUNNING)
 		goto out;
 
-	tp = task_thread_info(task);
+	thread_info_base = (unsigned long) task_stack_page(task);
 	bias = STACK_BIAS;
 	fp = task_thread_info(task)->ksp + bias;
 
 	do {
-		if (!kstack_valid(tp, fp))
+		/* Bogus frame pointer? */
+		if (fp < (thread_info_base + sizeof(struct thread_info)) ||
+		    fp >= (thread_info_base + THREAD_SIZE))
 			break;
 		rw = (struct reg_window *) fp;
 		pc = rw->ins[7];
diff --git a/arch/sparc64/kernel/stacktrace.c b/arch/sparc64/kernel/stacktrace.c
index 4e21d4a..7ef61cc 100644
--- a/arch/sparc64/kernel/stacktrace.c
+++ b/arch/sparc64/kernel/stacktrace.c
@@ -5,8 +5,6 @@
 #include <asm/ptrace.h>
 #include <asm/stacktrace.h>
 
-#include "kstack.h"
-
 void save_stack_trace(struct stack_trace *trace)
 {
 	struct thread_info *tp = task_thread_info(current);
@@ -25,13 +23,17 @@ void save_stack_trace(struct stack_trace *trace)
 		struct pt_regs *regs;
 		unsigned long pc;
 
-		if (!kstack_valid(tp, fp))
+		/* Bogus frame pointer? */
+		if (fp < (thread_base + sizeof(struct thread_info)) ||
+		    fp > (thread_base + THREAD_SIZE - sizeof(struct sparc_stackf)))
 			break;
 
 		sf = (struct sparc_stackf *) fp;
 		regs = (struct pt_regs *) (sf + 1);
 
-		if (kstack_is_trap_frame(tp, regs)) {
+		if (((unsigned long)regs <+		     (thread_base + THREAD_SIZE - sizeof(*regs))) &&
+		    (regs->magic & ~0x1ff) = PT_REGS_MAGIC) {
 			if (!(regs->tstate & TSTATE_PRIV))
 				break;
 			pc = regs->tpc;
diff --git a/arch/sparc64/kernel/traps.c b/arch/sparc64/kernel/traps.c
index eb19724..69f8dd9 100644
--- a/arch/sparc64/kernel/traps.c
+++ b/arch/sparc64/kernel/traps.c
@@ -40,7 +40,6 @@
 #include <asm/prom.h>
 
 #include "entry.h"
-#include "kstack.h"
 
 /* When an irrecoverable trap occurs at tl > 0, the trap entry
  * code logs the trap state registers at every level in the trap
@@ -2132,12 +2131,14 @@ void show_stack(struct task_struct *tsk, unsigned long *_ksp)
 		struct pt_regs *regs;
 		unsigned long pc;
 
-		if (!kstack_valid(tp, fp))
+		/* Bogus frame pointer? */
+		if (fp < (thread_base + sizeof(struct thread_info)) ||
+		    fp >= (thread_base + THREAD_SIZE))
 			break;
 		sf = (struct sparc_stackf *) fp;
 		regs = (struct pt_regs *) (sf + 1);
 
-		if (kstack_is_trap_frame(tp, regs)) {
+		if ((regs->magic & ~0x1ff) = PT_REGS_MAGIC) {
 			if (!(regs->tstate & TSTATE_PRIV))
 				break;
 			pc = regs->tpc;
diff --git a/arch/sparc64/lib/mcount.S b/arch/sparc64/lib/mcount.S
index fad90dd..734caf0 100644
--- a/arch/sparc64/lib/mcount.S
+++ b/arch/sparc64/lib/mcount.S
@@ -49,28 +49,6 @@ mcount:
 	cmp		%sp, %g3
 	bg,pt		%xcc, 1f
 	 nop
-	lduh		[%g6 + TI_CPU], %g1
-	sethi		%hi(hardirq_stack), %g3
-	or		%g3, %lo(hardirq_stack), %g3
-	sllx		%g1, 3, %g1
-	ldx		[%g3 + %g1], %g7
-	sub		%g7, STACK_BIAS, %g7
-	cmp		%sp, %g7
-	bleu,pt		%xcc, 2f
-	 sethi		%hi(THREAD_SIZE), %g3
-	add		%g7, %g3, %g7
-	cmp		%sp, %g7
-	blu,pn		%xcc, 1f
-2:	 sethi		%hi(softirq_stack), %g3
-	or		%g3, %lo(softirq_stack), %g3
-	ldx		[%g3 + %g1], %g7
-	cmp		%sp, %g7
-	bleu,pt		%xcc, 2f
-	 sethi		%hi(THREAD_SIZE), %g3
-	add		%g7, %g3, %g7
-	cmp		%sp, %g7
-	blu,pn		%xcc, 1f
-	 nop
 	/* If we are already on ovstack, don't hop onto it
 	 * again, we are already trying to output the stack overflow
 	 * message.
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index a41df7b..0ea8838 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -49,7 +49,6 @@
 #include <asm/sstate.h>
 #include <asm/mdesc.h>
 #include <asm/cpudata.h>
-#include <asm/irq.h>
 
 #define MAX_PHYS_ADDRESS	(1UL << 42UL)
 #define KPTE_BITMAP_CHUNK_SZ	(256UL * 1024UL * 1024UL)
@@ -1774,16 +1773,6 @@ void __init paging_init(void)
 	if (tlb_type = hypervisor)
 		sun4v_mdesc_init();
 
-	/* Once the OF device tree and MDESC have been setup, we know
-	 * the list of possible cpus.  Therefore we can allocate the
-	 * IRQ stacks.
-	 */
-	for_each_possible_cpu(i) {
-		/* XXX Use node local allocations... XXX */
-		softirq_stack[i] = __va(lmb_alloc(THREAD_SIZE, THREAD_SIZE));
-		hardirq_stack[i] = __va(lmb_alloc(THREAD_SIZE, THREAD_SIZE));
-	}
-
 	/* Setup bootmem... */
 	last_valid_pfn = end_pfn = bootmem_init(phys_base);
 
-- 
1.6.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Sunfire V880 and 480R 2.6.27.x startup hangs
  2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
                   ` (19 preceding siblings ...)
  2009-03-05  8:04 ` David Miller
@ 2009-03-05 15:39 ` Hermann Lauer
  20 siblings, 0 replies; 22+ messages in thread
From: Hermann Lauer @ 2009-03-05 15:39 UTC (permalink / raw)
  To: sparclinux

On Thu, Mar 05, 2009 at 12:04:17AM -0800, David Miller wrote:
> 
> So kernel_thread() is where it hangs...
> 
> The only big thing changing in sparc64 between 2.6.26.5 (which works)
> and 2.6.27 are IRQ stacks.
> 
> Here is a test patch which reverts sparc64 IRQ stacks.  If this makes
> your machine work it will be a big clue.

Applied your patch to 2.6.27.19 - hangs at the same point. Output
is on the web.

> (BTW, why do you get "OpenBoot Diagnostics failed" from the firmware
>  on reset/poweron?)

One disk fails diagnostics (see below), but works later without a problem.
From time to time I check the same on the 480R with has error at 
the OBP diagnostic, so that does to matter IMHO. The 480R is the
one with the cassini driver crashing the machine, so tests are a little
bit more clumsy. 

Btw.: I upgraded the 480R to debian lenny and compiled a 2.6.27.x kernel
- hangs the same point.

Thanks for looking again,

  Hermann 

------------------------------------
Testing /pci@8,600000/SUNW,qlc@2

   ERROR   : Disk 0  is not spinning.
   DEVICE  : /pci@8,600000/SUNW,qlc@2
   SUBTEST : selftest:loop-tests:inquiry-test:disk-test
   CALLERS : disk-test
   MACHINE : Sun Fire 880
   SERIAL# : 50911524
   DATE    : 03/05/2009 15:17:50  GMT
   CONTR0LS: diag-level=max test-args
-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-03-05 15:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-19  8:56 Sunfire V880 and 480R 2.6.27.x startup hangs [Was: Re: kernel crashes on Sun Fire 480R with cassini Hermann Lauer
2009-01-20  6:25 ` Sunfire V880 and 480R 2.6.27.x startup hangs David Miller
2009-01-22 13:29 ` Hermann Lauer
2009-01-26  2:30 ` David Miller
2009-01-26 10:04 ` Hermann Lauer
2009-01-27  9:33 ` Hermann Lauer
2009-01-28  6:29 ` David Miller
2009-01-28  8:45 ` Hermann Lauer
2009-01-31  0:00 ` David Miller
2009-02-02 14:27 ` Hermann Lauer
2009-02-02 20:50 ` David Miller
2009-02-03 21:26 ` Hermann Lauer
2009-02-03 23:19 ` David Miller
2009-02-06 10:28 ` Hermann Lauer
2009-02-07  7:23 ` David Miller
2009-02-09 22:21 ` Hermann Lauer
2009-02-11 21:25 ` Hermann Lauer
2009-02-11 21:58 ` David Miller
2009-02-11 23:15 ` David Miller
2009-02-12 15:30 ` Hermann Lauer
2009-03-05  8:04 ` David Miller
2009-03-05 15:39 ` Hermann Lauer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.