linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
@ 2012-02-13  7:45 ` Meelis Roos
  2012-02-13  8:06   ` Grant Likely
  2012-03-01 12:24   ` [tip:core/urgent] memblock: Fix size aligning of memblock_alloc_base_nid() tip-bot for Tejun Heo
  0 siblings, 2 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-13  7:45 UTC (permalink / raw)
  To: Grant Likely, Rob Herring; +Cc: sparclinux, Linux Kernel list

(Resend with proper To-s for OF people)

This is my first post-3.2 test on 2-CPU Sun Enterprise 3500 (PCI+SBus 
IO). prtconf is also below. Something OF-related seems to be happening 
here.

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.2.30 2002/10/25 14:03'
[    0.000000] PROMLIB: Root node compatible:
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.3.0-rc3-00188-g3ec1e88 (mroos@korvits) (gcc version 4.6.2 (Debian 4.6.2-14) ) #64 SMP Sun Feb 12 22:26:40 EET 2012
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 08:00:20:b6:ee:e2
[    0.000000] Kernel: Using 4 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] Unable to handle kernel NULL pointer dereference
[    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
[    0.000000] tsk->{mm,active_mm}->pgd = fffff800008c77d0
[    0.000000]               \|/ ____ \|/
[    0.000000]               "@'/ .. \`@"
[    0.000000]               /_| \__/ |_\
[    0.000000]                  \__U_/
[    0.000000] swapper(0): Oops [#1]
[    0.000000] TSTATE: 0000000080e01607 TPC: 00000000006459a0 TNPC: 0000000000645964 Y: 00000037    Not tainted
[    0.000000] TPC: <of_find_node_by_path+0x60/0x80>
[    0.000000] g0: 0000000000000000 g1: 0000000000000001 g2: 00000000000000ff g3: 00000000000000f0
[    0.000000] g4: 0000000000853fd0 g5: 0000000000000000 g6: 0000000000834000 g7: 0000000000000050
[    0.000000] o0: 0000000000000001 o1: fffff8007fced7c0 o2: 0000000001010101 o3: 0000000080808080
[    0.000000] o4: fffff8007fcc0a4d o5: 00000000000199b5 sp: 0000000000837231 ret_pc: 0000000000645970
[    0.000000] RPC: <of_find_node_by_path+0x30/0x80>
[    0.000000] l0: 00000000008ab400 l1: fffff8007fcc1f40 l2: 000000000085c5ec l3: 0000000000000025
[    0.000000] l4: 00000000005c0400 l5: 00000000008fa5e6 l6: 0000000000000006 l7: 0028280000000000
[    0.000000] i0: fffff8007fced7c0 i1: 0000000000808fd8 i2: 0000000001010101 i3: 0000000080808080
[    0.000000] i4: 0000000000876c00 i5: 0000000000000050 i6: 00000000008372e1 i7: 000000000064684c
[    0.000000] I7: <of_alias_scan+0xcc/0x1c0>
[    0.000000] Call Trace:
[    0.000000]  [000000000064684c] of_alias_scan+0xcc/0x1c0
[    0.000000]  [00000000008a0350] of_pdt_build_devicetree+0x90/0xa0
[    0.000000]  [000000000088c540] prom_build_devicetree+0x10/0x3c
[    0.000000]  [00000000008904d4] paging_init+0x59c/0x6bc
[    0.000000]  [000000000088bebc] setup_arch+0xf8/0x110
[    0.000000]  [000000000088a51c] start_kernel+0x8c/0x34c
[    0.000000]  [00000000006fbf28] tlb_fixup_done+0xa0/0xa8
[    0.000000]  [0000000000000000]           (null)
[    0.000000] Disabling lock debugging due to kernel taint
[    0.000000] Caller[000000000064684c]: of_alias_scan+0xcc/0x1c0
[    0.000000] Caller[00000000008a0350]: of_pdt_build_devicetree+0x90/0xa0
[    0.000000] Caller[000000000088c540]: prom_build_devicetree+0x10/0x3c
[    0.000000] Caller[00000000008904d4]: paging_init+0x59c/0x6bc
[    0.000000] Caller[000000000088bebc]: setup_arch+0xf8/0x110
[    0.000000] Caller[000000000088a51c]: start_kernel+0x8c/0x34c
[    0.000000] Caller[00000000006fbf28]: tlb_fixup_done+0xa0/0xa8
[    0.000000] Caller[0000000000000000]:           (null)
[    0.000000] Instruction DUMP: 01000000  fa5f6050  2aff7ff2 <c25f6018> 901720f0  40034b86  b010001d  81cfe008  01000000
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Press Stop-A (L1-A) to return to the boot prom


System Configuration:  Sun Microsystems  sun4u
Memory size: 2048 Megabytes
System Peripherals (PROM Nodes):

Node 0xf0029c88
    .node:  f0029c88
    clock-frequency:  05f5e100
    previous-reset-reason: 'S-POR'
    banner-name: '5-slot Sun Enterprise E3500'
    idprom:  01800800.20b6eee2.00000000.b6eee2a9.00000000.00000000.00000000.00000000
    reset-reason: 'S-POR'
    fatal-reset-info:  00006000
    breakpoint-trap:  0000007f
    #size-cells:  00000002
    name: 'SUNW,Ultra-Enterprise'

    Node 0xf002cf50
        .node:  f002cf50
        name: 'packages'

        Node 0xf00365c0
            .node:  f00365c0
            iso6429-1983-colors:  
            name: 'terminal-emulator'

        Node 0xf003932c
            .node:  f003932c
            disk-write-fix:  
            name: 'deblocker'

        Node 0xf0039a08
            .node:  f0039a08
            name: 'obp-tftp'

        Node 0xf00447cc
            .node:  f00447cc
            name: 'disk-label'

    Node 0xf002cfc0
        .node:  f002cfc0
        stdout:  ffdc1428
        stdin:  ffdc1658
        eeprom:  f005dd0c
        mmu:  fffe9f70
        memory:  fffea170
        bootargs:  00
        bootpath: '/pci@f,4000/SUNW,isptwo@3/sd@2,0:a'
        stdout-#lines:  ffffffff
        name: 'chosen'

    Node 0xf002d02c
        .node:  f002d02c
        add-brd-supported-types: '014'
        version: 'OBP 3.2.30 2002/10/25 14:03'
        model: 'SUNW,3.2'
        decode-complete:  
        aligned-allocator:  
        relative-addressing:  
        name: 'openprom'

        Node 0xf002d0bc
            .node:  f002d0bc
            name: 'client-services'

    Node 0xf002d164
        .node:  f002d164
        disabled-memory-list:  
        disabled-board-list:  
        memory-interleave: 'max'
        configuration-policy: 'component'
        scsi-initiator-id: '7'
        keyboard-click?: 'false'
        keymap:  
        ttyb-rts-dtr-off: 'false'
        ttyb-ignore-cd: 'true'
        ttya-rts-dtr-off: 'false'
        ttya-ignore-cd: 'true'
        ttyb-mode: '9600,8,n,1,-'
        ttya-mode: '9600,8,n,1,-'
        sbus-specific-probe:  
        sbus-probe-default: 'd3120'
        mfg-mode: 'off '
        diag-level: 'min'
        powerfail-time: '0'
        #power-cycles: '52'
        fcode-debug?: 'false'
        output-device: 'ttya'
        input-device: 'ttya'
        load-base: '16384'
        boot-command: 'boot'
        auto-boot?: 'false'
        watchdog-reboot?: 'false'
        diag-file:  
        diag-device: 'mydisk'
        boot-file:  
        boot-device: 'mydisk'
        local-mac-address?: 'false'
        ansi-terminal?: 'true'
        screen-#columns: '80'
        screen-#rows: '34'
        silent-mode?: 'false'
        use-nvramrc?: 'true'
        nvramrc:  64657661.6c696173.206d7964.69736b20.2f706369.40662c34.3030302f.53554e57.2c697370.74776f40.332f7364.40322c30.0a
        security-mode: 'none'
        security-password:  
        security-#badlogins: '0'
        oem-logo:  
        oem-logo?: 'false'
        oem-banner:  
        oem-banner?: 'false'
        hardware-revision:  
        last-hardware-update: '0'
        diag-switch?: 'false'
        name: 'options'

    Node 0xf002d1d4
        .node:  f002d1d4
        mydisk: '/pci@f,4000/SUNW,isptwo@3/sd@2,0'
        disk: '/sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0'
        disksocal: '/sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0'
        diskbrd: '/sbus@3,0/SUNW,fas@3,8800000/sd@a,0'
        diskisp: '/sbus@3,0/QLGC,isp@0,10000/sd@0,0'
        net: '/sbus@3,0/SUNW,hme@3,8c00000'
        cdrom: '/sbus@3,0/SUNW,fas@3,8800000/sd@6,0:f'
        tape: '/sbus@3,0/SUNW,fas@3,8800000/st@4,0'
        scsi: '/sbus@3,0/SUNW,fas@3,8800000'
        disk0: '/sbus@3,0/SUNW,fas@3,8800000/sd@0,0'
        disk1: '/sbus@3,0/SUNW,fas@3,8800000/sd@1,0'
        disk2: '/sbus@3,0/SUNW,fas@3,8800000/sd@2,0'
        disk3: '/sbus@3,0/SUNW,fas@3,8800000/sd@3,0'
        disk4: '/sbus@3,0/SUNW,fas@3,8800000/sd@4,0'
        disk5: '/sbus@3,0/SUNW,fas@3,8800000/sd@5,0'
        tape0: '/sbus@3,0/SUNW,fas@3,8800000/st@4,0'
        tape1: '/sbus@3,0/SUNW,fas@3,8800000/st@5,0'
        ttya: '/central/fhc/zs@0,902000:a'
        ttyb: '/central/fhc/zs@0,902000:b'
        keyboard: '/central/fhc/zs@0,904000'
        keyboard!: '/central/fhc/zs@0,904000:forcemode'
        name: 'aliases'

    Node 0xf004efb4
        .node:  f004efb4
        reg:  00000000.00000000.00000000.40000000.00000000.40000000.00000000.40000000
        available:  00000000.7fce2000.00000000.00014000.00000000.7fc00000.00000000.000d2000.00000000.00000000.00000000.7f7de000
        name: 'memory'

    Node 0xf004f594
        .node:  f004f594
        translations:  00000000.fffd0000.00000000.00020000.80000000.7ff600b6.00000000.fff70000.00000000.00060000.80000000.7fef80b6.00000000.fff6e000.00000000.00002000.80000000.7fbfe0b6.00000000.fff6c000.00000000.00002000.80000000.7fef60b6.00000000.fff66000.00000000.00002000.800001ff.f890208e.00000000.fff64000.00000000.00002000.800001ff.f890808e.00000000.fff62000.00000000.00002000.800001ff.f890808e.00000000.fff60000.00000000.00002000.800001c4.f830008e.00000000.fff5e000.00000000.00002000.800001d4.f830008e.00000000.fff5c000.00000000.00002000.800001dc.f830008e.00000000.ffdd8000.00000000.00184000.80000000.7fd720b6.00000000.ffdcc000.00000000.0000c000.800001cc.f880408e.00000000.ffdca000.00000000.00002000.80000000.7fd700b6.00000000.ffdc8000.00000000.00002000.800001ff.f820608e.00000000.ffdc4000.00000000.00004000.80000000.7fd3c0b6.00000000.ffdc2000.00000000.00002000.800001ff.f890408e.00000000.ffdc0000.00000000.00002000.80000000.7fd3a0b6.00000000.ffdb2000.00000000.00006000.800001c4.0000
 008e.00000000.ffdac000.00000000.00006000.800001c4.0000008e.00000000.ffda4000.00000000.00008000.80000000.7fd680b6.00000000.ffd98000.00000000.0000c000.800001c4.f880408e.00000000.ffd96000.00000000.00002000.800001c4.f830008e.00000000.ffd94000.00000000.00002000.800001c4.0000208e.00000000.ffd92000.00000000.00002000.800001c4.0000208e.00000000.ffd90000.00000000.00002000.800001c4.0000208e.00000000.ffd8a000.00000000.00006000.800001c6.0000008e.00000000.ffd84000.00000000.00006000.800001c6.0000008e.00000000.ffd7c000.00000000.00008000.80000000.7fd600b6.00000000.ffd7a000.00000000.00002000.800001c6.0000208e.00000000.ffd78000.00000000.00002000.800001c6.0000208e.00000000.ffd76000.00000000.00002000.800001c6.0000208e.00000000.ffd70000.00000000.00006000.800001d4.0000008e.00000000.ffd6a000.00000000.00006000.800001d4.0000008e.00000000.ffd62000.00000000.00008000.80000000.7fd580b6.00000000.ffd56000.00000000.0000c000.800001d4.f880408e.00000000.ffd54000.00000000.00002000.800001d4.f830008e.00000000.ffd
 52000.00000000.00002000.800001d4.0000208e.00000000.ffd50000.00000000.00002000.800001d4.0000208e.00000000.ffd4e000.00000000.00002000.800001d4.0000208e.00000000.ffd48000.00000000.00006000.800001d6.0000008e.00000000.ffd42000.00000000.00006000.800001d6.0000008e.00000000.ffd3a000.00000000.00008000.80000000.7fd500b6.00000000.ffd38000.00000000.00002000.800001d6.0000208e.00000000.ffd36000.00000000.00002000.800001d6.0000208e.00000000.ffd34000.00000000.00002000.800001d6.0000208e.00000000.ffd32000.00000000.00002000.800001dc.0000408e.00000000.ffd30000.00000000.00002000.880001dc.0100008e.00000000.ffd22000.00000000.0000e000.800001dc.0000008e.00000000.ffd1a000.00000000.00008000.80000000.7fd480b6.00000000.ffd18000.00000000.00002000.800001dc.0000208e.00000000.ffd16000.00000000.00002000.880001dc.0180008e.00000000.ffd08000.00000000.0000e000.800001dc.0000008e.00000000.ffcfc000.00000000.0000c000.800001dc.f880408e.00000000.ffcfa000.00000000.00002000.800001dc.f830008e.00000000.ffcf8000.00000000.00
 002000.800001dc.0000008e.00000000.ffcf6000.00000000.00002000.800001dc.0000008e.00000000.ffcf4000.00000000.00002000.800001dc.0000008e.00000000.ffcf2000.00000000.00002000.800001de.0000408e.00000000.ffcf0000.00000000.00002000.880001de.0100008e.00000000.ffce2000.00000000.0000e000.800001de.0000008e.00000000.ffcda000.00000000.00008000.80000000.7fd400b6.00000000.ffcd8000.00000000.00002000.800001de.0000208e.00000000.ffcd6000.00000000.00002000.880001de.0180008e.00000000.ffcc8000.00000000.0000e000.800001de.0000008e.00000000.ffcc6000.00000000.00002000.800001de.0000008e.00000000.ffcc4000.00000000.00002000.800001de.0000008e.00000000.ffcc2000.00000000.00002000.800001de.0000008e.00000000.ffac2000.00000000.00200000.80000000.7f7de0b6.00000000.f07fe000.00000000.00002000.800001ff.f004208e.00000000.f02a0000.00000000.00040000.80000000.7fcfa0b6.00000000.f0080000.00000000.00220000.80000000.7f9de0b6.00000000.f0000000.00000000.00080000.80000000.7ff800b6.00000000.4162a000.00000000.029d6000.80000000.0
 1a2a036.00000000.40000000.00000000.00c00000.80000000.00400036.00000000.00002000.00000000.00bfe000.80000000.00002036
        existing:  00000000.00000000.00000800.00000000.fffff800.00000000.00000800.00000000
        available:  fffff800.00000000.000007fc.00000000.00000001.00000000.000007ff.00000000.00000000.ffff0000.00000000.0000e000.00000000.00000000.00000000.f0000000.00000000.ffdb8000.00000000.00008000.00000000.f0800000.00000000.0f2c2000
        page-size:  00002000
        name: 'virtual-memory'

    Node 0xf005da70
        .node:  f005da70
        ranges:  00000000.f8000000.000001ff.f8000000.08000000
        reg:  000001ff.00000000.00000000.08000000
        name: 'central'

        Node 0xf005db8c
            .node:  f005db8c
            board-model: 'SUNW,501-2511'
            ranges:  00000000.00000000.00000000.f8000000.08000000
            reg:  00000000.f8800000.00000110.00000000.f8802000.00000010.00000000.f8804000.00000020.00000000.f8806000.00000020.00000000.f8808000.00000020.00000000.f880a000.00000020
            name: 'fhc'

            Node 0xf005dd0c
                .node:  f005dd0c
                address:  fff62000
                watchdog-enable:  
                interrupts:  0000003a
                reg:  00000000.00908000.00002000
                model: 'mk48t59'
                name: 'eeprom'

            Node 0xf005de3c
                .node:  f005de3c
                port-b-ignore-cd:  
                port-a-ignore-cd:  
                address:  fff66000
                interrupts:  00000039
                device_type: 'serial'
                reg:  00000000.00902000.00000008
                name: 'zs'

            Node 0xf005df14
                .node:  f005df14
                address:  ffdc2000
                port-b-ignore-cd:  
                port-a-ignore-cd:  
                keyboard:  
                interrupts:  00000039
                device_type: 'serial'
                reg:  00000000.00904000.00000008
                name: 'zs'

            Node 0xf005e05c
                .node:  f005e05c
                reg:  00000000.00900000.00000008.00000000.00906000.00000060.00000000.0090c000.00000001
                interrupts:  00000038
                name: 'clock-board'

    Node 0xf00df7bc
        .node:  f00df7bc
        board-type: 'cpu'
        board-model: 'SUNW,501-2557'
        ranges:  00000000.00000000.000001cc.f8000000.08000000
        central-space:  
        board#:  00000003
        reg:  000001cc.f8800000.00000000.00000110.000001cc.f8802000.00000000.00000010.000001cc.f8804000.00000000.00000020.000001cc.f8806000.00000000.00000020.000001cc.f8808000.00000000.00000020.000001cc.f880a000.00000000.00000020
        manfid#:  0000003e
        version#:  00000001
        model: 'SUNW,fhc0FA0'
        name: 'fhc'

        Node 0xf00dfa08
            .node:  f00dfa08
            reg:  00000000.01000000.00008000.00000000.02000000.01000000
            bank-0-status: 'ok'
            bank-1-status: 'ok'
            manfid#:  0000003e
            version#:  00000005
            model: 'SUNW,ac0F9E'
            device_type: 'memory-controller'
            name: 'ac'

        Node 0xf00dfb90
            .node:  f00dfb90
            reg:  00000000.00600000.00000010
            name: 'simm-status'

        Node 0xf00dfc28
            .node:  f00dfc28
            interrupts:  0000003b
            reg:  00000000.00400000.00000010
            name: 'environment'

        Node 0xf00dfce4
            .node:  f00dfce4
            reg:  00000000.00200000.00008000.00000000.00280000.00008000
            name: 'sram'

        Node 0xf00dfd80
            .node:  f00dfd80
            version:  4f425020.2020332e.322e3330.20323030.322f3130.2f323520.31343a30.3300504f.53542020.332e392e.33302032.3030322f.31302f32.35203134.3a303400
            model: 'SUNW,525-1431'
            reg:  00000000.00000000.00080000
            name: 'flashprom'

    Node 0xf00dfec4
        .node:  f00dfec4
        manufacturer#:  00000017
        implementation#:  00000011
        mask#:  000000a0
        sparc-version:  00000009
        ecache-associativity:  00000001
        ecache-line-size:  00000040
        ecache-size:  00800000
        #dtlb-entries:  00000040
        dcache-associativity:  00000001
        dcache-line-size:  00000020
        dcache-size:  00004000
        #itlb-entries:  00000040
        icache-associativity:  00000002
        icache-line-size:  00000020
        icache-size:  00004000
        upa-portid:  00000006
        clock-frequency:  17d78400
        rated-frequency:  17d78400
        reg:  000001cc.00000000.00000000.00000008
        board#:  00000003
        device_type: 'cpu'
        name: 'SUNW,UltraSPARC-II'

    Node 0xf00e0284
        .node:  f00e0284
        manufacturer#:  00000017
        implementation#:  00000011
        mask#:  000000a0
        sparc-version:  00000009
        ecache-associativity:  00000001
        ecache-line-size:  00000040
        ecache-size:  00800000
        #dtlb-entries:  00000040
        dcache-associativity:  00000001
        dcache-line-size:  00000020
        dcache-size:  00004000
        #itlb-entries:  00000040
        icache-associativity:  00000002
        icache-line-size:  00000020
        icache-size:  00004000
        upa-portid:  00000007
        clock-frequency:  17d78400
        rated-frequency:  17d78400
        reg:  000001ce.00000000.00000000.00000008
        board#:  00000003
        device_type: 'cpu'
        name: 'SUNW,UltraSPARC-II'

    Node 0xf006f7bc
        .node:  f006f7bc
        ranges:  00000001.00000000.000001c5.10000000.10000000.00000002.00000000.000001c5.20000000.10000000.0000000d.00000000.000001c5.d0000000.10000000
        interrupts:  000000b4.000000b5.000000b6.000000a5.000000aa.000000b7
        version#:  00000001
        implementation#:  00000000
        bus-parity-generated:  
        address:  ffdb2000
        scsi-initiator-id:  00000007
        model: 'SUNW,sysio'
        reg:  000001c4.00000000.00000000.00006000
        slot-address-bits:  0000001c
        up-burst-sizes:  0078007f
        burst-sizes:  00f8007f
        device_type: 'sbus'
        name: 'sbus'
        upa-portid:  00000002
        clock-frequency:  017d7840
        board#:  00000001

        Node 0xf0075084
            .node:  f0075084
            wwn:  20040800.20b6eee2
            intr:  00000003.00000000
            interrupts:  00000022
            ranges:  00000000.00000000.0000000d.00010240.00000018.00000001.00000000.0000000d.00010258.00000018.00000010.00000000.0000000d.00010300.00000008.00000011.00000000.0000000d.00010308.00000008
            reg:  0000000d.00010000.00010018
            device_type: 'socal'
            version: '@(#) FCode 1.12 99/07/30'
            manufacturer: 'SUNW'
            model: '501-3060'
            name: 'SUNW,socal'

            Node 0xf007c8ec
                .node:  f007c8ec
                port-wwn:  20050800.20b6eee2
                reg:  00000000.00000000.00000018.00000010.00000000.00000008
                port#:  00000000
                #address-cells:  00000004
                device_type: 'scsi-3'
                name: 'sf'

                Node 0xf007e704
                    .node:  f007e704
                    device_type: 'block'
                    name: 'ssd'

            Node 0xf007efd4
                .node:  f007efd4
                port-wwn:  20060800.20b6eee2
                reg:  00000001.00000000.00000018.00000011.00000000.00000008
                port#:  00000001
                #address-cells:  00000004
                device_type: 'scsi-3'
                name: 'sf'

                Node 0xf007f670
                    .node:  f007f670
                    device_type: 'block'
                    name: 'ssd'

        Node 0xf0080050
            .node:  f0080050
            local-mac-address:  080020ee.2248
            gem-rev:  00000000
            burst-sizes:  0078007f
            shared-pins: 'serdes'
            board-rev:  00000005
            interrupts:  00000004
            compatible: 'SUNW,sbus-gem'
            model: 'SUNW,sbus-gem'
            has-fcode: ' '
            version: '1.7'
            device_type: 'network'
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000001.00100000.00000014.00000001.00200000.00009060
            name: 'network'

        Node 0xf0086420
            .node:  f0086420
            scsi-initiator-id:  00000007
            isp-fcode: '1.21 95/05/18'
            device_type: 'scsi'
            intr:  00000003.00000000
            interrupts:  00000003
            wide:  00
            clock-frequency:  02625a00
            reg:  00000002.00010000.00000450
            64-bit-clean:  00
            model: 'QLGC,ISP1000'
            name: 'QLGC,isp'

            Node 0xf008bc8c
                .node:  f008bc8c
                device_type: 'block'
                name: 'sd'

            Node 0xf008c4a0
                .node:  f008c4a0
                device_type: 'byte'
                name: 'st'

    Node 0xf0071c1c
        .node:  f0071c1c
        board-type: 'dual-sbus-soc+'
        manfid#:  0000003e
        version#:  00000001
        ranges:  00000000.00000000.000001c4.f8000000.08000000
        reg:  000001c4.f8800000.00000000.00000110.000001c4.f8802000.00000000.00000010.000001c4.f8804000.00000000.00000020.000001c4.f8806000.00000000.00000020.000001c4.f8808000.00000000.00000020.000001c4.f880a000.00000000.00000020
        board-model: 'SUNW,501-2558'
        model: 'SUNW,fhc0FA0'
        board#:  00000001
        name: 'fhc'

        Node 0xf00720cc
            .node:  f00720cc
            manfid#:  0000003e
            version#:  00000005
            device_type: 'memory-controller'
            reg:  00000000.01000000.00008000.00000000.02000000.01000000
            model: 'SUNW,ac0F9E'
            name: 'ac'

        Node 0xf0072204
            .node:  f0072204
            interrupts:  0000003b
            reg:  00000000.00400000.00000010
            name: 'environment'

        Node 0xf00722c0
            .node:  f00722c0
            version:  46434f44.4520312e.382e3330.20323030.322f3130.2f323520.31343a30.32006950.4f535420.332e342e.33302032.3030322f.31302f32.35203134.3a303300
            model: 'SUNW,525-1757'
            reg:  00000000.00000000.00080000
            name: 'flashprom'

        Node 0xf00726f8
            .node:  f00726f8
            address:  ffd96000
            interrupts:  0000003a
            reg:  00000000.00300000.00002000
            model: 'mk48t59'
            name: 'eeprom'

        Node 0xf00727f0
            .node:  f00727f0
            reg:  00000000.00500000.00000010
            name: 'sbus-speed'

    Node 0xf00728e4
        .node:  f00728e4
        address:  ffd95c00.ffd91860.ffd93060
        interrupts:  000000b0.000000b1
        reg:  000001c4.00003c00.00000000.00000020.000001c4.00003860.00000000.00000010.000001c4.00003060.00000000.00000010
        board#:  00000001
        name: 'counter-timer'

    Node 0xf0072ad4
        .node:  f0072ad4
        ranges:  00000000.00000000.000001c7.00000000.10000000.00000003.00000000.000001c7.30000000.10000000
        interrupts:  000000f4.000000f5.000000f6.000000e5.000000ea.000000f7
        version#:  00000001
        implementation#:  00000000
        bus-parity-generated:  
        address:  ffd8a000
        scsi-initiator-id:  00000007
        model: 'SUNW,sysio'
        reg:  000001c6.00000000.00000000.00006000
        slot-address-bits:  0000001c
        up-burst-sizes:  0078007f
        burst-sizes:  00f8007f
        device_type: 'sbus'
        name: 'sbus'
        upa-portid:  00000003
        clock-frequency:  017d7840
        board#:  00000001

        Node 0xf008d070
            .node:  f008d070
            hm-rev:  00000022
            device_type: 'network'
            intr:  00000004.00000000
            interrupts:  00000004
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000003.08c00000.00000108.00000003.08c02000.00002000.00000003.08c04000.00002000.00000003.08c06000.00002000.00000003.08c07000.00000020
            name: 'SUNW,hme'

        Node 0xf0093c14
            .node:  f0093c14
            hm-rev:  00000022
            device_type: 'scsi'
            clock-frequency:  02625a00
            intr:  00000003.00000000
            interrupts:  00000003
            reg:  00000003.08800000.00000010.00000003.08810000.00000040
            name: 'SUNW,fas'

            Node 0xf009864c
                .node:  f009864c
                device_type: 'block'
                name: 'sd'

            Node 0xf0098f08
                .node:  f0098f08
                device_type: 'byte'
                name: 'st'

        Node 0xf0099bf4
            .node:  f0099bf4
            local-mac-address:  08002093.7994
            hm-rev:  00000022
            device_type: 'network'
            intr:  00000004.00000000
            interrupts:  00000004
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000000.08c00000.00000108.00000000.08c02000.00002000.00000000.08c04000.00002000.00000000.08c06000.00002000.00000000.08c07000.00000020
            model: 'SUNW,sbus-qfe'
            version: '1.11'
            name: 'SUNW,qfe'

        Node 0xf009fba8
            .node:  f009fba8
            local-mac-address:  08002093.7995
            hm-rev:  00000022
            device_type: 'network'
            intr:  00000004.00000000
            interrupts:  00000004
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000000.08c10000.00000108.00000000.08c12000.00002000.00000000.08c14000.00002000.00000000.08c16000.00002000.00000000.08c17000.00000020
            model: 'SUNW,sbus-qfe'
            version: '1.11'
            name: 'SUNW,qfe'

        Node 0xf00a5a84
            .node:  f00a5a84
            local-mac-address:  08002093.7996
            hm-rev:  00000022
            device_type: 'network'
            intr:  00000004.00000000
            interrupts:  00000004
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000000.08c20000.00000108.00000000.08c22000.00002000.00000000.08c24000.00002000.00000000.08c26000.00002000.00000000.08c27000.00000020
            model: 'SUNW,sbus-qfe'
            version: '1.11'
            name: 'SUNW,qfe'

        Node 0xf00ab960
            .node:  f00ab960
            local-mac-address:  08002093.7997
            hm-rev:  00000022
            device_type: 'network'
            intr:  00000004.00000000
            interrupts:  00000004
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000000.08c30000.00000108.00000000.08c32000.00002000.00000000.08c34000.00002000.00000000.08c36000.00002000.00000000.08c37000.00000020
            model: 'SUNW,sbus-qfe'
            version: '1.11'
            name: 'SUNW,qfe'

    Node 0xf0074e94
        .node:  f0074e94
        address:  ffd7bc00.ffd77860.ffd79060
        interrupts:  000000f0.000000f1
        reg:  000001c6.00003c00.00000000.00000020.000001c6.00003860.00000000.00000010.000001c6.00003060.00000000.00000010
        board#:  00000001
        name: 'counter-timer'

    Node 0xf014f7bc
        .node:  f014f7bc
        ranges:  00000001.00000000.000001d5.10000000.10000000.00000002.00000000.000001d5.20000000.10000000.0000000d.00000000.000001d5.d0000000.10000000
        interrupts:  000002b4.000002b5.000002b6.000002a5.000002aa.000002b7
        version#:  00000001
        implementation#:  00000000
        bus-parity-generated:  
        address:  ffd70000
        scsi-initiator-id:  00000007
        model: 'SUNW,sysio'
        reg:  000001d4.00000000.00000000.00006000
        slot-address-bits:  0000001c
        up-burst-sizes:  0078007f
        burst-sizes:  00f8007f
        device_type: 'sbus'
        name: 'sbus'
        upa-portid:  0000000a
        clock-frequency:  017d7840
        board#:  00000005

        Node 0xf0155084
            .node:  f0155084
            wwn:  20140800.20b6eee2
            intr:  00000003.00000000
            interrupts:  00000022
            ranges:  00000000.00000000.0000000d.00010240.00000018.00000001.00000000.0000000d.00010258.00000018.00000010.00000000.0000000d.00010300.00000008.00000011.00000000.0000000d.00010308.00000008
            reg:  0000000d.00010000.00010018
            device_type: 'socal'
            version: '@(#) FCode 1.12 99/07/30'
            manufacturer: 'SUNW'
            model: '501-3060'
            name: 'SUNW,socal'

            Node 0xf015c8ec
                .node:  f015c8ec
                port-wwn:  20150800.20b6eee2
                reg:  00000000.00000000.00000018.00000010.00000000.00000008
                port#:  00000000
                #address-cells:  00000004
                device_type: 'scsi-3'
                name: 'sf'

                Node 0xf015e704
                    .node:  f015e704
                    device_type: 'block'
                    name: 'ssd'

            Node 0xf015efd4
                .node:  f015efd4
                port-wwn:  20160800.20b6eee2
                reg:  00000001.00000000.00000018.00000011.00000000.00000008
                port#:  00000001
                #address-cells:  00000004
                device_type: 'scsi-3'
                name: 'sf'

                Node 0xf015f670
                    .node:  f015f670
                    device_type: 'block'
                    name: 'ssd'

        Node 0xf0160050
            .node:  f0160050
            scsi-initiator-id:  00000007
            clock-frequency:  03938700
            differential:  00
            isp-fcode: '1.28 99/11/08'
            device_type: 'scsi'
            intr:  00000003.00000000
            interrupts:  00000003
            wide:  00
            fast-20:  00
            reg:  00000001.00010000.00000450
            64-bit-clean:  00
            model: 'QLGC,ISP1000U'
            name: 'QLGC,isp'

            Node 0xf0165dc8
                .node:  f0165dc8
                device_type: 'block'
                name: 'sd'

            Node 0xf01665b8
                .node:  f01665b8
                device_type: 'byte'
                name: 'st'

        Node 0xf016713c
            .node:  f016713c
            cache-linesize:  00000010
            cache-size:  00008000
            intr:  00000002.00000000
            interrupts:  00000002
            reg:  00000002.00010000.00000080.00000002.00020000.00000068.00000002.00030000.0000000c
            model: 'SUNW,501-1763-01'
            name: 'SUNW,SunPC'

    Node 0xf0151c1c
        .node:  f0151c1c
        board-type: 'dual-sbus-soc+'
        manfid#:  0000003e
        version#:  00000001
        ranges:  00000000.00000000.000001d4.f8000000.08000000
        reg:  000001d4.f8800000.00000000.00000110.000001d4.f8802000.00000000.00000010.000001d4.f8804000.00000000.00000020.000001d4.f8806000.00000000.00000020.000001d4.f8808000.00000000.00000020.000001d4.f880a000.00000000.00000020
        board-model: 'SUNW,501-2558'
        model: 'SUNW,fhc0FA0'
        board#:  00000005
        name: 'fhc'

        Node 0xf01520cc
            .node:  f01520cc
            manfid#:  0000003e
            version#:  00000005
            device_type: 'memory-controller'
            reg:  00000000.01000000.00008000.00000000.02000000.01000000
            model: 'SUNW,ac0F9E'
            name: 'ac'

        Node 0xf0152204
            .node:  f0152204
            interrupts:  0000003b
            reg:  00000000.00400000.00000010
            name: 'environment'

        Node 0xf01522c0
            .node:  f01522c0
            version:  46434f44.4520312e.382e3330.20323030.322f3130.2f323520.31343a30.32006950.4f535420.332e342e.33302032.3030322f.31302f32.35203134.3a303300
            model: 'SUNW,525-1757'
            reg:  00000000.00000000.00080000
            name: 'flashprom'

        Node 0xf01526f8
            .node:  f01526f8
            address:  ffd54000
            interrupts:  0000003a
            reg:  00000000.00300000.00002000
            model: 'mk48t59'
            name: 'eeprom'

        Node 0xf01527f0
            .node:  f01527f0
            reg:  00000000.00500000.00000010
            name: 'sbus-speed'

    Node 0xf01528e4
        .node:  f01528e4
        address:  ffd53c00.ffd4f860.ffd51060
        interrupts:  000002b0.000002b1
        reg:  000001d4.00003c00.00000000.00000020.000001d4.00003860.00000000.00000010.000001d4.00003060.00000000.00000010
        board#:  00000005
        name: 'counter-timer'

    Node 0xf0152ad4
        .node:  f0152ad4
        ranges:  00000000.00000000.000001d7.00000000.10000000.00000003.00000000.000001d7.30000000.10000000
        interrupts:  000002f4.000002f5.000002f6.000002e5.000002ea.000002f7
        version#:  00000001
        implementation#:  00000000
        bus-parity-generated:  
        address:  ffd48000
        scsi-initiator-id:  00000007
        model: 'SUNW,sysio'
        reg:  000001d6.00000000.00000000.00006000
        slot-address-bits:  0000001c
        up-burst-sizes:  0078007f
        burst-sizes:  00f8007f
        device_type: 'sbus'
        name: 'sbus'
        upa-portid:  0000000b
        clock-frequency:  017d7840
        board#:  00000005

        Node 0xf01673a8
            .node:  f01673a8
            hm-rev:  00000022
            device_type: 'network'
            intr:  00000004.00000000
            interrupts:  00000004
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000003.08c00000.00000108.00000003.08c02000.00002000.00000003.08c04000.00002000.00000003.08c06000.00002000.00000003.08c07000.00000020
            name: 'SUNW,hme'

        Node 0xf016df4c
            .node:  f016df4c
            hm-rev:  00000022
            device_type: 'scsi'
            clock-frequency:  02625a00
            intr:  00000003.00000000
            interrupts:  00000003
            reg:  00000003.08800000.00000010.00000003.08810000.00000040
            name: 'SUNW,fas'

            Node 0xf0172984
                .node:  f0172984
                device_type: 'block'
                name: 'sd'

            Node 0xf0173240
                .node:  f0173240
                device_type: 'byte'
                name: 'st'

        Node 0xf0173f2c
            .node:  f0173f2c
            scsi-initiator-id:  00000007
            clock-frequency:  03938700
            differential:  00
            isp-fcode: '1.28 99/11/08'
            device_type: 'scsi'
            intr:  00000003.00000000
            interrupts:  00000003
            wide:  00
            fast-20:  00
            reg:  00000000.00010000.00000450
            64-bit-clean:  00
            model: 'QLGC,ISP1000U'
            name: 'QLGC,isp'

            Node 0xf0179ca4
                .node:  f0179ca4
                device_type: 'block'
                name: 'sd'

            Node 0xf017a494
                .node:  f017a494
                device_type: 'byte'
                name: 'st'

    Node 0xf0154e94
        .node:  f0154e94
        address:  ffd39c00.ffd35860.ffd37060
        interrupts:  000002f0.000002f1
        reg:  000001d6.00003c00.00000000.00000020.000001d6.00003860.00000000.00000010.000001d6.00003060.00000000.00000010
        board#:  00000005
        name: 'counter-timer'

    Node 0xf01bf7bc
        .node:  f01bf7bc
        available:  82000000.00000000.02808000.00000000.7d7f8000.81000000.00000000.00000400.00000000.0000fc00
        bus-range:  00000000.00000000
        version#:  00000004
        implementation#:  00000000
        clock-frequency:  01f78a40
        upa-portid:  0000000e
        interrupts:  000003b1.000003ae.000003af.000003a5.000003a8.000003b2
        ranges:  00000000.00000000.00000000.000001dc.01000000.00000000.00800000.01000000.00000000.00000000.000001dc.02010000.00000000.00010000.02000000.00000000.00000000.000001dd.80000000.00000000.80000000.03000000.00000000.00000000.000001dd.80000000.00000000.80000000
        address:  ffd32000.ffd30000.ffd22000
        reg:  000001dc.00004000.00000000.00002000.000001dc.01000000.00000000.00000100.000001dc.00000000.00000000.0000d000
        board#:  00000007
        model: 'SUNW,psycho'
        compatible: 'pci108e,8000'
        bus-parity-generated:  
        #size-cells:  00000002
        #address-cells:  00000003
        device_type: 'pci'
        name: 'pci'

        Node 0xf01d3d84
            .node:  f01d3d84
            assigned-addresses:  82000810.00000000.01000000.00000000.01000000.82000814.00000000.02000000.00000000.00800000
            power-consumption:  00000000.00e4e1c0
            reg:  00000800.00000000.00000000.00000000.00000000.02000810.00000000.00000000.00000000.01000000.02000814.00000000.00000000.00000000.00800000
            compatible:  70636931.3038652c.31303030.00706369.636c6173.732c3036.38303030.00
            name: 'pci108e,1000'
            66mhz-capable:  00000000
            udf-supported:  00000000
            fast-back-to-back:  00000001
            devsel-speed:  00000001
            class-code:  00068000
            interrupts:  00000001
            max-latency:  00000019
            min-grant:  0000000a
            revision-id:  00000001
            device-id:  00001000
            vendor-id:  0000108e

        Node 0xf01d4058
            .node:  f01d4058
            assigned-addresses:  82000910.00000000.02800000.00000000.00007030
            compatible: 'pci108e,1001'
            version: '1.17'
            device_type: 'network'
            hm-rev:  000000c1
            address-bits:  00000030
            max-frame-size:  00004000
            reg:  00000900.00000000.00000000.00000000.00000000.02000910.00000000.00000000.00000000.00007030
            model: 'SUNW,cheerio'
            name: 'SUNW,hme'
            66mhz-capable:  00000000
            udf-supported:  00000000
            fast-back-to-back:  00000001
            devsel-speed:  00000001
            class-code:  00020000
            interrupts:  000003a1
            max-latency:  00000005
            min-grant:  0000000a
            revision-id:  00000001
            device-id:  00001001
            vendor-id:  0000108e

    Node 0xf01c88e0
        .node:  f01c88e0
        available:  82800000.00000000.00002100.00000000.7fffdf00.81800000.00000000.00000440.00000000.0000fbc0
        bus-range:  00000080.00000080
        version#:  00000004
        implementation#:  00000000
        clock-frequency:  01f78a40
        slot-names:  00000004.7063692d.736c6f74.203000
        upa-portid:  0000000e
        66mhz-capable:  
        interrupts:  000003b0.000003ae.000003af.000003a5.000003a8.000003b2
        ranges:  00800000.00000000.00000000.000001dc.01000000.00000000.00800000.01000000.00000000.00000000.000001dc.02000000.00000000.00010000.02000000.00000000.00000000.000001dd.00000000.00000000.80000000.03000000.00000000.00000000.000001dd.00000000.00000000.80000000
        address:  ffd18000.ffd16000.ffd08000
        reg:  000001dc.00002000.00000000.00002000.000001dc.01800000.00000000.00000100.000001dc.00000000.00000000.0000d000
        board#:  00000007
        model: 'SUNW,psycho'
        compatible: 'pci108e,8000'
        bus-parity-generated:  
        #size-cells:  00000002
        #address-cells:  00000003
        device_type: 'pci'
        name: 'pci'

        Node 0xf01e7cb8
            .node:  f01e7cb8
            assigned-addresses:  81801020.00000000.00000400.00000000.00000020
            power-consumption:  00000000.00e4e1c0
            reg:  00801000.00000000.00000000.00000000.00000000.01801020.00000000.00000000.00000000.00000020
            compatible:  70636939.32352c31.32333400.70636931.3130362c.33303338.00706369.636c6173.732c3063.30333030.00757362.00
            name: 'usb'
            66mhz-capable:  00000000
            udf-supported:  00000000
            fast-back-to-back:  00000000
            devsel-speed:  00000001
            class-code:  000c0300
            interrupts:  00000001
            subsystem-vendor-id:  00000925
            subsystem-id:  00001234
            max-latency:  00000000
            min-grant:  00000000
            revision-id:  00000050
            device-id:  00003038
            vendor-id:  00001106

        Node 0xf01e7fd4
            .node:  f01e7fd4
            assigned-addresses:  81801120.00000000.00000420.00000000.00000020
            reg:  00801100.00000000.00000000.00000000.00000000.01801120.00000000.00000000.00000000.00000020
            compatible:  70636939.32352c31.32333400.70636931.3130362c.33303338.00706369.636c6173.732c3063.30333030.00757362.00
            name: 'usb'
            66mhz-capable:  00000000
            udf-supported:  00000000
            fast-back-to-back:  00000000
            devsel-speed:  00000001
            class-code:  000c0300
            interrupts:  00000002
            subsystem-vendor-id:  00000925
            subsystem-id:  00001234
            max-latency:  00000000
            min-grant:  00000000
            revision-id:  00000050
            device-id:  00003038
            vendor-id:  00001106

        Node 0xf01e82c0
            .node:  f01e82c0
            assigned-addresses:  82801210.00000000.00002000.00000000.00000100
            reg:  00801200.00000000.00000000.00000000.00000000.02801210.00000000.00000000.00000000.00000100
            compatible:  70636939.32352c31.32333400.70636931.3130362c.33313034.00706369.636c6173.732c3063.30333230.00757362.00
            name: 'usb'
            66mhz-capable:  00000000
            udf-supported:  00000000
            fast-back-to-back:  00000000
            devsel-speed:  00000001
            class-code:  000c0320
            interrupts:  00000003
            subsystem-vendor-id:  00000925
            subsystem-id:  00001234
            max-latency:  00000000
            min-grant:  00000000
            revision-id:  00000051
            device-id:  00003104
            vendor-id:  00001106

    Node 0xf01c923c
        .node:  f01c923c
        board-type: 'dual-pci'
        manfid#:  0000003e
        version#:  00000001
        ranges:  00000000.00000000.000001dc.f8000000.08000000
        reg:  000001dc.f8800000.00000000.00000110.000001dc.f8802000.00000000.00000010.000001dc.f8804000.00000000.00000020.000001dc.f8806000.00000000.00000020.000001dc.f8808000.00000000.00000020.000001dc.f880a000.00000000.00000020
        board-model: 'SUNW,501-3023'
        model: 'SUNW,fhc0FA0'
        board#:  00000007
        name: 'fhc'

        Node 0xf01c9718
            .node:  f01c9718
            manfid#:  0000003e
            version#:  00000005
            device_type: 'memory-controller'
            reg:  00000000.01000000.00008000.00000000.02000000.01000000
            model: 'SUNW,ac0F9E'
            name: 'ac'

        Node 0xf01c9850
            .node:  f01c9850
            interrupts:  0000003b
            reg:  00000000.00400000.00000010
            name: 'environment'

        Node 0xf01c990c
            .node:  f01c990c
            version:  46434f44.4520312e.382e3330.20323030.322f3130.2f323520.31343a30.32006950.4f535420.332e302e.33302032.3030322f.31302f32.35203134.3a303300
            model: 'SUNW,525-1680'
            reg:  00000000.00000000.00080000
            name: 'flashprom'

        Node 0xf01c9d44
            .node:  f01c9d44
            address:  ffcfa000
            interrupts:  0000003a
            reg:  00000000.00300000.00002000
            model: 'mk48t59'
            name: 'eeprom'

        Node 0xf01c9e3c
            .node:  f01c9e3c
            reg:  00000000.00500000.00000010
            name: 'sbus-speed'

    Node 0xf01c9f28
        .node:  f01c9f28
        address:  ffcf9c00.ffcf5860.ffcf7060
        interrupts:  000003ac.000003ad
        reg:  000001dc.00001c00.00000000.00000020.000001dc.00001860.00000000.00000010.000001dc.00001060.00000000.00000010
        board#:  00000007
        name: 'counter-timer'

    Node 0xf01ca118
        .node:  f01ca118
        available:  82000000.00000000.00020000.00000000.7ffe0000.81000000.00000000.00000500.00000000.0000fb00
        bus-range:  00000000.00000000
        version#:  00000004
        implementation#:  00000000
        clock-frequency:  01f78a40
        upa-portid:  0000000f
        interrupts:  000003f1.000003ee.000003ef.000003e5.000003e8.000003f2
        ranges:  00000000.00000000.00000000.000001de.01000000.00000000.00800000.01000000.00000000.00000000.000001de.02010000.00000000.00010000.02000000.00000000.00000000.000001df.80000000.00000000.80000000.03000000.00000000.00000000.000001df.80000000.00000000.80000000
        address:  ffcf2000.ffcf0000.ffce2000
        reg:  000001de.00004000.00000000.00002000.000001de.01000000.00000000.00000100.000001de.00000000.00000000.0000d000
        board#:  00000007
        model: 'SUNW,psycho'
        compatible: 'pci108e,8000'
        bus-parity-generated:  
        #size-cells:  00000002
        #address-cells:  00000003
        device_type: 'pci'
        name: 'pci'

        Node 0xf01dc3c0
            .node:  f01dc3c0
            assigned-addresses:  81001810.00000000.00000400.00000000.00000100.82001814.00000000.00002000.00000000.00001000.82001830.00000000.00010000.00000000.00010000
            model: 'QLGC,ISP1040B'
            scsi-initiator-id:  00000007
            clock-frequency:  03938700
            alternate-reg:  00000000.00000000.00000000.00000000.00000000.02001814.00000000.00000000.00000000.00000100.01001810.00000000.00000000.00000000.00000100
            reg:  00001800.00000000.00000000.00000000.00000000.01001810.00000000.00000000.00000000.00000100.02001814.00000000.00000000.00000000.00001000.02001830.00000000.00000000.00000000.00010000
            power-consumption:  00000000.00000000.00895440.00895440
            manufacturer: 'QLGC'
            device_type: 'scsi'
            name: 'SUNW,isptwo'
            66mhz-capable:  00000000
            udf-supported:  00000000
            fast-back-to-back:  00000000
            devsel-speed:  00000001
            class-code:  00010000
            interrupts:  000003e0
            max-latency:  00000000
            min-grant:  00000000
            revision-id:  00000002
            device-id:  00001020
            vendor-id:  00001077

            Node 0xf01e6534
                .node:  f01e6534
                device_type: 'block'
                name: 'sd'

            Node 0xf01e7010
                .node:  f01e7010
                device_type: 'byte'
                name: 'st'

    Node 0xf01d320c
        .node:  f01d320c
        available:  82800000.00000000.00004000.00000000.7fffc000.81800000.00000000.00000900.00000000.0000f700
        bus-range:  00000080.00000080
        version#:  00000004
        implementation#:  00000000
        clock-frequency:  03ef1480
        slot-names:  00000004.7063692d.736c6f74.203100
        upa-portid:  0000000f
        66mhz-capable:  
        interrupts:  000003f0.000003ee.000003ef.000003e5.000003e8.000003f2
        ranges:  00800000.00000000.00000000.000001de.01000000.00000000.00800000.01000000.00000000.00000000.000001de.02000000.00000000.00010000.02000000.00000000.00000000.000001df.00000000.00000000.80000000.03000000.00000000.00000000.000001df.00000000.00000000.80000000
        address:  ffcd8000.ffcd6000.ffcc8000
        reg:  000001de.00002000.00000000.00002000.000001de.01800000.00000000.00000100.000001de.00000000.00000000.0000d000
        board#:  00000007
        model: 'SUNW,psycho'
        compatible: 'pci108e,8000'
        bus-parity-generated:  
        #size-cells:  00000002
        #address-cells:  00000003
        device_type: 'pci'
        name: 'pci'

        Node 0xf01e86d0
            .node:  f01e86d0
            assigned-addresses:  81801010.00000000.00000400.00000000.00000100.83801014.00000000.00002000.00000000.00002000.8180101c.00000000.00000800.00000000.00000100
            power-consumption:  00000000.00e4e1c0
            reg:  00801000.00000000.00000000.00000000.00000000.01801010.00000000.00000000.00000000.00000100.03801014.00000000.00000000.00000000.00002000.0180101c.00000000.00000000.00000000.00000100
            compatible:  70636939.3030352c.34340070.63693930.30352c38.30313700.70636963.6c617373.2c303130.30303000.73637369.00
            name: 'scsi'
            66mhz-capable:  00000001
            udf-supported:  00000000
            fast-back-to-back:  00000000
            devsel-speed:  00000002
            class-code:  00010000
            interrupts:  00000001
            subsystem-vendor-id:  00009005
            subsystem-id:  00000044
            max-latency:  00000019
            min-grant:  00000028
            revision-id:  00000010
            device-id:  00008017
            vendor-id:  00009005

    Node 0xf01d3b68
        .node:  f01d3b68
        address:  ffcc7c00.ffcc3860.ffcc5060
        interrupts:  000003ec.000003ed
        reg:  000001de.00001c00.00000000.00000020.000001de.00001860.00000000.00000010.000001de.00001060.00000000.00000010
        board#:  00000007
        name: 'counter-timer'


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13  7:45 ` OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88 Meelis Roos
@ 2012-02-13  8:06   ` Grant Likely
  2012-02-13  9:20     ` Meelis Roos
  2012-02-13  9:50     ` Meelis Roos
  2012-03-01 12:24   ` [tip:core/urgent] memblock: Fix size aligning of memblock_alloc_base_nid() tip-bot for Tejun Heo
  1 sibling, 2 replies; 46+ messages in thread
From: Grant Likely @ 2012-02-13  8:06 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Rob Herring, sparclinux, Linux Kernel list

On Mon, Feb 13, 2012 at 09:45:40AM +0200, Meelis Roos wrote:
> (Resend with proper To-s for OF people)
> 
> This is my first post-3.2 test on 2-CPU Sun Enterprise 3500 (PCI+SBus 
> IO). prtconf is also below. Something OF-related seems to be happening 
> here.
> 
> [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.2.30 2002/10/25 14:03'
> [    0.000000] PROMLIB: Root node compatible:
> [    0.000000] Initializing cgroup subsys cpu
> [    0.000000] Linux version 3.3.0-rc3-00188-g3ec1e88 (mroos@korvits) (gcc version 4.6.2 (Debian 4.6.2-14) ) #64 SMP Sun Feb 12 22:26:40 EET 2012
> [    0.000000] debug: ignoring loglevel setting.
> [    0.000000] bootconsole [earlyprom0] enabled
> [    0.000000] ARCH: SUN4U
> [    0.000000] Ethernet address: 08:00:20:b6:ee:e2
> [    0.000000] Kernel: Using 4 locked TLB entries for main kernel image.
> [    0.000000] Remapping the kernel... done.
> [    0.000000] Unable to handle kernel NULL pointer dereference
> [    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
> [    0.000000] tsk->{mm,active_mm}->pgd = fffff800008c77d0
> [    0.000000]               \|/ ____ \|/
> [    0.000000]               "@'/ .. \`@"
> [    0.000000]               /_| \__/ |_\
> [    0.000000]                  \__U_/
> [    0.000000] swapper(0): Oops [#1]
> [    0.000000] TSTATE: 0000000080e01607 TPC: 00000000006459a0 TNPC: 0000000000645964 Y: 00000037    Not tainted
> [    0.000000] TPC: <of_find_node_by_path+0x60/0x80>
> [    0.000000] g0: 0000000000000000 g1: 0000000000000001 g2: 00000000000000ff g3: 00000000000000f0
> [    0.000000] g4: 0000000000853fd0 g5: 0000000000000000 g6: 0000000000834000 g7: 0000000000000050
> [    0.000000] o0: 0000000000000001 o1: fffff8007fced7c0 o2: 0000000001010101 o3: 0000000080808080
> [    0.000000] o4: fffff8007fcc0a4d o5: 00000000000199b5 sp: 0000000000837231 ret_pc: 0000000000645970
> [    0.000000] RPC: <of_find_node_by_path+0x30/0x80>
> [    0.000000] l0: 00000000008ab400 l1: fffff8007fcc1f40 l2: 000000000085c5ec l3: 0000000000000025
> [    0.000000] l4: 00000000005c0400 l5: 00000000008fa5e6 l6: 0000000000000006 l7: 0028280000000000
> [    0.000000] i0: fffff8007fced7c0 i1: 0000000000808fd8 i2: 0000000001010101 i3: 0000000080808080
> [    0.000000] i4: 0000000000876c00 i5: 0000000000000050 i6: 00000000008372e1 i7: 000000000064684c
> [    0.000000] I7: <of_alias_scan+0xcc/0x1c0>
> [    0.000000] Call Trace:
> [    0.000000]  [000000000064684c] of_alias_scan+0xcc/0x1c0
> [    0.000000]  [00000000008a0350] of_pdt_build_devicetree+0x90/0xa0
> [    0.000000]  [000000000088c540] prom_build_devicetree+0x10/0x3c
> [    0.000000]  [00000000008904d4] paging_init+0x59c/0x6bc
> [    0.000000]  [000000000088bebc] setup_arch+0xf8/0x110
> [    0.000000]  [000000000088a51c] start_kernel+0x8c/0x34c

Try the following patch.  I suspect the new of_alias_scan() isn't careful
enough about which properties it dereferences:

---

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 133908a..9188caa 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1174,6 +1174,10 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
 		    !strcmp(pp->name, "linux,phandle"))
 			continue;
 
+		/* Check for null value or non-strings (no null termination) */
+		if (!pp->value || strnlen(pp->value, pp->length) == pp->length)
+			continue;
+
 		np = of_find_node_by_path(pp->value);
 		if (!np)
 			continue;


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13  8:06   ` Grant Likely
@ 2012-02-13  9:20     ` Meelis Roos
  2012-02-13 21:46       ` Grant Likely
  2012-02-13  9:50     ` Meelis Roos
  1 sibling, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-13  9:20 UTC (permalink / raw)
  To: Grant Likely; +Cc: Rob Herring, sparclinux, Linux Kernel list

> Try the following patch.  I suspect the new of_alias_scan() isn't careful
> enough about which properties it dereferences:
> 
> ---
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 133908a..9188caa 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1174,6 +1174,10 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
>  		    !strcmp(pp->name, "linux,phandle"))
>  			continue;
>  
> +		/* Check for null value or non-strings (no null termination) */
> +		if (!pp->value || strnlen(pp->value, pp->length) == pp->length)
> +			continue;
> +
>  		np = of_find_node_by_path(pp->value);
>  		if (!np)
>  			continue;
> 

Yes, it probably gets past this problem but oopses in a different place:

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.2.30 2002/10/25 14:03'
[    0.000000] PROMLIB: Root node compatible: 
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.3.0-rc3-00188-g3ec1e88-dirty (mroos@korvits) (gcc version 4.6.2 (Debian 42
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 08:00:20:b6:ee:e2
[    0.000000] Kernel: Using 4 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] Unable to handle kernel NULL pointer dereference
[    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
[    0.000000] tsk->{mm,active_mm}->pgd = fffff800008c77d0
[    0.000000]               \|/ ____ \|/
[    0.000000]               "@'/ .. \`@"
[    0.000000]               /_| \__/ |_\
[    0.000000]                  \__U_/
[    0.000000] swapper(0): Oops [#1]
[    0.000000] TSTATE: 0000000080e01606 TPC: 0000000000645810 TNPC: 0000000000645814 Y: 00000037    Not d
[    0.000000] TPC: <of_find_node_by_phandle+0x30/0x60>
[    0.000000] g0: 0000000000837b88 g1: 00000000fffff800 g2: 0000000000000000 g3: 0000000000000002
[    0.000000] g4: 0000000000853fd0 g5: 0000000000000000 g6: 0000000000834000 g7: 0000000000000050
[    0.000000] o0: 0000000000876cf0 o1: fffff8007fcc0900 o2: 0000000001010101 o3: 0000000080808080
[    0.000000] o4: 000000000000000e o5: 000000000086c000 sp: 0000000000837301 ret_pc: 00000000006457e8
[    0.000000] RPC: <of_find_node_by_phandle+0x8/0x60>
[    0.000000] l0: 0000000000808fd8 l1: 0000000000876d28 l2: 000000000072a800 l3: 0000000000000080
[    0.000000] l4: 0000000000000013 l5: 0000000000000013 l6: 0000000000000000 l7: 0000000000000281
[    0.000000] i0: 00000000f005de3c i1: ffffffffffdc1428 i2: 0000000000000100 i3: 0000000000000004
[    0.000000] i4: 0000000000000050 i5: 0000000000876c00 i6: 00000000008373b1 i7: 000000000088cd10
[    0.000000] I7: <of_console_init+0xa4/0x144>
[    0.000000] Call Trace:
[    0.000000]  [000000000088cd10] of_console_init+0xa4/0x144
[    0.000000]  [000000000088c548] prom_build_devicetree+0x18/0x3c
[    0.000000]  [00000000008904d4] paging_init+0x59c/0x6bc
[    0.000000]  [000000000088bebc] setup_arch+0xf8/0x110
[    0.000000]  [000000000088a51c] start_kernel+0x8c/0x34c
[    0.000000]  [00000000006fbf28] tlb_fixup_done+0xa0/0xa8
[    0.000000]  [0000000000000000]           (null)
[    0.000000] Disabling lock debugging due to kernel taint
[    0.000000] Caller[000000000088cd10]: of_console_init+0xa4/0x144
[    0.000000] Caller[000000000088c548]: prom_build_devicetree+0x18/0x3c
[    0.000000] Caller[00000000008904d4]: paging_init+0x59c/0x6bc
[    0.000000] Caller[000000000088bebc]: setup_arch+0xf8/0x110
[    0.000000] Caller[000000000088a51c]: start_kernel+0x8c/0x34c
[    0.000000] Caller[00000000006fbf28]: tlb_fixup_done+0xa0/0xa8
[    0.000000] Caller[0000000000000000]:           (null)
[    0.000000] Instruction DUMP: 901760f0  02c70007  901760f0 <c2072010> 80a04018  324ffffc  f85f2050  9 
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Press Stop-A (L1-A) to return to the boot prom

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13  8:06   ` Grant Likely
  2012-02-13  9:20     ` Meelis Roos
@ 2012-02-13  9:50     ` Meelis Roos
  2012-02-13  9:51       ` Meelis Roos
  2012-02-13 10:35       ` Meelis Roos
  1 sibling, 2 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-13  9:50 UTC (permalink / raw)
  To: Grant Likely; +Cc: Rob Herring, sparclinux, Linux Kernel list


Another variation of the crash, without the patch, but backtrace is 
slightly different (strlen) - maybe fixed by the patch, maybe not.

   0.000000] Unable to handle kernel NULL pointer dereference
[    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
[    0.000000] tsk->{mm,active_mm}->pgd = fffff800604ea3a8
[    0.000000]               \|/ ____ \|/
[    0.000000]               "@'/ .. \`@"
[    0.000000]               /_| \__/ |_\
[    0.000000]                  \__U_/
[    0.000000] swapper(0): Oops [#1]
[    0.000000] TSTATE: 0000004480e01606 TPC: 00000000005be460 TNPC: 00000000005be464 Y: 00000037    Not d
[    0.000000] TPC: <strlen+0x60/0xd4>
[    0.000000] g0: 000000000000002f g1: 0000000000000001 g2: 0000000000000000 g3: 000000000073a700
[    0.000000] g4: 000000000085ea50 g5: 0000000000000000 g6: 0000000000854000 g7: 0030a80000000000
[    0.000000] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001010101 o3: 0000000080808080
[    0.000000] o4: 0000000001010000 o5: fffff8006feae140 sp: 00000000008572c1 ret_pc: 0000000000655108
[    0.000000] RPC: <of_alias_scan+0x68/0x200>
[    0.000000] l0: 00000000008a4380 l1: fffff8006feae6b5 l2: fffff8006feae140 l3: fffff8006fe98e00
[    0.000000] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000008678d0
[    0.000000] i0: 00000000008c3f24 i1: 0000000000896ca0 i2: 00000000008268c0 i3: 00000000008268b8
[    0.000000] i4: 00000000008038c8 i5: fffff8006feae5c0 i6: 0000000000857381 i7: 00000000008c4314
[    0.000000] I7: <of_pdt_build_devicetree+0x90/0xa0>
[    0.000000] Call Trace:
[    0.000000]  [00000000008c4314] of_pdt_build_devicetree+0x90/0xa0
[    0.000000]  [00000000008b0330] prom_build_devicetree+0x10/0x3c
[    0.000000]  [00000000008b3bb8] paging_init+0xa3c/0xde8
[    0.000000]  [00000000008af978] setup_arch+0x324/0x688
[    0.000000]  [00000000008ae4ec] start_kernel+0x80/0x338
[    0.000000]  [0000000000715b30] tlb_fixup_done+0x88/0x90
[    0.000000]  [0000000000000000]           (null)
[    0.000000] Disabling lock debugging due to kernel taint
[    0.000000] Caller[00000000008c4314]: of_pdt_build_devicetree+0x90/0xa0
[    0.000000] Caller[00000000008b0330]: prom_build_devicetree+0x10/0x3c
[    0.000000] Caller[00000000008b3bb8]: paging_init+0xa3c/0xde8
[    0.000000] Caller[00000000008af978]: setup_arch+0x324/0x688
[    0.000000] Caller[00000000008ae4ec]: start_kernel+0x80/0x338
[    0.000000] Caller[0000000000715b30]: tlb_fixup_done+0x88/0x90
[    0.000000] Caller[0000000000000000]:           (null)
[    0.000000] Instruction DUMP: 96132080  19004040  94132101 <da020000> 9823400a  808b000b  024ffffd  9 

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13  9:50     ` Meelis Roos
@ 2012-02-13  9:51       ` Meelis Roos
  2012-02-13 10:35       ` Meelis Roos
  1 sibling, 0 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-13  9:51 UTC (permalink / raw)
  To: Grant Likely; +Cc: Rob Herring, sparclinux, Linux Kernel list

> Another variation of the crash, without the patch, but backtrace is 
> slightly different (strlen) - maybe fixed by the patch, maybe not.

This variation means it's from a different machine - sorry to be 
confusing.

-- 
Meelis Roos (mroos@ut.ee)      http://www.cs.ut.ee/~mroos/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13  9:50     ` Meelis Roos
  2012-02-13  9:51       ` Meelis Roos
@ 2012-02-13 10:35       ` Meelis Roos
  1 sibling, 0 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-13 10:35 UTC (permalink / raw)
  To: Grant Likely; +Cc: Rob Herring, sparclinux, Linux Kernel list

> Another variation of the crash, without the patch, but backtrace is 
> slightly different (strlen) - maybe fixed by the patch, maybe not.

Tried this machine with the patvch too, same backtrace to strlen. 
prtconf below.

> [   0.000000] Unable to handle kernel NULL pointer dereference
> [    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
> [    0.000000] tsk->{mm,active_mm}->pgd = fffff800604ea3a8
> [    0.000000]               \|/ ____ \|/
> [    0.000000]               "@'/ .. \`@"
> [    0.000000]               /_| \__/ |_\
> [    0.000000]                  \__U_/
> [    0.000000] swapper(0): Oops [#1]
> [    0.000000] TSTATE: 0000004480e01606 TPC: 00000000005be460 TNPC: 00000000005be464 Y: 00000037    Not d
> [    0.000000] TPC: <strlen+0x60/0xd4>
> [    0.000000] g0: 000000000000002f g1: 0000000000000001 g2: 0000000000000000 g3: 000000000073a700
> [    0.000000] g4: 000000000085ea50 g5: 0000000000000000 g6: 0000000000854000 g7: 0030a80000000000
> [    0.000000] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001010101 o3: 0000000080808080
> [    0.000000] o4: 0000000001010000 o5: fffff8006feae140 sp: 00000000008572c1 ret_pc: 0000000000655108
> [    0.000000] RPC: <of_alias_scan+0x68/0x200>
> [    0.000000] l0: 00000000008a4380 l1: fffff8006feae6b5 l2: fffff8006feae140 l3: fffff8006fe98e00
> [    0.000000] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000008678d0
> [    0.000000] i0: 00000000008c3f24 i1: 0000000000896ca0 i2: 00000000008268c0 i3: 00000000008268b8
> [    0.000000] i4: 00000000008038c8 i5: fffff8006feae5c0 i6: 0000000000857381 i7: 00000000008c4314
> [    0.000000] I7: <of_pdt_build_devicetree+0x90/0xa0>
> [    0.000000] Call Trace:
> [    0.000000]  [00000000008c4314] of_pdt_build_devicetree+0x90/0xa0
> [    0.000000]  [00000000008b0330] prom_build_devicetree+0x10/0x3c
> [    0.000000]  [00000000008b3bb8] paging_init+0xa3c/0xde8
> [    0.000000]  [00000000008af978] setup_arch+0x324/0x688
> [    0.000000]  [00000000008ae4ec] start_kernel+0x80/0x338
> [    0.000000]  [0000000000715b30] tlb_fixup_done+0x88/0x90
> [    0.000000]  [0000000000000000]           (null)
> [    0.000000] Disabling lock debugging due to kernel taint
> [    0.000000] Caller[00000000008c4314]: of_pdt_build_devicetree+0x90/0xa0
> [    0.000000] Caller[00000000008b0330]: prom_build_devicetree+0x10/0x3c
> [    0.000000] Caller[00000000008b3bb8]: paging_init+0xa3c/0xde8
> [    0.000000] Caller[00000000008af978]: setup_arch+0x324/0x688
> [    0.000000] Caller[00000000008ae4ec]: start_kernel+0x80/0x338
> [    0.000000] Caller[0000000000715b30]: tlb_fixup_done+0x88/0x90
> [    0.000000] Caller[0000000000000000]:           (null)
> [    0.000000] Instruction DUMP: 96132080  19004040  94132101 <da020000> 9823400a  808b000b  024ffffd  9 

System Configuration:  Sun Microsystems  sun4u
Memory size: 1024 Megabytes
System Peripherals (PROM Nodes):

Node 0xf002a678
    .node:  f002a678
    idprom:  01830003.ba11b371.000003ba.11b37182.00000000.00000000.00000000.00000000
    scsi-initiator-id:  00000007
    reset-reason: 'S-POR'
    breakpoint-trap:  0000007f
    #size-cells:  00000002
    model: 'SUNW,375-3015'
    name: 'SUNW,UltraAX-i2'
    clock-frequency:  05f5e100
    banner-name: 'Sun Fire V100 (UltraSPARC-IIe 500MHz)'
    compatible: 'sun4u'
    device_type: 'upa'
    stick-frequency:  0054c563

    Node 0xf002d908
        .node:  f002d908
        name: 'packages'

        Node 0xf0035e4c
            .node:  f0035e4c
            iso6429-1983-colors:  
            name: 'terminal-emulator'

        Node 0xf0038e7c
            .node:  f0038e7c
            disk-write-fix:  
            name: 'deblocker'

        Node 0xf00395c4
            .node:  f00395c4
            name: 'obp-tftp'

        Node 0xf0044b08
            .node:  f0044b08
            name: 'disk-label'

        Node 0xf0059f74
            .node:  f0059f74
            name: 'SUNW,builtin-drivers'

        Node 0xf0062644
            .node:  f0062644
            source: '/pci@1f,0/isa@7/flashprom@1f,0:'
            name: 'dropins'

        Node 0xf00730e0
            .node:  f00730e0
            name: 'kbd-translator'

    Node 0xf002d978
        .node:  f002d978
        mmu:  fffe7ae0
        memory:  fffe7ce0
        bootargs:  00
        bootpath: '/pci@1f,0/ide@d/disk@2,0:a'
        stdout:  fffbd7b8
        stdin:  fffbda00
        stdout-#lines:  ffffffff
        name: 'chosen'

    Node 0xf002d9e4
        .node:  f002d9e4
        version: 'OBP 4.0.18 2002/05/23 18:22'
        model: 'SUNW,4.0'
        aligned-allocator:  
        relative-addressing:  
        name: 'openprom'

        Node 0xf002da74
            .node:  f002da74
            name: 'client-services'

    Node 0xf002db1c
        .node:  f002db1c
        ras-shutdown-enabled?: 'false'
        shutdown-temp: '75'
        warning-temp: '70'
        env-monitor: 'enabled'
        diag-passes: '1'
        diag-continue?: '0'
        diag-targets: '0'
        diag-verbosity: '0'
        keyboard-click?: 'false'
        keymap:  
        scsi-initiator-id: '7'
        #power-cycles: '100'
        system-board-serial#:  
        system-board-date:  
        ttyb-rts-dtr-off: 'false'
        ttyb-ignore-cd: 'true'
        ttya-rts-dtr-off: 'false'
        ttya-ignore-cd: 'true'
        ttyb-mode: '9600,8,n,1,-'
        ttya-mode: '9600,8,n,1,-'
        pci-probe-list: '7,3,c,5,a,d'
        mfg-mode: 'off'
        diag-level: 'max'
        fcode-debug?: 'false'
        output-device: 'ttya'
        input-device: 'ttya'
        load-base: '16384'
        auto-boot-retry?: 'false'
        boot-command: 'boot'
        auto-boot?: 'true'
        watchdog-reboot?: 'true'
        diag-file:  
        diag-device: 'disk'
        boot-file:  
        boot-device: 'disk net'
        local-mac-address?: 'false'
        net-timeout: '0'
        ansi-terminal?: 'true'
        screen-#columns: '80'
        screen-#rows: '34'
        silent-mode?: 'false'
        use-nvramrc?: 'false'
        nvramrc:  
        security-mode: 'none'
        security-password:  
        security-#badlogins: '0'
        oem-logo:  
        oem-logo?: 'false'
        oem-banner:  
        oem-banner?: 'false'
        hardware-revision:  
        last-hardware-update:  
        diag-switch?: 'true'
        name: 'options'

    Node 0xf002db8c
        .node:  f002db8c
        disk: '/pci@1f,0/ide@d/disk@2,0'
        rtc: '/pci@1f,0/isa@7/rtc@0,70'
        usb: '/pci@1f,0/usb@a'
        flash: '/pci@1f,0/isa@7/flashprom@1f,0'
        lom: '/pci@1f,0/isa@7/SUNW,lomh@0,8010'
        i2c-nvram: '/pci@1f,0/pmu@3/i2c@0,0/i2c-nvram@0,aa'
        net1: '/pci@1f,0/ethernet@5'
        dload1: '/pci@1f,0/ethernet@5:,'
        dload: '/pci@1f,0/ethernet@c:,'
        net0: '/pci@1f,0/ethernet@c'
        net: '/pci@1f,0/ethernet@c'
        cdrom: '/pci@1f,0/ide@d/cdrom@3,0:f'
        disk3: '/pci@1f,0/ide@d/disk@3,0'
        disk2: '/pci@1f,0/ide@d/disk@2,0'
        disk1: '/pci@1f,0/ide@d/disk@1,0'
        disk0: '/pci@1f,0/ide@d/disk@0,0'
        ide: '/pci@1f,0/ide@d'
        floppy: '/pci@1f,0/isa@7/dma/floppy'
        ttyb: '/pci@1f,0/isa@7/serial@0,2e8'
        ttya: '/pci@1f,0/isa@7/serial@0,3f8'
        name: 'aliases'

    Node 0xf0050050
        .node:  f0050050
        reg:  00000000.00000000.00000000.10000000.00000000.20000000.00000000.10000000.00000000.40000000.00000000.10000000.00000000.60000000.00000000.10000000
        available:  00000000.6fec0000.00000000.00006000.00000000.6fe80000.00000000.00030000.00000000.6f000000.00000000.00e00000.00000000.60000000.00000000.0effe000.00000000.40000000.00000000.10000000.00000000.20000000.00000000.10000000.00000000.00000000.00000000.10000000
        name: 'memory'

    Node 0xf0050634
        .node:  f0050634
        translations:  00000000.fffe0000.00000000.00010000.80000000.6fef00b6.00000000.fffdc000.00000000.00004000.80000000.6fee40b6.00000000.fffd4000.00000000.00004000.80000000.6fede0b6.00000000.fffd2000.00000000.00002000.800001fe.0200808e.00000000.fffd0000.00000000.00002000.80000000.6fed60b6.00000000.fffce000.00000000.00002000.800001fe.0200008e.00000000.fffcc000.00000000.00002000.800001fe.0200208e.00000000.fffca000.00000000.00002000.800001fe.0200408e.00000000.fffc8000.00000000.00002000.80000000.6effe0b6.00000000.fffc6000.00000000.00002000.80000000.6fed20b6.00000000.fffc4000.00000000.00002000.80000000.6fedc0b6.00000000.fffc2000.00000000.00002000.800001fe.0200008e.00000000.fffbc000.00000000.00004000.80000000.6fec80b6.00000000.fff82000.00000000.00010000.800001fe.0000008e.00000000.fff7e000.00000000.00004000.80000000.6fed80b6.00000000.f0000000.00000000.00100000.80000000.6ff000b6.00000000.40000000.00000000.04000000.80000000.60000036.00000000.00400000.00000000.01000000.80000000.6000
 0036.00000000.00002000.00000000.003fe000.80000000.00002036
        existing:  00000000.00000000.00000800.00000000.fffff800.00000000.00000800.00000000
        available:  fffff800.00000000.000007fc.00000000.00000001.00000000.000007ff.00000000.00000000.ffff0000.00000000.0000e000.00000000.00000000.00000000.f0000000.00000000.fffc0000.00000000.00002000.00000000.fff92000.00000000.0002a000.00000000.fff00000.00000000.0007e000.00000000.f0f80000.00000000.0e080000.00000000.f0800000.00000000.00700000
        page-size:  00002000
        name: 'virtual-memory'

    Node 0xf0069d48
        .node:  f0069d48
        available:  81000000.00000000.00010230.00000000.00bffdd0.82000000.00000000.00004000.00000000.0003c000.82000000.00000000.000c0000.00000000.00f40000.82000000.00000000.02000000.00000000.5e000000.82000000.00000000.80000000.00000000.40000000.82000000.00000000.e0000000.00000000.10000000
        bus-range:  00000000.00000000
        interrupt-map:  00006800.00000000.00000000.00000001.f0069d48.0000000c.00005000.00000000.00000000.00000001.f0069d48.00000024.00006000.00000000.00000000.00000001.f0069d48.00000006.00002800.00000000.00000000.00000001.f0069d48.0000001c.00003800.00000000.00000000.00000004.f0069d48.0000002b.00003800.00000000.00000000.00000005.f0069d48.00000023.00003800.00000000.00000000.00000001.f0069d48.0000002a.00001800.00000000.00000000.00000001.f0069d48.00000022
        interrupt-map-mask:  00fff800.00000000.00000000.00000007
        #interrupt-cells:  00000001
        virtual-dma:  60000000.20000000
        reg:  000001fe.00000000.00000000.00010000.000001fe.01000000.00000000.00000100
        ranges:  00000000.00000000.00000000.000001fe.01000000.00000000.01000000.01000000.00000000.00000000.000001fe.02000000.00000000.01000000.02000000.00000000.00000000.000001ff.00000000.00000001.00000000.03000000.00000000.00000000.000001ff.00000000.00000001.00000000
        #virtual-dma-size-cells:  00000001
        #virtual-dma-addr-cells:  00000001
        clock-frequency:  03ef1480
        latency-timer:  
        button-interrupt:  
        no-streaming-cache:  
        66mhz-capable:  
        interrupts:  00000030.0000002e.0000002f.00000025
        upa-portid:  0000001f
        bus-parity-generated:  
        compatible: 'pci108e,a001'
        model: 'SUNW,sabre'
        name: 'pci'
        device_type: 'pci'
        #address-cells:  00000003
        #size-cells:  00000002

        Node 0xf0073e2c
            .node:  f0073e2c
            cache-line-size:  00000000
            latency-timer:  00000000
            #size-cells:  00000001
            #address-cells:  00000002
            name: 'isa'
            ranges:  00000000.00000000.81003810.00000000.00000000.00010000.0000001f.00000000.82003814.00000000.f0000000.00080000
            reg:  00003800.00000000.00000000.00000000.00000000.81003810.00000000.00000000.00000000.00010000.82003814.00000000.00000000.00000000.00100000
            devsel-speed:  00000001
            class-code:  00060100
            max-latency:  00000000
            min-grant:  00000000
            subsystem-id:  00001533
            subsystem-vendor-id:  000010b9
            revision-id:  00000000
            device-id:  00001533
            vendor-id:  000010b9

            Node 0xf00749f4
                .node:  f00749f4
                reg:  00000000.00000000.00010000
                interrupts:  00000001
                compatible: 'isadma'
                name: 'dma'

            Node 0xf0074ccc
                .node:  f0074ccc
                address:  fffce070
                reg:  00000000.00000070.00000002
                compatible: 'm5819'
                model: 'm5819'
                name: 'rtc'

                Node 0xf009cac4
                    .node:  f009cac4
                    device_type: 'tod'
                    name: 'todm5819'

            Node 0xf007583c
                .node:  f007583c
                compatible: 'acpi-power'
                button:  
                interrupts:  00000005
                reg:  00000000.00002000.00000008
                name: 'power'

            Node 0xf00759d0
                .node:  f00759d0
                reg:  00000000.00008010.00000002
                interrupts:  00000001
                device_type: 'block'
                name: 'SUNW,lomh'

            Node 0xf0076e0c
                .node:  f0076e0c
                port-a-ignore-cd:  
                nohupcl:  00
                interrupt-priorities:  0000000c.0000000c
                reg:  00000000.000003f8.00000008
                compatible:  73753136.35353000.737500
                device_type: 'serial'
                name: 'serial'
                interrupts:  00000004

            Node 0xf0078af8
                .node:  f0078af8
                port-b-ignore-cd:  
                nohupcl:  00
                interrupt-priorities:  0000000c.0000000c
                reg:  00000000.000002e8.00000008
                compatible:  73753136.35353000.737500
                device_type: 'serial'
                name: 'serial'
                interrupts:  00000004

            Node 0xf007ac10
                .node:  f007ac10
                model: 'SUNW,258-7883'
                version: 'CORE 1.0.18 2002/05/23 18:22'
                name: 'flashprom'
                reg:  0000001f.00000000.00080000

        Node 0xf007b6bc
            .node:  f007b6bc
            name: 'pmu'
            ranges:  00000000.00000000.00001800.00000000.00000000.00000100.00000001.00000000.81001810.00000000.00004000.00000100.00000002.00000000.81001814.00000000.00000000.00000100
            reg:  00001800.00000000.00000000.00000000.00000000.81001810.00000000.00004000.00000000.00000010
            compatible:  70636931.3062392c.37313031.00706369.636c6173.732c3030.30303030.00
            #address-cells:  00000002
            #size-cells:  00000001
            devsel-speed:  00000001
            class-code:  00000000
            max-latency:  00000000
            min-grant:  00000000
            revision-id:  00000000
            device-id:  00007101
            vendor-id:  000010b9

            Node 0xf007be84
                .node:  f007be84
                reg:  00000000.00000000.00000100.00000001.00000000.00000100
                #address-cells:  00000002
                #size-cells:  00000000
                interrupts:  00000001
                compatible: 'i2c-smbus'
                name: 'i2c'

                Node 0xf007d31c
                    .node:  f007d31c
                    compatible: 'i2c-max1617'
                    name: 'temperature'
                    reg:  00000000.00000030

                Node 0xf007d48c
                    .node:  f007d48c
                    compatible: 'i2c-at34c02'
                    name: 'dimm'
                    reg:  00000000.000000a8

                Node 0xf007d544
                    .node:  f007d544
                    compatible: 'i2c-at34c02'
                    name: 'dimm'
                    reg:  00000000.000000aa

                Node 0xf007d5fc
                    .node:  f007d5fc
                    compatible: 'i2c-at34c02'
                    name: 'dimm'
                    reg:  00000000.000000ac

                Node 0xf007d6b4
                    .node:  f007d6b4
                    compatible: 'i2c-at34c02'
                    name: 'dimm'
                    reg:  00000000.000000ae

                Node 0xf007d76c
                    .node:  f007d76c
                    reg:  00000000.000000a0
                    #address-cells:  00000001
                    compatible: 'i2c-at24c64'
                    device_type: 'nvram'
                    name: 'i2c-nvram'

                    Node 0xf007e284
                        .node:  f007e284
                        reg:  00001fd8.00000028
                        device_type: 'idprom'
                        name: 'idprom'

                Node 0xf007e538
                    .node:  f007e538
                    reg:  00000000.000000a2
                    #address-cells:  00000001
                    compatible: 'i2c-at24c64'
                    name: 'motherboard-fru'

            Node 0xf007f0d0
                .node:  f007f0d0
                compatible: 'SUNW,smbus-ppm'
                name: 'ppm'
                register-mask:  00000000.00000001
                reg:  00000000.000000b3.00000001.80000000.000000ba.00000001.00000000.000000bb.00000001

            Node 0xf007f344
                .node:  f007f344
                compatible: 'SUNW,smbus-beep'
                name: 'beep'
                reg:  00000000.000000b2.00000001.00000000.000000d3.00000001.00000002.00000042.00000002.00000002.00000061.00000001

            Node 0xf007f45c
                .node:  f007f45c
                compatible: 'SUNW,smbus-fan-control'
                name: 'fan-control'
                register-mask:  00000000.00000002
                reg:  00000000.000000c8.00000004.80000000.000000ba.00000001

        Node 0xf007f660
            .node:  f007f660
            name: 'lomp'
            reg:  00001800.00000000.00000000.00000000.00000000.81001810.00004000.00000000.00000000.00000010

        Node 0xf007fae8
            .node:  f007fae8
            local-mac-address:  0003ba11.b371
            assigned-addresses:  81006010.00000000.00010000.00000000.00000100.82006014.00000000.00000000.00000000.00002000.82006030.00000000.00040000.00000000.00040000
            version: '1.0'
            compatible:  70636934.3535342c.34333465.00706369.31323868.2c393130.32007063.69313238.322c3931.30320070.6369636c.6173732c.30323030.303000
            device_type: 'network'
            subsystem-id:  0000434e
            subsystem-vendor-id:  00004554
            reg:  00006000.00000000.00000000.00000000.00000000.01006010.00000000.00000000.00000000.00000100.02006014.00000000.00000000.00000000.00000100
            name: 'ethernet'
            devsel-speed:  00000001
            class-code:  00020000
            interrupts:  00000001
            max-latency:  00000028
            min-grant:  00000014
            revision-id:  00000031
            device-id:  00009102
            vendor-id:  00001282

        Node 0xf0089634
            .node:  f0089634
            local-mac-address:  0003ba11.b372
            assigned-addresses:  81002810.00000000.00010100.00000000.00000100.82002814.00000000.00002000.00000000.00002000.82002830.00000000.00080000.00000000.00040000
            version: '1.0'
            compatible:  70636934.3535342c.34333465.00706369.31323868.2c393130.32007063.69313238.322c3931.30320070.6369636c.6173732c.30323030.303000
            device_type: 'network'
            subsystem-id:  0000434e
            subsystem-vendor-id:  00004554
            reg:  00002800.00000000.00000000.00000000.00000000.01002810.00000000.00000000.00000000.00000100.02002814.00000000.00000000.00000000.00000100
            name: 'ethernet'
            devsel-speed:  00000001
            class-code:  00020000
            interrupts:  00000001
            max-latency:  00000028
            min-grant:  00000014
            revision-id:  00000031
            device-id:  00009102
            vendor-id:  00001282

        Node 0xf0093180
            .node:  f0093180
            assigned-addresses:  82005010.00000000.01000000.00000000.01000000
            sunw,find-fcode:  f009838c
            maximum-frame#:  0000ffff
            reg:  00005000.00000000.00000000.00000000.00000000.02005010.00000000.00000000.00000000.01000000
            #size-cells:  00000000
            #address-cells:  00000001
            compatible:  70636931.3062392c.35323337.2e330070.63693130.62392c35.32333700.70636963.6c617373.2c306330.33313000.70636963.6c617373.2c306330.3300
            name: 'usb'
            fast-back-to-back:  
            devsel-speed:  00000001
            class-code:  000c0310
            interrupts:  00000001
            max-latency:  00000050
            min-grant:  00000000
            revision-id:  00000003
            device-id:  00005237
            vendor-id:  000010b9

        Node 0xf0098ff8
            .node:  f0098ff8
            assigned-addresses:  81006810.00000000.00010200.00000000.00000008.81006814.00000000.00010218.00000000.00000008.81006818.00000000.00010210.00000000.00000008.8100681c.00000000.00010208.00000000.00000008.81006820.00000000.00010220.00000000.00000010
            reg:  00006800.00000000.00000000.00000000.00000000.01006810.00000000.00000000.00000000.00000008.01006814.00000000.00000000.00000000.00000004.01006818.00000000.00000000.00000000.00000008.0100681c.00000000.00000000.00000000.00000004.01006820.00000000.00000000.00000000.00000010
            compatible:  70636931.3062392c.35323239.00706369.636c6173.732c3031.30316666.00
            #address-cells:  00000002
            device_type: 'ide'
            name: 'ide'
            fast-back-to-back:  
            devsel-speed:  00000001
            class-code:  000101ff
            interrupts:  00000001
            max-latency:  00000004
            min-grant:  00000002
            revision-id:  000000c3
            device-id:  00005229
            vendor-id:  000010b9

            Node 0xf009b86c
                .node:  f009b86c
                device_type: 'block'
                name: 'disk'
                compatible: 'ide-disk'

            Node 0xf009bf18
                .node:  f009bf18
                device_type: 'block'
                name: 'cdrom'
                compatible: 'ide-cdrom'

    Node 0xf0072d50
        .node:  f0072d50
        manufacturer#:  00000017
        implementation#:  00000013
        mask#:  00000014
        ecache-size:  00040000
        clock-frequency:  1dcd6500
        name: 'SUNW,UltraSPARC-IIe'
        sparc-version:  00000009
        ecache-associativity:  00000001
        ecache-line-size:  00000040
        #dtlb-entries:  00000040
        dcache-associativity:  00000001
        dcache-line-size:  00000020
        dcache-size:  00004000
        #itlb-entries:  00000040
        icache-associativity:  00000002
        icache-line-size:  00000020
        icache-size:  00004000
        upa-portid:  00000000
        reg:  000001c0.00000000.00000000.00000008
        device_type: 'cpu'


-- 
Meelis Roos (mroos@ut.ee)      http://www.cs.ut.ee/~mroos/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13  9:20     ` Meelis Roos
@ 2012-02-13 21:46       ` Grant Likely
  2012-02-14  0:58         ` David Miller
  2012-02-16 19:53         ` Meelis Roos
  0 siblings, 2 replies; 46+ messages in thread
From: Grant Likely @ 2012-02-13 21:46 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Rob Herring, sparclinux, Linux Kernel list

On Mon, Feb 13, 2012 at 11:20:36AM +0200, Meelis Roos wrote:
> > Try the following patch.  I suspect the new of_alias_scan() isn't careful
> > enough about which properties it dereferences:
> > 
> > ---
> > 
> > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > index 133908a..9188caa 100644
> > --- a/drivers/of/base.c
> > +++ b/drivers/of/base.c
> > @@ -1174,6 +1174,10 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> >  		    !strcmp(pp->name, "linux,phandle"))
> >  			continue;
> >  
> > +		/* Check for null value or non-strings (no null termination) */
> > +		if (!pp->value || strnlen(pp->value, pp->length) == pp->length)
> > +			continue;
> > +
> >  		np = of_find_node_by_path(pp->value);
> >  		if (!np)
> >  			continue;
> > 
> 
> Yes, it probably gets past this problem but oopses in a different place:
> 
> [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.2.30 2002/10/25 14:03'
> [    0.000000] PROMLIB: Root node compatible: 
> [    0.000000] Initializing cgroup subsys cpu
> [    0.000000] Linux version 3.3.0-rc3-00188-g3ec1e88-dirty (mroos@korvits) (gcc version 4.6.2 (Debian 42
> [    0.000000] debug: ignoring loglevel setting.
> [    0.000000] bootconsole [earlyprom0] enabled
> [    0.000000] ARCH: SUN4U
> [    0.000000] Ethernet address: 08:00:20:b6:ee:e2
> [    0.000000] Kernel: Using 4 locked TLB entries for main kernel image.
> [    0.000000] Remapping the kernel... done.
> [    0.000000] Unable to handle kernel NULL pointer dereference
> [    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
> [    0.000000] tsk->{mm,active_mm}->pgd = fffff800008c77d0
> [    0.000000]               \|/ ____ \|/
> [    0.000000]               "@'/ .. \`@"
> [    0.000000]               /_| \__/ |_\
> [    0.000000]                  \__U_/
> [    0.000000] swapper(0): Oops [#1]
> [    0.000000] TSTATE: 0000000080e01606 TPC: 0000000000645810 TNPC: 0000000000645814 Y: 00000037    Not d
> [    0.000000] TPC: <of_find_node_by_phandle+0x30/0x60>

Ugh; that looks bad.  If it failed there, then the global device node list
is corrupted.  I hate to ask you this, but would you be able to git bisect to
narrow down the commit that causes the problem?

g.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13 21:46       ` Grant Likely
@ 2012-02-14  0:58         ` David Miller
  2012-02-14  2:30           ` Grant Likely
  2012-02-14  5:54           ` mroos
  2012-02-16 19:53         ` Meelis Roos
  1 sibling, 2 replies; 46+ messages in thread
From: David Miller @ 2012-02-14  0:58 UTC (permalink / raw)
  To: grant.likely; +Cc: mroos, rob.herring, sparclinux, linux-kernel

From: Grant Likely <grant.likely@secretlab.ca>
Date: Mon, 13 Feb 2012 14:46:23 -0700

> Ugh; that looks bad.  If it failed there, then the global device node list
> is corrupted.  I hate to ask you this, but would you be able to git bisect to
> narrow down the commit that causes the problem?

Wild guess on all of these bugs, bad OF node reference counting and a
OF node is free'd up prematurely.

If you look at the sparc code that has been subsumed into the generic
drivers/of/ stuff over the past few years, you'll see that we never
consistently did any of the reference counting bits on the sparc side.

I never did it, because I don't anticipate ever having hot-plug
support for OF nodes.

Anyways, if you now start to mix the drivers/of/ stuff which
religiously does the reference counting with of_node_{get,put}()
with the remaining scraps of sparc code that doesn't... it might
not be pretty.

In the crash dump after your test patch, we are in
of_find_node_by_phandle() with a 'np' pointer in the allnodes list
equal to 0x50.

The signature in the original crash dump is identical, except
that time we were in of_find_node_by_path(), but again the 'np'
pointer was 0x50.

Something else that might be suspicious were the memblock changes
that happened this release cycle, so I wouldn't be surprised if
a bisect turned up something in there.

FWIW I've been running current kernels on my niagara boxes without
incident for several weeks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-14  0:58         ` David Miller
@ 2012-02-14  2:30           ` Grant Likely
  2012-02-14  2:41             ` Grant Likely
  2012-02-16 21:08             ` mroos
  2012-02-14  5:54           ` mroos
  1 sibling, 2 replies; 46+ messages in thread
From: Grant Likely @ 2012-02-14  2:30 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, rob.herring, sparclinux, linux-kernel

On Mon, Feb 13, 2012 at 5:58 PM, David Miller <davem@davemloft.net> wrote:
> From: Grant Likely <grant.likely@secretlab.ca>
> Date: Mon, 13 Feb 2012 14:46:23 -0700
>
>> Ugh; that looks bad.  If it failed there, then the global device node list
>> is corrupted.  I hate to ask you this, but would you be able to git bisect to
>> narrow down the commit that causes the problem?
>
> Wild guess on all of these bugs, bad OF node reference counting and a
> OF node is free'd up prematurely.
>
> If you look at the sparc code that has been subsumed into the generic
> drivers/of/ stuff over the past few years, you'll see that we never
> consistently did any of the reference counting bits on the sparc side.

Hmmm.... The of_node_put() code path shouldn't exist on sparc.  You'll
see that it is #ifdef'd out in include/linux/of.h.  Plus, only
'OF_DETACHED' nodes are allowed to be released, an there are only 3
code paths (all calling of_detach_node()) specific to powerpc that can
detach a node.

> I never did it, because I don't anticipate ever having hot-plug
> support for OF nodes.
>
> Anyways, if you now start to mix the drivers/of/ stuff which
> religiously does the reference counting with of_node_{get,put}()
> with the remaining scraps of sparc code that doesn't... it might
> not be pretty.
>
> In the crash dump after your test patch, we are in
> of_find_node_by_phandle() with a 'np' pointer in the allnodes list
> equal to 0x50.

Definitely not right!  It would be interesting to add a printk() to
of_find_node_by_phandle() or of_find_node_by_path() to blast out the
node names as it traverses the tree.  That could help track down
corruption.

>
> The signature in the original crash dump is identical, except
> that time we were in of_find_node_by_path(), but again the 'np'
> pointer was 0x50.
>
> Something else that might be suspicious were the memblock changes
> that happened this release cycle, so I wouldn't be surprised if
> a bisect turned up something in there.
>
> FWIW I've been running current kernels on my niagara boxes without
> incident for several weeks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-14  2:30           ` Grant Likely
@ 2012-02-14  2:41             ` Grant Likely
  2012-02-16 21:08             ` mroos
  1 sibling, 0 replies; 46+ messages in thread
From: Grant Likely @ 2012-02-14  2:41 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, rob.herring, sparclinux, linux-kernel

On Mon, Feb 13, 2012 at 7:30 PM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Mon, Feb 13, 2012 at 5:58 PM, David Miller <davem@davemloft.net> wrote:
>> From: Grant Likely <grant.likely@secretlab.ca>
>> Date: Mon, 13 Feb 2012 14:46:23 -0700
>>
>>> Ugh; that looks bad.  If it failed there, then the global device node list
>>> is corrupted.  I hate to ask you this, but would you be able to git bisect to
>>> narrow down the commit that causes the problem?
>>
>> Wild guess on all of these bugs, bad OF node reference counting and a
>> OF node is free'd up prematurely.
>>
>> If you look at the sparc code that has been subsumed into the generic
>> drivers/of/ stuff over the past few years, you'll see that we never
>> consistently did any of the reference counting bits on the sparc side.
>
> Hmmm.... The of_node_put() code path shouldn't exist on sparc.  You'll
> see that it is #ifdef'd out in include/linux/of.h.  Plus, only
> 'OF_DETACHED' nodes are allowed to be released, an there are only 3
> code paths (all calling of_detach_node()) specific to powerpc that can
> detach a node.

In fact, I should disable those paths always when CONFIG_OF_DYNAMIC is
disabled.  I'll look into doing so for v3.4.

g.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-14  0:58         ` David Miller
  2012-02-14  2:30           ` Grant Likely
@ 2012-02-14  5:54           ` mroos
  1 sibling, 0 replies; 46+ messages in thread
From: mroos @ 2012-02-14  5:54 UTC (permalink / raw)
  To: David Miller; +Cc: grant.likely, rob.herring, sparclinux, Linux Kernel list

> FWIW I've been running current kernels on my niagara boxes without
> incident for several weeks.

It runs for me on Ultra 1, Ultra 5 IDE, Ultra 10 SCSI and Blade 100. 
Fails on E3500, V100 and Netra X1 so it's probably dependent on 
something in the device tree.

I will try bisecting and the suggested printk's but it takes time since 
I will be away from computers most of today.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-13 21:46       ` Grant Likely
  2012-02-14  0:58         ` David Miller
@ 2012-02-16 19:53         ` Meelis Roos
  2012-02-16 21:23           ` Sam Ravnborg
  2012-02-20  9:11           ` Meelis Roos
  1 sibling, 2 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-16 19:53 UTC (permalink / raw)
  To: Grant Likely; +Cc: Rob Herring, sparclinux, Linux Kernel list

> Ugh; that looks bad.  If it failed there, then the global device node list
> is corrupted.  I hate to ask you this, but would you be able to git bisect to
> narrow down the commit that causes the problem?

Finished bisecting on E2500 (the original machine where I found the 
problem). Bisecting leads to
[0ee332c1451869963626bf9cac88f165a90990e1] memblock: Kill early_node_map[]
So yes, it looks like memblock.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-14  2:30           ` Grant Likely
  2012-02-14  2:41             ` Grant Likely
@ 2012-02-16 21:08             ` mroos
  1 sibling, 0 replies; 46+ messages in thread
From: mroos @ 2012-02-16 21:08 UTC (permalink / raw)
  To: Grant Likely; +Cc: David Miller, rob.herring, sparclinux, linux-kernel

> Definitely not right!  It would be interesting to add a printk() to
> of_find_node_by_phandle() or of_find_node_by_path() to blast out the
> node names as it traverses the tree.  That could help track down
> corruption.

[    0.000000] of_find_node_by_path: /chosen
[    0.000000] of_find_node_by_path: /aliases                      ¥_6䥷~ê7\eý+õï*¢ꢏñ?¿sM       ý{
aliases000000] ò7find_node_by_path: ðÑÔ_Bÿ
[    0.000000] Unable to handle kernel NULL pointer dereference

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-16 19:53         ` Meelis Roos
@ 2012-02-16 21:23           ` Sam Ravnborg
  2012-02-20  9:11           ` Meelis Roos
  1 sibling, 0 replies; 46+ messages in thread
From: Sam Ravnborg @ 2012-02-16 21:23 UTC (permalink / raw)
  To: Meelis Roos, Tejun Heo
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list

On Thu, Feb 16, 2012 at 09:53:14PM +0200, Meelis Roos wrote:
> > Ugh; that looks bad.  If it failed there, then the global device node list
> > is corrupted.  I hate to ask you this, but would you be able to git bisect to
> > narrow down the commit that causes the problem?
> 
> Finished bisecting on E2500 (the original machine where I found the 
> problem). Bisecting leads to
> [0ee332c1451869963626bf9cac88f165a90990e1] memblock: Kill early_node_map[]
> So yes, it looks like memblock.

Added Tejun.

	Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-16 19:53         ` Meelis Roos
  2012-02-16 21:23           ` Sam Ravnborg
@ 2012-02-20  9:11           ` Meelis Roos
  2012-02-20 17:06             ` Tejun Heo
  1 sibling, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-20  9:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list

> So yes, it looks like memblock.

Finished bisecting on the other machine too (Sun Fire V100 where strlen 
crashes):

7bd0b0f0da3b1ec11cbcc798eb0ef747a1184077 is the first bad commit
commit 7bd0b0f0da3b1ec11cbcc798eb0ef747a1184077
Author: Tejun Heo <tj@kernel.org>
Date:   Thu Dec 8 10:22:09 2011 -0800

    memblock: Reimplement memblock allocation using reverse free area iterator
    
    Now that all early memory information is in memblock when enabled, we
    can implement reverse free area iterator and use it to implement NUMA
    aware allocator which is then wrapped for simpler variants instead of
    the confusing and inefficient mending of information in separate NUMA
    aware allocator.
    
    Implement for_each_free_mem_range_reverse(), use it to reimplement
    memblock_find_in_range_node() which in turn is used by all allocators.
    
    The visible allocator interface is inconsistent and can probably use
    some cleanup too.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Yinghai Lu <yinghai@kernel.org>

:040000 040000 f74f55a80162a0a1a45c135ca62a51b9af824d53 a2dc2bccf4a30ee516709d0fdcb33faae11059ff M      include
:040000 040000 e4c4292fe66c4d8d6aa89710ce9f538fbf550ae8 5677586fad018ae9978d53084ba5d617fe231a3d M      mm

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-20  9:11           ` Meelis Roos
@ 2012-02-20 17:06             ` Tejun Heo
  2012-02-20 20:04               ` Meelis Roos
  2012-02-20 22:32               ` Meelis Roos
  0 siblings, 2 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-20 17:06 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam

Hello, Meelis, Sam.

Sorry about the delay.  I've been pretty swamped lately.

On Mon, Feb 20, 2012 at 11:11:05AM +0200, Meelis Roos wrote:
> Finished bisecting on the other machine too (Sun Fire V100 where strlen 
> crashes):
> 
> 7bd0b0f0da3b1ec11cbcc798eb0ef747a1184077 is the first bad commit
> commit 7bd0b0f0da3b1ec11cbcc798eb0ef747a1184077
> Author: Tejun Heo <tj@kernel.org>
> Date:   Thu Dec 8 10:22:09 2011 -0800
> 
>     memblock: Reimplement memblock allocation using reverse free area iterator
>     
>     Now that all early memory information is in memblock when enabled, we
>     can implement reverse free area iterator and use it to implement NUMA
>     aware allocator which is then wrapped for simpler variants instead of
>     the confusing and inefficient mending of information in separate NUMA
>     aware allocator.
>     
>     Implement for_each_free_mem_range_reverse(), use it to reimplement
>     memblock_find_in_range_node() which in turn is used by all allocators.
>     
>     The visible allocator interface is inconsistent and can probably use
>     some cleanup too.
>     
>     Signed-off-by: Tejun Heo <tj@kernel.org>
>     Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>     Cc: Yinghai Lu <yinghai@kernel.org>

Hmmm.... So, different bisection results from two machines?  That's a
bit weird.  I *think* this bisection result makes more sense.  Can you
please verify the bisection result on e2500 once more?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-20 17:06             ` Tejun Heo
@ 2012-02-20 20:04               ` Meelis Roos
  2012-02-20 21:01                 ` Tejun Heo
  2012-02-20 22:32               ` Meelis Roos
  1 sibling, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-20 20:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam

> Hmmm.... So, different bisection results from two machines?  That's a
> bit weird.  I *think* this bisection result makes more sense.  Can you
> please verify the bisection result on e2500 once more?

Will do.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-20 20:04               ` Meelis Roos
@ 2012-02-20 21:01                 ` Tejun Heo
  0 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-20 21:01 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam

Hello,

On Mon, Feb 20, 2012 at 10:04:10PM +0200, Meelis Roos wrote:
> > Hmmm.... So, different bisection results from two machines?  That's a
> > bit weird.  I *think* this bisection result makes more sense.  Can you
> > please verify the bisection result on e2500 once more?
> 
> Will do.

Thanks a lot.  I'm *suspecting* that somehow memory used to back the
device tree is not fully reserved and the change in allocation logic
is giving out it as part of allocation.  I'll look through the change
more and see if I can spot a bug in the new code but I guess we'll
probably have to print out some pointer values to find out the
offending address.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-20 17:06             ` Tejun Heo
  2012-02-20 20:04               ` Meelis Roos
@ 2012-02-20 22:32               ` Meelis Roos
  2012-02-21  1:05                 ` Tejun Heo
  1 sibling, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-20 22:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam

> On Mon, Feb 20, 2012 at 11:11:05AM +0200, Meelis Roos wrote:
> > Finished bisecting on the other machine too (Sun Fire V100 where strlen 
> > crashes):
> > 
> > 7bd0b0f0da3b1ec11cbcc798eb0ef747a1184077 is the first bad commit
> > commit 7bd0b0f0da3b1ec11cbcc798eb0ef747a1184077
> > Author: Tejun Heo <tj@kernel.org>
> > Date:   Thu Dec 8 10:22:09 2011 -0800
> > 
> >     memblock: Reimplement memblock allocation using reverse free area iterator
> >     
> >     Now that all early memory information is in memblock when enabled, we
> >     can implement reverse free area iterator and use it to implement NUMA
> >     aware allocator which is then wrapped for simpler variants instead of
> >     the confusing and inefficient mending of information in separate NUMA
> >     aware allocator.
> >     
> >     Implement for_each_free_mem_range_reverse(), use it to reimplement
> >     memblock_find_in_range_node() which in turn is used by all allocators.
> >     
> >     The visible allocator interface is inconsistent and can probably use
> >     some cleanup too.
> >     
> >     Signed-off-by: Tejun Heo <tj@kernel.org>
> >     Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >     Cc: Yinghai Lu <yinghai@kernel.org>
> 
> Hmmm.... So, different bisection results from two machines?  That's a
> bit weird.  I *think* this bisection result makes more sense.  Can you
> please verify the bisection result on e2500 once more?

You were right. The first machine now bisects down to the same commit - 
I was confused by "0 revisions to test" and did not run the last step 
whe first bisecting.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-20 22:32               ` Meelis Roos
@ 2012-02-21  1:05                 ` Tejun Heo
  2012-02-22  0:36                   ` Meelis Roos
  2012-02-22 17:03                   ` Sam Ravnborg
  0 siblings, 2 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-21  1:05 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

Hello,

Meelis, can you please apply the following patch before & after the
offending commit, boot with "memblock=debug" added as kernel param and
post the boot log?  The patch will generate some offset warnings after
the commit but should work fine.

Sam, David, as I'm not familiar with the code base, is it possible to
tell which address is corrupted (zeroed, it seems)?  ie. can we add
"if (XXX == NULL) printk("%p is corrputed\n"...);" somewhere?

Thanks.

diff --git a/mm/memblock.c b/mm/memblock.c
index 1adbef0..dccfced 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -179,9 +179,15 @@ int __init_memblock memblock_reserve_reserved_regions(void)
 
 static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
 {
-	type->total_size -= type->regions[r].size;
-	memmove(&type->regions[r], &type->regions[r + 1],
-		(type->cnt - (r + 1)) * sizeof(type->regions[r]));
+	struct memblock_region *rgn = &type->regions[r];
+
+	memblock_dbg("     memblock %s: rm  [%#016llx-%#016llx] node %d\n",
+		     memblock_type_name(type),
+		     (unsigned long long)rgn->base,
+		     (unsigned long long)rgn->base + rgn->size, rgn->nid);
+
+	type->total_size -= rgn->size;
+	memmove(rgn, rgn + 1, (type->cnt - (r + 1)) * sizeof(*rgn));
 	type->cnt--;
 
 	/* Special case for empty arrays */
@@ -317,6 +323,9 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
 	memblock_set_region_node(rgn, nid);
 	type->cnt++;
 	type->total_size += size;
+	memblock_dbg("   memblock %s: add [%#016llx-%016llx] node %d @%d\n",
+		     memblock_type_name(type), (unsigned long long)base,
+		     (unsigned long long)base + size, nid, idx);
 }
 
 /**
@@ -342,6 +351,10 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 	phys_addr_t end = base + memblock_cap_size(base, &size);
 	int i, nr_new;
 
+	memblock_dbg("   memblock %s: ADD [%#016llx-%#016llx] node %d\n",
+		     memblock_type_name(type), (unsigned long long)base,
+		     (unsigned long long)base + size, nid);
+
 	/* special case for empty array */
 	if (type->regions[0].size == 0) {
 		WARN_ON(type->cnt != 1 || type->total_size);
@@ -349,6 +362,8 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
 		type->regions[0].size = size;
 		memblock_set_region_node(&type->regions[0], nid);
 		type->total_size = size;
+		memblock_dbg("     memblock %s: add first entry\n",
+			     memblock_type_name(type));
 		return 0;
 	}
 repeat:
@@ -494,6 +509,10 @@ static int __init_memblock __memblock_remove(struct memblock_type *type,
 	int start_rgn, end_rgn;
 	int i, ret;
 
+	memblock_dbg("     memblock %s: RM  [%#016llx-%016llx]\n",
+		     memblock_type_name(type), (unsigned long long)base,
+		     (unsigned long long)base + size);
+
 	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
 	if (ret)
 		return ret;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-21  1:05                 ` Tejun Heo
@ 2012-02-22  0:36                   ` Meelis Roos
  2012-02-22 17:48                     ` Tejun Heo
  2012-02-22 18:22                     ` Richard Mortimer
  2012-02-22 17:03                   ` Sam Ravnborg
  1 sibling, 2 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-22  0:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

[-- Attachment #1: Type: TEXT/PLAIN, Size: 645 bytes --]

> Meelis, can you please apply the following patch before & after the
> offending commit, boot with "memblock=debug" added as kernel param and
> post the boot log?  The patch will generate some offset warnings after
> the commit but should work fine.

Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached)
After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached)

In addition, a third type of sparc machines breaks in a third way - V210 
and V240 just hang after telling

console [tty0] enabled, bootconsole disabled

and before calibrating the delay loop. Bisect has led to the same commit.

-- 
Meelis Roos (mroos@linux.ee)

[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 49939 bytes --]

[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 39513 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-21  1:05                 ` Tejun Heo
  2012-02-22  0:36                   ` Meelis Roos
@ 2012-02-22 17:03                   ` Sam Ravnborg
  2012-02-22 17:12                     ` Meelis Roos
  1 sibling, 1 reply; 46+ messages in thread
From: Sam Ravnborg @ 2012-02-22 17:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Meelis Roos, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, David S. Miller

On Mon, Feb 20, 2012 at 05:05:37PM -0800, Tejun Heo wrote:
> Hello,
> 
> Meelis, can you please apply the following patch before & after the
> offending commit, boot with "memblock=debug" added as kernel param and
> post the boot log?  The patch will generate some offset warnings after
> the commit but should work fine.
> 
> Sam, David, as I'm not familiar with the code base, is it possible to
> tell which address is corrupted (zeroed, it seems)?  ie. can we add
> "if (XXX == NULL) printk("%p is corrputed\n"...);" somewhere?

No idea - sorry. I spend most of the time with sparc32 - which I
do not even feel familiar with yet :-(

One thing I noticed while working with memblock for sparc32 (*) is that allocations
are done top-down. So we may end up allocatng memory with a considerably higher
address than we are used to.
This is obviously just a wild guess...

Meelis - do the affected boxes have any special memory configurations?
Could you try to boot with a sensible mem=xxx value to see if limiting the memory
helps.

(*) I have re-done the original patch-set and I have a quite good feeling about it.
HIGHMEM support is outstanding - I got a bit confused when I looked at x86.

But my ss5 crashes the first time I try to use the allocated memory - 
so I assume I have some silly issue somewhere. Nothing points at memblock
in this case.

	Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 17:03                   ` Sam Ravnborg
@ 2012-02-22 17:12                     ` Meelis Roos
  2012-02-22 17:21                       ` Sam Ravnborg
  0 siblings, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-22 17:12 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Tejun Heo, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, David S. Miller

> Meelis - do the affected boxes have any special memory configurations?

Nothin special to me. E3500 has 2G, V100 has 1G, V210 and V240 have 2G 
and 1.5G.

> Could you try to boot with a sensible mem=xxx value to see if limiting the memory
> helps.

Like mem=256M? Will try.

-- 
Meelis Roos (mroos@ut.ee)      http://www.cs.ut.ee/~mroos/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 17:12                     ` Meelis Roos
@ 2012-02-22 17:21                       ` Sam Ravnborg
  2012-02-22 17:41                         ` Meelis Roos
  0 siblings, 1 reply; 46+ messages in thread
From: Sam Ravnborg @ 2012-02-22 17:21 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Tejun Heo, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, David S. Miller

On Wed, Feb 22, 2012 at 07:12:06PM +0200, Meelis Roos wrote:
> > Meelis - do the affected boxes have any special memory configurations?
> 
> Nothin special to me. E3500 has 2G, V100 has 1G, V210 and V240 have 2G 
> and 1.5G.
> 
> > Could you try to boot with a sensible mem=xxx value to see if limiting the memory
> > helps.
> 
> Like mem=256M? Will try.
Think just a little more - I do not think this will help.
I confused myself with some of the sparc32 issues I have hit.

I have looked a little at the log files you included.
The only thing that looked different was that the faulty version
had a number after "@" which is higher than 1 - where the OK always have 1.

This is "idx" in memblock_insert_region() - but I did not look closer.

	Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 17:21                       ` Sam Ravnborg
@ 2012-02-22 17:41                         ` Meelis Roos
  0 siblings, 0 replies; 46+ messages in thread
From: Meelis Roos @ 2012-02-22 17:41 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Tejun Heo, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, David S. Miller

> > > Could you try to boot with a sensible mem=xxx value to see if limiting the memory
> > > helps.
> > 
> > Like mem=256M? Will try.
> Think just a little more - I do not think this will help.

Tried it on the 2G V210. It changes the picture. With 2G RAM, it 
just hangs.

With mem=256M it produces a crash in strlen and of_alias_scan like in 
V100 with 1G.

mem=512M results in the same strlen error.

mem=1G results in a stranger error:

[    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x90 bytes below 0x0.
[    0.000000] 
[    0.000000] Call Trace:
[    0.000000]  [00000000007a6a28] memblock_alloc_base+0x28/0x38
[    0.000000]  [000000000079ca50] prom_early_alloc+0xc/0x60
[    0.000000]  [00000000007ae090] of_pdt_create_node.part.0+0x4/0xe0
[    0.000000]  [00000000007ae250] of_pdt_build_devicetree+0x30/0xa0
[    0.000000]  [000000000079c4a8] prom_build_devicetree+0x18/0x38
[    0.000000]  [00000000007a03c0] paging_init+0x59c/0x6bc
[    0.000000]  [000000000079be50] setup_arch+0xf8/0x108
[    0.000000]  [000000000079a4e8] start_kernel+0x78/0x30c
[    0.000000]  [00000000006a3e80] tlb_fixup_done+0x98/0xa0
[    0.000000]  [0000000000000000]           (null)

The working machines have 512M RAM, 834M RAM and 2G RAM so it's not just 
the amount of RAM.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22  0:36                   ` Meelis Roos
@ 2012-02-22 17:48                     ` Tejun Heo
  2012-02-22 18:25                       ` Meelis Roos
  2012-02-22 20:44                       ` David Miller
  2012-02-22 18:22                     ` Richard Mortimer
  1 sibling, 2 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-22 17:48 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

On Wed, Feb 22, 2012 at 02:36:13AM +0200, Meelis Roos wrote:
> > Meelis, can you please apply the following patch before & after the
> > offending commit, boot with "memblock=debug" added as kernel param and
> > post the boot log?  The patch will generate some offset warnings after
> > the commit but should work fine.
> 
> Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached)
> After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached)

Can you please try the following patch?  If it still fails to boot,
please attach the failing log.  Thank you.

diff --git a/mm/memblock.c b/mm/memblock.c
index 77b5f22..99f2855 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,9 +99,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
 
-	/* align @size to avoid excessive fragmentation on reserved array */
-	size = round_up(size, align);
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -731,6 +728,9 @@ static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
 {
 	phys_addr_t found;
 
+	/* align @size to avoid excessive fragmentation on reserved array */
+	size = round_up(size, align);
+
 	found = memblock_find_in_range_node(0, max_addr, size, align, nid);
 	if (found && !memblock_reserve(found, size))
 		return found;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22  0:36                   ` Meelis Roos
  2012-02-22 17:48                     ` Tejun Heo
@ 2012-02-22 18:22                     ` Richard Mortimer
  2012-02-22 20:26                       ` David Miller
  1 sibling, 1 reply; 46+ messages in thread
From: Richard Mortimer @ 2012-02-22 18:22 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Tejun Heo, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, sam, David S. Miller

On 22/02/2012 00:36, Meelis Roos wrote:
>> Meelis, can you please apply the following patch before&  after the
>> offending commit, boot with "memblock=debug" added as kernel param and
>> post the boot log?  The patch will generate some offset warnings after
>> the commit but should work fine.
>
> Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached)
> After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached)
>
Its a long time since I regularly had to worry about SPARC boxes (not) 
booting so may be the difference between virtual & physical addresses 
but I notice that some of the addresses in the register dump have 
non-zero values in the upper 32 bits but the memblock values have zero 
in the upper half.


memblock reserved: ADD [0x0000007fcc0a40-0x0000007fcc0a4e] node 1
memblock reserved: add [0x0000007fcc0a40-000000007fcc0a4e] node 1 @767

But a similar address in the registers has fffff800 in there.

o4: fffff8007fcc0a4d

I know that there are a number of explanations why things would be 
different (32 bit acesses etc) but it could explain things plus we would 
be talking 64 bit addresses in the kernel.

Just a thought.

Richard


> In addition, a third type of sparc machines breaks in a third way - V210
> and V240 just hang after telling
>
> console [tty0] enabled, bootconsole disabled
>
> and before calibrating the delay loop. Bisect has led to the same commit.
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 17:48                     ` Tejun Heo
@ 2012-02-22 18:25                       ` Meelis Roos
  2012-02-23 18:55                         ` Tejun Heo
  2012-02-22 20:44                       ` David Miller
  1 sibling, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-22 18:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

> Can you please try the following patch?  If it still fails to boot,
> please attach the failing log.  Thank you.

It works on E3500! Will try other machines tomorrow.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 18:22                     ` Richard Mortimer
@ 2012-02-22 20:26                       ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2012-02-22 20:26 UTC (permalink / raw)
  To: richm; +Cc: mroos, tj, grant.likely, rob.herring, sparclinux, linux-kernel, sam

From: Richard Mortimer <richm@oldelvet.org.uk>
Date: Wed, 22 Feb 2012 18:22:36 +0000

> memblock reserved: ADD [0x0000007fcc0a40-0x0000007fcc0a4e] node 1
> memblock reserved: add [0x0000007fcc0a40-000000007fcc0a4e] node 1 @767

These are physical addresses.

> But a similar address in the registers has fffff800 in there.
> 
> o4: fffff8007fcc0a4d

All of physical memory is mapped linearly starting at 0xfffff80000000000
and this is such a virtual address.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 17:48                     ` Tejun Heo
  2012-02-22 18:25                       ` Meelis Roos
@ 2012-02-22 20:44                       ` David Miller
  2012-02-22 21:00                         ` Tejun Heo
  1 sibling, 1 reply; 46+ messages in thread
From: David Miller @ 2012-02-22 20:44 UTC (permalink / raw)
  To: tj; +Cc: mroos, grant.likely, rob.herring, sparclinux, linux-kernel, sam

From: Tejun Heo <tj@kernel.org>
Date: Wed, 22 Feb 2012 09:48:25 -0800

> On Wed, Feb 22, 2012 at 02:36:13AM +0200, Meelis Roos wrote:
>> > Meelis, can you please apply the following patch before & after the
>> > offending commit, boot with "memblock=debug" added as kernel param and
>> > post the boot log?  The patch will generate some offset warnings after
>> > the commit but should work fine.
>> 
>> Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached)
>> After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached)
> 
> Can you please try the following patch?  If it still fails to boot,
> please attach the failing log.  Thank you.

Interesting, but two things strike me.

First, this seems like it would only cause problems if the caller
specified a too small size parameter, and then wrote past the 'size'
bytes of the buffer.  And if so, this means we have an improperly
sized allocation somewhere, probably in the OF tree fetching code.

For example, maybe we mis-calculate the size of an OF device node
property before we fetch it from the firmware, therefore allocate
too small a buffer, and the property fetch operation splats all
over the end of the buffer.  Another possibility is that the
property length reported by the firmware is wrong and too small.

BTW, this kind of bug would be easy to catch, simply put a magic
number signature into all unallocated memblock memory then at
allocation time check that signature.  If we signal an error when we
don't see the proper signature and turn on the OF tree building
logging, we can see exactly which operation writes past the end of a
buffer.

Second, you'd need similar handling in other call chains such as
memblock_double_array()'s invocation of memblock_find_in_range().
It seems a bad idea to hide how size is modified, so probably it's
best to pass the address of the size parameter and modify the
caller's value in that way so that the size used in the reserve
matches up.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 20:44                       ` David Miller
@ 2012-02-22 21:00                         ` Tejun Heo
  0 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-22 21:00 UTC (permalink / raw)
  To: David Miller
  Cc: mroos, grant.likely, rob.herring, sparclinux, linux-kernel, sam

Hello, David.

On Wed, Feb 22, 2012 at 03:44:17PM -0500, David Miller wrote:
> > Can you please try the following patch?  If it still fails to boot,
> > please attach the failing log.  Thank you.
> 
> Interesting, but two things strike me.
> 
> First, this seems like it would only cause problems if the caller
> specified a too small size parameter, and then wrote past the 'size'
> bytes of the buffer.  And if so, this means we have an improperly
> sized allocation somewhere, probably in the OF tree fetching code.

There's another, less likely, possibility.  It made the allocation
table much larger and the lowest address used ended up lower.
0x0000007fc8fa40 vs 0x0000007fc94000.  Not too much of difference and
just allocating some more memory should rule out or confirm it.

> For example, maybe we mis-calculate the size of an OF device node
> property before we fetch it from the firmware, therefore allocate
> too small a buffer, and the property fetch operation splats all
> over the end of the buffer.  Another possibility is that the
> property length reported by the firmware is wrong and too small.
> 
> BTW, this kind of bug would be easy to catch, simply put a magic
> number signature into all unallocated memblock memory then at
> allocation time check that signature.  If we signal an error when we
> don't see the proper signature and turn on the OF tree building
> logging, we can see exactly which operation writes past the end of a
> buffer.

Yeah, redzonning can definitely help but I'm not sure whether we want
to go full on allocation debugging and all for early allocator.  The
thing doesn't even support freeing.

> Second, you'd need similar handling in other call chains such as
> memblock_double_array()'s invocation of memblock_find_in_range().
> It seems a bad idea to hide how size is modified, so probably it's
> best to pass the address of the size parameter and modify the
> caller's value in that way so that the size used in the reserve
> matches up.

I suspect the size modification was added later to avoid expanding
allocation table early during boot and we can do that only for
memblock_alloc*() calls as they don't have matching free interface.
If we modify explicit reservations, we have to propagate the modified
size to each user and so on.  Given that the allocation table is
discarded after boot completion and there aren't too many explicit
reservations, I don't think we need to expand size aligning to all
find_in_range users.  I guess it all depends on how complete allocator
we want for early boot.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-22 18:25                       ` Meelis Roos
@ 2012-02-23 18:55                         ` Tejun Heo
  2012-02-23 23:31                           ` David Miller
  2012-02-24  9:20                           ` Meelis Roos
  0 siblings, 2 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-23 18:55 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

Hello,

On Wed, Feb 22, 2012 at 08:25:32PM +0200, Meelis Roos wrote:
> > Can you please try the following patch?  If it still fails to boot,
> > please attach the failing log.  Thank you.
> 
> It works on E3500! Will try other machines tomorrow.

Once confirmed, I'll push the patch through tip.  It just hides the
underlying problem but we should be in no worse shape than before,
it's two line change so reproduing the problem again for proper
diagnosing isn't difficult, and we're getting a bit late in release
cycle already.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-23 18:55                         ` Tejun Heo
@ 2012-02-23 23:31                           ` David Miller
  2012-02-24  9:20                           ` Meelis Roos
  1 sibling, 0 replies; 46+ messages in thread
From: David Miller @ 2012-02-23 23:31 UTC (permalink / raw)
  To: tj; +Cc: mroos, grant.likely, rob.herring, sparclinux, linux-kernel, sam

From: Tejun Heo <tj@kernel.org>
Date: Thu, 23 Feb 2012 10:55:03 -0800

> Hello,
> 
> On Wed, Feb 22, 2012 at 08:25:32PM +0200, Meelis Roos wrote:
>> > Can you please try the following patch?  If it still fails to boot,
>> > please attach the failing log.  Thank you.
>> 
>> It works on E3500! Will try other machines tomorrow.
> 
> Once confirmed, I'll push the patch through tip.  It just hides the
> underlying problem but we should be in no worse shape than before,
> it's two line change so reproduing the problem again for proper
> diagnosing isn't difficult, and we're getting a bit late in release
> cycle already.

Ok.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-23 18:55                         ` Tejun Heo
  2012-02-23 23:31                           ` David Miller
@ 2012-02-24  9:20                           ` Meelis Roos
  2012-02-27 17:17                             ` Meelis Roos
  1 sibling, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-24  9:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

> > > Can you please try the following patch?  If it still fails to boot,
> > > please attach the failing log.  Thank you.
> > 
> > It works on E3500! Will try other machines tomorrow.
> 
> Once confirmed, I'll push the patch through tip.  It just hides the
> underlying problem but we should be in no worse shape than before,
> it's two line change so reproduing the problem again for proper
> diagnosing isn't difficult, and we're getting a bit late in release
> cycle already.

It cured the V210 too but I could not test V100 since it's offline until 
monday.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-24  9:20                           ` Meelis Roos
@ 2012-02-27 17:17                             ` Meelis Roos
  2012-02-27 19:43                               ` Sam Ravnborg
  0 siblings, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-27 17:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Grant Likely, Rob Herring, sparclinux, Linux Kernel list, sam,
	David S. Miller

> > > > Can you please try the following patch?  If it still fails to boot,
> > > > please attach the failing log.  Thank you.
> > > 
> > > It works on E3500! Will try other machines tomorrow.
> > 
> > Once confirmed, I'll push the patch through tip.  It just hides the
> > underlying problem but we should be in no worse shape than before,
> > it's two line change so reproduing the problem again for proper
> > diagnosing isn't difficult, and we're getting a bit late in release
> > cycle already.
> 
> It cured the V210 too but I could not test V100 since it's offline until 
> monday.

Tested V100 too, success!

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-27 17:17                             ` Meelis Roos
@ 2012-02-27 19:43                               ` Sam Ravnborg
  2012-02-27 21:25                                 ` Meelis Roos
  0 siblings, 1 reply; 46+ messages in thread
From: Sam Ravnborg @ 2012-02-27 19:43 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Tejun Heo, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, David S. Miller

On Mon, Feb 27, 2012 at 07:17:42PM +0200, Meelis Roos wrote:
> > > > > Can you please try the following patch?  If it still fails to boot,
> > > > > please attach the failing log.  Thank you.
> > > > 
> > > > It works on E3500! Will try other machines tomorrow.
> > > 
> > > Once confirmed, I'll push the patch through tip.  It just hides the
> > > underlying problem but we should be in no worse shape than before,
> > > it's two line change so reproduing the problem again for proper
> > > diagnosing isn't difficult, and we're getting a bit late in release
> > > cycle already.
> > 
> > It cured the V210 too but I could not test V100 since it's offline until 
> > monday.
> 
> Tested V100 too, success!

Hi Meelis.

I have tried to cook up a small patch that verify the length of what
we read - compared to the original length.

Could you try to give this a quick spin and see if something
turns up. I you have time it would be good to try on a box
that worked before and one that was fixed by the patch from Tejun.

I have not looked much at the of stuff - but this looked like the right place to start.

I have no possibility to try it out myself...

	Sam

diff --git a/drivers/of/pdt.c b/drivers/of/pdt.c
index 07cc1d6..826204a 100644
--- a/drivers/of/pdt.c
+++ b/drivers/of/pdt.c
@@ -128,6 +128,10 @@ static struct property * __init of_pdt_build_one_prop(phandle node, char *prev,
 			p->value = prom_early_alloc(p->length + 1);
 			len = of_pdt_prom_ops->getproperty(node, p->name,
 					p->value, p->length);
+
+			if (len != p->length)
+				pr_err("prop: %s %d => %d", p->name, p->length, len);
+
 			if (len <= 0)
 				p->length = 0;
 			((unsigned char *)p->value)[p->length] = '\0';
@@ -161,8 +165,13 @@ static char * __init of_pdt_get_one_property(phandle node, const char *name)
 
 	len = of_pdt_prom_ops->getproplen(node, name);
 	if (len > 0) {
+		int proplen;
 		buf = prom_early_alloc(len);
-		len = of_pdt_prom_ops->getproperty(node, name, buf, len);
+		proplen = of_pdt_prom_ops->getproperty(node, name, buf, len);
+
+		if (proplen != len)
+			pr_err("prop: %s %d => %d\n", name, len, proplen);
+
 	}
 
 	return buf;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-27 19:43                               ` Sam Ravnborg
@ 2012-02-27 21:25                                 ` Meelis Roos
  2012-02-27 21:30                                   ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-27 21:25 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Tejun Heo, Grant Likely, Rob Herring, sparclinux,
	Linux Kernel list, David S. Miller

> Could you try to give this a quick spin and see if something
> turns up. I you have time it would be good to try on a box
> that worked before and one that was fixed by the patch from Tejun.

Neither of the machines - already working one and "fixed with the 
rounding patch" one emit any prot: messages.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-27 21:25                                 ` Meelis Roos
@ 2012-02-27 21:30                                   ` David Miller
  2012-02-28 21:10                                     ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2012-02-27 21:30 UTC (permalink / raw)
  To: mroos; +Cc: sam, tj, grant.likely, rob.herring, sparclinux, linux-kernel

From: Meelis Roos <mroos@linux.ee>
Date: Mon, 27 Feb 2012 23:25:11 +0200 (EET)

>> Could you try to give this a quick spin and see if something
>> turns up. I you have time it would be good to try on a box
>> that worked before and one that was fixed by the patch from Tejun.
> 
> Neither of the machines - already working one and "fixed with the 
> rounding patch" one emit any prot: messages.

I think the issue is that OF writes past the end of the buffer even
though the length it reports is smaller than what it writes.

That's why we really need to fill the memblock memory with magic
numbers and scan every allocation for free memory with corrupted
magic values.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3.3-rc5] memblock: Fix size aligning of memblock_alloc_base_nid()
@ 2012-02-28 20:56 Tejun Heo
  2012-02-13  7:45 ` OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88 Meelis Roos
  2012-02-28 22:16 ` [PATCH v3.3-rc5] " Sam Ravnborg
  0 siblings, 2 replies; 46+ messages in thread
From: Tejun Heo @ 2012-02-28 20:56 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: David S. Miller, linux-kernel, Meelis Roos, Grant Likely,
	Rob Herring, sparclinux, sam

memblock allocator aligns @size to @align to reduce the amount of
fragmentation.  7bd0b0f0da "memblock: Reimplement memblock allocation
using reverse free area iterator" broke it by incorrectly relocating
@size aligning to memblock_find_in_range_node().  As the aligned size
is not propagated back to memblock_alloc_base_nid(), the actually
reserved size isn't aligned.

While this increases memory use for memblock reserved array, this
shouldn't cause any critical failure; however, it seems that the size
aligning was hiding a use-beyond-allocation bug in sparc64 and losing
the aligning causes boot failure.

The underlying problem is currently being debugged but this is a
proper fix in itself, it's already pretty late in -rc cycle for boot
failures and reverting the change for debugging isn't difficult.
Restore the size aligning moving it to memblock_alloc_base_nid().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <alpine.SOC.1.00.1202130942030.1488@math.ut.ee>
---
 mm/memblock.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 77b5f22..99f2855 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,9 +99,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
 
-	/* align @size to avoid excessive fragmentation on reserved array */
-	size = round_up(size, align);
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -731,6 +728,9 @@ static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
 {
 	phys_addr_t found;
 
+	/* align @size to avoid excessive fragmentation on reserved array */
+	size = round_up(size, align);
+
 	found = memblock_find_in_range_node(0, max_addr, size, align, nid);
 	if (found && !memblock_reserve(found, size))
 		return found;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-27 21:30                                   ` David Miller
@ 2012-02-28 21:10                                     ` David Miller
  2012-02-28 21:36                                       ` Meelis Roos
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2012-02-28 21:10 UTC (permalink / raw)
  To: mroos; +Cc: sam, tj, grant.likely, rob.herring, sparclinux, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Mon, 27 Feb 2012 16:30:44 -0500 (EST)

> I think the issue is that OF writes past the end of the buffer even
> though the length it reports is smaller than what it writes.

Meelis, can you get your tree back into a state where the crash happens
and then add the following debugging patch and see what happens?

Thanks!

diff --git a/drivers/of/pdt.c b/drivers/of/pdt.c
index 07cc1d6..367ef33 100644
--- a/drivers/of/pdt.c
+++ b/drivers/of/pdt.c
@@ -125,12 +125,31 @@ static struct property * __init of_pdt_build_one_prop(phandle node, char *prev,
 		} else {
 			int len;
 
+#if 1
+			int i;
+			p->value = prom_early_alloc(p->length + 1 + 64);
+			for (i = p->length + 1; i < p->length + 1 + 64; i++)
+				((unsigned char *)p->value)[i] = 0xff;
+#else
 			p->value = prom_early_alloc(p->length + 1);
+#endif
 			len = of_pdt_prom_ops->getproperty(node, p->name,
 					p->value, p->length);
-			if (len <= 0)
+			if (len <= 0) {
+				pr_info("OF BUG: getproperty(%s, %d) returns %d\n",
+					p->name, p->length, len);
 				p->length = 0;
+			}
 			((unsigned char *)p->value)[p->length] = '\0';
+#if 1
+			for (i = p->length + 1; i < p->length + 1 + 64; i++) {
+				if (((unsigned char *)p->value)[i] != 0xff) {
+					pr_info("OF BUG: Write past end of property buffer\n");
+					pr_info("OF BUG: Property name [%s] length [%d] getprop len [%d]\n",
+						p->name, p->length, len);
+				}
+			}
+#endif
 		}
 	}
 	return p;
@@ -161,7 +180,11 @@ static char * __init of_pdt_get_one_property(phandle node, const char *name)
 
 	len = of_pdt_prom_ops->getproplen(node, name);
 	if (len > 0) {
+#if 1
+		buf = prom_early_alloc(len + 64);
+#else
 		buf = prom_early_alloc(len);
+#endif
 		len = of_pdt_prom_ops->getproperty(node, name, buf, len);
 	}
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-28 21:10                                     ` David Miller
@ 2012-02-28 21:36                                       ` Meelis Roos
  2012-02-28 22:56                                         ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-28 21:36 UTC (permalink / raw)
  To: David Miller; +Cc: sam, tj, grant.likely, rob.herring, sparclinux, linux-kernel

> Meelis, can you get your tree back into a state where the crash happens
> and then add the following debugging patch and see what happens?

Tried it, no obvious results in dmesg, except the crash is in a slightly 
different location.

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.2.30 2002/10/25 14:03'
[    0.000000] PROMLIB: Root node compatible: 
[    0.000000] Linux version 3.2.0-rc3-00076-g7bd0b0f-dirty (mroos@korvits) (gcc version 4.6.2 (Debian 4.6.2-14) ) #84 SMP Tue Feb 28 23:28:49 EET 2012
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 08:00:20:b6:ee:e2
[    0.000000] Kernel: Using 4 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] Unable to handle kernel paging request at virtual address 000000007fcf2000
[    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
[    0.000000] tsk->{mm,active_mm}->pgd = fffff800007db7d0
[    0.000000]               \|/ ____ \|/
[    0.000000]               "@'/ .. \`@"
[    0.000000]               /_| \__/ |_\
[    0.000000]                  \__U_/
[    0.000000] swapper(0): Oops [#1]
[    0.000000] TSTATE: 0000008880e01600 TPC: 000000000057b4c8 TNPC: 000000000057b4cc Y: 00000037    Not tainted
[    0.000000] TPC: <strcmp+0x8/0x60>
[    0.000000] g0: 000000000077f7f0 g1: 0000000000000000 g2: 000000000000002f g3: 00000000000000f0
[    0.000000] g4: 000000000077f350 g5: 0000000000000000 g6: 0000000000760000 g7: 0000000000000050
[    0.000000] o0: 000000000079dbc8 o1: 0000000000000000 o2: 0000000000000000 o3: 0000000000000002
[    0.000000] o4: 0000000000000002 o5: 0000000000000000 sp: 0000000000763181 ret_pc: 00000000006a9984
[    0.000000] RPC: <_raw_read_lock+0x24/0x40>
[    0.000000] l0: 0000000001028000 l1: fffff8007fcbc380 l2: 8000000000000000 l3: 0800000000000000
[    0.000000] l4: 0000000000000080 l5: 0000000000000002 l6: 0000000000000000 l7: 0020280000000000
[    0.000000] i0: 000000007fcf3c80 i1: fffff8007fcec480 i2: 0000000001010101 i3: 0000000080808080
[    0.000000] i4: fffff8007fcb8ccd i5: 0000000000028337 i6: 0000000000763231 i7: 0000000000606250
[    0.000000] I7: <of_find_node_by_path+0x30/0x80>
[    0.000000] Call Trace:
[    0.000000]  [0000000000606250] of_find_node_by_path+0x30/0x80
[    0.000000]  [0000000000606e0c] of_alias_scan+0xcc/0x1c0
[    0.000000]  [00000000007c328c] of_pdt_build_devicetree+0x90/0xa0
[    0.000000]  [00000000007b0680] prom_build_devicetree+0x10/0x3c
[    0.000000]  [00000000007b4614] paging_init+0x59c/0x6bc
[    0.000000]  [00000000007afffc] setup_arch+0xf8/0x110
[    0.000000]  [00000000007ae514] start_kernel+0x84/0x32c
[    0.000000]  [00000000006918c8] tlb_fixup_done+0xa0/0xa8
[    0.000000]  [0000000000000000]           (null)
[    0.000000] Disabling lock debugging due to kernel taint
[    0.000000] Caller[0000000000606250]: of_find_node_by_path+0x30/0x80
[    0.000000] Caller[0000000000606e0c]: of_alias_scan+0xcc/0x1c0
[    0.000000] Caller[00000000007c328c]: of_pdt_build_devicetree+0x90/0xa0
[    0.000000] Caller[00000000007b0680]: prom_build_devicetree+0x10/0x3c
[    0.000000] Caller[00000000007b4614]: paging_init+0x59c/0x6bc
[    0.000000] Caller[00000000007afffc]: setup_arch+0xf8/0x110
[    0.000000] Caller[00000000007ae514]: start_kernel+0x84/0x32c
[    0.000000] Caller[00000000006918c8]: tlb_fixup_done+0xa0/0xa8
[    0.000000] Caller[0000000000000000]:           (null)
[    0.000000] Instruction DUMP: 01000000  9de3bf50  82102000 <c40e0001> c60e4001  80a08003  12400008  82006001  80a0a000 
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Call Trace:
[    0.000000]  [000000000069c7fc] panic+0x68/0x1e4
[    0.000000]  [0000000000461a30] do_exit+0x230/0x2c0
[    0.000000]  [00000000004292c0] die_if_kernel+0x180/0x260
[    0.000000]  [000000000069c224] unhandled_fault+0x8c/0x98
[    0.000000]  [0000000000445778] do_kernel_fault+0xd8/0x100
[    0.000000]  [000000000044584c] do_sparc64_fault+0xac/0x540
[    0.000000]  [0000000000407948] sparc64_realfault_common+0x10/0x20
[    0.000000]  [000000000057b4c8] strcmp+0x8/0x60
[    0.000000]  [0000000000606250] of_find_node_by_path+0x30/0x80
[    0.000000]  [0000000000606e0c] of_alias_scan+0xcc/0x1c0
[    0.000000]  [00000000007c328c] of_pdt_build_devicetree+0x90/0xa0
[    0.000000]  [00000000007b0680] prom_build_devicetree+0x10/0x3c
[    0.000000]  [00000000007b4614] paging_init+0x59c/0x6bc
[    0.000000]  [00000000007afffc] setup_arch+0xf8/0x110
[    0.000000]  [00000000007ae514] start_kernel+0x84/0x32c
[    0.000000]  [00000000006918c8] tlb_fixup_done+0xa0/0xa8
[    0.000000] Press Stop-A (L1-A) to return to the boot prom

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3.3-rc5] memblock: Fix size aligning of memblock_alloc_base_nid()
  2012-02-28 20:56 [PATCH v3.3-rc5] memblock: Fix size aligning of memblock_alloc_base_nid() Tejun Heo
  2012-02-13  7:45 ` OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88 Meelis Roos
@ 2012-02-28 22:16 ` Sam Ravnborg
  1 sibling, 0 replies; 46+ messages in thread
From: Sam Ravnborg @ 2012-02-28 22:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, David S. Miller, linux-kernel,
	Meelis Roos, Grant Likely, Rob Herring, sparclinux

On Wed, Feb 29, 2012 at 05:56:21AM +0900, Tejun Heo wrote:
> memblock allocator aligns @size to @align to reduce the amount of
> fragmentation.  7bd0b0f0da "memblock: Reimplement memblock allocation
> using reverse free area iterator" broke it by incorrectly relocating
> @size aligning to memblock_find_in_range_node().  As the aligned size
> is not propagated back to memblock_alloc_base_nid(), the actually
> reserved size isn't aligned.
> 
> While this increases memory use for memblock reserved array, this
> shouldn't cause any critical failure; however, it seems that the size
> aligning was hiding a use-beyond-allocation bug in sparc64 and losing
> the aligning causes boot failure.
> 
> The underlying problem is currently being debugged but this is a
> proper fix in itself, it's already pretty late in -rc cycle for boot
> failures and reverting the change for debugging isn't difficult.
> Restore the size aligning moving it to memblock_alloc_base_nid().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Meelis Roos <mroos@linux.ee>

> Reported-by: Sam Ravnborg <sam@ravnborg.org>
Actually not :-(
I only fooled around with some clueless suggestions - I do
not have any sparc64 boxes. And my sparc32 box that is alive atm,
does not exhibit this problem.

	Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-28 21:36                                       ` Meelis Roos
@ 2012-02-28 22:56                                         ` David Miller
  2012-02-29  6:15                                           ` Meelis Roos
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2012-02-28 22:56 UTC (permalink / raw)
  To: mroos; +Cc: sam, tj, grant.likely, rob.herring, sparclinux, linux-kernel

From: Meelis Roos <mroos@linux.ee>
Date: Tue, 28 Feb 2012 23:36:07 +0200 (EET)

>> Meelis, can you get your tree back into a state where the crash happens
>> and then add the following debugging patch and see what happens?
> 
> Tried it, no obvious results in dmesg, except the crash is in a slightly 
> different location.

Interesting, the corruption is a little bit different this time, yet similar
to the ones we saw previously:

> [    0.000000] TPC: <strcmp+0x8/0x60>
 ...
> [    0.000000] i0: 000000007fcf3c80 i1: fffff8007fcec480 i2: 0000000001010101 i3: 0000000080808080
> [    0.000000] i4: fffff8007fcb8ccd i5: 0000000000028337 i6: 0000000000763231 i7: 0000000000606250

This is strcmp(0x000000007fcf3c80, 0xfffff8007fcec480), the first arg is
a bad pointer, somehow the top virtual address bits have been zero'd out.

It comes from dp->full_name, so something walked all over the beginning
of a device_node object.

Let's see if we can figure out anything else about the nature of the
corruption, please add this patch on top.

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 133908a..7c0f7f4 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -376,6 +376,18 @@ struct device_node *of_find_node_by_path(const char *path)
 
 	read_lock(&devtree_lock);
 	for (; np; np = np->allnext) {
+		if (!np->full_name)
+			continue;
+
+		if ((unsigned long)np->full_name < 0xfffff80000000000) {
+			pr_info("OF BUG: Bogus full_name pointer [%p]\n",
+				np->full_name);
+			pr_info("OF BUG: np[%p] np->name[%p] np->type[%p] np->phandle[0x%08x]\n",
+				np, np->name, np->type, (unsigned int) np->phandle);
+			pr_info("OF BUG: np->name(%s) np->type(%s)\n",
+				np->name, np->type);
+		}
+
 		if (np->full_name && (of_node_cmp(np->full_name, path) == 0)
 		    && of_node_get(np))
 			break;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-28 22:56                                         ` David Miller
@ 2012-02-29  6:15                                           ` Meelis Roos
  2012-02-29  6:27                                             ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Meelis Roos @ 2012-02-29  6:15 UTC (permalink / raw)
  To: David Miller; +Cc: sam, tj, grant.likely, rob.herring, sparclinux, linux-kernel

> > Tried it, no obvious results in dmesg, except the crash is in a slightly 
> > different location.
> 
> Interesting, the corruption is a little bit different this time, yet similar
> to the ones we saw previously:
> 
> > [    0.000000] TPC: <strcmp+0x8/0x60>
>  ...
> > [    0.000000] i0: 000000007fcf3c80 i1: fffff8007fcec480 i2: 0000000001010101 i3: 0000000080808080
> > [    0.000000] i4: fffff8007fcb8ccd i5: 0000000000028337 i6: 0000000000763231 i7: 0000000000606250
> 
> This is strcmp(0x000000007fcf3c80, 0xfffff8007fcec480), the first arg is
> a bad pointer, somehow the top virtual address bits have been zero'd out.
> 
> It comes from dp->full_name, so something walked all over the beginning
> of a device_node object.
> 
> Let's see if we can figure out anything else about the nature of the
> corruption, please add this patch on top.

Here it is - triggers this time:

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.2.30 2002/10/25 14:03'
[    0.000000] PROMLIB: Root node compatible: 
[    0.000000] Linux version 3.2.0-rc3-00076-g7bd0b0f-dirty (mroos@korvits) (gcc version 4.6.2 (Debian 4.6.2-14) ) #85 SMP Wed Feb 29 08:06:38 EET 2012
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 08:00:20:b6:ee:e2
[    0.000000] Kernel: Using 4 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] OF BUG: Bogus full_name pointer [0000000000730e08]
[    0.000000] OF BUG: np[fffff8007fcf3f40] np->name[fffff8007fcf3ec0] np->type[0000000000756bf8] np->phandle[0xf0029c88]
[    0.000000] OF BUG: np->name(SUNW,Ultra-Enterprise) np->type(<NULL>)
[    0.000000] OF BUG: Bogus full_name pointer [0000000000730e08]
[    0.000000] OF BUG: np[fffff8007fcf3f40] np->name[fffff8007fcf3ec0] np->type[0000000000756bf8] np->phandle[0xf0029c88]
[    0.000000] OF BUG: np->name(SUNW,Ultra-Enterprise) np->type(<NULL>)
[    0.000000] OF BUG: Bogus full_name pointer [0000000000730e08]
[    0.000000] OF BUG: np[fffff8007fcf3f40] np->name[fffff8007fcf3ec0] np->type[0000000000756bf8] np->phandle[0xf0029c88]
[    0.000000] OF BUG: np->name(SUNW,Ultra-Enterprise) np->type(<NULL>)
[    0.000000] OF BUG: Bogus full_name pointer [000000007fcf3c80]
[    0.000000] OF BUG: np[fffff8007fceacc0] np->name[          (null)] np->type[          (null)] np->phandle[0x00000001]
[    0.000000] OF BUG: np->name((null)) np->type((null))
[    0.000000] Unable to handle kernel paging request at virtual address 000000007fcf2000
[    0.000000] tsk->{mm,active_mm}->context = 0000000000000000
[    0.000000] tsk->{mm,active_mm}->pgd = fffff800007db7d0
[    0.000000]               \|/ ____ \|/
[    0.000000]               "@'/ .. \`@"
[    0.000000]               /_| \__/ |_\
[    0.000000]                  \__U_/
[    0.000000] swapper(0): Oops [#1]
[    0.000000] TSTATE: 0000004480e01600 TPC: 000000000057b4c8 TNPC: 000000000057b4cc Y: 00000037    Not tainted
[    0.000000] TPC: <strcmp+0x8/0x60>
[    0.000000] g0: 000000000077f7f0 g1: 0000000000000000 g2: 0000000000000000 g3: 0000000000787950
[    0.000000] g4: 000000000077f350 g5: 0000000000000000 g6: 0000000000760000 g7: 0000000000000040
[    0.000000] o0: 000000000000003f o1: 0000000000763930 o2: 0000000000000003 o3: 00000000007879e4
[    0.000000] o4: 000000000080ee45 o5: 000000000080ee1b sp: 0000000000763181 ret_pc: 000000000069cad0
[    0.000000] RPC: <printk+0x24/0x38>
[    0.000000] l0: 0000000001028000 l1: fffff8007fcbc380 l2: 8000000000000000 l3: 0800000000000000
[    0.000000] l4: 0000000000000080 l5: 0000000000000002 l6: 0000000000000000 l7: 0020280000000000
[    0.000000] i0: 000000007fcf3c80 i1: fffff8007fcec480 i2: 0000000000000000 i3: 0000000000000000
[    0.000000] i4: 0000000000000001 i5: 0000000000028337 i6: 0000000000763231 i7: 0000000000606278
[    0.000000] I7: <of_find_node_by_path+0x58/0xe0>
[    0.000000] Call Trace:
[    0.000000]  [0000000000606278] of_find_node_by_path+0x58/0xe0
[    0.000000]  [0000000000606e6c] of_alias_scan+0xcc/0x1c0
[    0.000000]  [00000000007c328c] of_pdt_build_devicetree+0x90/0xa0
[    0.000000]  [00000000007b0680] prom_build_devicetree+0x10/0x3c
[    0.000000]  [00000000007b4614] paging_init+0x59c/0x6bc
[    0.000000]  [00000000007afffc] setup_arch+0xf8/0x110
[    0.000000]  [00000000007ae514] start_kernel+0x84/0x32c
[    0.000000]  [0000000000691928] tlb_fixup_done+0xa0/0xa8
[    0.000000]  [0000000000000000]           (null)
[    0.000000] Disabling lock debugging due to kernel taint
[    0.000000] Caller[0000000000606278]: of_find_node_by_path+0x58/0xe0
[    0.000000] Caller[0000000000606e6c]: of_alias_scan+0xcc/0x1c0
[    0.000000] Caller[00000000007c328c]: of_pdt_build_devicetree+0x90/0xa0
[    0.000000] Caller[00000000007b0680]: prom_build_devicetree+0x10/0x3c
[    0.000000] Caller[00000000007b4614]: paging_init+0x59c/0x6bc
[    0.000000] Caller[00000000007afffc]: setup_arch+0xf8/0x110
[    0.000000] Caller[00000000007ae514]: start_kernel+0x84/0x32c
[    0.000000] Caller[0000000000691928]: tlb_fixup_done+0xa0/0xa8
[    0.000000] Caller[0000000000000000]:           (null)
[    0.000000] Instruction DUMP: 01000000  9de3bf50  82102000 <c40e0001> c60e4001  80a08003  12400008  82006001  80a0a000 
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Call Trace:
[    0.000000]  [000000000069c85c] panic+0x68/0x1e4
[    0.000000]  [0000000000461a30] do_exit+0x230/0x2c0
[    0.000000]  [00000000004292c0] die_if_kernel+0x180/0x260
[    0.000000]  [000000000069c284] unhandled_fault+0x8c/0x98
[    0.000000]  [0000000000445778] do_kernel_fault+0xd8/0x100
[    0.000000]  [000000000044584c] do_sparc64_fault+0xac/0x540
[    0.000000]  [0000000000407948] sparc64_realfault_common+0x10/0x20
[    0.000000]  [000000000057b4c8] strcmp+0x8/0x60
[    0.000000]  [0000000000606278] of_find_node_by_path+0x58/0xe0
[    0.000000]  [0000000000606e6c] of_alias_scan+0xcc/0x1c0
[    0.000000]  [00000000007c328c] of_pdt_build_devicetree+0x90/0xa0
[    0.000000]  [00000000007b0680] prom_build_devicetree+0x10/0x3c
[    0.000000]  [00000000007b4614] paging_init+0x59c/0x6bc
[    0.000000]  [00000000007afffc] setup_arch+0xf8/0x110
[    0.000000]  [00000000007ae514] start_kernel+0x84/0x32c
[    0.000000]  [0000000000691928] tlb_fixup_done+0xa0/0xa8
[    0.000000] Press Stop-A (L1-A) to return to the boot prom

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88
  2012-02-29  6:15                                           ` Meelis Roos
@ 2012-02-29  6:27                                             ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2012-02-29  6:27 UTC (permalink / raw)
  To: mroos; +Cc: sam, tj, grant.likely, rob.herring, sparclinux, linux-kernel

From: Meelis Roos <mroos@linux.ee>
Date: Wed, 29 Feb 2012 08:15:06 +0200 (EET)

> Here it is - triggers this time:

Thanks a lot.

I need to add some more diagnostics to further narrow it down,
I'll give you a patch for that when I get a chance.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:core/urgent] memblock: Fix size aligning of memblock_alloc_base_nid()
  2012-02-13  7:45 ` OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88 Meelis Roos
  2012-02-13  8:06   ` Grant Likely
@ 2012-03-01 12:24   ` tip-bot for Tejun Heo
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot for Tejun Heo @ 2012-03-01 12:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, grant.likely, hpa, mingo, torvalds, davem,
	rob.herring, akpm, tj, mroos, tglx, mingo

Commit-ID:  847854f5988a04fe7e02d2fdd4fa0df9f96360fe
Gitweb:     http://git.kernel.org/tip/847854f5988a04fe7e02d2fdd4fa0df9f96360fe
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Wed, 29 Feb 2012 05:56:21 +0900
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 1 Mar 2012 10:53:18 +0100

memblock: Fix size aligning of memblock_alloc_base_nid()

memblock allocator aligns @size to @align to reduce the amount
of fragmentation.  Commit:

 7bd0b0f0da ("memblock: Reimplement memblock allocation using reverse free area iterator")

Broke it by incorrectly relocating @size aligning to
memblock_find_in_range_node().  As the aligned size is not
propagated back to memblock_alloc_base_nid(), the actually
reserved size isn't aligned.

While this increases memory use for memblock reserved array,
this shouldn't cause any critical failure; however, it seems
that the size aligning was hiding a use-beyond-allocation bug in
sparc64 and losing the aligning causes boot failure.

The underlying problem is currently being debugged but this is a
proper fix in itself, it's already pretty late in -rc cycle for
boot failures and reverting the change for debugging isn't
difficult. Restore the size aligning moving it to
memblock_alloc_base_nid().

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120228205621.GC3252@dhcp-172-17-108-109.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <alpine.SOC.1.00.1202130942030.1488@math.ut.ee>
---
 mm/memblock.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 77b5f22..99f2855 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,9 +99,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
 
-	/* align @size to avoid excessive fragmentation on reserved array */
-	size = round_up(size, align);
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -731,6 +728,9 @@ static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
 {
 	phys_addr_t found;
 
+	/* align @size to avoid excessive fragmentation on reserved array */
+	size = round_up(size, align);
+
 	found = memblock_find_in_range_node(0, max_addr, size, align, nid);
 	if (found && !memblock_reserve(found, size))
 		return found;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2012-03-01 12:25 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-28 20:56 [PATCH v3.3-rc5] memblock: Fix size aligning of memblock_alloc_base_nid() Tejun Heo
2012-02-13  7:45 ` OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88 Meelis Roos
2012-02-13  8:06   ` Grant Likely
2012-02-13  9:20     ` Meelis Roos
2012-02-13 21:46       ` Grant Likely
2012-02-14  0:58         ` David Miller
2012-02-14  2:30           ` Grant Likely
2012-02-14  2:41             ` Grant Likely
2012-02-16 21:08             ` mroos
2012-02-14  5:54           ` mroos
2012-02-16 19:53         ` Meelis Roos
2012-02-16 21:23           ` Sam Ravnborg
2012-02-20  9:11           ` Meelis Roos
2012-02-20 17:06             ` Tejun Heo
2012-02-20 20:04               ` Meelis Roos
2012-02-20 21:01                 ` Tejun Heo
2012-02-20 22:32               ` Meelis Roos
2012-02-21  1:05                 ` Tejun Heo
2012-02-22  0:36                   ` Meelis Roos
2012-02-22 17:48                     ` Tejun Heo
2012-02-22 18:25                       ` Meelis Roos
2012-02-23 18:55                         ` Tejun Heo
2012-02-23 23:31                           ` David Miller
2012-02-24  9:20                           ` Meelis Roos
2012-02-27 17:17                             ` Meelis Roos
2012-02-27 19:43                               ` Sam Ravnborg
2012-02-27 21:25                                 ` Meelis Roos
2012-02-27 21:30                                   ` David Miller
2012-02-28 21:10                                     ` David Miller
2012-02-28 21:36                                       ` Meelis Roos
2012-02-28 22:56                                         ` David Miller
2012-02-29  6:15                                           ` Meelis Roos
2012-02-29  6:27                                             ` David Miller
2012-02-22 20:44                       ` David Miller
2012-02-22 21:00                         ` Tejun Heo
2012-02-22 18:22                     ` Richard Mortimer
2012-02-22 20:26                       ` David Miller
2012-02-22 17:03                   ` Sam Ravnborg
2012-02-22 17:12                     ` Meelis Roos
2012-02-22 17:21                       ` Sam Ravnborg
2012-02-22 17:41                         ` Meelis Roos
2012-02-13  9:50     ` Meelis Roos
2012-02-13  9:51       ` Meelis Roos
2012-02-13 10:35       ` Meelis Roos
2012-03-01 12:24   ` [tip:core/urgent] memblock: Fix size aligning of memblock_alloc_base_nid() tip-bot for Tejun Heo
2012-02-28 22:16 ` [PATCH v3.3-rc5] " Sam Ravnborg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).