* [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR
@ 2019-06-24 14:09 Sachin Sant
2019-06-24 14:42 ` David Hildenbrand
0 siblings, 1 reply; 3+ messages in thread
From: Sachin Sant @ 2019-06-24 14:09 UTC (permalink / raw)
To: linuxppc-dev, david, Stephen Rothwell; +Cc: linux-next, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3443 bytes --]
Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.
This problem was introduced with next-20190620 (dc636f5d78).
next-20190619 was last good kernel.
Reverting following commit allows the kernel to boot.
2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()
[ 0.014409] Using shared cache scheduler topology
[ 0.016302] devtmpfs: initialized
[ 0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
[ 0.031575] NET: Registered protocol family 16
[ 0.031724] audit: initializing netlink subsys (disabled)
[ 0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
[ 0.032249] cpuidle: using governor menu
[ 0.032403] pstore: Registered nvram as persistent store backend
[ 60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
[ 60.061254] rcu: 0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999
[ 60.061261] (t=6000 jiffies g=-1187 q=0)
[ 60.061265] NMI backtrace for cpu 0
[ 60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
[ 60.061275] Call Trace:
[ 60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
[ 60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
[ 60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
[ 60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
[ 60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
[ 60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
[ 60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
[ 60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
[ 60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
[ 60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
[ 60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
[ 60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
[ 60.061350] LR = arch_local_irq_restore+0x84/0x90
[ 60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
[ 60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
[ 60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
[ 60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
[ 60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
[ 60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
[ 60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
[ 60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
[ 60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
[ 60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
[ 60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
[ 88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 88.016569] Modules linked in:
Thanks
-Sachin
[-- Attachment #2: boot.log --]
[-- Type: application/octet-stream, Size: 11416 bytes --]
[ 0.000000] hash-mmu: Page sizes from device-tree:
[ 0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[ 0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
[ 0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
[ 0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[ 0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
[ 0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
[ 0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
[ 0.000000] Using 1TB segments
[ 0.000000] hash-mmu: Initializing hash mmu with SLB
[ 0.000000] Linux version 5.2.0-rc5-next-20190620-autotest (root@ltczep10-lp3.aus.stglabs.ibm.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) #1 SMP Mon Jun 24 06:38:12 PDT 2019
[ 0.000000] Found initrd at 0xc0000000032b0000:0xc0000000044c4efc
[ 0.000000] Using pSeries machine description
[ 0.000000] printk: bootconsole [udbg0] enabled
[ 0.000000] Partition configured for 40 cpus.
[ 0.000000] CPU maps initialized for 8 threads per core
[ 0.000000] -----------------------------------------------------
[ 0.000000] phys_mem_size = 0x1900000000
[ 0.000000] dcache_bsize = 0x80
[ 0.000000] icache_bsize = 0x80
[ 0.000000] cpu_features = 0x0000c07f8f5f91a7
[ 0.000000] possible = 0x0000fbffcf5fb1a7
[ 0.000000] always = 0x00000003800081a1
[ 0.000000] cpu_user_features = 0xdc0065c2 0xefe00000
[ 0.000000] mmu_features = 0x7c006001
[ 0.000000] firmware_features = 0x00000013c45bfc57
[ 0.000000] hash-mmu: ppc64_pft_size = 0x1e
[ 0.000000] hash-mmu: htab_hash_mask = 0x7fffff
[ 0.000000] hash-mmu: kernel vmalloc start = 0xc008000000000000
[ 0.000000] hash-mmu: kernel IO start = 0xc00a000000000000
[ 0.000000] hash-mmu: kernel vmemmap start = 0xc00c000000000000
[ 0.000000] -----------------------------------------------------
[ 0.000000] numa: NODE_DATA [mem 0x18ffe98980-0x18ffe9ffff]
[ 0.000000] numa: NODE_DATA(0) on node 3
[ 0.000000] numa: NODE_DATA [mem 0x18ffe91300-0x18ffe9897f]
[ 0.000000] rfi-flush: fallback displacement flush available
[ 0.000000] rfi-flush: mttrig type flush available
[ 0.000000] count-cache-flush: full software flush sequence enabled.
[ 0.000000] stf-barrier: eieio barrier available
[ 0.000000] PPC64 nvram contains 15360 bytes
[ 0.000000] barrier-nospec: using ORI speculation barrier
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x00000018ffffffff]
[ 0.000000] Device empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 3: [mem 0x0000000000000000-0x00000018ffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
[ 0.000000] Initmem setup node 3 [mem 0x0000000000000000-0x00000018ffffffff]
[ 0.000000] percpu: Embedded 11 pages/cpu s631960 r0 d88936 u1048576
[ 0.000000] node[0] zonelist: 3:Normal
[ 0.000000] node[3] zonelist: 3:Normal
[ 0.000000] Built 2 zonelists, mobility grouping on. Total pages: 1636800
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: root=UUID=5b503143-3264-4a73-bd1b-2b1cd7d250f6
[ 0.000000] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 104547200K/104857600K available (11840K kernel code, 1792K rwdata, 3584K rodata, 4992K init, 3850K bss, 310400K reserved, 0K cma-reserved)
[ 0.000000] random: get_random_u64 called from cache_random_seq_create+0xa4/0x1c0 with crng_init=0
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=40, Nodes=32
[ 0.000000] ftrace: allocating 31746 entries in 12 pages
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=40.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=40
[ 0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[ 0.000000] rcu: Offload RCU callbacks from CPUs: (none).
[ 0.000002] time_init: 56 bit decrementer (max: 7fffffffffffff)
[ 0.000067] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
[ 0.000178] clocksource: timebase mult[1f40000] shift[24] registered
[ 0.000312] Console: colour dummy device 80x25
[ 0.000363] printk: console [hvc0] enabled
[ 0.000363] printk: console [hvc0] enabled
[ 0.000411] printk: bootconsole [udbg0] disabled
[ 0.000411] printk: bootconsole [udbg0] disabled
[ 0.000508] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[ 0.000521] pid_max: default: 40960 minimum: 320
[ 0.000673] LSM: Security Framework initializing
[ 0.000759] Yama: becoming mindful.
[ 0.000781] SELinux: Initializing.
[ 0.000944] *** VALIDATE SELinux ***
[ 0.001035] Mount-cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[ 0.001073] Mountpoint-cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[ 0.001370] *** VALIDATE proc ***
[ 0.001623] *** VALIDATE cgroup1 ***
[ 0.001629] *** VALIDATE cgroup2 ***
[ 0.002225] EEH: pSeries platform initialized
[ 0.002233] POWER9 performance monitor hardware support registered
[ 0.002272] rcu: Hierarchical SRCU implementation.
[ 0.003078] smp: Bringing up secondary CPUs ...
[ 0.014555] smp: Brought up 2 nodes, 40 CPUs
[ 0.014564] numa: Node 0 CPUs:
[ 0.014567] numa: Node 3 CPUs: 0-39
[ 0.014571] Using small cores at SMT level
[ 0.014575] Using shared cache scheduler topology
[ 0.016465] devtmpfs: initialized
[ 0.031046] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.031058] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
[ 0.031602] NET: Registered protocol family 16
[ 0.031751] audit: initializing netlink subsys (disabled)
[ 0.031822] audit: type=2000 audit(1561384091.030:1): state=initialized audit_enabled=0 res=1
[ 0.032275] cpuidle: using governor menu
[ 0.032429] pstore: Registered nvram as persistent store backend
[ 60.061325] rcu: INFO: rcu_sched self-detected stall on CPU
[ 60.061333] rcu: 0-....: (5999 ticks this GP) idle=1ca/1/0x4000000000000002 softirq=5/5 fqs=2999
[ 60.061340] (t=6000 jiffies g=-1187 q=0)
[ 60.061343] NMI backtrace for cpu 0
[ 60.061349] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190620-autotest #1
[ 60.061354] Call Trace:
[ 60.061358] [c0000018ee813360] [c000000000b625cc] dump_stack+0xb0/0xf4 (unreliable)
[ 60.061365] [c0000018ee8133a0] [c000000000b6d544] nmi_cpu_backtrace+0x144/0x150
[ 60.061371] [c0000018ee813430] [c000000000b6d6fc] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
[ 60.061378] [c0000018ee8134d0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
[ 60.061385] [c0000018ee8134f0] [c0000000001c6210] rcu_dump_cpu_stacks+0x10c/0x16c
[ 60.061391] [c0000018ee813540] [c0000000001c5264] rcu_sched_clock_irq+0x744/0x990
[ 60.061397] [c0000018ee813610] [c0000000001d5e38] update_process_times+0x48/0x90
[ 60.061403] [c0000018ee813640] [c0000000001ea31c] tick_periodic+0x4c/0x120
[ 60.061408] [c0000018ee813670] [c0000000001ea430] tick_handle_periodic+0x40/0xe0
[ 60.061414] [c0000018ee8136b0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
[ 60.061420] [c0000018ee813710] [c000000000009204] decrementer_common+0x134/0x140
[ 60.061427] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
[ 60.061427] LR = arch_local_irq_restore+0x84/0x90
[ 60.061434] [c0000018ee813a10] [c0000018ee813bac] 0xc0000018ee813bac (unreliable)
[ 60.061441] [c0000018ee813a30] [c000000000b883e0] _raw_spin_unlock_irqrestore+0x50/0x80
[ 60.061447] [c0000018ee813a50] [c000000000b69e84] klist_next+0xb4/0x150
[ 60.061453] [c0000018ee813aa0] [c000000000766f60] subsys_find_device_by_id+0xf0/0x1a0
[ 60.061459] [c0000018ee813b00] [c000000000796464] find_memory_block_by_id+0x94/0xb0
[ 60.061466] [c0000018ee813b30] [c000000000797b1c] walk_memory_blocks+0x7c/0xf0
[ 60.061472] [c0000018ee813b80] [c000000000795f60] link_mem_sections+0x40/0x60
[ 60.061478] [c0000018ee813bb0] [c000000000f28c28] topology_init+0xa0/0x268
[ 60.061483] [c0000018ee813c10] [c000000000010448] do_one_initcall+0x68/0x2c0
[ 60.061489] [c0000018ee813ce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
[ 60.061495] [c0000018ee813db0] [c0000000000107c4] kernel_init+0x24/0x150
[ 60.061500] [c0000018ee813e20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
[ 88.016703] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 88.016709] Modules linked in:
[ 88.016713] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190620-autotest #1
[ 88.016718] NIP: c00000000000ab8c LR: c00000000001b434 CTR: 0000000000000000
[ 88.016724] REGS: c0000018ee813780 TRAP: 0901 Not tainted (5.2.0-rc5-next-20190620-autotest)
[ 88.016730] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44008484 XER: 00000005
[ 88.016738] CFAR: c000000000b69e30 IRQMASK: 0
[ 88.016738] GPR00: c000000000b883e0 c0000018ee813a10 c000000001544d00 0000000000000900
[ 88.016738] GPR04: 0000000000000000 0000000000000000 0000000000000001 0000000000000004
[ 88.016738] GPR08: 0000000000000000 c0000018eb436d80 0000000000000001 0000000000000220
[ 88.016738] GPR12: 0000000000000000 c000000001990000
[ 88.016761] NIP [c00000000000ab8c] replay_interrupt_return+0x0/0x4
[ 88.016767] LR [c00000000001b434] arch_local_irq_restore+0x84/0x90
[ 88.016771] Call Trace:
[ 88.016774] [c0000018ee813a10] [c0000018ee813bac] 0xc0000018ee813bac (unreliable)
[ 88.016780] [c0000018ee813a30] [c000000000b883e0] _raw_spin_unlock_irqrestore+0x50/0x80
[ 88.016786] [c0000018ee813a50] [c000000000b69e84] klist_next+0xb4/0x150
[ 88.016792] [c0000018ee813aa0] [c000000000766f60] subsys_find_device_by_id+0xf0/0x1a0
[ 88.016798] [c0000018ee813b00] [c000000000796464] find_memory_block_by_id+0x94/0xb0
[ 88.016804] [c0000018ee813b30] [c000000000797b1c] walk_memory_blocks+0x7c/0xf0
[ 88.016810] [c0000018ee813b80] [c000000000795f60] link_mem_sections+0x40/0x60
[ 88.016816] [c0000018ee813bb0] [c000000000f28c28] topology_init+0xa0/0x268
[ 88.016821] [c0000018ee813c10] [c000000000010448] do_one_initcall+0x68/0x2c0
[ 88.016827] [c0000018ee813ce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
[ 88.016833] [c0000018ee813db0] [c0000000000107c4] kernel_init+0x24/0x150
[ 88.016838] [c0000018ee813e20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
[ 88.016843] Instruction dump:
[ 88.016847] 7d200026 618c8000 2c030900 4182e568 2c030500 4182e010 2c030f00 4182f2d8
[ 88.016854] 2c030a00 4182ffb8 60000000 60000000 <4e800020> 7c781b78 480003e1 480003f9
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR
2019-06-24 14:09 [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR Sachin Sant
@ 2019-06-24 14:42 ` David Hildenbrand
2019-06-24 17:09 ` Sachin Sant
0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2019-06-24 14:42 UTC (permalink / raw)
To: Sachin Sant, linuxppc-dev, Stephen Rothwell; +Cc: linux-next, linux-kernel
On 24.06.19 16:09, Sachin Sant wrote:
> Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.
>
> This problem was introduced with next-20190620 (dc636f5d78).
> next-20190619 was last good kernel.
>
> Reverting following commit allows the kernel to boot.
> 2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()
>
>
> [ 0.014409] Using shared cache scheduler topology
> [ 0.016302] devtmpfs: initialized
> [ 0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> [ 0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
> [ 0.031575] NET: Registered protocol family 16
> [ 0.031724] audit: initializing netlink subsys (disabled)
> [ 0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
> [ 0.032249] cpuidle: using governor menu
> [ 0.032403] pstore: Registered nvram as persistent store backend
> [ 60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 60.061254] rcu: 0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999
> [ 60.061261] (t=6000 jiffies g=-1187 q=0)
> [ 60.061265] NMI backtrace for cpu 0
> [ 60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
> [ 60.061275] Call Trace:
> [ 60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
> [ 60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
> [ 60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
> [ 60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
> [ 60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
> [ 60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
> [ 60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
> [ 60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
> [ 60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
> [ 60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
> [ 60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
> [ 60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
> [ 60.061350] LR = arch_local_irq_restore+0x84/0x90
> [ 60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
> [ 60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
> [ 60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
> [ 60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
> [ 60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
> [ 60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
> [ 60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
> [ 60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
> [ 60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
> [ 60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
> [ 60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
> [ 88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [ 88.016569] Modules linked in:
>
Hi, thanks! Please see
https://lkml.org/lkml/2019/6/21/600
and especially
https://lkml.org/lkml/2019/6/21/908
Does this fix your problem? The fix is on its way to next.
Cheers!
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR
2019-06-24 14:42 ` David Hildenbrand
@ 2019-06-24 17:09 ` Sachin Sant
0 siblings, 0 replies; 3+ messages in thread
From: Sachin Sant @ 2019-06-24 17:09 UTC (permalink / raw)
To: David Hildenbrand
Cc: Stephen Rothwell, linux-next, linuxppc-dev, linux-kernel
> On 24-Jun-2019, at 8:12 PM, David Hildenbrand <david@redhat.com> wrote:
>
> On 24.06.19 16:09, Sachin Sant wrote:
>> Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.
>>
>> This problem was introduced with next-20190620 (dc636f5d78).
>> next-20190619 was last good kernel.
>>
>> Reverting following commit allows the kernel to boot.
>> 2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()
>>
>>
>> [ 0.014409] Using shared cache scheduler topology
>> [ 0.016302] devtmpfs: initialized
>> [ 0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
>> [ 0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
>> [ 0.031575] NET: Registered protocol family 16
>> [ 0.031724] audit: initializing netlink subsys (disabled)
>> [ 0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
>> [ 0.032249] cpuidle: using governor menu
>> [ 0.032403] pstore: Registered nvram as persistent store backend
>> [ 60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
>> [ 60.061254] rcu: 0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999
>> [ 60.061261] (t=6000 jiffies g=-1187 q=0)
>> [ 60.061265] NMI backtrace for cpu 0
>> [ 60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
>> [ 60.061275] Call Trace:
>> [ 60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
>> [ 60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
>> [ 60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
>> [ 60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
>> [ 60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
>> [ 60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
>> [ 60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
>> [ 60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
>> [ 60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
>> [ 60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
>> [ 60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
>> [ 60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
>> [ 60.061350] LR = arch_local_irq_restore+0x84/0x90
>> [ 60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
>> [ 60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
>> [ 60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
>> [ 60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
>> [ 60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
>> [ 60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
>> [ 60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
>> [ 60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
>> [ 60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
>> [ 60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
>> [ 60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
>> [ 88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
>> [ 88.016569] Modules linked in:
>>
>
> Hi, thanks! Please see
>
> https://lkml.org/lkml/2019/6/21/600
>
> and especially
>
> https://lkml.org/lkml/2019/6/21/908
>
> Does this fix your problem? The fix is on its way to next.
Yes, this patch fixes the problem for me.
Thanks
-Sachin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-06-24 17:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24 14:09 [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR Sachin Sant
2019-06-24 14:42 ` David Hildenbrand
2019-06-24 17:09 ` Sachin Sant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).