linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR
@ 2019-06-24 14:09 Sachin Sant
  2019-06-24 14:42 ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Sachin Sant @ 2019-06-24 14:09 UTC (permalink / raw)
  To: linuxppc-dev, david, Stephen Rothwell; +Cc: linux-next, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3443 bytes --]

Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.

This problem was introduced with next-20190620 (dc636f5d78).
next-20190619 was last good kernel.

Reverting following commit allows the kernel to boot.
2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()


[    0.014409] Using shared cache scheduler topology
[    0.016302] devtmpfs: initialized
[    0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
[    0.031575] NET: Registered protocol family 16
[    0.031724] audit: initializing netlink subsys (disabled)
[    0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
[    0.032249] cpuidle: using governor menu
[    0.032403] pstore: Registered nvram as persistent store backend
[   60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
[   60.061254] rcu: 	0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999 
[   60.061261] 	(t=6000 jiffies g=-1187 q=0)
[   60.061265] NMI backtrace for cpu 0
[   60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
[   60.061275] Call Trace:
[   60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
[   60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
[   60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
[   60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
[   60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
[   60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
[   60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
[   60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
[   60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
[   60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
[   60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
[   60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
[   60.061350]     LR = arch_local_irq_restore+0x84/0x90
[   60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
[   60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
[   60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
[   60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
[   60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
[   60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
[   60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
[   60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
[   60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
[   60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
[   60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
[   88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[   88.016569] Modules linked in:


Thanks
-Sachin



[-- Attachment #2: boot.log --]
[-- Type: application/octet-stream, Size: 11416 bytes --]

[    0.000000] hash-mmu: Page sizes from device-tree:
[    0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[    0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
[    0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
[    0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[    0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
[    0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
[    0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
[    0.000000] Using 1TB segments
[    0.000000] hash-mmu: Initializing hash mmu with SLB
[    0.000000] Linux version 5.2.0-rc5-next-20190620-autotest (root@ltczep10-lp3.aus.stglabs.ibm.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) #1 SMP Mon Jun 24 06:38:12 PDT 2019
[    0.000000] Found initrd at 0xc0000000032b0000:0xc0000000044c4efc
[    0.000000] Using pSeries machine description
[    0.000000] printk: bootconsole [udbg0] enabled
[    0.000000] Partition configured for 40 cpus.
[    0.000000] CPU maps initialized for 8 threads per core
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x1900000000
[    0.000000] dcache_bsize      = 0x80
[    0.000000] icache_bsize      = 0x80
[    0.000000] cpu_features      = 0x0000c07f8f5f91a7
[    0.000000]   possible        = 0x0000fbffcf5fb1a7
[    0.000000]   always          = 0x00000003800081a1
[    0.000000] cpu_user_features = 0xdc0065c2 0xefe00000
[    0.000000] mmu_features      = 0x7c006001
[    0.000000] firmware_features = 0x00000013c45bfc57
[    0.000000] hash-mmu: ppc64_pft_size    = 0x1e
[    0.000000] hash-mmu: htab_hash_mask    = 0x7fffff
[    0.000000] hash-mmu: kernel vmalloc start   = 0xc008000000000000
[    0.000000] hash-mmu: kernel IO start        = 0xc00a000000000000
[    0.000000] hash-mmu: kernel vmemmap start   = 0xc00c000000000000
[    0.000000] -----------------------------------------------------
[    0.000000] numa:   NODE_DATA [mem 0x18ffe98980-0x18ffe9ffff]
[    0.000000] numa:     NODE_DATA(0) on node 3
[    0.000000] numa:   NODE_DATA [mem 0x18ffe91300-0x18ffe9897f]
[    0.000000] rfi-flush: fallback displacement flush available
[    0.000000] rfi-flush: mttrig type flush available
[    0.000000] count-cache-flush: full software flush sequence enabled.
[    0.000000] stf-barrier: eieio barrier available
[    0.000000] PPC64 nvram contains 15360 bytes
[    0.000000] barrier-nospec: using ORI speculation barrier
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x00000018ffffffff]
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   3: [mem 0x0000000000000000-0x00000018ffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
[    0.000000] Initmem setup node 3 [mem 0x0000000000000000-0x00000018ffffffff]
[    0.000000] percpu: Embedded 11 pages/cpu s631960 r0 d88936 u1048576
[    0.000000] node[0] zonelist: 3:Normal 
[    0.000000] node[3] zonelist: 3:Normal 
[    0.000000] Built 2 zonelists, mobility grouping on.  Total pages: 1636800
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: root=UUID=5b503143-3264-4a73-bd1b-2b1cd7d250f6 
[    0.000000] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[    0.000000] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 104547200K/104857600K available (11840K kernel code, 1792K rwdata, 3584K rodata, 4992K init, 3850K bss, 310400K reserved, 0K cma-reserved)
[    0.000000] random: get_random_u64 called from cache_random_seq_create+0xa4/0x1c0 with crng_init=0
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=40, Nodes=32
[    0.000000] ftrace: allocating 31746 entries in 12 pages
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=40.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=40
[    0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[    0.000000] rcu: 	Offload RCU callbacks from CPUs: (none).
[    0.000002] time_init: 56 bit decrementer (max: 7fffffffffffff)
[    0.000067] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
[    0.000178] clocksource: timebase mult[1f40000] shift[24] registered
[    0.000312] Console: colour dummy device 80x25
[    0.000363] printk: console [hvc0] enabled
[    0.000363] printk: console [hvc0] enabled
[    0.000411] printk: bootconsole [udbg0] disabled
[    0.000411] printk: bootconsole [udbg0] disabled
[    0.000508] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    0.000521] pid_max: default: 40960 minimum: 320
[    0.000673] LSM: Security Framework initializing
[    0.000759] Yama: becoming mindful.
[    0.000781] SELinux:  Initializing.
[    0.000944] *** VALIDATE SELinux ***
[    0.001035] Mount-cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[    0.001073] Mountpoint-cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[    0.001370] *** VALIDATE proc ***
[    0.001623] *** VALIDATE cgroup1 ***
[    0.001629] *** VALIDATE cgroup2 ***
[    0.002225] EEH: pSeries platform initialized
[    0.002233] POWER9 performance monitor hardware support registered
[    0.002272] rcu: Hierarchical SRCU implementation.
[    0.003078] smp: Bringing up secondary CPUs ...
[    0.014555] smp: Brought up 2 nodes, 40 CPUs
[    0.014564] numa: Node 0 CPUs:
[    0.014567] numa: Node 3 CPUs: 0-39
[    0.014571] Using small cores at SMT level
[    0.014575] Using shared cache scheduler topology
[    0.016465] devtmpfs: initialized
[    0.031046] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.031058] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
[    0.031602] NET: Registered protocol family 16
[    0.031751] audit: initializing netlink subsys (disabled)
[    0.031822] audit: type=2000 audit(1561384091.030:1): state=initialized audit_enabled=0 res=1
[    0.032275] cpuidle: using governor menu
[    0.032429] pstore: Registered nvram as persistent store backend
[   60.061325] rcu: INFO: rcu_sched self-detected stall on CPU
[   60.061333] rcu: 	0-....: (5999 ticks this GP) idle=1ca/1/0x4000000000000002 softirq=5/5 fqs=2999 
[   60.061340] 	(t=6000 jiffies g=-1187 q=0)
[   60.061343] NMI backtrace for cpu 0
[   60.061349] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190620-autotest #1
[   60.061354] Call Trace:
[   60.061358] [c0000018ee813360] [c000000000b625cc] dump_stack+0xb0/0xf4 (unreliable)
[   60.061365] [c0000018ee8133a0] [c000000000b6d544] nmi_cpu_backtrace+0x144/0x150
[   60.061371] [c0000018ee813430] [c000000000b6d6fc] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
[   60.061378] [c0000018ee8134d0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
[   60.061385] [c0000018ee8134f0] [c0000000001c6210] rcu_dump_cpu_stacks+0x10c/0x16c
[   60.061391] [c0000018ee813540] [c0000000001c5264] rcu_sched_clock_irq+0x744/0x990
[   60.061397] [c0000018ee813610] [c0000000001d5e38] update_process_times+0x48/0x90
[   60.061403] [c0000018ee813640] [c0000000001ea31c] tick_periodic+0x4c/0x120
[   60.061408] [c0000018ee813670] [c0000000001ea430] tick_handle_periodic+0x40/0xe0
[   60.061414] [c0000018ee8136b0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
[   60.061420] [c0000018ee813710] [c000000000009204] decrementer_common+0x134/0x140
[   60.061427] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
[   60.061427]     LR = arch_local_irq_restore+0x84/0x90
[   60.061434] [c0000018ee813a10] [c0000018ee813bac] 0xc0000018ee813bac (unreliable)
[   60.061441] [c0000018ee813a30] [c000000000b883e0] _raw_spin_unlock_irqrestore+0x50/0x80
[   60.061447] [c0000018ee813a50] [c000000000b69e84] klist_next+0xb4/0x150
[   60.061453] [c0000018ee813aa0] [c000000000766f60] subsys_find_device_by_id+0xf0/0x1a0
[   60.061459] [c0000018ee813b00] [c000000000796464] find_memory_block_by_id+0x94/0xb0
[   60.061466] [c0000018ee813b30] [c000000000797b1c] walk_memory_blocks+0x7c/0xf0
[   60.061472] [c0000018ee813b80] [c000000000795f60] link_mem_sections+0x40/0x60
[   60.061478] [c0000018ee813bb0] [c000000000f28c28] topology_init+0xa0/0x268
[   60.061483] [c0000018ee813c10] [c000000000010448] do_one_initcall+0x68/0x2c0
[   60.061489] [c0000018ee813ce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
[   60.061495] [c0000018ee813db0] [c0000000000107c4] kernel_init+0x24/0x150
[   60.061500] [c0000018ee813e20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
[   88.016703] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[   88.016709] Modules linked in:
[   88.016713] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190620-autotest #1
[   88.016718] NIP:  c00000000000ab8c LR: c00000000001b434 CTR: 0000000000000000
[   88.016724] REGS: c0000018ee813780 TRAP: 0901   Not tainted  (5.2.0-rc5-next-20190620-autotest)
[   88.016730] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44008484  XER: 00000005
[   88.016738] CFAR: c000000000b69e30 IRQMASK: 0 
[   88.016738] GPR00: c000000000b883e0 c0000018ee813a10 c000000001544d00 0000000000000900 
[   88.016738] GPR04: 0000000000000000 0000000000000000 0000000000000001 0000000000000004 
[   88.016738] GPR08: 0000000000000000 c0000018eb436d80 0000000000000001 0000000000000220 
[   88.016738] GPR12: 0000000000000000 c000000001990000 
[   88.016761] NIP [c00000000000ab8c] replay_interrupt_return+0x0/0x4
[   88.016767] LR [c00000000001b434] arch_local_irq_restore+0x84/0x90
[   88.016771] Call Trace:
[   88.016774] [c0000018ee813a10] [c0000018ee813bac] 0xc0000018ee813bac (unreliable)
[   88.016780] [c0000018ee813a30] [c000000000b883e0] _raw_spin_unlock_irqrestore+0x50/0x80
[   88.016786] [c0000018ee813a50] [c000000000b69e84] klist_next+0xb4/0x150
[   88.016792] [c0000018ee813aa0] [c000000000766f60] subsys_find_device_by_id+0xf0/0x1a0
[   88.016798] [c0000018ee813b00] [c000000000796464] find_memory_block_by_id+0x94/0xb0
[   88.016804] [c0000018ee813b30] [c000000000797b1c] walk_memory_blocks+0x7c/0xf0
[   88.016810] [c0000018ee813b80] [c000000000795f60] link_mem_sections+0x40/0x60
[   88.016816] [c0000018ee813bb0] [c000000000f28c28] topology_init+0xa0/0x268
[   88.016821] [c0000018ee813c10] [c000000000010448] do_one_initcall+0x68/0x2c0
[   88.016827] [c0000018ee813ce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
[   88.016833] [c0000018ee813db0] [c0000000000107c4] kernel_init+0x24/0x150
[   88.016838] [c0000018ee813e20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
[   88.016843] Instruction dump:
[   88.016847] 7d200026 618c8000 2c030900 4182e568 2c030500 4182e010 2c030f00 4182f2d8 
[   88.016854] 2c030a00 4182ffb8 60000000 60000000 <4e800020> 7c781b78 480003e1 480003f9 



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR
  2019-06-24 14:09 [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR Sachin Sant
@ 2019-06-24 14:42 ` David Hildenbrand
  2019-06-24 17:09   ` Sachin Sant
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2019-06-24 14:42 UTC (permalink / raw)
  To: Sachin Sant, linuxppc-dev, Stephen Rothwell; +Cc: linux-next, linux-kernel

On 24.06.19 16:09, Sachin Sant wrote:
> Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.
> 
> This problem was introduced with next-20190620 (dc636f5d78).
> next-20190619 was last good kernel.
> 
> Reverting following commit allows the kernel to boot.
> 2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()
> 
> 
> [    0.014409] Using shared cache scheduler topology
> [    0.016302] devtmpfs: initialized
> [    0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> [    0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
> [    0.031575] NET: Registered protocol family 16
> [    0.031724] audit: initializing netlink subsys (disabled)
> [    0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
> [    0.032249] cpuidle: using governor menu
> [    0.032403] pstore: Registered nvram as persistent store backend
> [   60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
> [   60.061254] rcu: 	0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999 
> [   60.061261] 	(t=6000 jiffies g=-1187 q=0)
> [   60.061265] NMI backtrace for cpu 0
> [   60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
> [   60.061275] Call Trace:
> [   60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
> [   60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
> [   60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
> [   60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
> [   60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
> [   60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
> [   60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
> [   60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
> [   60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
> [   60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
> [   60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
> [   60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
> [   60.061350]     LR = arch_local_irq_restore+0x84/0x90
> [   60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
> [   60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
> [   60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
> [   60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
> [   60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
> [   60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
> [   60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
> [   60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
> [   60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
> [   60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
> [   60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
> [   88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [   88.016569] Modules linked in:
> 

Hi, thanks! Please see

https://lkml.org/lkml/2019/6/21/600

and especially

https://lkml.org/lkml/2019/6/21/908

Does this fix your problem? The fix is on its way to next.

Cheers!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR
  2019-06-24 14:42 ` David Hildenbrand
@ 2019-06-24 17:09   ` Sachin Sant
  0 siblings, 0 replies; 3+ messages in thread
From: Sachin Sant @ 2019-06-24 17:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Stephen Rothwell, linux-next, linuxppc-dev, linux-kernel



> On 24-Jun-2019, at 8:12 PM, David Hildenbrand <david@redhat.com> wrote:
> 
> On 24.06.19 16:09, Sachin Sant wrote:
>> Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.
>> 
>> This problem was introduced with next-20190620 (dc636f5d78).
>> next-20190619 was last good kernel.
>> 
>> Reverting following commit allows the kernel to boot.
>> 2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()
>> 
>> 
>> [    0.014409] Using shared cache scheduler topology
>> [    0.016302] devtmpfs: initialized
>> [    0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
>> [    0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
>> [    0.031575] NET: Registered protocol family 16
>> [    0.031724] audit: initializing netlink subsys (disabled)
>> [    0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
>> [    0.032249] cpuidle: using governor menu
>> [    0.032403] pstore: Registered nvram as persistent store backend
>> [   60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
>> [   60.061254] rcu: 	0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999 
>> [   60.061261] 	(t=6000 jiffies g=-1187 q=0)
>> [   60.061265] NMI backtrace for cpu 0
>> [   60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
>> [   60.061275] Call Trace:
>> [   60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
>> [   60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
>> [   60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
>> [   60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
>> [   60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
>> [   60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
>> [   60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
>> [   60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
>> [   60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
>> [   60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
>> [   60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
>> [   60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
>> [   60.061350]     LR = arch_local_irq_restore+0x84/0x90
>> [   60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
>> [   60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
>> [   60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
>> [   60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
>> [   60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
>> [   60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
>> [   60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
>> [   60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
>> [   60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
>> [   60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
>> [   60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
>> [   88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
>> [   88.016569] Modules linked in:
>> 
> 
> Hi, thanks! Please see
> 
> https://lkml.org/lkml/2019/6/21/600
> 
> and especially
> 
> https://lkml.org/lkml/2019/6/21/908
> 
> Does this fix your problem? The fix is on its way to next.

Yes, this patch fixes the problem for me.

Thanks
-Sachin


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-24 17:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24 14:09 [next][PowerPC] RCU stalls while booting linux-next on PowerVM LPAR Sachin Sant
2019-06-24 14:42 ` David Hildenbrand
2019-06-24 17:09   ` Sachin Sant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).