* 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
@ 2017-03-10 17:42 Meelis Roos
2017-03-10 17:46 ` Meelis Roos
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Meelis Roos @ 2017-03-10 17:42 UTC (permalink / raw)
To: sparclinux
I am seeing the following soft lockup on multiple Ultrasparc II era
sparc machines - Ultra 2, Netra X1, Blade 100 at least. 4.9 was fine.
Will attempt to bisect some time.
The lockups are detected routinely (every 2-3 files when compiling new
kernel) but the machine keeps running.
CC added because of NMI changes - no idea if it is related.
I have THP enabled and always on in most machines' kernel config.
[503216.376072] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
[503216.470787] Modules linked in: ipv6 loop snd_sun_cs4231 snd_pcm snd_timer sr_mod sg evdev cdrom snd soundcore sunhme parport_sunbpp parport flash
[503216.636654] CPU: 0 PID: 153 Comm: khugepaged Tainted: G L 4.10.0-11624-g0710f3f #162
[503216.747322] task: fffff8006c57e180 task.stack: fffff8006c5e0000
[503216.822501] TSTATE: 0000004480001607 TPC: 00000000007dfa68 TNPC: 00000000007dfa6c Y: 01ceecf2 Tainted: G L
[503216.959282] TPC: <_raw_spin_unlock_irqrestore+0x8/0x20>
[503217.026217] g0: 00000000007dfa68 g1: 0000400000000000 g2: 0000000000008000 g3: 000000006c014000
[503217.134716] g4: fffff8006c57e180 g5: fffff8006eef8000 g6: fffff8006c5e0000 g7: 0000000000000016
[503217.243359] o0: fffff8006cd2d6c0 o1: 0000000000000000 o2: 0000000000000000 o3: 0000000000000000
[503217.351857] o4: 000000007143c000 o5: fffff8006cd2d400 sp: fffff8006c5e3191 ret_pc: 0000000000442efc
[503217.464434] RPC: <tlb_batch_add_one+0x5c/0x100>
[503217.522754] l0: fffff8006eef8000 l1: 0000000000000002 l2: 00000000009160c0 l3: 000000000000000e
[503217.631244] l4: 0000000000000001 l5: 0000000000000001 l6: fffff8006da38000 l7: 00000000000003ef
[503217.739733] i0: fffff8006cd2d400 i1: 000000007143c000 i2: 0000000000000000 i3: 0000000000000000
[503217.848185] i4: fffff8006f8006e0 i5: 00000000009086e0 i6: fffff8006c5e3241 i7: 0000000000443230
[503217.956690] I7: <set_pmd_at+0x130/0x1a0>
[503218.007747] Call Trace:
[503218.040947] [0000000000443230] set_pmd_at+0x130/0x1a0
[503218.106503] [00000000005060f4] pmdp_collapse_flush+0x14/0x40
[503218.179404] [0000000000529f38] khugepaged_scan_pmd+0x3d8/0xac0
[503218.254370] [000000000052acd0] khugepaged+0x6b0/0xa80
[503218.319827] [0000000000470ef0] kthread+0xd0/0x120
[503218.381186] [0000000000406044] ret_from_fork+0x1c/0x2c
[503218.447700] [0000000000000000] (null)
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
2017-03-10 17:42 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153] Meelis Roos
@ 2017-03-10 17:46 ` Meelis Roos
2017-03-10 19:54 ` Meelis Roos
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Meelis Roos @ 2017-03-10 17:46 UTC (permalink / raw)
To: sparclinux
> I am seeing the following soft lockup on multiple Ultrasparc II era
> sparc machines - Ultra 2, Netra X1, Blade 100 at least. 4.9 was fine.
> Will attempt to bisect some time.
>
> The lockups are detected routinely (every 2-3 files when compiling new
> kernel) but the machine keeps running.
>
> CC added because of NMI changes - no idea if it is related.
>
> I have THP enabled and always on in most machines' kernel config.
>
> [503216.376072] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
Ocassionally, I also get this on the U2:
[500680.792223] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 39s!
[500680.891127] Showing busy workqueues and worker pools:
[500680.955396] workqueue events: flags=0x0
[500681.005072] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[500681.080732] pending: vmstat_shepherd
[500681.131337] workqueue events_freezable_power_: flags=0x84
[500681.199644] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[500681.275233] pending: disk_events_workfn
[500681.328847] workqueue kblockd: flags=0x18
[500681.380364] pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=2/256
[500681.457924] pending: blk_mq_timeout_work, blk_mq_timeout_work
[500681.534525] workqueue vmstat: flags=0xc
[500681.583956] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[500681.659357] pending: vmstat_update
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
2017-03-10 17:42 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153] Meelis Roos
2017-03-10 17:46 ` Meelis Roos
@ 2017-03-10 19:54 ` Meelis Roos
2017-04-09 12:09 ` Meelis Roos
2017-04-12 2:14 ` David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Meelis Roos @ 2017-03-10 19:54 UTC (permalink / raw)
To: sparclinux
> I am seeing the following soft lockup on multiple Ultrasparc II era
> sparc machines - Ultra 2, Netra X1, Blade 100 at least. 4.9 was fine.
> Will attempt to bisect some time.
Also seeing both warnings on T2000, so not specific to UltraSparc II.
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
2017-03-10 17:42 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153] Meelis Roos
2017-03-10 17:46 ` Meelis Roos
2017-03-10 19:54 ` Meelis Roos
@ 2017-04-09 12:09 ` Meelis Roos
2017-04-12 2:14 ` David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Meelis Roos @ 2017-04-09 12:09 UTC (permalink / raw)
To: sparclinux
> > I am seeing the following soft lockup on multiple Ultrasparc II era
> > sparc machines - Ultra 2, Netra X1, Blade 100 at least. 4.9 was fine.
> > Will attempt to bisect some time.
>
> Also seeing both warnings on T2000, so not specific to UltraSparc II.
Yesterday evening git has fixed that - at first it seems all affected
servers are OK.
I do not know when it was fixed - did not have time to test anything
more since I found the problem.
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153]
2017-03-10 17:42 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153] Meelis Roos
` (2 preceding siblings ...)
2017-04-09 12:09 ` Meelis Roos
@ 2017-04-12 2:14 ` David Miller
3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2017-04-12 2:14 UTC (permalink / raw)
To: sparclinux
From: Meelis Roos <mroos@linux.ee>
Date: Sun, 9 Apr 2017 15:09:07 +0300 (EEST)
>> > I am seeing the following soft lockup on multiple Ultrasparc II era
>> > sparc machines - Ultra 2, Netra X1, Blade 100 at least. 4.9 was fine.
>> > Will attempt to bisect some time.
>>
>> Also seeing both warnings on T2000, so not specific to UltraSparc II.
>
> Yesterday evening git has fixed that - at first it seems all affected
> servers are OK.
>
> I do not know when it was fixed - did not have time to test anything
> more since I found the problem.
I was probably the THP bug fix:
commit 76811263b3fa6347699a446cddeb63badf3e6095
Author: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Fri Mar 31 15:48:53 2017 -0700
sparc64: Fix memory corruption when THP is enabled
The memory corruption was happening due to incorrect
TLB/TSB flushing of hugepages.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-04-12 2:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-10 17:42 4.10 sparc64 regression: BUG: soft lockup - CPU#0 stuck for 23s! [khugepaged:153] Meelis Roos
2017-03-10 17:46 ` Meelis Roos
2017-03-10 19:54 ` Meelis Roos
2017-04-09 12:09 ` Meelis Roos
2017-04-12 2:14 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.