linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at kernel/smpboot.c:134!
@ 2013-04-05 21:43 Dave Hansen
  2013-04-06  7:12 ` Srivatsa S. Bhat
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2013-04-05 21:43 UTC (permalink / raw)
  To: Srivatsa S. Bhat, linux-kernel, Thomas Gleixner

Hey Thomas,

I seem to be running in to smpboot_thread_fn()'s

	BUG_ON(td->cpu != smp_processor_id());

pretty regularly, both at boot and if I boot with maxcpus=x and then
online the CPUs from sysfs after boot.  It's a 160-logical-cpu system,
so it's quite a beast.  I _seem_ to be hitting it more often at higher
cpu counts, but it doesn't trigger on bringing up a particular CPU as
far as I can tell.

This is on a pull of mainline from today, e0a77f263.  Any ideas?

> [  790.223270] ------------[ cut here ]------------
> [  790.223966] kernel BUG at kernel/smpboot.c:134!
> [  790.224739] invalid opcode: 0000 [#1] SMP 
> [  790.225671] Modules linked in:
> [  790.226428] CPU 81 
> [  790.226909] Pid: 3909, comm: migration/135 Tainted: G        W    3.9.0-rc5-00184-gb6a9b7f-dirty #118 FUJITSU-SV PRIMEQUEST 1800E2/SB
> [  790.228775] RIP: 0010:[<ffffffff8110bee8>]  [<ffffffff8110bee8>] smpboot_thread_fn+0x258/0x280
> [  790.230205] RSP: 0018:ffff88bfef9c1e08  EFLAGS: 00010202
> [  790.231090] RAX: 0000000000000051 RBX: ffff88bfefb82000 RCX: 000000000000b888
> [  790.231653] RDX: ffff88bfef9c1fd8 RSI: ffff881fff000000 RDI: 0000000000000087
> [  790.232085] RBP: ffff88bfef9c1e38 R08: 0000000000000001 R09: 0000000000000000
> [  790.232850] R10: 0000000000000018 R11: 0000000000000000 R12: ffff88bfec9e22e0
> [  790.233561] R13: ffffffff81e587a0 R14: ffff88bfec9e22e0 R15: 0000000000000000
> [  790.234004] FS:  0000000000000000(0000) GS:ffff881fff000000(0000) knlGS:0000000000000000
> [  790.234918] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  790.235602] CR2: 00007fa89a333c62 CR3: 0000000001e0b000 CR4: 00000000000007e0
> [  790.236110] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  790.236584] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  790.237329] Process migration/135 (pid: 3909, threadinfo ffff88bfef9c0000, task ffff88bfec9e22e0)
> [  790.238321] Stack:
> [  790.238882]  ffff88bfef9c1e38 0000000000000000 ffff88ffef421cc0 ffff88bfef9c1ec0
> [  790.245415]  ffff88bfefb82000 ffffffff8110bc90 ffff88bfef9c1f48 ffffffff810ff1df
> [  790.250755]  0000000000000001 0000000000000087 ffff88bfefb82000 0000000000000000
> [  790.253365] Call Trace:
> [  790.254121]  [<ffffffff8110bc90>] ? __smpboot_create_thread+0x180/0x180
> [  790.255428]  [<ffffffff810ff1df>] kthread+0xef/0x100
> [  790.256071]  [<ffffffff819cb1a4>] ? wait_for_completion+0x124/0x180
> [  790.256697]  [<ffffffff810ff0f0>] ? __init_kthread_worker+0x80/0x80
> [  790.257325]  [<ffffffff819dba9c>] ret_from_fork+0x7c/0xb0
> [  790.258233]  [<ffffffff810ff0f0>] ? __init_kthread_worker+0x80/0x80
> [  790.258942] Code: ef 3d 01 01 48 89 df e8 87 b0 16 00 48 83 05 67 ef 3d 01 01 48 83 c4 10 31 c0 5b 41 5c 41 5d 41 5e 5d c3 48 83 05 90 ef 3d 01 01 <0f> 0b 48 83 05 96 ef 3d 01 01 48 83 05 56 ef 3d 01 01 0f 0b 48 
> [  790.276178] RIP  [<ffffffff8110bee8>] smpboot_thread_fn+0x258/0x280
> [  790.276735]  RSP <ffff88bfef9c1e08>
> [  790.278348] ---[ end trace 84baa2bee1434240 ]---



^ permalink raw reply	[flat|nested] 38+ messages in thread
* kernel BUG at kernel/smpboot.c:134!
@ 2014-09-23  0:09 Brian Norris
  0 siblings, 0 replies; 38+ messages in thread
From: Brian Norris @ 2014-09-23  0:09 UTC (permalink / raw)
  To: Linux Kernel; +Cc: linux-arm-kernel, Brian Norris

Hi all,

I'm asking here just to see if anyone had any good suggestions for me
here, or if the BUG() I hit looks familiar to anyone; I'm not expecting
anyone to solve my problem for me.

I've been testing out CPU hotplug [1] on an ARM v7 A15-based SMP system,
and I've seen various sorts of oopses, crashes, etc., most of which seem
to be in the scheduler code. Sometimes my $PC is somewhere out in the
weeds (jumping to 0x00000000, or some address not in the kernel text
region).

Anyway, the most promising result--for debugging purposes--was when I
was toggling CPU#1 on/off with a loop like this:

	while :
	do
		echo 0 > /sys/devices/system/cpu/cpu1/online
		echo 1 > /sys/devices/system/cpu/cpu1/online
	done

which managed to trigger the following BUG(), after about 4700 cycles
(oddly, on multiple occasions, failures happen at around 4700 to 4800
cycles):

...
[  164.737561] CPU1: Booted secondary processor
[  164.785821] CPU1: shutdown
[  164.788883] ------------[ cut here ]------------
[  164.793537] kernel BUG at kernel/smpboot.c:134!
[  164.793540] Internal error: Oops - BUG: 0 [#1] SMP ARM
[  164.793547] Modules linked in:
[  164.793553] CPU: 2 PID: 3 Comm: ksoftirqd/0 Not tainted 3.14.13-1.0pre-00342-g95275cee3dcd #220
[  164.793557] task: cd087140 ti: cd09a000 task.ti: cd09a000
[  164.793569] PC is at smpboot_thread_fn+0x174/0x17c
[  164.793572] LR is at smpboot_thread_fn+0x40/0x17c
[  164.793576] pc : [<c0046bb4>]    lr : [<c0046a80>]    psr: 800f0013
[  164.793576] sp : cd09bf40  ip : 00000000  fp : 00000000
[  164.793577] r10: cd09a000  r9 : 00000002  r8 : 00000000
[  164.793580] r7 : 00000001  r6 : c0f89548  r5 : cd09a000  r4 : cd03abc0
[  164.793582] r3 : 00000002  r2 : cd09bf40  r1 : 00000000  r0 : 00000000
[  164.793586] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  164.793589] Control: 30c5387d  Table: 0d35b5c0  DAC: 55555555
[  164.793592] Process ksoftirqd/0 (pid: 3, stack limit = 0xcd09a240)
[  164.793594] Stack: (0xcd09bf40 to 0xcd09c000)
[  164.793600] bf40: cd087140 cd03ab80 00000000 cd03abc0 c0046a40 00000000 00000000 00000000
[  164.793604] bf60: 00000000 c0040424 52bdbfb4 00000001 00000000 cd03abc0 00000000 00030003
[  164.793608] bf80: cd09bf80 cd09bf80 00000000 00000000 cd09bf90 cd09bf90 cd09bfac cd03ab80
[  164.793611] bfa0: c0040350 00000000 00000000 c000edb8 00000000 00000000 00000000 00000000
[  164.793614] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  164.793617] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 87fd818d 2b1a63eb
[  164.793630] [<c0046bb4>] (smpboot_thread_fn) from [<c0040424>] (kthread+0xd4/0xec)
[  164.793639] [<c0040424>] (kthread) from [<c000edb8>] (ret_from_fork+0x14/0x3c)
[  164.793644] Code: e1a00004 eb0204ba e3a00000 e8bd8ff8 (e7f001f2) 
[  164.793651] ---[ end trace d7127a76ecca6b80 ]---

This test is on a 3.14.13-based kernel, but I retested on a more recent
kernel (around 3.17-rc3), and I see very similar corruption and failures
(although I haven't yet triggered this specific BUG() in my limited
testing).

Any comments are welcome. I'll try to remember to update here if I
figure anything out.

Thanks,
Brian

[1] I actually encountered errors while testing suspend-to-RAM, but I
    (correctly) suspected the problems were occurring in the hotplug /
    disable_nonboot_cpus() code path.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-09-23  0:09 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-05 21:43 kernel BUG at kernel/smpboot.c:134! Dave Hansen
2013-04-06  7:12 ` Srivatsa S. Bhat
2013-04-06  8:31   ` Thomas Gleixner
2013-04-07  9:20     ` Thomas Gleixner
2013-04-07  9:50       ` Borislav Petkov
2013-04-08  9:24         ` Thomas Gleixner
2013-04-08 11:55           ` Borislav Petkov
2013-04-08 12:17             ` Thomas Gleixner
2013-04-09 14:38               ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Thomas Gleixner
2013-04-09 15:55                 ` Dave Hansen
2013-04-09 18:43                   ` Thomas Gleixner
2013-04-09 19:30                     ` Thomas Gleixner
2013-04-09 20:38                       ` Dave Hansen
2013-04-09 20:54                         ` Dave Hansen
2013-04-10  8:29                         ` Thomas Gleixner
2013-04-10 10:51                           ` Thomas Gleixner
2013-04-10 19:41                             ` Dave Hansen
2013-04-11 10:19                               ` Thomas Gleixner
2013-04-11 10:48                                 ` Srivatsa S. Bhat
2013-04-11 11:43                                   ` Srivatsa S. Bhat
2013-04-11 11:59                                     ` Srivatsa S. Bhat
2013-04-11 12:51                                     ` Thomas Gleixner
2013-04-11 12:54                                     ` Thomas Gleixner
2013-04-11 13:46                                   ` Thomas Gleixner
2013-04-11 18:07                                 ` Dave Hansen
2013-04-11 19:48                                   ` Thomas Gleixner
2013-04-10 14:03                   ` [PATCH] CPU hotplug, smpboot: Fix crash in smpboot_thread_fn() Srivatsa S. Bhat
2013-04-11  8:10                     ` Thomas Gleixner
2013-04-11 10:19                       ` Srivatsa S. Bhat
2013-04-11 19:16                 ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Srivatsa S. Bhat
2013-04-11 20:47                   ` Thomas Gleixner
2013-04-11 21:19                     ` Srivatsa S. Bhat
2013-04-12 10:59                       ` Thomas Gleixner
2013-04-12 11:26                         ` Srivatsa S. Bhat
2013-04-15 19:49                         ` Dave Hansen
2013-04-12 10:41                 ` Peter Zijlstra
2013-04-12 12:32                 ` [tip:core/urgent] " tip-bot for Thomas Gleixner
2014-09-23  0:09 kernel BUG at kernel/smpboot.c:134! Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).