LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Hansen <dave@sr71.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>,
	dhillf@gmail.com, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu
Date: Tue, 09 Apr 2013 08:55:11 -0700
Message-ID: <516439DF.3050901@sr71.net> (raw)
In-Reply-To: <alpine.LFD.2.02.1304091635430.21884@ionos>

Hey Thomas,

I don't think the patch helped my case.  Looks like the same BUG_ON().

I accidentally booted with possible_cpus=10 instead of 160.  I wasn't
able to trigger this in that case, even repeatedly on/offlining them.
But, once I booted with possible_cpus=160, it triggered in a jiffy.

Two oopses below (bottom one has cpu numbers):

> [  467.106219] ------------[ cut here ]------------
> [  467.106400] kernel BUG at kernel/smpboot.c:134!
> [  467.106556] invalid opcode: 0000 [#1] SMP 
> [  467.106831] Modules linked in:
> [  467.107039] CPU 0 
> [  467.107109] Pid: 3095, comm: migration/115 Tainted: G        W    3.9.0-rc6-00020-g84ee980-dirty #132 FUJITSU-SV PRIMEQUEST 1800E2/SB
> [  467.107507] RIP: 0010:[<ffffffff8110bed8>]  [<ffffffff8110bed8>] smpboot_thread_fn+0x258/0x280
> [  467.107820] RSP: 0018:ffff887ff0561e08  EFLAGS: 00010202
> [  467.107980] RAX: 0000000000000000 RBX: ffff887ff04ef010 RCX: 000000000000b888
> [  467.108142] RDX: ffff887ff0561fd8 RSI: ffff881ffda00000 RDI: 0000000000000073
> [  467.108303] RBP: ffff887ff0561e38 R08: 0000000000000001 R09: 0000000000000000
> [  467.108465] R10: 0000000000000018 R11: 0000000000000000 R12: ffff887ff053c5c0
> [  467.108629] R13: ffffffff81e587a0 R14: ffff887ff053c5c0 R15: 0000000000000000
> [  467.108791] FS:  0000000000000000(0000) GS:ffff881ffda00000(0000) knlGS:0000000000000000
> [  467.109037] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  467.109194] CR2: 000000000117c278 CR3: 0000000001e0b000 CR4: 00000000000007f0
> [  467.109357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  467.109519] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  467.109684] Process migration/115 (pid: 3095, threadinfo ffff887ff0560000, task ffff887ff053c5c0)
> [  467.109930] Stack:
> [  467.110075]  ffff887ff0561e38 0000000000000000 ffff881fe60adcc0 ffff887ff0561ec0
> [  467.110580]  ffff887ff04ef010 ffffffff8110bc80 ffff887ff0561f48 ffffffff810ff1df
> [  467.111075]  0000000000000001 ffff881f00000073 ffff887ff04ef010 ffff887f00000001
> [  467.111568] Call Trace:
> [  467.111726]  [<ffffffff8110bc80>] ? __smpboot_create_thread+0x180/0x180
> [  467.111893]  [<ffffffff810ff1df>] kthread+0xef/0x100
> [  467.112057]  [<ffffffff8110e340>] ? complete+0x30/0x80
> [  467.112216]  [<ffffffff810ff0f0>] ? __init_kthread_worker+0x80/0x80
> [  467.112386]  [<ffffffff819db99c>] ret_from_fork+0x7c/0xb0
> [  467.112548]  [<ffffffff810ff0f0>] ? __init_kthread_worker+0x80/0x80
> [  467.112708] Code: ef 3d 01 01 48 89 df e8 c7 af 16 00 48 83 05 97 ef 3d 01 01 48 83 c4 10 31 c0 5b 41 5c 41 5d 41 5e 5d c3 48 83 05 c0 ef 3d 01 01 <0f> 0b 48 83 05 c6 ef 3d 01 01 48 83 05 86 ef 3d 01 01 0f 0b 48 
> [  467.117014] RIP  [<ffffffff8110bed8>] smpboot_thread_fn+0x258/0x280
> [  467.117233]  RSP <ffff887ff0561e08>
> [  467.117414] ---[ end trace d851dfb0bce51ca2 ]---

Here's the same oops, but with the line numbers munged because I added
some printks:

> [  161.551788] smpboot_thread_fn():
> [  161.551807] td->cpu: 132
> [  161.551808] smp_processor_id(): 121
> [  161.551811] comm: migration/%u
> [  161.551840] ------------[ cut here ]------------
> [  161.551939] kernel BUG at kernel/smpboot.c:149!
> [  161.552030] invalid opcode: 0000 [#1] SMP 
> [  161.552255] Modules linked in:
> [  161.552397] CPU 121 
> [  161.552474] Pid: 2957, comm: migration/132 Tainted: G        W    3.9.0-rc6-00020-g84ee980-dirty #136 FUJITSU-SV PRIMEQUEST 1800E2/SB
> [  161.552655] RIP: 0010:[<ffffffff8110bf29>]  [<ffffffff8110bf29>] smpboot_thread_fn+0x409/0x560
> [  161.552852] RSP: 0018:ffff88bff0403de8  EFLAGS: 00010202
> [  161.552935] RAX: 0000000000000079 RBX: ffff88bff02ac070 RCX: 0000000000000006
> [  161.553025] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff889ffec0d190
> [  161.553115] RBP: ffff88bff0403e38 R08: 0000000000000001 R09: 0000000000000001
> [  161.553204] R10: 0000000000000000 R11: 0000000000000b09 R12: ffff88bff04745c0
> [  161.553319] R13: ffffffff81e587a0 R14: ffffffff8110bb20 R15: ffff88bff04745c0
> [  161.553411] FS:  0000000000000000(0000) GS:ffff889ffec00000(0000) knlGS:0000000000000000
> [  161.553534] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  161.553619] CR2: 00007f0c4155c6d0 CR3: 0000000001e0b000 CR4: 00000000000007e0
> [  161.553709] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  161.553799] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  161.553889] Process migration/132 (pid: 2957, threadinfo ffff88bff0402000, task ffff88bff04745c0)
> [  161.554156] Stack:
> [  161.554312]  ffffffff8110bb20 ffff88bff04745c0 ffff88bff0403e08 0000000000000000
> [  161.554839]  ffff88bff0403e38 ffff881fef323cc0 ffff88bff0403ec0 ffff88bff02ac070
> [  161.555370]  ffffffff8110bb20 0000000000000000 ffff88bff0403f48 ffffffff810ff08f
> [  161.555891] Call Trace:
> [  161.556055]  [<ffffffff8110bb20>] ? __smpboot_create_thread+0x180/0x180
> [  161.556230]  [<ffffffff8110bb20>] ? __smpboot_create_thread+0x180/0x180
> [  161.556409]  [<ffffffff810ff08f>] kthread+0xef/0x100
> [  161.556590]  [<ffffffff819d5154>] ? wait_for_completion+0x124/0x180
> [  161.556761]  [<ffffffff810fefa0>] ? __init_kthread_worker+0x80/0x80
> [  161.556982]  [<ffffffff819e59dc>] ret_from_fork+0x7c/0xb0
> [  161.557148]  [<ffffffff810fefa0>] ? __init_kthread_worker+0x80/0x80
> [  161.557316] Code: 05 e4 f1 3d 01 01 e8 2b cf 8b 00 48 83 05 df f1 3d 01 01 65 8b 04 25 64 b0 00 00 39 03 0f 84 0c fd ff ff 48 83 05 cf f1 3d 01 01 <0f> 0b 48 83 05 cd f1 3d 01 01 0f 1f 44 00 00 b9 8b 00 00 00 48 
> [  161.561934] RIP  [<ffffffff8110bf29>] smpboot_thread_fn+0x409/0x560
> [  161.562171]  RSP <ffff88bff0403de8>
> [  161.562352] ---[ end trace 6a3b5261afedf7da ]---


  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-05 21:43 kernel BUG at kernel/smpboot.c:134! Dave Hansen
2013-04-06  7:12 ` Srivatsa S. Bhat
2013-04-06  8:31   ` Thomas Gleixner
2013-04-07  9:20     ` Thomas Gleixner
2013-04-07  9:50       ` Borislav Petkov
2013-04-08  9:24         ` Thomas Gleixner
2013-04-08 11:55           ` Borislav Petkov
2013-04-08 12:17             ` Thomas Gleixner
2013-04-09 14:38               ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Thomas Gleixner
2013-04-09 15:55                 ` Dave Hansen [this message]
2013-04-09 18:43                   ` Thomas Gleixner
2013-04-09 19:30                     ` Thomas Gleixner
2013-04-09 20:38                       ` Dave Hansen
2013-04-09 20:54                         ` Dave Hansen
2013-04-10  8:29                         ` Thomas Gleixner
2013-04-10 10:51                           ` Thomas Gleixner
2013-04-10 19:41                             ` Dave Hansen
2013-04-11 10:19                               ` Thomas Gleixner
2013-04-11 10:48                                 ` Srivatsa S. Bhat
2013-04-11 11:43                                   ` Srivatsa S. Bhat
2013-04-11 11:59                                     ` Srivatsa S. Bhat
2013-04-11 12:51                                     ` Thomas Gleixner
2013-04-11 12:54                                     ` Thomas Gleixner
2013-04-11 13:46                                   ` Thomas Gleixner
2013-04-11 18:07                                 ` Dave Hansen
2013-04-11 19:48                                   ` Thomas Gleixner
2013-04-10 14:03                   ` [PATCH] CPU hotplug, smpboot: Fix crash in smpboot_thread_fn() Srivatsa S. Bhat
2013-04-11  8:10                     ` Thomas Gleixner
2013-04-11 10:19                       ` Srivatsa S. Bhat
2013-04-11 19:16                 ` [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu Srivatsa S. Bhat
2013-04-11 20:47                   ` Thomas Gleixner
2013-04-11 21:19                     ` Srivatsa S. Bhat
2013-04-12 10:59                       ` Thomas Gleixner
2013-04-12 11:26                         ` Srivatsa S. Bhat
2013-04-15 19:49                         ` Dave Hansen
2013-04-12 10:41                 ` Peter Zijlstra
2013-04-12 12:32                 ` [tip:core/urgent] " tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516439DF.3050901@sr71.net \
    --to=dave@sr71.net \
    --cc=bp@alien8.de \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git