All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>
Subject: Re: sched: Unexpected reschedule of offline CPU#2!
Date: Fri, 16 Aug 2019 12:32:08 -0700	[thread overview]
Message-ID: <20190816193208.GA29478@roeck-us.net> (raw)
In-Reply-To: <alpine.DEB.2.21.1908161217380.1873@nanos.tec.linutronix.de>

On Fri, Aug 16, 2019 at 12:22:22PM +0200, Thomas Gleixner wrote:
> On Mon, 29 Jul 2019, Guenter Roeck wrote:
> > On Mon, Jul 29, 2019 at 12:47:45PM +0200, Peter Zijlstra wrote:
> > > On Mon, Jul 29, 2019 at 12:38:30PM +0200, Thomas Gleixner wrote:
> > > > Reboot has two modes:
> > > > 
> > > >  - Regular reboot initiated from user space
> > > > 
> > > >  - Panic reboot
> > > > 
> > > > For the regular reboot we can make it go through proper hotplug, 
> > > 
> > > That seems sensible.
> > > 
> > > > for the panic case not so much.
> > > 
> > > It's panic, shit has already hit fan, one or two more pieces shouldn't
> > > something anybody cares about.
> > > 
> > 
> > Some more digging shows that this happens a lot with Google GCE intances,
> > typically after a panic. The problem with that, if I understand correctly,
> > is that it may prevent coredumps from being written. So, while of course
> > the panic is what needs to be fixed, it is still quite annoying, and it
> > would help if this can be fixed for panic handling as well.
> > 
> > How about the patch suggested by Hillf Danton ? Would that help for the
> > panic case ?
> 
> I have no idea how that patch looks like, but the quick hack is below.
> 
> Thanks,
> 
> 	tglx
> 
> 8<---------------
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 75fea0d48c0e..625627b1457c 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -601,6 +601,7 @@ void stop_this_cpu(void *dummy)
>  	/*
>  	 * Remove this CPU:
>  	 */
> +	set_cpu_active(smp_processor_id(), false);
>  	set_cpu_online(smp_processor_id(), false);
>  	disable_local_APIC();
>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
> 
No luck. The problem is still seen with this patch applied on top of
the mainline kernel (commit a69e90512d9def6).

Guenter

---
[   22.315834] e1000e: EEE TX LPI TIMER: 00000000
[   22.323624] reboot: Restarting system
[   22.324260] reboot: machine restart
[   22.325885] ------------[ cut here ]------------
[   22.330425] sched: Unexpected reschedule of offline CPU#3!
ILLOPC: ffffffffb524403f: 0f 0b
[   22.330926] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smp.c:126 native_smp_send_reschedule+0x2f/0x40
[   22.331238] Modules linked in:
[   22.331427] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc4+ #1
[   22.331626] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   22.331971] RIP: 0010:native_smp_send_reschedule+0x2f/0x40
[   22.332164] Code: 05 de 81 95 01 73 15 48 8b 05 bd fa 61 01 be fd 00 00 00 48 8b 40 30 e9 6f d0 fb 00 89 fe 48 c7 c7 88 da 74 b6 e8 7f 6c 02 00 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 53 48 83 ec
[   22.332705] RSP: 0018:ffffa457800d0d68 EFLAGS: 00000086
[   22.332884] RAX: 0000000000000000 RBX: ffff9a8cbb9ba000 RCX: 0000000000000103
[   22.333109] RDX: 0000000080000103 RSI: 0000000000000000 RDI: 00000000ffffffff
[   22.333327] RBP: ffffa457800d0e90 R08: 0000000000000000 R09: 0000000000000000
[   22.333546] R10: 0000000000000000 R11: ffffa457800d0c10 R12: 000000000000a1b9
[   22.333767] R13: ffff9a8cbae26030 R14: ffff9a8cbae25f80 R15: ffff9a8cbb83a000
[   22.334045] FS:  0000000000000000(0000) GS:ffff9a8cbb880000(0000) knlGS:0000000000000000
[   22.334321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.334520] CR2: 00007fba66a35010 CR3: 0000000176cd6000 CR4: 00000000007406e0
[   22.334794] PKRU: 55555554
[   22.334915] Call Trace:
[   22.335062]  <IRQ>
[   22.335148]  check_preempt_curr+0x7f/0xc0
[   22.335295]  load_balance+0x589/0xc50
[   22.335513]  rebalance_domains+0x30d/0x410
[   22.335684]  _nohz_idle_balance+0x1bd/0x200
[   22.335854]  __do_softirq+0xe5/0x478
[   22.336023]  irq_exit+0xa9/0xc0
[   22.336163]  reschedule_interrupt+0xf/0x20
[   22.336317]  </IRQ>
[   22.336409] RIP: 0010:default_idle+0x23/0x180
[   22.336561] Code: ff 90 90 90 90 90 90 41 55 41 54 55 53 e8 45 75 7c ff 0f 1f 44 00 00 e8 0b aa 40 ff e9 07 00 00 00 0f 00 2d 31 94 4a 00 fb f4 <e8> 28 75 7c ff 89 c5 0f 1f 44 00 00 5b 5d 41 5c 41 5d c3 65 8b 05
[   22.337102] RSP: 0018:ffffa4578006bec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff02
[   22.337342] RAX: ffff9a8cbae23fc0 RBX: 0000000000000001 RCX: 0000000000000001
[   22.337561] RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffffffffb6852dd6
[   22.337780] RBP: ffffffffb6b9c1f8 R08: 0000000000000001 R09: 0000000000000000
[   22.337996] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   22.338229] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   22.338501]  do_idle+0x1df/0x260
[   22.338588]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   22.338706]  cpu_startup_entry+0x14/0x20
[   22.338793]  start_secondary+0x151/0x180
[   22.338885]  secondary_startup_64+0xa4/0xb0
[   22.339060] irq event stamp: 61631
[   22.339176] hardirqs last  enabled at (61630): [<ffffffffb5f5c6dc>] _raw_spin_unlock_irqrestore+0x4c/0x60
[   22.339373] hardirqs last disabled at (61631): [<ffffffffb5f5c46d>] _raw_spin_lock_irqsave+0xd/0x50
[   22.339568] softirqs last  enabled at (61626): [<ffffffffb5272bc8>] irq_enter+0x58/0x60
[   22.339726] softirqs last disabled at (61627): [<ffffffffb5272c79>] irq_exit+0xa9/0xc0
[   22.339897] ---[ end trace 8ad53445879058cc ]---
[   22.340384] ACPI MEMORY or I/O RESET_REG.

  reply	other threads:[~2019-08-16 19:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-27 16:44 sched: Unexpected reschedule of offline CPU#2! Guenter Roeck
2019-07-29  9:35 ` Peter Zijlstra
2019-07-29  9:58   ` Thomas Gleixner
2019-07-29 10:13     ` Peter Zijlstra
2019-07-29 10:38       ` Thomas Gleixner
2019-07-29 10:47         ` Peter Zijlstra
2019-07-29 20:50           ` Guenter Roeck
2019-08-16 10:22             ` Thomas Gleixner
2019-08-16 19:32               ` Guenter Roeck [this message]
2019-08-17 20:21                 ` Thomas Gleixner
2021-07-27  8:00                   ` Henning Schild
2021-07-27  8:00                     ` Henning Schild
2021-07-27  8:46                     ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190816193208.GA29478@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.