linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>
Subject: Re: sched: Unexpected reschedule of offline CPU#2!
Date: Mon, 29 Jul 2019 13:50:59 -0700	[thread overview]
Message-ID: <20190729205059.GA1127@roeck-us.net> (raw)
In-Reply-To: <20190729104745.GA31398@hirez.programming.kicks-ass.net>

On Mon, Jul 29, 2019 at 12:47:45PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 29, 2019 at 12:38:30PM +0200, Thomas Gleixner wrote:
> > On Mon, 29 Jul 2019, Peter Zijlstra wrote:
> > > On Mon, Jul 29, 2019 at 11:58:24AM +0200, Thomas Gleixner wrote:
> > > > On Mon, 29 Jul 2019, Peter Zijlstra wrote:
> > > > > On Sat, Jul 27, 2019 at 09:44:50AM -0700, Guenter Roeck wrote:
> > > > > > [   61.348866] Call Trace:
> > > > > > [   61.349392]  kick_ilb+0x90/0xa0
> > > > > > [   61.349629]  trigger_load_balance+0xf0/0x5c0
> > > > > > [   61.349859]  ? check_preempt_wakeup+0x1b0/0x1b0
> > > > > > [   61.350057]  scheduler_tick+0xa7/0xd0
> > > > > 
> > > > > kick_ilb() iterates nohz.idle_cpus_mask to find itself an idle_cpu().
> > > > > 
> > > > > idle_cpus_mask() is set from nohz_balance_enter_idle() and cleared from
> > > > > nohz_balance_exit_idle(). nohz_balance_enter_idle() is called from
> > > > > __tick_nohz_idle_stop_tick() when entering nohz idle, this includes the
> > > > > cpu_is_offline() clause of the idle loop.
> > > > > 
> > > > > However, when offline, cpu_active() should also be false, and this
> > > > > function should no-op.
> > > > 
> > > > Ha. That reboot mess is not clearing cpu active as it's not going through
> > > > the regular cpu hotplug path. It's using reboot IPI which 'stops' the cpus
> > > > dead in their tracks after clearing cpu online....
> > > 
> > > $string-of-cock-compliant-curses
> > > 
> > > What a trainwreck...
> > > 
> > > So if it doesn't play by the normal rules; how does it expect to work?
> > > 
> > > So what do we do? 'Fix' reboot or extend the rules?
> > 
> > Reboot has two modes:
> > 
> >  - Regular reboot initiated from user space
> > 
> >  - Panic reboot
> > 
> > For the regular reboot we can make it go through proper hotplug, 
> 
> That seems sensible.
> 
> > for the panic case not so much.
> 
> It's panic, shit has already hit fan, one or two more pieces shouldn't
> something anybody cares about.
> 

Some more digging shows that this happens a lot with Google GCE intances,
typically after a panic. The problem with that, if I understand correctly,
is that it may prevent coredumps from being written. So, while of course
the panic is what needs to be fixed, it is still quite annoying, and it
would help if this can be fixed for panic handling as well.

How about the patch suggested by Hillf Danton ? Would that help for the
panic case ?

Thanks,
Guenter

  reply	other threads:[~2019-07-29 20:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-27 16:44 sched: Unexpected reschedule of offline CPU#2! Guenter Roeck
2019-07-29  9:35 ` Peter Zijlstra
2019-07-29  9:58   ` Thomas Gleixner
2019-07-29 10:13     ` Peter Zijlstra
2019-07-29 10:38       ` Thomas Gleixner
2019-07-29 10:47         ` Peter Zijlstra
2019-07-29 20:50           ` Guenter Roeck [this message]
2019-08-16 10:22             ` Thomas Gleixner
2019-08-16 19:32               ` Guenter Roeck
2019-08-17 20:21                 ` Thomas Gleixner
2021-07-27  8:00                   ` Henning Schild
2021-07-27  8:46                     ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190729205059.GA1127@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).