All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Guenter Roeck <linux@roeck-us.net>
Cc: x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>
Subject: Re: sched: Unexpected reschedule of offline CPU#2!
Date: Mon, 29 Jul 2019 11:35:45 +0200	[thread overview]
Message-ID: <20190729093545.GV31381@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190727164450.GA11726@roeck-us.net>

On Sat, Jul 27, 2019 at 09:44:50AM -0700, Guenter Roeck wrote:
> Hi,
> 
> I see the following traceback (or similar tracebacks) once in a while
> during my boot tests. In this specific case it is with mainline
> (v5.3-rc1-195-g3ea54d9b0d65), but I have seen it with other branches
> as well. This isn't a new problem; I have seen it for quite some time.
> There is no specific action required to make it appear; just running
> reboot loops is sufficient. The problem doesn't happen a lot;
> non-scientifically I would say I see it maybe once every few hundred
> boots.
> 
> No specific action requested or asked for; this is just informational.
> 
> A complete log is at:
> https://kerneltests.org/builders/qemu-x86-master/builds/1285/steps/qemubuildcommand/logs/stdio
> 
> Guenter
> 
> ---
> [   61.248329] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   61.268277] e1000e: EEE TX LPI TIMER: 00000000
> [   61.311435] reboot: Restarting system
> [   61.312321] reboot: machine restart
> [   61.342193] ------------[ cut here ]------------
> [   61.342660] sched: Unexpected reschedule of offline CPU#2!
> ILLOPC: ce241f83: 0f 0b
> [   61.344323] WARNING: CPU: 1 PID: 15 at arch/x86/kernel/smp.c:126 native_smp_send_reschedule+0x33/0x40
> [   61.344836] Modules linked in:
> [   61.345694] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 5.3.0-rc1+ #1
> [   61.345998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [   61.346569] EIP: native_smp_send_reschedule+0x33/0x40
> [   61.347099] Code: cf 73 1c 8b 15 60 54 2b cf 8b 4a 18 ba fd 00 00 00 e8 05 65 c7 00 c9 c3 8d b4 26 00 00 00 00 50 68 04 ca 1a cf e8 fe e3 01 00 <0f> 0b 58 5a c9 c3 8d b4 26 00 00 00 00 55 89 e5 56 53 83 ec 0c 65
> [   61.347726] EAX: 0000002e EBX: 00000002 ECX: 00000000 EDX: cdd64140
> [   61.347977] ESI: 00000002 EDI: 00000000 EBP: cdd73c88 ESP: cdd73c80
> [   61.348234] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000096
> [   61.348514] CR0: 80050033 CR2: b7ee7048 CR3: 0c28f000 CR4: 000006d0
> [   61.348866] Call Trace:
> [   61.349392]  kick_ilb+0x90/0xa0
> [   61.349629]  trigger_load_balance+0xf0/0x5c0
> [   61.349859]  ? check_preempt_wakeup+0x1b0/0x1b0
> [   61.350057]  scheduler_tick+0xa7/0xd0

kick_ilb() iterates nohz.idle_cpus_mask to find itself an idle_cpu().

idle_cpus_mask() is set from nohz_balance_enter_idle() and cleared from
nohz_balance_exit_idle(). nohz_balance_enter_idle() is called from
__tick_nohz_idle_stop_tick() when entering nohz idle, this includes the
cpu_is_offline() clause of the idle loop.

However, when offline, cpu_active() should also be false, and this
function should no-op.

Then we have nohz_balance_exit_idle() from sched_cpu_dying(), which
should explicitly clear the CPU from the mask when going offline.

So I'm not immediately seeing how we can select an offline CPU to kick.



  reply	other threads:[~2019-07-29  9:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-27 16:44 sched: Unexpected reschedule of offline CPU#2! Guenter Roeck
2019-07-29  9:35 ` Peter Zijlstra [this message]
2019-07-29  9:58   ` Thomas Gleixner
2019-07-29 10:13     ` Peter Zijlstra
2019-07-29 10:38       ` Thomas Gleixner
2019-07-29 10:47         ` Peter Zijlstra
2019-07-29 20:50           ` Guenter Roeck
2019-08-16 10:22             ` Thomas Gleixner
2019-08-16 19:32               ` Guenter Roeck
2019-08-17 20:21                 ` Thomas Gleixner
2021-07-27  8:00                   ` Henning Schild
2021-07-27  8:00                     ` Henning Schild
2021-07-27  8:46                     ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190729093545.GV31381@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.