linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tick/nohz: Fix softlockup on scheduler stalls in kvm guest
@ 2016-09-02  6:38 Wanpeng Li
  2016-09-02  8:31 ` [tip:timers/urgent] " tip-bot for Wanpeng Li
  0 siblings, 1 reply; 2+ messages in thread
From: Wanpeng Li @ 2016-09-02  6:38 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpeng.li@hotmail.com>

tick_nohz_start_idle() is prevented to be called if the idle tick can't 
be stopped since commit 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle 
enter"). As a result, after suspend/resume the host machine, full dynticks 
kvm guest will softlockup:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]
 Call Trace:
  default_idle+0x31/0x1a0
  arch_cpu_idle+0xf/0x20
  default_idle_call+0x2a/0x50
  cpu_startup_entry+0x39b/0x4d0
  rest_init+0x138/0x140
  ? rest_init+0x5/0x140
  start_kernel+0x4c1/0x4ce
  ? set_init_arg+0x55/0x55
  ? early_idt_handler_array+0x120/0x120
  x86_64_start_reservations+0x24/0x26
  x86_64_start_kernel+0x142/0x14f

In addition, cat /proc/stat | grep cpu in guest or host:

cpu  398 16 5049 15754 5490 0 1 46 0 0
cpu0 206 5 450 0 0 0 1 14 0 0
cpu1 81 0 3937 3149 1514 0 0 9 0 0
cpu2 45 6 332 6052 2243 0 0 11 0 0
cpu3 65 2 328 6552 1732 0 0 11 0 0

The idle and iowait states are weird 0 for cpu0(housekeeping). 

The bug is present in both guest and host kernels, and they both have 
cpu0's idle and iowait states issue, however, host kernel's suspend/resume 
path etc will touch watchdog to avoid the softlockup.

- The watchdog will not be touched in tick_nohz_stop_idle path (need be 
  touched since the scheduler stall is expected) if idle_active flags are 
  not detected.
- The idle and iowait states will not be accounted when exit idle loop 
  (resched or interrupt) if idle start time and idle_active flags are 
  not set. 

This patch fixes it by reverting commit 1f3b0f8243cb934 since can't stop 
idle tick doesn't mean can't be idle.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/time/tick-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 204fdc8..2ec7c00 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -908,10 +908,11 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 	ktime_t now, expires;
 	int cpu = smp_processor_id();
 
+	now = tick_nohz_start_idle(ts);
+
 	if (can_stop_idle_tick(cpu, ts)) {
 		int was_stopped = ts->tick_stopped;
 
-		now = tick_nohz_start_idle(ts);
 		ts->idle_calls++;
 
 		expires = tick_nohz_stop_sched_tick(ts, now, cpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [tip:timers/urgent] tick/nohz: Fix softlockup on scheduler stalls in kvm guest
  2016-09-02  6:38 [PATCH] tick/nohz: Fix softlockup on scheduler stalls in kvm guest Wanpeng Li
@ 2016-09-02  8:31 ` tip-bot for Wanpeng Li
  0 siblings, 0 replies; 2+ messages in thread
From: tip-bot for Wanpeng Li @ 2016-09-02  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, wanpeng.li, pbonzini, rkrcmar, sanjeev.yadav,
	peterz, mingo, tglx, gaurav.jindal, hpa

Commit-ID:  08d072599234c959b0b82b63fa252c129225a899
Gitweb:     http://git.kernel.org/tip/08d072599234c959b0b82b63fa252c129225a899
Author:     Wanpeng Li <wanpeng.li@hotmail.com>
AuthorDate: Fri, 2 Sep 2016 14:38:23 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 2 Sep 2016 10:25:40 +0200

tick/nohz: Fix softlockup on scheduler stalls in kvm guest

tick_nohz_start_idle() is prevented to be called if the idle tick can't 
be stopped since commit 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle 
enter"). As a result, after suspend/resume the host machine, full dynticks 
kvm guest will softlockup:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]
 Call Trace:
  default_idle+0x31/0x1a0
  arch_cpu_idle+0xf/0x20
  default_idle_call+0x2a/0x50
  cpu_startup_entry+0x39b/0x4d0
  rest_init+0x138/0x140
  ? rest_init+0x5/0x140
  start_kernel+0x4c1/0x4ce
  ? set_init_arg+0x55/0x55
  ? early_idt_handler_array+0x120/0x120
  x86_64_start_reservations+0x24/0x26
  x86_64_start_kernel+0x142/0x14f

In addition, cat /proc/stat | grep cpu in guest or host:

cpu  398 16 5049 15754 5490 0 1 46 0 0
cpu0 206 5 450 0 0 0 1 14 0 0
cpu1 81 0 3937 3149 1514 0 0 9 0 0
cpu2 45 6 332 6052 2243 0 0 11 0 0
cpu3 65 2 328 6552 1732 0 0 11 0 0

The idle and iowait states are weird 0 for cpu0(housekeeping). 

The bug is present in both guest and host kernels, and they both have 
cpu0's idle and iowait states issue, however, host kernel's suspend/resume 
path etc will touch watchdog to avoid the softlockup.

- The watchdog will not be touched in tick_nohz_stop_idle path (need be 
  touched since the scheduler stall is expected) if idle_active flags are 
  not detected.
- The idle and iowait states will not be accounted when exit idle loop 
  (resched or interrupt) if idle start time and idle_active flags are 
  not set. 

This patch fixes it by reverting commit 1f3b0f8243cb934 since can't stop 
idle tick doesn't mean can't be idle.

Fixes: 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle enter")
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com>
Cc: Gaurav Jindal<gaurav.jindal@spreadtrum.com>
Cc: stable@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/tick-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 204fdc8..2ec7c00 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -908,10 +908,11 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 	ktime_t now, expires;
 	int cpu = smp_processor_id();
 
+	now = tick_nohz_start_idle(ts);
+
 	if (can_stop_idle_tick(cpu, ts)) {
 		int was_stopped = ts->tick_stopped;
 
-		now = tick_nohz_start_idle(ts);
 		ts->idle_calls++;
 
 		expires = tick_nohz_stop_sched_tick(ts, now, cpu);

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-09-02  8:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02  6:38 [PATCH] tick/nohz: Fix softlockup on scheduler stalls in kvm guest Wanpeng Li
2016-09-02  8:31 ` [tip:timers/urgent] " tip-bot for Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).