All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection
Date: Fri, 21 May 2021 11:01:27 -0700	[thread overview]
Message-ID: <20210521180127.GD4441@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20210521155624.174524-1-senozhatsky@chromium.org>

On Sat, May 22, 2021 at 12:56:23AM +0900, Sergey Senozhatsky wrote:
> Soft watchdog timer function checks if a virtual machine
> was suspended and hence what looks like a lockup in fact
> is a false positive.
> 
> This is what kvm_check_and_clear_guest_paused() does: it
> tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> and if it's set then we need to touch all watchdogs and bail
> out.
> 
> Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> check works fine.
> 
> There is, however, one more watchdog that runs from IRQ, so
> watchdog timer fn races with it, and that watchdog is not aware
> of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> 
> apic_timer_interrupt()
>  smp_apic_timer_interrupt()
>   hrtimer_interrupt()
>    __hrtimer_run_queues()
>     tick_sched_timer()
>      tick_sched_handle()
>       update_process_times()
>        rcu_sched_clock_irq()
> 
> This triggers RCU stalls on our devices during VM resume.
> 
> If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> then there is nothing on this VCPU that touches watchdogs and
> RCU reads stale gp stall timestamp and new jiffies value, which
> makes it think that RCU has stalled.
> 
> Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> don't report RCU stalls when we resume the VM.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>

I have queued both for testing and further review, thank you!

							Thanx, Paul

> ---
> 
> v2: fixed powerpc build breakage
> 
>  kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index d574e3bbd929..bc689911a81d 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -7,6 +7,8 @@
>   * Author: Paul E. McKenney <paulmck@linux.ibm.com>
>   */
>  
> +#include <linux/kvm_para.h>
> +
>  //////////////////////////////////////////////////////////////////////////////
>  //
>  // Controlling CPU stall warnings, including delay calculation.
> @@ -698,6 +700,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>  	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> +		/*
> +		 * If a virtual machine is stopped by the host it can look to
> +		 * the watchdog like an RCU stall. Check to see if the host
> +		 * stopped the vm.
> +		 */
> +		if (kvm_check_and_clear_guest_paused())
> +			return;
> +
>  		/* We haven't checked in, so go dump stack. */
>  		print_cpu_stall(gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> @@ -707,6 +717,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>  		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> +		/*
> +		 * If a virtual machine is stopped by the host it can look to
> +		 * the watchdog like an RCU stall. Check to see if the host
> +		 * stopped the vm.
> +		 */
> +		if (kvm_check_and_clear_guest_paused())
> +			return;
> +
>  		/* They had a few time units to dump stack, so complain. */
>  		print_other_cpu_stall(gs2, gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> -- 
> 2.31.1.818.g46aad6cb9e-goog
> 

  parent reply	other threads:[~2021-05-21 18:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 15:56 [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection Sergey Senozhatsky
2021-05-21 15:56 ` [PATCH 2/2] rcu: do not disable gp stall detection in rcu_cpu_stall_reset() Sergey Senozhatsky
2021-05-21 18:01 ` Paul E. McKenney [this message]
2021-05-21 21:38   ` [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection Paul E. McKenney
2021-05-24  1:56     ` Sergey Senozhatsky
2021-05-24  3:46       ` Paul E. McKenney
2021-05-24  4:00         ` Sergey Senozhatsky
2021-07-15  9:09 ` Sergey Senozhatsky
2021-07-15 13:32   ` Paul E. McKenney
2021-07-15 14:08     ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210521180127.GD4441@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.