All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rcu/tree: consider time a VM was suspended
@ 2021-05-16 10:27 Sergey Senozhatsky
  2021-05-17 16:23 ` Paul E. McKenney
  2021-05-21  6:36 ` Sergey Senozhatsky
  0 siblings, 2 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-16 10:27 UTC (permalink / raw)
  To: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes
  Cc: rcu, linux-kernel, Sergey Senozhatsky

Soft watchdog timer function checks if a virtual machine
was suspended and hence what looks like a lockup in fact
is a false positive.

This is what kvm_check_and_clear_guest_paused() does: it
tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
and if it's set then we need to touch all watchdogs and bail
out.

Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
check works fine.

There is, however, one more watchdog that runs from IRQ, so
watchdog timer fn races with it, and that watchdog is not aware
of PVCLOCK_GUEST_STOPPED - RCU stall detector.

apic_timer_interrupt()
 smp_apic_timer_interrupt()
  hrtimer_interrupt()
   __hrtimer_run_queues()
    tick_sched_timer()
     tick_sched_handle()
      update_process_times()
       rcu_sched_clock_irq()

This triggers RCU stalls on our devices during VM resume.

If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
then there is nothing on this VCPU that touches watchdogs and
RCU reads stale gp stall timestamp and new jiffies value, which
makes it think that RCU has stalled.

Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
don't report RCU stalls when we resume the VM.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 59b95cc5cbdf..cb233c79f0bc 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -7,6 +7,8 @@
  * Author: Paul E. McKenney <paulmck@linux.ibm.com>
  */
 
+#include <asm/kvm_para.h>
+
 //////////////////////////////////////////////////////////////////////////////
 //
 // Controlling CPU stall warnings, including delay calculation.
@@ -695,6 +697,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
 	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
 
+		/*
+		 * If a virtual machine is stopped by the host it can look to
+		 * the watchdog like an RCU stall. Check to see if the host
+		 * stopped the vm.
+		 */
+		if (kvm_check_and_clear_guest_paused())
+			return;
+
 		/* We haven't checked in, so go dump stack. */
 		print_cpu_stall(gps);
 		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
@@ -704,6 +714,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
 		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
 		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
 
+		/*
+		 * If a virtual machine is stopped by the host it can look to
+		 * the watchdog like an RCU stall. Check to see if the host
+		 * stopped the vm.
+		 */
+		if (kvm_check_and_clear_guest_paused())
+			return;
+
 		/* They had a few time units to dump stack, so complain. */
 		print_other_cpu_stall(gs2, gps);
 		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-16 10:27 [PATCH] rcu/tree: consider time a VM was suspended Sergey Senozhatsky
@ 2021-05-17 16:23 ` Paul E. McKenney
  2021-05-18  1:41   ` Sergey Senozhatsky
  2021-05-21  6:36 ` Sergey Senozhatsky
  1 sibling, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-17 16:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel

On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote:
> Soft watchdog timer function checks if a virtual machine
> was suspended and hence what looks like a lockup in fact
> is a false positive.
> 
> This is what kvm_check_and_clear_guest_paused() does: it
> tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> and if it's set then we need to touch all watchdogs and bail
> out.
> 
> Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> check works fine.
> 
> There is, however, one more watchdog that runs from IRQ, so
> watchdog timer fn races with it, and that watchdog is not aware
> of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> 
> apic_timer_interrupt()
>  smp_apic_timer_interrupt()
>   hrtimer_interrupt()
>    __hrtimer_run_queues()
>     tick_sched_timer()
>      tick_sched_handle()
>       update_process_times()
>        rcu_sched_clock_irq()
> 
> This triggers RCU stalls on our devices during VM resume.
> 
> If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> then there is nothing on this VCPU that touches watchdogs and
> RCU reads stale gp stall timestamp and new jiffies value, which
> makes it think that RCU has stalled.
> 
> Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> don't report RCU stalls when we resume the VM.

Good point!

But if I understand your patch correctly, if the virtual machine is
stopped at any point during a grace period, even if only for a short time,
stall warnings are suppressed for that grace period forever, courtesy of
the call to rcu_cpu_stall_reset().  So, if something really is stalling,
and the virtual machine just happens to stop for a few milliseconds, the
stall warning is completely suppressed.  Which would make it difficult
to debug the underlying stall condition.

Is it possible to provide RCU with information on the duration of the
virtual-machine stoppage so that RCU could adjust the timing of the
stall warning?  Maybe by having something like rcu_cpu_stall_reset()
that takes the duration of the stoppage in jiffies?

Please note that I am not completely opposed to your patch below, but
I figured that this was a good time to ask if we can do better.

							Thanx, Paul

> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 59b95cc5cbdf..cb233c79f0bc 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -7,6 +7,8 @@
>   * Author: Paul E. McKenney <paulmck@linux.ibm.com>
>   */
>  
> +#include <asm/kvm_para.h>
> +
>  //////////////////////////////////////////////////////////////////////////////
>  //
>  // Controlling CPU stall warnings, including delay calculation.
> @@ -695,6 +697,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>  	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> +		/*
> +		 * If a virtual machine is stopped by the host it can look to
> +		 * the watchdog like an RCU stall. Check to see if the host
> +		 * stopped the vm.
> +		 */
> +		if (kvm_check_and_clear_guest_paused())
> +			return;
> +
>  		/* We haven't checked in, so go dump stack. */
>  		print_cpu_stall(gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> @@ -704,6 +714,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>  		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> +		/*
> +		 * If a virtual machine is stopped by the host it can look to
> +		 * the watchdog like an RCU stall. Check to see if the host
> +		 * stopped the vm.
> +		 */
> +		if (kvm_check_and_clear_guest_paused())
> +			return;
> +
>  		/* They had a few time units to dump stack, so complain. */
>  		print_other_cpu_stall(gs2, gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> -- 
> 2.31.1.751.gd2f1c929bd-goog
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-17 16:23 ` Paul E. McKenney
@ 2021-05-18  1:41   ` Sergey Senozhatsky
  2021-05-18 23:15     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-18  1:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sergey Senozhatsky, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Suleiman Souhlal, rcu, linux-kernel

On (21/05/17 09:23), Paul E. McKenney wrote:
> On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote:
> > Soft watchdog timer function checks if a virtual machine
> > was suspended and hence what looks like a lockup in fact
> > is a false positive.
> > 
> > This is what kvm_check_and_clear_guest_paused() does: it
> > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> > and if it's set then we need to touch all watchdogs and bail
> > out.
> > 
> > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> > check works fine.
> > 
> > There is, however, one more watchdog that runs from IRQ, so
> > watchdog timer fn races with it, and that watchdog is not aware
> > of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> > 
> > apic_timer_interrupt()
> >  smp_apic_timer_interrupt()
> >   hrtimer_interrupt()
> >    __hrtimer_run_queues()
> >     tick_sched_timer()
> >      tick_sched_handle()
> >       update_process_times()
> >        rcu_sched_clock_irq()
> > 
> > This triggers RCU stalls on our devices during VM resume.
> > 
> > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> > then there is nothing on this VCPU that touches watchdogs and
> > RCU reads stale gp stall timestamp and new jiffies value, which
> > makes it think that RCU has stalled.
> > 
> > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> > don't report RCU stalls when we resume the VM.
> 
> Good point!
> 
> But if I understand your patch correctly, if the virtual machine is
> stopped at any point during a grace period, even if only for a short time,
> stall warnings are suppressed for that grace period forever, courtesy of
> the call to rcu_cpu_stall_reset().  So, if something really is stalling,
> and the virtual machine just happens to stop for a few milliseconds, the
> stall warning is completely suppressed.  Which would make it difficult
> to debug the underlying stall condition.
> 
> Is it possible to provide RCU with information on the duration of the
> virtual-machine stoppage so that RCU could adjust the timing of the
> stall warning?  Maybe by having something like rcu_cpu_stall_reset()
> that takes the duration of the stoppage in jiffies?

Good questions!

And I think I've some bad news and some good news.

As far as I can tell, none of the PVCLOCK_GUEST_STOPPED handlers take
the stoppage duration into consideration. For instance, as soon as
watchdog timer IRQ detects a potential softlockup it checks
PVCLOCK_GUEST_STOPPED and touches all watchdogs, including RCU:

watchdog_timer_fn()
 kvm_check_and_clear_guest_paused()
  pvclock_touch_watchdogs()
   rcu_cpu_stall_reset()                 // + the remaining watchdogs

But things get more complex.

pvclock_clocksource_read() also checks PVCLOCK_GUEST_STOPPED and calls
pvclock_touch_watchdogs(). And this path is executed rather often.

For instance,

apic_timer_interrupt()
 smp_apic_timer_interrupt()
  hrtimer_interrupt()
   __hrtimer_run_queues()
    hrtimer_wakeup()
     try_to_wake_up()
      update_rq_clock()
       sched_clock_cpu()
        sched_clock()
	 kvm_sched_clock_read()
	  kvm_clock_read()
	   pvclock_clocksource_read()
	    pvclock_touch_watchdogs()
	     rcu_cpu_stall_reset()       // + the remaining watchdogs

Or

do_IRQ
 irq_exit
  sched_clock_cpu
   sched_clock
    kvm_sched_clock_read
     kvm_clock_read
      pvclock_clocksource_read
       pvclock_touch_watchdogs
        rcu_cpu_stall_reset()            // + the remaining watchdogs

And so on...

You may wonder what are the good news then.

Well. I'd say that my patch (is not beautiful but it) does not add
a lot of additional or new damage. And it still fixes the valid race
condition, as far as I'm concerned.

I think we need to rework how pvclock_touch_watchdogs() does things
internally, basically what you suggested, and this can be a separate
effort.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-18  1:41   ` Sergey Senozhatsky
@ 2021-05-18 23:15     ` Paul E. McKenney
  2021-05-20  5:50       ` Sergey Senozhatsky
  2021-05-20  6:18       ` Sergey Senozhatsky
  0 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-18 23:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Suleiman Souhlal, rcu, linux-kernel

On Tue, May 18, 2021 at 10:41:21AM +0900, Sergey Senozhatsky wrote:
> On (21/05/17 09:23), Paul E. McKenney wrote:
> > On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote:
> > > Soft watchdog timer function checks if a virtual machine
> > > was suspended and hence what looks like a lockup in fact
> > > is a false positive.
> > > 
> > > This is what kvm_check_and_clear_guest_paused() does: it
> > > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> > > and if it's set then we need to touch all watchdogs and bail
> > > out.
> > > 
> > > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> > > check works fine.
> > > 
> > > There is, however, one more watchdog that runs from IRQ, so
> > > watchdog timer fn races with it, and that watchdog is not aware
> > > of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> > > 
> > > apic_timer_interrupt()
> > >  smp_apic_timer_interrupt()
> > >   hrtimer_interrupt()
> > >    __hrtimer_run_queues()
> > >     tick_sched_timer()
> > >      tick_sched_handle()
> > >       update_process_times()
> > >        rcu_sched_clock_irq()
> > > 
> > > This triggers RCU stalls on our devices during VM resume.
> > > 
> > > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> > > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> > > then there is nothing on this VCPU that touches watchdogs and
> > > RCU reads stale gp stall timestamp and new jiffies value, which
> > > makes it think that RCU has stalled.
> > > 
> > > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> > > don't report RCU stalls when we resume the VM.
> > 
> > Good point!
> > 
> > But if I understand your patch correctly, if the virtual machine is
> > stopped at any point during a grace period, even if only for a short time,
> > stall warnings are suppressed for that grace period forever, courtesy of
> > the call to rcu_cpu_stall_reset().  So, if something really is stalling,
> > and the virtual machine just happens to stop for a few milliseconds, the
> > stall warning is completely suppressed.  Which would make it difficult
> > to debug the underlying stall condition.
> > 
> > Is it possible to provide RCU with information on the duration of the
> > virtual-machine stoppage so that RCU could adjust the timing of the
> > stall warning?  Maybe by having something like rcu_cpu_stall_reset()
> > that takes the duration of the stoppage in jiffies?
> 
> Good questions!
> 
> And I think I've some bad news and some good news.
> 
> As far as I can tell, none of the PVCLOCK_GUEST_STOPPED handlers take
> the stoppage duration into consideration. For instance, as soon as
> watchdog timer IRQ detects a potential softlockup it checks
> PVCLOCK_GUEST_STOPPED and touches all watchdogs, including RCU:
> 
> watchdog_timer_fn()
>  kvm_check_and_clear_guest_paused()
>   pvclock_touch_watchdogs()
>    rcu_cpu_stall_reset()                 // + the remaining watchdogs
> 
> But things get more complex.
> 
> pvclock_clocksource_read() also checks PVCLOCK_GUEST_STOPPED and calls
> pvclock_touch_watchdogs(). And this path is executed rather often.
> 
> For instance,
> 
> apic_timer_interrupt()
>  smp_apic_timer_interrupt()
>   hrtimer_interrupt()
>    __hrtimer_run_queues()
>     hrtimer_wakeup()
>      try_to_wake_up()
>       update_rq_clock()
>        sched_clock_cpu()
>         sched_clock()
> 	 kvm_sched_clock_read()
> 	  kvm_clock_read()
> 	   pvclock_clocksource_read()
> 	    pvclock_touch_watchdogs()
> 	     rcu_cpu_stall_reset()       // + the remaining watchdogs
> 
> Or
> 
> do_IRQ
>  irq_exit
>   sched_clock_cpu
>    sched_clock
>     kvm_sched_clock_read
>      kvm_clock_read
>       pvclock_clocksource_read
>        pvclock_touch_watchdogs
>         rcu_cpu_stall_reset()            // + the remaining watchdogs
> 
> And so on...
> 
> You may wonder what are the good news then.
> 
> Well. I'd say that my patch (is not beautiful but it) does not add
> a lot of additional or new damage. And it still fixes the valid race
> condition, as far as I'm concerned.

I have tentatively pulled it in for review and testing.

> I think we need to rework how pvclock_touch_watchdogs() does things
> internally, basically what you suggested, and this can be a separate
> effort.

In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
guest migration and debugger breakpoints, correct?  Either way, I am
wondering if rcu_cpu_stall_reset() should take a lighter touch.  Right
now, it effectively disables all stalls for the current grace period.
Why not make it restart the stall timeout when the stoppage is detected?

The strange thing is that unless something is updating the jiffies counter
to make it look like the system was up during the stoppage time interval,
there should be no reason to tell RCU anything.  Is the jiffies counter
updated in this manner?  (Not seeing it right offhand, but I don't claim
to be familiar with this code.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-18 23:15     ` Paul E. McKenney
@ 2021-05-20  5:50       ` Sergey Senozhatsky
  2021-05-20 14:57         ` Paul E. McKenney
  2021-05-20  6:18       ` Sergey Senozhatsky
  1 sibling, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-20  5:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sergey Senozhatsky, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Suleiman Souhlal, rcu, linux-kernel

On (21/05/18 16:15), Paul E. McKenney wrote:
> 
> In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> guest migration and debugger breakpoints, correct?

Our use case is a bit different. We suspend VM when user puts the host
system into sleep (which can happen multiple times a day).

> Either way, I am wondering if rcu_cpu_stall_reset() should take a lighter
> touch.  Right now, it effectively disables all stalls for the current grace
> period. Why not make it restart the stall timeout when the stoppage is detected?

Sounds good. I can cook a patch and run some tests.
Or do you want to send a patch?

> The strange thing is that unless something is updating the jiffies counter
> to make it look like the system was up during the stoppage time interval,
> there should be no reason to tell RCU anything.  Is the jiffies counter
> updated in this manner?  (Not seeing it right offhand, but I don't claim
> to be familiar with this code.)

VCPUs are not resumed all at once. It's up to the host to schedule VCPUs
for execution. So, for example, when we resume VCPU-3 and it discovers
this_cpu PVCLOCK_GUEST_STOPPED, other VCPUs, e.g. VCPU-0, can already be
resumed, up and running processing timer interrupts and adding ticks to
jiffies.

I can reproduce it.
While VCPU-2 has PVCLOCK_GUEST_STOPPED set (resuming) and is in
check_cpu_stall(), the VCPU-3 is executing:

	apic_timer_interrupt()
	 tick_irq_enter()
	  tick_do_update_jiffies64()
	   do_timer()

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-18 23:15     ` Paul E. McKenney
  2021-05-20  5:50       ` Sergey Senozhatsky
@ 2021-05-20  6:18       ` Sergey Senozhatsky
  2021-05-20 14:53         ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-20  6:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sergey Senozhatsky, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Suleiman Souhlal, rcu, linux-kernel

On (21/05/18 16:15), Paul E. McKenney wrote:
> 
> In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> guest migration and debugger breakpoints, correct?  Either way, I am
> wondering if rcu_cpu_stall_reset() should take a lighter touch.  Right
> now, it effectively disables all stalls for the current grace period.
> Why not make it restart the stall timeout when the stoppage is detected?

rcu_cpu_stall_reset() is used in many other places, not sure if we can
change its behaviour rcu_cpu_stall_reset().

Maybe it'll be possible to just stop calling it from PV-clock and do
something like this

---

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index eda37df016f0..2d2489eda8e6 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -40,7 +40,7 @@ void pvclock_touch_watchdogs(void)
 {
 	touch_softlockup_watchdog_sync();
 	clocksource_touch_watchdog();
-	rcu_cpu_stall_reset();
+	record_gp_stall_check_time();
 	reset_hung_task_detector();
 }
 
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index cb233c79f0bc..6b3165c7a2c3 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -137,7 +137,7 @@ void rcu_cpu_stall_reset(void)
 // Interaction with RCU grace periods
 
 /* Start of new grace period, so record stall time (and forcing times). */
-static void record_gp_stall_check_time(void)
+void record_gp_stall_check_time(void)
 {
 	unsigned long j = jiffies;
 	unsigned long j1;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-20  6:18       ` Sergey Senozhatsky
@ 2021-05-20 14:53         ` Paul E. McKenney
  2021-05-20 22:24           ` Sergey Senozhatsky
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-20 14:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Suleiman Souhlal, rcu, linux-kernel

On Thu, May 20, 2021 at 03:18:07PM +0900, Sergey Senozhatsky wrote:
> On (21/05/18 16:15), Paul E. McKenney wrote:
> > 
> > In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> > guest migration and debugger breakpoints, correct?  Either way, I am
> > wondering if rcu_cpu_stall_reset() should take a lighter touch.  Right
> > now, it effectively disables all stalls for the current grace period.
> > Why not make it restart the stall timeout when the stoppage is detected?
> 
> rcu_cpu_stall_reset() is used in many other places, not sure if we can
> change its behaviour rcu_cpu_stall_reset().

There was some use case back in the day where they wanted an indefinite
suppression of RCU CPU stall warnings for the current grace period, but
all the current use cases look fine with restarting the stall timeout.

However, please see below.

> Maybe it'll be possible to just stop calling it from PV-clock and do
> something like this

This was in fact one of the things I was considering, at least until
I convinced myself that I needed to ask some questions.

One point of possibly unnecessary nervousness on my part is resetting
the start of the grace period, which might confuse diagnostics.

But how about something like this?

void rcu_cpu_stall_reset(void)
{
	WRITE_ONCE(rcu_state.jiffies_stall,
		   jiffies + rcu_jiffies_till_stall_check());
}

Would something like that work?

(One issue with this is if there has already been an RCU CPU stall
warning, in which case the timeout for repeat warnings is tripled.  But in
the current use cases, I don't see that this matters.  If it turns out to
matter, I would just add a flag to rcu_state saying whether the current
grace period had seen a stall, and use that to decide whether or not
to triple the timeout.)

							Thanx, Paul

> ---
> 
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index eda37df016f0..2d2489eda8e6 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -40,7 +40,7 @@ void pvclock_touch_watchdogs(void)
>  {
>  	touch_softlockup_watchdog_sync();
>  	clocksource_touch_watchdog();
> -	rcu_cpu_stall_reset();
> +	record_gp_stall_check_time();
>  	reset_hung_task_detector();
>  }
>  
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index cb233c79f0bc..6b3165c7a2c3 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -137,7 +137,7 @@ void rcu_cpu_stall_reset(void)
>  // Interaction with RCU grace periods
>  
>  /* Start of new grace period, so record stall time (and forcing times). */
> -static void record_gp_stall_check_time(void)
> +void record_gp_stall_check_time(void)
>  {
>  	unsigned long j = jiffies;
>  	unsigned long j1;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-20  5:50       ` Sergey Senozhatsky
@ 2021-05-20 14:57         ` Paul E. McKenney
  2021-05-20 22:34           ` Sergey Senozhatsky
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-20 14:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Suleiman Souhlal, rcu, linux-kernel

On Thu, May 20, 2021 at 02:50:49PM +0900, Sergey Senozhatsky wrote:
> On (21/05/18 16:15), Paul E. McKenney wrote:
> > 
> > In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> > guest migration and debugger breakpoints, correct?
> 
> Our use case is a bit different. We suspend VM when user puts the host
> system into sleep (which can happen multiple times a day).

OK, that is an interesting use case that I don't see.

> > Either way, I am wondering if rcu_cpu_stall_reset() should take a lighter
> > touch.  Right now, it effectively disables all stalls for the current grace
> > period. Why not make it restart the stall timeout when the stoppage is detected?
> 
> Sounds good. I can cook a patch and run some tests.
> Or do you want to send a patch?

Given that you have the test setup, things might go faster if you do
the patch, especially taking timezones into consideration.  Of course,
if you run into difficulties, you know where to find me.

> > The strange thing is that unless something is updating the jiffies counter
> > to make it look like the system was up during the stoppage time interval,
> > there should be no reason to tell RCU anything.  Is the jiffies counter
> > updated in this manner?  (Not seeing it right offhand, but I don't claim
> > to be familiar with this code.)
> 
> VCPUs are not resumed all at once. It's up to the host to schedule VCPUs
> for execution. So, for example, when we resume VCPU-3 and it discovers
> this_cpu PVCLOCK_GUEST_STOPPED, other VCPUs, e.g. VCPU-0, can already be
> resumed, up and running processing timer interrupts and adding ticks to
> jiffies.
> 
> I can reproduce it.
> While VCPU-2 has PVCLOCK_GUEST_STOPPED set (resuming) and is in
> check_cpu_stall(), the VCPU-3 is executing:
> 
> 	apic_timer_interrupt()
> 	 tick_irq_enter()
> 	  tick_do_update_jiffies64()
> 	   do_timer()

OK, but the normal grace period time is way less than one second, and
the stall timeout in mainline is 21 seconds, so that would be a -lot-
of jiffies of skew.  Or does the restarting really take that long a time?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-20 14:53         ` Paul E. McKenney
@ 2021-05-20 22:24           ` Sergey Senozhatsky
  2021-05-21  0:14             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-20 22:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sergey Senozhatsky, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Suleiman Souhlal, rcu, linux-kernel

On (21/05/20 07:53), Paul E. McKenney wrote:
> On Thu, May 20, 2021 at 03:18:07PM +0900, Sergey Senozhatsky wrote:
> > On (21/05/18 16:15), Paul E. McKenney wrote:
> > > 
> > > In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> > > guest migration and debugger breakpoints, correct?  Either way, I am
> > > wondering if rcu_cpu_stall_reset() should take a lighter touch.  Right
> > > now, it effectively disables all stalls for the current grace period.
> > > Why not make it restart the stall timeout when the stoppage is detected?
> > 
> > rcu_cpu_stall_reset() is used in many other places, not sure if we can
> > change its behaviour rcu_cpu_stall_reset().
> 
> There was some use case back in the day where they wanted an indefinite
> suppression of RCU CPU stall warnings for the current grace period, but
> all the current use cases look fine with restarting the stall timeout.
> 
> However, please see below.
> 
> > Maybe it'll be possible to just stop calling it from PV-clock and do
> > something like this
> 
> This was in fact one of the things I was considering, at least until
> I convinced myself that I needed to ask some questions.
> 
> One point of possibly unnecessary nervousness on my part is resetting
> the start of the grace period, which might confuse diagnostics.
> 
> But how about something like this?
> 
> void rcu_cpu_stall_reset(void)
> {
> 	WRITE_ONCE(rcu_state.jiffies_stall,
> 		   jiffies + rcu_jiffies_till_stall_check());
> }
> 
> Would something like that work?

This should work.

On a side note.

I wish we didn't have to put kvm_check_and_clear_guest_paused() all
over the place.

We do load jiffies at the start of check_cpu_stall(). So, in theory,
we can just use that captured jiffies (which can become obsolete but
so will grace period timestamps) value and never read current system
jiffies because they can jump way forward. IOW

	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;

instead of

	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;

Then we probably can remove kvm_check_and_clear_guest_paused().

But that "don't load current jiffies" is rather fragile.

kvm_check_and_clear_guest_paused() is not pretty, but at least it's
explicit.

---

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 49dda86a0e84..24f749bc1f90 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -695,19 +695,11 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	    ULONG_CMP_GE(gps, js))
 		return; /* No stall or GP completed since entering function. */
 	rnp = rdp->mynode;
-	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
+	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
 	if (rcu_gp_in_progress() &&
 	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
 	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
 
-		/*
-		 * If a virtual machine is stopped by the host it can look to
-		 * the watchdog like an RCU stall. Check to see if the host
-		 * stopped the vm.
-		 */
-		if (kvm_check_and_clear_guest_paused())
-			return;
-
 		/* We haven't checked in, so go dump stack. */
 		print_cpu_stall(gps);
 		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
@@ -717,14 +709,6 @@ static void check_cpu_stall(struct rcu_data *rdp)
 		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
 		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
 
-		/*
-		 * If a virtual machine is stopped by the host it can look to
-		 * the watchdog like an RCU stall. Check to see if the host
-		 * stopped the vm.
-		 */
-		if (kvm_check_and_clear_guest_paused())
-			return;
-
 		/* They had a few time units to dump stack, so complain. */
 		print_other_cpu_stall(gs2, gps);
 		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-20 14:57         ` Paul E. McKenney
@ 2021-05-20 22:34           ` Sergey Senozhatsky
  2021-05-21  0:15             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-20 22:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sergey Senozhatsky, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Suleiman Souhlal, rcu, linux-kernel

On (21/05/20 07:57), Paul E. McKenney wrote:
> > 
> > Sounds good. I can cook a patch and run some tests.
> > Or do you want to send a patch?
> 
> Given that you have the test setup, things might go faster if you do
> the patch, especially taking timezones into consideration.  Of course,
> if you run into difficulties, you know where to find me.

OK. Sounds good to me.

> > While VCPU-2 has PVCLOCK_GUEST_STOPPED set (resuming) and is in
> > check_cpu_stall(), the VCPU-3 is executing:
> > 
> > 	apic_timer_interrupt()
> > 	 tick_irq_enter()
> > 	  tick_do_update_jiffies64()
> > 	   do_timer()
> 
> OK, but the normal grace period time is way less than one second, and
> the stall timeout in mainline is 21 seconds, so that would be a -lot-
> of jiffies of skew.  Or does the restarting really take that long a time?

That's a good question. I see huge jiffies spike in the logs.
I suspect that resuming a VM can take some time, especially on a "not
powerful at all" overcommitted host (more virtual CPUs than physical
ones).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-20 22:24           ` Sergey Senozhatsky
@ 2021-05-21  0:14             ` Paul E. McKenney
  2021-05-21  6:42               ` Sergey Senozhatsky
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-21  0:14 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Suleiman Souhlal, rcu, linux-kernel

On Fri, May 21, 2021 at 07:24:03AM +0900, Sergey Senozhatsky wrote:
> On (21/05/20 07:53), Paul E. McKenney wrote:
> > On Thu, May 20, 2021 at 03:18:07PM +0900, Sergey Senozhatsky wrote:
> > > On (21/05/18 16:15), Paul E. McKenney wrote:
> > > > 
> > > > In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> > > > guest migration and debugger breakpoints, correct?  Either way, I am
> > > > wondering if rcu_cpu_stall_reset() should take a lighter touch.  Right
> > > > now, it effectively disables all stalls for the current grace period.
> > > > Why not make it restart the stall timeout when the stoppage is detected?
> > > 
> > > rcu_cpu_stall_reset() is used in many other places, not sure if we can
> > > change its behaviour rcu_cpu_stall_reset().
> > 
> > There was some use case back in the day where they wanted an indefinite
> > suppression of RCU CPU stall warnings for the current grace period, but
> > all the current use cases look fine with restarting the stall timeout.
> > 
> > However, please see below.
> > 
> > > Maybe it'll be possible to just stop calling it from PV-clock and do
> > > something like this
> > 
> > This was in fact one of the things I was considering, at least until
> > I convinced myself that I needed to ask some questions.
> > 
> > One point of possibly unnecessary nervousness on my part is resetting
> > the start of the grace period, which might confuse diagnostics.
> > 
> > But how about something like this?
> > 
> > void rcu_cpu_stall_reset(void)
> > {
> > 	WRITE_ONCE(rcu_state.jiffies_stall,
> > 		   jiffies + rcu_jiffies_till_stall_check());
> > }
> > 
> > Would something like that work?
> 
> This should work.
> 
> On a side note.
> 
> I wish we didn't have to put kvm_check_and_clear_guest_paused() all
> over the place.
> 
> We do load jiffies at the start of check_cpu_stall(). So, in theory,
> we can just use that captured jiffies (which can become obsolete but
> so will grace period timestamps) value and never read current system
> jiffies because they can jump way forward. IOW
> 
> 	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
> 
> instead of
> 
> 	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> 
> Then we probably can remove kvm_check_and_clear_guest_paused().
> 
> But that "don't load current jiffies" is rather fragile.
> 
> kvm_check_and_clear_guest_paused() is not pretty, but at least it's
> explicit.

If this works for you, I am very much in favor!

							Thanx, Paul

> ---
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 49dda86a0e84..24f749bc1f90 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -695,19 +695,11 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  	    ULONG_CMP_GE(gps, js))
>  		return; /* No stall or GP completed since entering function. */
>  	rnp = rdp->mynode;
> -	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> +	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
>  	if (rcu_gp_in_progress() &&
>  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>  	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> -		/*
> -		 * If a virtual machine is stopped by the host it can look to
> -		 * the watchdog like an RCU stall. Check to see if the host
> -		 * stopped the vm.
> -		 */
> -		if (kvm_check_and_clear_guest_paused())
> -			return;
> -
>  		/* We haven't checked in, so go dump stack. */
>  		print_cpu_stall(gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> @@ -717,14 +709,6 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>  		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> -		/*
> -		 * If a virtual machine is stopped by the host it can look to
> -		 * the watchdog like an RCU stall. Check to see if the host
> -		 * stopped the vm.
> -		 */
> -		if (kvm_check_and_clear_guest_paused())
> -			return;
> -
>  		/* They had a few time units to dump stack, so complain. */
>  		print_other_cpu_stall(gs2, gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-20 22:34           ` Sergey Senozhatsky
@ 2021-05-21  0:15             ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-21  0:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Suleiman Souhlal, rcu, linux-kernel

On Fri, May 21, 2021 at 07:34:41AM +0900, Sergey Senozhatsky wrote:
> On (21/05/20 07:57), Paul E. McKenney wrote:
> > > 
> > > Sounds good. I can cook a patch and run some tests.
> > > Or do you want to send a patch?
> > 
> > Given that you have the test setup, things might go faster if you do
> > the patch, especially taking timezones into consideration.  Of course,
> > if you run into difficulties, you know where to find me.
> 
> OK. Sounds good to me.
> 
> > > While VCPU-2 has PVCLOCK_GUEST_STOPPED set (resuming) and is in
> > > check_cpu_stall(), the VCPU-3 is executing:
> > > 
> > > 	apic_timer_interrupt()
> > > 	 tick_irq_enter()
> > > 	  tick_do_update_jiffies64()
> > > 	   do_timer()
> > 
> > OK, but the normal grace period time is way less than one second, and
> > the stall timeout in mainline is 21 seconds, so that would be a -lot-
> > of jiffies of skew.  Or does the restarting really take that long a time?
> 
> That's a good question. I see huge jiffies spike in the logs.
> I suspect that resuming a VM can take some time, especially on a "not
> powerful at all" overcommitted host (more virtual CPUs than physical
> ones).

I really am just asking the question.  ;-)

After all, if restarting a VM can take that long, then it can take
that long.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-16 10:27 [PATCH] rcu/tree: consider time a VM was suspended Sergey Senozhatsky
  2021-05-17 16:23 ` Paul E. McKenney
@ 2021-05-21  6:36 ` Sergey Senozhatsky
  2021-05-21 14:01   ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-21  6:36 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, rcu,
	linux-kernel

On (21/05/16 19:27), Sergey Senozhatsky wrote:
>  kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 59b95cc5cbdf..cb233c79f0bc 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -7,6 +7,8 @@
>   * Author: Paul E. McKenney <paulmck@linux.ibm.com>
>   */
>  
> +#include <asm/kvm_para.h>
> +

D'oh, why did I do this...

Paul, I've a trivial fixup. How do you want to handle it?

---

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index cd815b19740a..b9b52f91e5b8 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -7,7 +7,7 @@
  * Author: Paul E. McKenney <paulmck@linux.ibm.com>
  */
 
-#include <asm/kvm_para.h>
+#include <linux/kvm_para.h>
 
 //////////////////////////////////////////////////////////////////////////////
 //

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-21  0:14             ` Paul E. McKenney
@ 2021-05-21  6:42               ` Sergey Senozhatsky
  2021-05-21 14:02                 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-05-21  6:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Sergey Senozhatsky, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Suleiman Souhlal, rcu, linux-kernel

On (21/05/20 17:14), Paul E. McKenney wrote:
> > On a side note.
> > 
> > I wish we didn't have to put kvm_check_and_clear_guest_paused() all
> > over the place.
> > 
> > We do load jiffies at the start of check_cpu_stall(). So, in theory,
> > we can just use that captured jiffies (which can become obsolete but
> > so will grace period timestamps) value and never read current system
> > jiffies because they can jump way forward. IOW
> > 
> > 	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
> > 
> > instead of
> > 
> > 	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> > 
> > Then we probably can remove kvm_check_and_clear_guest_paused().
> > 
> > But that "don't load current jiffies" is rather fragile.
> > 
> > kvm_check_and_clear_guest_paused() is not pretty, but at least it's
> > explicit.
> 
> If this works for you, I am very much in favor!

Oh, no. Sorry for the noise. This is racy and won't work.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-21  6:36 ` Sergey Senozhatsky
@ 2021-05-21 14:01   ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-21 14:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, rcu, linux-kernel

On Fri, May 21, 2021 at 03:36:01PM +0900, Sergey Senozhatsky wrote:
> On (21/05/16 19:27), Sergey Senozhatsky wrote:
> >  kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index 59b95cc5cbdf..cb233c79f0bc 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -7,6 +7,8 @@
> >   * Author: Paul E. McKenney <paulmck@linux.ibm.com>
> >   */
> >  
> > +#include <asm/kvm_para.h>
> > +
> 
> D'oh, why did I do this...
> 
> Paul, I've a trivial fixup. How do you want to handle it?

I dropped your earlier patch due to the warning and also because it
looked like you might have a different approach.

So please just send the fixed-up patch and I will pull it in and see
how it does.

							Thanx, Paul

> ---
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index cd815b19740a..b9b52f91e5b8 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -7,7 +7,7 @@
>   * Author: Paul E. McKenney <paulmck@linux.ibm.com>
>   */
>  
> -#include <asm/kvm_para.h>
> +#include <linux/kvm_para.h>
>  
>  //////////////////////////////////////////////////////////////////////////////
>  //

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] rcu/tree: consider time a VM was suspended
  2021-05-21  6:42               ` Sergey Senozhatsky
@ 2021-05-21 14:02                 ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2021-05-21 14:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Suleiman Souhlal, rcu, linux-kernel

On Fri, May 21, 2021 at 03:42:31PM +0900, Sergey Senozhatsky wrote:
> On (21/05/20 17:14), Paul E. McKenney wrote:
> > > On a side note.
> > > 
> > > I wish we didn't have to put kvm_check_and_clear_guest_paused() all
> > > over the place.
> > > 
> > > We do load jiffies at the start of check_cpu_stall(). So, in theory,
> > > we can just use that captured jiffies (which can become obsolete but
> > > so will grace period timestamps) value and never read current system
> > > jiffies because they can jump way forward. IOW
> > > 
> > > 	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
> > > 
> > > instead of
> > > 
> > > 	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> > > 
> > > Then we probably can remove kvm_check_and_clear_guest_paused().
> > > 
> > > But that "don't load current jiffies" is rather fragile.
> > > 
> > > kvm_check_and_clear_guest_paused() is not pretty, but at least it's
> > > explicit.
> > 
> > If this works for you, I am very much in favor!
> 
> Oh, no. Sorry for the noise. This is racy and won't work.

I had been hoping!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-21 14:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16 10:27 [PATCH] rcu/tree: consider time a VM was suspended Sergey Senozhatsky
2021-05-17 16:23 ` Paul E. McKenney
2021-05-18  1:41   ` Sergey Senozhatsky
2021-05-18 23:15     ` Paul E. McKenney
2021-05-20  5:50       ` Sergey Senozhatsky
2021-05-20 14:57         ` Paul E. McKenney
2021-05-20 22:34           ` Sergey Senozhatsky
2021-05-21  0:15             ` Paul E. McKenney
2021-05-20  6:18       ` Sergey Senozhatsky
2021-05-20 14:53         ` Paul E. McKenney
2021-05-20 22:24           ` Sergey Senozhatsky
2021-05-21  0:14             ` Paul E. McKenney
2021-05-21  6:42               ` Sergey Senozhatsky
2021-05-21 14:02                 ` Paul E. McKenney
2021-05-21  6:36 ` Sergey Senozhatsky
2021-05-21 14:01   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.