linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 6/7] watchdog: Cleanup handling of false positives
@ 2021-05-16 10:46 Sergey Senozhatsky
  2021-05-17 15:01 ` Petr Mladek
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Senozhatsky @ 2021-05-16 10:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Laurence Oberman, Vincent Whitchurch, Michal Hocko, linux-kernel

Hi,

// This was never in my inbox, so sorry if I mess up the "Reply-to"
// Original message:  https://lore.kernel.org/lkml/20210311122130.6788-7-pmladek@suse.com/


>@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> 	/* .. and repeat */
> 	hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
>
> -	/* Reset the interval when touched externally by a known slow code. */
> +	/*
> +	 * If a virtual machine is stopped by the host it can look to
> +	 * the watchdog like a soft lockup. Check to see if the host
> +	 * stopped the vm before we process the timestamps.
> +	 */
> +	kvm_check_and_clear_guest_paused();
> +
[..]
>@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> 	 */
> 	duration = is_softlockup(touch_ts, period_ts);
> 	if (unlikely(duration)) {
> -		/*
> -		 * If a virtual machine is stopped by the host it can look to
> -		 * the watchdog like a soft lockup, check to see if the host
> -		 * stopped the vm before we issue the warning
> -		 */
> -		if (kvm_check_and_clear_guest_paused())
> -			return HRTIMER_RESTART;

This looks racy to me. I believe kvm_check_and_clear_guest_paused()
was in the right place.

VCPU can be scheduled out/preepmpted any time at any point; and then
guest VM (or even the entire system) can be suspended. When we resume
the VM we continue from where we were preempted (from VCPU POW).

So what the old code did

watchdog_timer_fn()
{
	...
	<<!!>>

	// Suppose we are suspended here. When we are getting resumed
	// jiffies jump forward, which may look like a soft lockup.
	duration = is_softlockup(touch_ts, period_ts);
	if (unlikely(duration)) {
		// And this is where kvm_check_and_clear_guest_paused()
		// jumps in. We know already that jiffies have jumped,
		// we don't know if jiffies jumped because the VM was
		// suspended. And this is what we figure out here and
		// bail out
		if (kvm_check_and_clear_guest_paused())
			return HRTIMER_RESTART;
	}
}

The new code does the following

watchdog_timer_fn()
{
	...
	kvm_check_and_clear_guest_paused(); // PVCLOCK_GUEST_STOPPED is not set

	<<!!>>

	// Suppose the VM got suspended at this point. PVCLOCK_GUEST_STOPPED
	// is set, but we don't check it. jiffies will jump and this will look
	// like a lockup, but we don't check if jiffies jumped because the VM
	// was suspended
	duration = is_softlockup(touch_ts, period_ts);
	if (unlikely(duration)) {
		// report the lockup and perhaps panic the system,
		// depending on the configuration
	}
}

What am I missing?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 6/7] watchdog: Cleanup handling of false positives
  2021-05-16 10:46 [PATCH v2 6/7] watchdog: Cleanup handling of false positives Sergey Senozhatsky
@ 2021-05-17 15:01 ` Petr Mladek
  0 siblings, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2021-05-17 15:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Laurence Oberman, Vincent Whitchurch, Michal Hocko, linux-kernel

On Sun 2021-05-16 19:46:21, Sergey Senozhatsky wrote:
> Hi,
> 
> // This was never in my inbox, so sorry if I mess up the "Reply-to"
> // Original message:  https://lore.kernel.org/lkml/20210311122130.6788-7-pmladek@suse.com/
> 
> 
> >@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > 	/* .. and repeat */
> > 	hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
> >
> > -	/* Reset the interval when touched externally by a known slow code. */
> > +	/*
> > +	 * If a virtual machine is stopped by the host it can look to
> > +	 * the watchdog like a soft lockup. Check to see if the host
> > +	 * stopped the vm before we process the timestamps.
> > +	 */
> > +	kvm_check_and_clear_guest_paused();
> > +
> [..]
> >@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > 	 */
> > 	duration = is_softlockup(touch_ts, period_ts);
> > 	if (unlikely(duration)) {
> > -		/*
> > -		 * If a virtual machine is stopped by the host it can look to
> > -		 * the watchdog like a soft lockup, check to see if the host
> > -		 * stopped the vm before we issue the warning
> > -		 */
> > -		if (kvm_check_and_clear_guest_paused())
> > -			return HRTIMER_RESTART;
> 
> This looks racy to me. I believe kvm_check_and_clear_guest_paused()
> was in the right place.
> 
> VCPU can be scheduled out/preepmpted any time at any point; and then
> guest VM (or even the entire system) can be suspended. When we resume
> the VM we continue from where we were preempted (from VCPU POW).
> 
> So what the old code did
> 
> watchdog_timer_fn()
> {
> 	...
> 	<<!!>>
> 
> 	// Suppose we are suspended here. When we are getting resumed
> 	// jiffies jump forward, which may look like a soft lockup.
> 	duration = is_softlockup(touch_ts, period_ts);
> 	if (unlikely(duration)) {
> 		// And this is where kvm_check_and_clear_guest_paused()
> 		// jumps in. We know already that jiffies have jumped,
> 		// we don't know if jiffies jumped because the VM was
> 		// suspended. And this is what we figure out here and
> 		// bail out
> 		if (kvm_check_and_clear_guest_paused())
> 			return HRTIMER_RESTART;
> 	}
> }
> 
> The new code does the following
> 
> watchdog_timer_fn()
> {
> 	...
> 	kvm_check_and_clear_guest_paused(); // PVCLOCK_GUEST_STOPPED is not set
> 
> 	<<!!>>
> 
> 	// Suppose the VM got suspended at this point. PVCLOCK_GUEST_STOPPED
> 	// is set, but we don't check it. jiffies will jump and this will look
> 	// like a lockup, but we don't check if jiffies jumped because the VM
> 	// was suspended
> 	duration = is_softlockup(touch_ts, period_ts);
> 	if (unlikely(duration)) {
> 		// report the lockup and perhaps panic the system,
> 		// depending on the configuration
> 	}
> }
> 
> What am I missing?

Great catch! You have a point.

Well, I think that the entire code is racy. touch_ts and period_ts are
set by:

	unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
	unsigned long period_ts = __this_cpu_read(watchdog_report_ts);

They are neither volatile not there are any barriers. It means that
period_ts might be re-read in these two checks:

	/* Reset the interval when touched by known problematic code. */
	if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
		update_report_ts();
		return HRTIMER_RESTART;
	}

and

	duration = is_softlockup(touch_ts, period_ts);


where:

static int is_softlockup(unsigned long touch_ts, unsigned long period_ts)
{
	unsigned long now = get_timestamp();

	if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
		/* Warn about unreasonable delays. */
		if (time_after(now, period_ts + get_softlockup_thresh()))
			return now - touch_ts;
	}
	return 0;
}

Now, if the watchdog is touched from NMI. period_ts might be
SOFTLOCKUP_DELAY_REPORT. It is ULONG_MAX.

As a result period_ts + get_softlockup_thresh() would overflow and
we could report softlockup even when there is none.

I probably does not happen because the per-CPU variable is read only
once. And watchdogs are not typically touched from NMI. Except that
show_regs() actually touch the watchdog.

That said. This patch most likely made things worse and should be
reverted. But even better solution would be to remove this race.
I mean to make the code safe and sane at the same time.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 6/7] watchdog: Cleanup handling of false positives
  2021-03-11 12:21 [PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup Petr Mladek
@ 2021-03-11 12:21 ` Petr Mladek
  0 siblings, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2021-03-11 12:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andrew Morton
  Cc: Laurence Oberman, Vincent Whitchurch, Michal Hocko, linux-kernel,
	Petr Mladek

The commit d6ad3e286d2c075 ("softlockup: Add sched_clock_tick() to avoid
kernel warning on kgdb resume") introduced touch_softlockup_watchdog_sync().

It solved a problem when the watchdog was touched in an atomic context,
the timer callback was proceed right after releasing interrupts,
and the local clock has not been updated yet. In this case,
sched_clock_tick() was called in watchdog_timer_fn() before
updating the timer.

So far so good.

Later the commit 5d1c0f4a80a6df73 ("watchdog: add check for suspended vm
in softlockup detector") added two kvm_check_and_clear_guest_paused()
calls. They touch the watchdog when the guest has been sleeping.

The code makes my head spin around.

Scenario 1:

    + guest did sleep:
	+ PVCLOCK_GUEST_STOPPED is set

    + 1st watchdog_timer_fn() invocation:
	+ the watchdog is not touched yet
	+ is_softlockup() returns too big delay
	+ kvm_check_and_clear_guest_paused():
	   + clear PVCLOCK_GUEST_STOPPED
	   + call touch_softlockup_watchdog_sync()
		+ set SOFTLOCKUP_DELAY_REPORT
		+ set softlockup_touch_sync
	+ return from the timer callback

      + 2nd watchdog_timer_fn() invocation:

	+ call sched_clock_tick() even though it is not needed.
	  The timer callback was invoked again only because the clock
	  has already been updated in the meantime.

	+ call kvm_check_and_clear_guest_paused() that does nothing
	  because PVCLOCK_GUEST_STOPPED has been cleared already.

	+ call update_report_ts() and return. This is fine. Except
	  that sched_clock_tick() might allow to set it already
	  during the 1st invocation.

Scenario 2:

	+ guest did sleep

	+ 1st watchdog_timer_fn() invocation
	    + same as in 1st scenario

	+ guest did sleep again:
	    + set PVCLOCK_GUEST_STOPPED again

	+ 2nd watchdog_timer_fn() invocation
	    + SOFTLOCKUP_DELAY_REPORT is set from 1st invocation
	    + call sched_clock_tick()
	    + call kvm_check_and_clear_guest_paused()
		+ clear PVCLOCK_GUEST_STOPPED
		+ call touch_softlockup_watchdog_sync()
		    + set SOFTLOCKUP_DELAY_REPORT
		    + set softlockup_touch_sync
	    + call update_report_ts() (set real timestamp immediately)
	    + return from the timer callback

	+ 3rd watchdog_timer_fn() invocation
	    + timestamp is set from 2nd invocation
	    + softlockup_touch_sync is set but not checked because
	      the real timestamp is already set

Make the code more straightforward:

1. Always call kvm_check_and_clear_guest_paused() at the very
   beginning to handle PVCLOCK_GUEST_STOPPED. It touches the watchdog
   when the quest did sleep.

2. Handle the situation when the watchdog has been touched
   (SOFTLOCKUP_DELAY_REPORT is set).

   Call sched_clock_tick() when touch_*sync() variant was used. It makes
   sure that the timestamp will be up to date even when it has been
   touched in atomic context or quest did sleep.

As a result, kvm_check_and_clear_guest_paused() is called on a single
location. And the right timestamp is always set when returning from
the timer callback.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/watchdog.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6dc1f79e36aa..c050323fcd33 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	/* .. and repeat */
 	hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
 
-	/* Reset the interval when touched externally by a known slow code. */
+	/*
+	 * If a virtual machine is stopped by the host it can look to
+	 * the watchdog like a soft lockup. Check to see if the host
+	 * stopped the vm before we process the timestamps.
+	 */
+	kvm_check_and_clear_guest_paused();
+
+	/* Reset the interval when touched by known problematic code. */
 	if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
 		if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
 			/*
@@ -386,10 +393,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			sched_clock_tick();
 		}
 
-		/* Clear the guest paused flag on watchdog reset */
-		kvm_check_and_clear_guest_paused();
 		update_report_ts();
-
 		return HRTIMER_RESTART;
 	}
 
@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	 */
 	duration = is_softlockup(touch_ts, period_ts);
 	if (unlikely(duration)) {
-		/*
-		 * If a virtual machine is stopped by the host it can look to
-		 * the watchdog like a soft lockup, check to see if the host
-		 * stopped the vm before we issue the warning
-		 */
-		if (kvm_check_and_clear_guest_paused())
-			return HRTIMER_RESTART;
-
 		/*
 		 * Prevent multiple soft-lockup reports if one cpu is already
 		 * engaged in dumping all cpu back traces.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 6/7] watchdog: Cleanup handling of false positives
  2020-12-10 16:00 [PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup Petr Mladek
@ 2020-12-10 16:00 ` Petr Mladek
  0 siblings, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2020-12-10 16:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra
  Cc: Laurence Oberman, Vincent Whitchurch, Michal Hocko, linux-kernel,
	Petr Mladek

The commit d6ad3e286d2c075 ("softlockup: Add sched_clock_tick() to avoid
kernel warning on kgdb resume") introduced touch_softlockup_watchdog_sync().

It solved a problem when the watchdog was touched in an atomic context,
the timer callback was proceed right after releasing interrupts,
and the local clock has not been updated yet. In this case,
sched_clock_tick() was called in watchdog_timer_fn() before
updating the timer.

So far so good.

Later the commit 5d1c0f4a80a6df73 ("watchdog: add check for suspended vm
in softlockup detector") added two kvm_check_and_clear_guest_paused()
calls. They touch the watchdog when the guest has been sleeping.

The code makes my head spin around.

Scenario 1:

    + guest did sleep:
	+ PVCLOCK_GUEST_STOPPED is set

    + 1st watchdog_timer_fn() invocation:
	+ the watchdog is not touched yet
	+ is_softlockup() returns too big delay
	+ kvm_check_and_clear_guest_paused():
	   + clear PVCLOCK_GUEST_STOPPED
	   + call touch_softlockup_watchdog_sync()
		+ set SOFTLOCKUP_DELAY_REPORT
		+ set softlockup_touch_sync
	+ return from the timer callback

      + 2nd watchdog_timer_fn() invocation:

	+ call sched_clock_tick() even though it is not needed.
	  The timer callback was invoked again only because the clock
	  has already been updated in the meantime.

	+ call kvm_check_and_clear_guest_paused() that does nothing
	  because PVCLOCK_GUEST_STOPPED has been cleared already.

	+ call update_report_ts() and return. This is fine. Except
	  that sched_clock_tick() might allow to set it already
	  during the 1st invocation.

Scenario 2:

	+ guest did sleep

	+ 1st watchdog_timer_fn() invocation
	    + same as in 1st scenario

	+ guest did sleep again:
	    + set PVCLOCK_GUEST_STOPPED again

	+ 2nd watchdog_timer_fn() invocation
	    + SOFTLOCKUP_DELAY_REPORT is set from 1st invocation
	    + call sched_clock_tick()
	    + call kvm_check_and_clear_guest_paused()
		+ clear PVCLOCK_GUEST_STOPPED
		+ call touch_softlockup_watchdog_sync()
		    + set SOFTLOCKUP_DELAY_REPORT
		    + set softlockup_touch_sync
	    + call update_report_ts() (set real timestamp immediately)
	    + return from the timer callback

	+ 3rd watchdog_timer_fn() invocation
	    + timestamp is set from 2nd invocation
	    + softlockup_touch_sync is set but not checked because
	      the real timestamp is already set

Make the code more straightforward:

1. Always call kvm_check_and_clear_guest_paused() at the very
   beginning to handle PVCLOCK_GUEST_STOPPED. It touches the watchdog
   when the quest did sleep.

2. Handle the situation when the watchdog has been touched
   (SOFTLOCKUP_DELAY_REPORT is set).

   Call sched_clock_tick() when touch_*sync() variant was used. It makes
   sure that the timestamp will be up to date even when it has been
   touched in atomic context or quest did sleep.

As a result, kvm_check_and_clear_guest_paused() is called on a single
location. And the right timestamp is always set when returning from
the timer callback.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/watchdog.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6dc1f79e36aa..c050323fcd33 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	/* .. and repeat */
 	hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
 
-	/* Reset the interval when touched externally by a known slow code. */
+	/*
+	 * If a virtual machine is stopped by the host it can look to
+	 * the watchdog like a soft lockup. Check to see if the host
+	 * stopped the vm before we process the timestamps.
+	 */
+	kvm_check_and_clear_guest_paused();
+
+	/* Reset the interval when touched by known problematic code. */
 	if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
 		if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
 			/*
@@ -386,10 +393,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			sched_clock_tick();
 		}
 
-		/* Clear the guest paused flag on watchdog reset */
-		kvm_check_and_clear_guest_paused();
 		update_report_ts();
-
 		return HRTIMER_RESTART;
 	}
 
@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	 */
 	duration = is_softlockup(touch_ts, period_ts);
 	if (unlikely(duration)) {
-		/*
-		 * If a virtual machine is stopped by the host it can look to
-		 * the watchdog like a soft lockup, check to see if the host
-		 * stopped the vm before we issue the warning
-		 */
-		if (kvm_check_and_clear_guest_paused())
-			return HRTIMER_RESTART;
-
 		/*
 		 * Prevent multiple soft-lockup reports if one cpu is already
 		 * engaged in dumping all cpu back traces.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-17 15:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16 10:46 [PATCH v2 6/7] watchdog: Cleanup handling of false positives Sergey Senozhatsky
2021-05-17 15:01 ` Petr Mladek
  -- strict thread matches above, loose matches on Subject: below --
2021-03-11 12:21 [PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup Petr Mladek
2021-03-11 12:21 ` [PATCH v2 6/7] watchdog: Cleanup handling of false positives Petr Mladek
2020-12-10 16:00 [PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup Petr Mladek
2020-12-10 16:00 ` [PATCH v2 6/7] watchdog: Cleanup handling of false positives Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).