linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chao Gao <chao.gao@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	Mark Rutland <Mark.Rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Chris Mason <clm@fb.com>, LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	lkp@lists.01.org, lkp@intel.com, ying.huang@intel.com,
	zhengjun.xing@intel.com
Subject: Re: [clocksource]  8901ecc231:  stress-ng.lockbus.ops_per_sec -9.5% regression
Date: Mon, 2 Aug 2021 14:20:09 +0800	[thread overview]
Message-ID: <20210802062008.GA24720@gao-cwp> (raw)
In-Reply-To: <20210527182959.GA437082@paulmck-ThinkPad-P17-Gen-1>

[snip]
>commit 48ebcfbfd877f5d9cddcc03c91352a8ca7b190af
>Author: Paul E. McKenney <paulmck@kernel.org>
>Date:   Thu May 27 11:03:28 2021 -0700
>
>    clocksource: Forgive repeated long-latency watchdog clocksource reads
>    
>    Currently, the clocksource watchdog reacts to repeated long-latency
>    clocksource reads by marking that clocksource unstable on the theory that
>    these long-latency reads are a sign of a serious problem.  And this theory
>    does in fact have real-world support in the form of firmware issues [1].
>    
>    However, it is also possible to trigger this using stress-ng on what
>    the stress-ng man page terms "poorly designed hardware" [2].  And it
>    is not necessarily a bad thing for the kernel to diagnose cases where
>    high-stress workloads are being run on hardware that is not designed
>    for this sort of use.
>    
>    Nevertheless, it is quite possible that real-world use will result in
>    some situation requiring that high-stress workloads run on hardware
>    not designed to accommodate them, and also requiring that the kernel
>    refrain from marking clocksources unstable.
>    
>    Therefore, provide an out-of-tree patch that reacts to this situation
>    by leaving the clocksource alone, but using the old 62.5-millisecond
>    skew-detection threshold in response persistent long-latency reads.
>    In addition, the offending clocksource is marked for re-initialization
>    in this case, which both restarts that clocksource with a clean bill of
>    health and avoids false-positive skew reports on later watchdog checks.

Hi Paul,

Sorry to dig out this old thread.

I am testing with this patch in a VM, but I find sometimes re-initialization
after coarse-grained skew check may not happen as expected because ...

>    
>    Link: https://lore.kernel.org/lkml/20210513155515.GB23902@xsang-OptiPlex-9020/ # [1]
>    Link: https://lore.kernel.org/lkml/20210521083322.GG25531@xsang-OptiPlex-9020/ # [2]
>    Link: https://lore.kernel.org/lkml/20210521084405.GH25531@xsang-OptiPlex-9020/
>    Link: https://lore.kernel.org/lkml/20210511233403.GA2896757@paulmck-ThinkPad-P17-Gen-1/
>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
>diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c
>index 01df12395c0e..b72a969f7b93 100644
>--- a/kernel/time/clocksource-wdtest.c
>+++ b/kernel/time/clocksource-wdtest.c
>@@ -146,13 +146,12 @@ static int wdtest_func(void *arg)
> 		else if (i <= max_cswd_read_retries)
> 			s = ", expect message";
> 		else
>-			s = ", expect clock skew";
>+			s = ", expect coarse-grained clock skew check and re-initialization";
> 		pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s);
> 		WRITE_ONCE(wdtest_ktime_read_ndelays, i);
> 		schedule_timeout_uninterruptible(2 * HZ);
> 		WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays));
>-		WARN_ON_ONCE((i <= max_cswd_read_retries) !=
>-			     !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
>+		WARN_ON_ONCE(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE);
> 		wdtest_ktime_clocksource_reset();
> 	}
> 
>diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>index 4485635b69f5..6c0820779bd3 100644
>--- a/kernel/time/clocksource.c
>+++ b/kernel/time/clocksource.c
>@@ -225,13 +225,13 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> 				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
> 					smp_processor_id(), watchdog->name, nretries);
> 			}
>-			return true;
>+			return false;
> 		}
> 	}
> 
>-	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
>+	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, coarse-grained skew check followed by re-initialization\n",
> 		smp_processor_id(), watchdog->name, wd_delay, nretries);
>-	return false;
>+	return true;
> }
> 
> static u64 csnow_mid;
>@@ -355,6 +355,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> 	int next_cpu, reset_pending;
> 	int64_t wd_nsec, cs_nsec;
> 	struct clocksource *cs;
>+	bool coarse;
> 	u32 md;
> 
> 	spin_lock(&watchdog_lock);
>@@ -372,11 +373,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> 			continue;
> 		}
> 
>-		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
>-			/* Clock readout unreliable, so give it up. */
>-			__clocksource_unstable(cs);
>-			continue;
>-		}
>+		coarse = cs_watchdog_read(cs, &csnow, &wdnow);
> 
> 		/* Clocksource initialized ? */
> 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
>@@ -402,7 +399,13 @@ static void clocksource_watchdog(struct timer_list *unused)
> 			continue;
> 
> 		/* Check the deviation from the watchdog clocksource. */
>-		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>+		if (coarse) {
>+			md = 62500 * NSEC_PER_USEC;
>+			cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>+			pr_warn("timekeeping watchdog on CPU%d: %s coarse-grained %lu.%03lu ms clock-skew check followed by re-initialization\n", smp_processor_id(), watchdog->name, md / NSEC_PER_MSEC, md % NSEC_PER_MSEC / NSEC_PER_USEC);

... this message on CPU5 doesn't show up in below kernel logs.
Do you think it is a bug? if yes, any idea to resolve it?

[  498.571086] clocksource: timekeeping watchdog on CPU1: hpet read-back delay of 432490ns, attempt 4, coarse-grained skew check followed by re-initialization
[  498.572867] clocksource: timekeeping watchdog on CPU1: hpet coarse-grained 62.500 ms clock-skew check followed by re-initialization
[  504.071959] clocksource: timekeeping watchdog on CPU4: hpet read-back delay of 1679880ns, attempt 4, coarse-grained skew check followed by re-initialization
[  504.073817] clocksource: timekeeping watchdog on CPU4: hpet coarse-grained 62.500 ms clock-skew check followed by re-initialization
[  504.568821] clocksource: timekeeping watchdog on CPU5: hpet read-back delay of 554880ns, attempt 4, coarse-grained skew check followed by re-initialization
[  505.067666] clocksource: timekeeping watchdog on CPU6: hpet retried 3 times before success
[  505.068593] clocksource: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large:
[  505.069596] clocksource:                       'hpet' wd_nsec: 499376790 wd_now: be2f200d wd_last: bb3522fe mask: ffffffff
[  505.071131] clocksource:                       'tsc' cs_nsec: 498867307 cs_now: 103895c060a cs_last: 1034aea96ea mask: ffffffffffffffff
[  505.072994] clocksource:                       'tsc' is current clocksource.
[  505.074748] tsc: Marking TSC unstable due to clocksource watchdog

Thanks
-Chao

>+		} else {
>+			md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>+		}
> 		if (abs(cs_nsec - wd_nsec) > md) {
> 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
> 				smp_processor_id(), cs->name);

  parent reply	other threads:[~2021-08-02  6:12 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21  8:33 [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression kernel test robot
2021-05-21 13:56 ` Paul E. McKenney
2021-05-22 16:08   ` Paul E. McKenney
2021-05-26  6:49     ` Feng Tang
2021-05-26 13:49       ` Paul E. McKenney
2021-05-27 18:29         ` Paul E. McKenney
2021-05-27 19:01           ` Andi Kleen
2021-05-27 19:19             ` Paul E. McKenney
2021-05-27 19:29               ` Matthew Wilcox
2021-05-27 21:05                 ` Paul E. McKenney
2021-05-28  0:58                   ` Andi Kleen
2021-06-01 17:10                     ` Paul E. McKenney
2021-08-02  6:20           ` Chao Gao [this message]
2021-08-02 17:02             ` Paul E. McKenney
2021-08-03  8:58               ` Chao Gao
2021-08-03 13:48                 ` Paul E. McKenney
2021-08-05  2:16                   ` Chao Gao
2021-08-05  4:03                     ` Paul E. McKenney
2021-08-05  4:34                       ` Andi Kleen
2021-08-05 15:33                         ` Paul E. McKenney
2021-08-05  5:39                       ` Chao Gao
2021-08-05 15:37                         ` Paul E. McKenney
2021-08-06  2:10                           ` Chao Gao
2021-08-06  4:15                             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210802062008.GA24720@gao-cwp \
    --to=chao.gao@intel.com \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=feng.tang@intel.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=maz@kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    --cc=zhengjun.xing@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).