linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: tglx@linutronix.de, linux-kernel@vger.kernel.org,
	john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net,
	Mark.Rutland@arm.com, maz@kernel.org, kernel-team@fb.com,
	neeraju@codeaurora.org, ak@linux.intel.com,
	zhengjun.xing@intel.com,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>
Subject: Re: [PATCH v10 clocksource 6/7] clocksource: Forgive tsc_early pre-calibration drift
Date: Tue, 27 Apr 2021 12:16:37 +0800	[thread overview]
Message-ID: <20210427041637.GA66694@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20210427034640.GF975577@paulmck-ThinkPad-P17-Gen-1>

On Mon, Apr 26, 2021 at 08:46:40PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 27, 2021 at 09:13:55AM +0800, Feng Tang wrote:
> > On Mon, Apr 26, 2021 at 11:26:52AM -0700, Paul E. McKenney wrote:
> > > On Mon, Apr 26, 2021 at 11:36:05PM +0800, Feng Tang wrote:
> > > > On Mon, Apr 26, 2021 at 08:25:29AM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Apr 26, 2021 at 11:01:27PM +0800, Feng Tang wrote:
> > > > > > Hi Paul,
> > > > > > 
> > > > > > On Sun, Apr 25, 2021 at 03:47:07PM -0700, Paul E. McKenney wrote:
> > > > > > > Because the x86 tsc_early clocksource is given a quick and semi-accurate
> > > > > > > calibration (by design!), it might have drift rates well in excess of
> > > > > > > the 0.1% limit that is in the process of being adopted.
> > > > > > > 
> > > > > > > Therefore, add a max_drift field to the clocksource structure that, when
> > > > > > > non-zero, specifies the maximum allowable drift rate in nanoseconds over
> > > > > > > a half-second period.  The tsc_early clocksource initializes this to five
> > > > > > > miliseconds, which corresponds to the 1% drift rate limit suggested by
> > > > > > > Xing Zhengjun.  This max_drift field is intended only for early boot,
> > > > > > > so clocksource_watchdog() splats if it encounters a non-zero value in
> > > > > > > this field more than 60 seconds after boot, inspired by a suggestion by
> > > > > > > Thomas Gleixner.
> > > > > > > 
> > > > > > > This was tested by setting the clocksource_tsc ->max_drift field to 1,
> > > > > > > which, as expected, resulted in a clock-skew event.
> > > > > > 
> > > > > > We've run the same last for this v10, and those 'unstable' thing [1] can
> > > > > > not be reproduced!
> > > > > 
> > > > > Good to hear!  ;-)
> > > > > 
> > > > > > We've reported one case that tsc can be wrongly judged as 'unstable'
> > > > > > by 'refined-jiffies' watchdog [1], while reducing the threshold could
> > > > > > make it easier to be triggered.
> > > > > > 
> > > > > > It could be reproduced on the a plaform with a 115200 serial console,
> > > > > > and hpet been disabled (several x86 platforms has this), add 
> > > > > > 'initcall_debug' cmdline parameter to get more debug message, we can
> > > > > > see:
> > > > > > 
> > > > > > [    1.134197] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc-early' as unstable because the skew is too large:
> > > > > > [    1.134214] clocksource:                       'refined-jiffies' wd_nesc: 500000000 wd_now: ffff8b35 wd_last: ffff8b03 mask: ffffffff
> > > > > > [    1.134217] clocksource:                       'tsc-early' cs_nsec: 507537855 cs_now: 4e63c9d09 cs_last: 4bebd81f5 mask: ffffffffffffffff
> > > > > > [    1.134220] clocksource:                       No current clocksource.
> > > > > > [    1.134222] tsc: Marking TSC unstable due to clocksource watchdog
> > > > > 
> > > > > Just to make sure I understand: "could be reproduced" as in this is the
> > > > > result from v9, and v10 avoids this, correct?
> > > > 
> > > > Sorry I didn't make it clear. This is a rarely happened case, and can
> > > > be reproduced with upstream kerenl, which has 62.5 ms threshold. 6/7 &
> > > > 7/7 patch of reducing the threshold can make it easier to be triggered.
> > > 
> > > Ah, OK, so this could be considered to be a benefit of this series, then.
> > > 
> > > Does this happen only for tsc-early, or for tsc as well?
> > > 
> > > Has it already been triggered on v10 of this series?  (I understand that
> > > it certainly should be easier to trigger, just curious whether this has
> > > already happened.)
> > 
> > Yes, it has. The upper log is from v10 (actually it's the 'dev' branch
> > of your linux-rcu git, which I didn't find obvious difference) on a
> > client platform 
> > 
> >  [    1.134214] clocksource:    'refined-jiffies' wd_nesc: 500000000 wd_now: ffff8b35 wd_last: ffff8b03 mask: ffffffff
> >  [    1.134217] clocksource:    'tsc-early' cs_nsec: 507537855 cs_now: 4e63c9d09 cs_last: 4bebd81f5 mask: ffffffffffffffff
> > 
> > The deviation is 7537855 ns (7.5 ms). And as said before, it needs many
> > pre-conditions to be triggered.
> > 
> > Also I found the debug patch is useful, which prints out the direct
> > nanoseconds info when 'unstable' is detected.
> 
> Looks good to me!
> 
> If you give me a Signed-off-by, I would be happy to queue it.

Sure, here it is. thanks!

- Feng

From ff23cc5589c84c4d5f9a009867f21ac5ce96c9e3 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Tue, 27 Apr 2021 12:14:30 +0800
Subject: [PATCH] clocksource: print deviation in nanoseconds for unstable case

Currently when an unstable clocksource is detected, the raw counter
of that clocksource and watchdog will be printed, which can only be
understood after some math calculation. So print the existing delta
in nanoseconds for easier check.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 kernel/time/clocksource.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index a374cf7..5370f0c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -443,10 +443,10 @@ static void clocksource_watchdog(struct timer_list *unused)
 		if (abs(cs_nsec - wd_nsec) > md) {
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
-			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-				watchdog->name, wdnow, wdlast, watchdog->mask);
-			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
-				cs->name, csnow, cslast, cs->mask);
+			pr_warn("                      '%s' wd_nesc: %lld wd_now: %llx wd_last: %llx mask: %llx\n",
+				watchdog->name, wd_nsec, wdnow, wdlast, watchdog->mask);
+			pr_warn("                      '%s' cs_nsec: %lld cs_now: %llx cs_last: %llx mask: %llx\n",
+				cs->name, cs_nsec, csnow, cslast, cs->mask);
 			if (curr_clocksource == cs)
 				pr_warn("                      '%s' is current clocksource.\n", cs->name);
 			else if (curr_clocksource)
-- 
2.7.4


  reply	other threads:[~2021-04-27  4:16 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-25 22:45 [PATCH v10 clocksource 0/7] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 1/7] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
2021-04-26  4:07   ` Andi Kleen
2021-04-26  7:13     ` Thomas Gleixner
2021-04-26 15:28     ` Paul E. McKenney
2021-04-26 16:00       ` Andi Kleen
2021-04-26 16:14         ` Paul E. McKenney
2021-04-26 17:56           ` Andi Kleen
2021-04-26 18:24             ` Paul E. McKenney
2021-04-28  4:49               ` Luming Yu
2021-04-28 13:57                 ` Paul E. McKenney
2021-04-28 14:24                   ` Luming Yu
2021-04-28 14:37                     ` Thomas Gleixner
2021-04-25 22:47 ` [PATCH v10 clocksource 2/7] clocksource: Retry clock read if long delays detected Paul E. McKenney
2021-04-27  1:44   ` Feng Tang
2021-04-25 22:47 ` [PATCH v10 clocksource 3/7] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
2021-04-26  4:12   ` Andi Kleen
2021-04-26  7:16     ` Thomas Gleixner
2021-04-25 22:47 ` [PATCH v10 clocksource 4/7] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 5/7] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 6/7] clocksource: Forgive tsc_early pre-calibration drift Paul E. McKenney
2021-04-26 15:01   ` Feng Tang
2021-04-26 15:25     ` Paul E. McKenney
2021-04-26 15:36       ` Feng Tang
2021-04-26 18:26         ` Paul E. McKenney
2021-04-27  1:13           ` Feng Tang
2021-04-27  3:46             ` Paul E. McKenney
2021-04-27  4:16               ` Feng Tang [this message]
2021-04-26 15:28     ` Thomas Gleixner
2021-04-27 21:03     ` Thomas Gleixner
2021-04-27  7:27   ` [clocksource] 8c30ace35d: WARNING:at_kernel/time/clocksource.c:#clocksource_watchdog kernel test robot
2021-04-27  8:45     ` Feng Tang
2021-04-27 13:37       ` Paul E. McKenney
2021-04-27 17:50         ` Paul E. McKenney
2021-04-27 21:09           ` Thomas Gleixner
2021-04-28  1:48             ` Paul E. McKenney
2021-04-28 10:14               ` Thomas Gleixner
2021-04-28 18:31                 ` Paul E. McKenney
2021-04-28 13:34             ` Thomas Gleixner
2021-04-28 15:39               ` Peter Zijlstra
2021-04-28 17:00                 ` Thomas Gleixner
2021-04-29  7:38                   ` Feng Tang
2021-04-28 18:31               ` Paul E. McKenney
2021-04-29  8:27                 ` Thomas Gleixner
2021-04-29 14:26                   ` Paul E. McKenney
2021-04-29 17:30                     ` Thomas Gleixner
2021-04-29 23:04                       ` Andi Kleen
2021-04-30  0:24                         ` Paul E. McKenney
2021-04-30  0:59                           ` Paul E. McKenney
2021-04-30  5:08                       ` Paul E. McKenney
2021-04-25 22:47 ` [PATCH v9 clocksource 6/6] clocksource: Reduce WATCHDOG_THRESHOLD Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 7/7] " Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210427041637.GA66694@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=zhengjun.xing@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).