All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	<linux-kernel@vger.kernel.org>, <john.stultz@linaro.org>,
	<sboyd@kernel.org>, <corbet@lwn.net>, <Mark.Rutland@arm.com>,
	<maz@kernel.org>, <kernel-team@meta.com>,
	<neeraju@codeaurora.org>, <ak@linux.intel.com>,
	<zhengjun.xing@intel.com>, Chris Mason <clm@meta.com>,
	John Stultz <jstultz@google.com>,
	Waiman Long <longman@redhat.com>
Subject: Re: [PATCH clocksource 1/3] clocksource: Reject bogus watchdog clocksource measurements
Date: Mon, 28 Nov 2022 10:15:43 +0800	[thread overview]
Message-ID: <Y4QZzzk+FdGj4AXm@feng-clx> (raw)
In-Reply-To: <20221123212348.GI4001@paulmck-ThinkPad-P17-Gen-1>

On Wed, Nov 23, 2022 at 01:23:48PM -0800, Paul E. McKenney wrote:
> On Wed, Nov 23, 2022 at 10:36:04AM +0800, Feng Tang wrote:
> > On Tue, Nov 22, 2022 at 02:07:12PM -0800, Paul E. McKenney wrote:
> > [...] 
> > > > > If PM_TIMER was involved, I would expect 'acpi_pm' instead of
> > > > > refined-jiffies.  Or am I misinterpreting the output and/or code?
> > > > 
> > > > It's about timing. On a typical server platform, the clocksources
> > > > init order could be:
> > > >   refined-jiffies --> hpet --> tsc-early --> acpi_pm --> tsc 
> > > > 
> > > > >From your log, TSC('tsc-early') is disabled before 'acpi_pm' get
> > > > initialized, so 'acpi_pm' timer (if exist) had no chance to watchdog
> > > > the tsc.
> > > > 
> > > > > Either way, would it make sense to add CLOCK_SOURCE_MUST_VERIFY to
> > > > > clocksource_hpet.flags?
> > > > 
> > > > Maybe try below patch, which will skip watchdog for 'tsc-early',
> > > > while giving 'acpi_pm' timer a chance to watchdog 'tsc'.
> > > > 
> > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > > > index cafacb2e58cc..9840f0131764 100644
> > > > --- a/arch/x86/kernel/tsc.c
> > > > +++ b/arch/x86/kernel/tsc.c
> > > > @@ -1131,8 +1131,7 @@ static struct clocksource clocksource_tsc_early = {
> > > >  	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
> > > >  	.read			= read_tsc,
> > > >  	.mask			= CLOCKSOURCE_MASK(64),
> > > > -	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
> > > > -				  CLOCK_SOURCE_MUST_VERIFY,
> > > > +	.flags			= CLOCK_SOURCE_IS_CONTINUOUS,
> > > >  	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
> > > >  	.enable			= tsc_cs_enable,
> > > >  	.resume			= tsc_resume,
> > > 
> > > Your mainline patch b50db7095fe0 ("x86/tsc: Disable clocksource watchdog
> > > for TSC on qualified platorms") mitigates the issue so we are good for
> > > the immediate future, at least assuming reliable TSC.
> > > 
> > > But it also disables checking against HPET, hence my question about
> > > marking clocksource_hpet.flags with CLOCK_SOURCE_MUST_VERIFY at boot time
> > > on systems whose CPUs have constant_tsc, nonstop_tsc, and tsc_adjust.
> > 
> > IIUC, this will make TSC to watchdog HPET every 500 ms. We have got
> > report that the 500ms watchdog timer had big impact on some parallel
> > workload on big servers, that was another factor for us to seek
> > stopping the timer.
> 
> Another approach would be to slow it down.  Given the tighter bounds
> on skew, it could be done every (say) 10 seconds while allowing
> 2 milliseconds skew instead of the current 100 microseconds.

Yes, this can reduce the OS noise much. One problem is if we make it
a general interface, there is some clocksource whose warp time is
less than 10 seconds, like ACPI PM_TIMER (3-4 seconds), and I don't
know if other ARCHs have similar cases.

> 
> > Is this about the concern of possible TSC frequency calibration
> > issue, as the 40 ms per second drift between HPET and TSC? With 
> > b50db7095fe0 backported, we also have another patch to force TSC
> > calibration for those platforms which get the TSC freq directly
> > from CPUID or MSR and don't have such info in dmesg:
> >  "tsc: Refined TSC clocksource calibration: 2693.509 MHz" 
> > 
> > https://lore.kernel.org/lkml/20220509144110.9242-1-feng.tang@intel.com/
> > 
> > We did met tsc calibration issue due to some firmware issue, and
> > this can help to catch it. You can try it if you think it's relevant.
> 
> I am giving this a go, thank you!

Thanks for spending time testing it!

Thanks,
Feng

  reply	other threads:[~2022-11-28  2:19 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14 23:28 [PATCH clocksource 0/3] Reject bogus watchdog clocksource measurements Paul E. McKenney
2022-11-14 23:28 ` [PATCH clocksource 1/3] clocksource: " Paul E. McKenney
2022-11-17 21:57   ` Thomas Gleixner
2022-11-17 23:09     ` Paul E. McKenney
2022-11-21  0:55       ` Feng Tang
2022-11-21 15:21         ` Paul E. McKenney
2022-11-21 18:14         ` Paul E. McKenney
2022-11-22 15:55           ` Feng Tang
2022-11-22 22:07             ` Paul E. McKenney
2022-11-23  2:36               ` Feng Tang
2022-11-23 21:23                 ` Paul E. McKenney
2022-11-28  2:15                   ` Feng Tang [this message]
2022-11-29 19:29                     ` Paul E. McKenney
2022-11-30  1:38                       ` Feng Tang
2022-11-30  4:12                         ` Paul E. McKenney
2022-11-30  4:49                           ` Feng Tang
2022-11-30  5:16                             ` Paul E. McKenney
2022-11-30  5:35                               ` Feng Tang
2022-11-30  5:50                                 ` Paul E. McKenney
2022-11-30  6:00                                   ` Feng Tang
2022-12-01 17:24                                     ` Paul E. McKenney
2022-12-02  1:10                                       ` Feng Tang
2022-12-02  1:44                                         ` Paul E. McKenney
2022-12-02  2:02                                           ` Feng Tang
2022-12-02 22:24                                             ` Paul E. McKenney
2022-12-03  2:51                                               ` Feng Tang
2022-11-14 23:28 ` [PATCH clocksource 2/3] clocksource: Add comments to classify bogus measurements Paul E. McKenney
2022-11-14 23:28 ` [PATCH clocksource 3/3] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4QZzzk+FdGj4AXm@feng-clx \
    --to=feng.tang@intel.com \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=clm@meta.com \
    --cc=corbet@lwn.net \
    --cc=john.stultz@linaro.org \
    --cc=jstultz@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=maz@kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=zhengjun.xing@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.