linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Waiman Long <longman@redhat.com>,
	John Stultz <jstultz@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>, <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	<linux-kernel@vger.kernel.org>, Tim Chen <tim.c.chen@intel.com>
Subject: Re: [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected
Date: Fri, 23 Dec 2022 10:09:04 +0800	[thread overview]
Message-ID: <Y6UNwAOQIGWyDfsX@feng-clx> (raw)
In-Reply-To: <20221222214219.GA1875767@paulmck-ThinkPad-P17-Gen-1>

On Thu, Dec 22, 2022 at 01:42:19PM -0800, Paul E. McKenney wrote:
[...]
> > > 
> > > > > Also I've run the patch on a Alderlake system, with a fine acpi pm_timer
> > > > > and a fake broken pm_timer, and they both works without errors.
> > > > 
> > > > Thank you!  Did it correctly identify the fake broken pm_timer as being
> > > > broken?  If so, may I have your Tested-by?
> > > 
> > > On that Alderlake system, HPET will be disabled by kernel, and I
> > > manually increased the tsc frequency about 1/256 to make pm_timer
> > > look to have 1/256 deviation. And got dmesg like:
> > > 
> > > [    2.738554] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'acpi_pm' as unstable because the skew is too large:
> > > [    2.738558] clocksource:                       'tsc' wd_nsec: 275956624 wd_now: 13aab38d0d wd_last: 1382c1144d mask: ffffffffffffffff
> > > [    2.738564] clocksource:                       'acpi_pm' cs_nsec: 277034651 cs_now: 731575 cs_last: 63f3cb mask: ffffff
> > > [    2.738568] clocksource:                       'tsc' (not 'acpi_pm') is current clocksource.
> > > 
> > > The deviation is indeed about 1/256. And pm_timer won't be shown in /sys/:
> > > 
> > > /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc 
> > > /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc
> > > 
> > > So feel free to add:
> > > 
> > > 	Tested-by: Feng Tang <feng.tang@intel.com>
> > 
> > Thank you very much!  I will apply this on my next rebase.
> 
> But first, here is a prototype of the limited-time clocksource watchdog.
> Thoughts?
> 
> 						Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit dfbf67806c4c7f2bdd79cdefe86a2bea6e7afcab
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Dec 22 13:21:47 2022 -0800
> 
>     clocksource: Permit limited-duration clocksource watchdogging
>     
>     Some systems want regular clocksource checking, but their workloads
>     cannot tolerate the overhead of full-time clocksource checking.
>     Therefore, add a CLOCKSOURCE_WATCHDOG_DURATION Kconfig option and a
>     clocksource.watchdog_duration kernel-boot parameter that limits the
>     clocksource watchdog to the specified number of minutes past boot.
>     Values of zero disable checking, and a value of -1 restores the
>     traditional behavior of always checking.
>     
>     This does change behavior compared to older kernels, but recent kernels
>     disable the clocksource watchdog completely in the common case where the
>     TSC is judged to be trustworthy.  This change in behavior is therefore
>     not a real issue.
    
Yes, this changes the general semantics. Last year, I've posted a
patch to limit the watchdog to run for 10 minutes, and at that time
Thomas mentioned one of his machine may show tsc issue after running
for one day depending on work load [1].

As the intention is to validate HPET/PMTIMER, which are not as
delicate as TSC, maybe we can add a per-clocksource verify-period
field, and only set it for HPET/PMTIMER?

[1]. https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/

Thanks,
Feng

>     Link: https://lore.kernel.org/all/a82092f5-abc8-584f-b2ba-f06c82ffbe7d@redhat.com/
>     Reported-by: Waiman Long <longman@redhat.com>
>     Reported-by: Feng Tang <feng.tang@intel.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
> index bae8f11070bef..2676e011673d5 100644
> --- a/kernel/time/Kconfig
> +++ b/kernel/time/Kconfig
> @@ -209,5 +209,17 @@ config CLOCKSOURCE_WATCHDOG_MAX_SKEW_US
>  	  per million.	If the clocksource is good enough for NTP,
>  	  it is good enough for the clocksource watchdog!
>  
> +config CLOCKSOURCE_WATCHDOG_DURATION
> +	int "Default time to run clocksource watchdog (in minutes)"
> +	depends on CLOCKSOURCE_WATCHDOG
> +	range -1 1440
> +	default 10
> +	help
> +	  Specify the default number of minutes that the clocksource
> +	  watchdog should run after being started.  Specify -1 to run
> +	  indefinitely or zero to not run at all.  This value may be
> +	  overridden using the clocksource.watchdog_duration kernel
> +	  boot parameter.
> +
>  endmenu
>  endif
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index b153fce8c0e4b..c920c6c22e0fb 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -213,6 +213,9 @@ module_param(max_cswd_read_retries, ulong, 0644);
>  EXPORT_SYMBOL_GPL(max_cswd_read_retries);
>  static int verify_n_cpus = 8;
>  module_param(verify_n_cpus, int, 0644);
> +static int watchdog_duration = CONFIG_CLOCKSOURCE_WATCHDOG_DURATION;
> +module_param(watchdog_duration, int, 0444);
> +static unsigned long watchdog_end_jiffies;
>  
>  enum wd_read_status {
>  	WD_READ_SUCCESS,
> @@ -549,7 +552,9 @@ static void clocksource_watchdog(struct timer_list *unused)
>  	 * Arm timer if not already pending: could race with concurrent
>  	 * pair clocksource_stop_watchdog() clocksource_start_watchdog().
>  	 */
> -	if (!timer_pending(&watchdog_timer)) {
> +	if (!timer_pending(&watchdog_timer) &&
> +	    (watchdog_duration < 0 ||
> +	     (watchdog_duration >= 0 && time_before(jiffies, watchdog_end_jiffies)))) {
>  		watchdog_timer.expires += WATCHDOG_INTERVAL + extra_wait;
>  		add_timer_on(&watchdog_timer, next_cpu);
>  	}
> @@ -559,8 +564,10 @@ static void clocksource_watchdog(struct timer_list *unused)
>  
>  static inline void clocksource_start_watchdog(void)
>  {
> -	if (watchdog_running || !watchdog || list_empty(&watchdog_list))
> +	if (watchdog_running || !watchdog || list_empty(&watchdog_list) || !watchdog_duration)
>  		return;
> +	if (watchdog_duration > 0)
> +		watchdog_end_jiffies = jiffies + watchdog_duration * 60 * HZ;
>  	timer_setup(&watchdog_timer, clocksource_watchdog, 0);
>  	watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
>  	add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));

  parent reply	other threads:[~2022-12-23  2:14 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-20  8:25 [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected Feng Tang
2022-12-20 16:11 ` Waiman Long
2022-12-20 18:34   ` Paul E. McKenney
2022-12-21  1:01     ` Feng Tang
2022-12-21  3:26       ` Waiman Long
2022-12-22  0:40         ` Paul E. McKenney
2022-12-22  3:39           ` Waiman Long
2022-12-22  5:55             ` Paul E. McKenney
2022-12-22  6:00               ` Feng Tang
2022-12-22  6:14                 ` Paul E. McKenney
2022-12-22  6:37                   ` Feng Tang
2022-12-22 18:24                     ` Paul E. McKenney
2022-12-22 21:42                       ` Paul E. McKenney
2022-12-22 23:28                         ` Paul E. McKenney
2022-12-23  2:09                         ` Feng Tang [this message]
2022-12-23  3:37                           ` Paul E. McKenney
     [not found]                             ` <ad71008d-4acc-d211-dc19-c33bb25ff42c@redhat.com>
2022-12-23  4:14                               ` Feng Tang
2022-12-27 18:38                                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y6UNwAOQIGWyDfsX@feng-clx \
    --to=feng.tang@intel.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).