linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Feng Tang <feng.tang@intel.com>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, Tim Chen <tim.c.chen@intel.com>
Subject: Re: [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected
Date: Tue, 20 Dec 2022 11:11:08 -0500	[thread overview]
Message-ID: <6fb04ee9-ce77-4835-2ad1-b7f8419cfb77@redhat.com> (raw)
In-Reply-To: <20221220082512.186283-1-feng.tang@intel.com>

On 12/20/22 03:25, Feng Tang wrote:
> There were bug reported on 8 sockets x86 machines that TSC was wrongly
> disabled when system is under heavy workload.
>
>   [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns
>   [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped!
>   [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns
>   [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped!
>   [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable
>   [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog
>   [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
>   [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998)
>   [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564.
>   [ 821.067990] clocksource: Switched to clocksource hpet
>
> This can be reproduced when system is running memory intensive 'stream'
> test, or some stress-ng subcases like 'ioport'.
>
> The reason is when system is under heavy load, the read latency of
> clocksource can be very high, it can be seen even with lightweight
> TSC read, and is much worse on MMIO or IO port read based external
> clocksource. Causing the watchdog check to be inaccurate.
>
> As the clocksource watchdog is a lifetime check with frequency of
> twice a second, there is no need to rush doing it when the system
> is under heavy load and the clocksource read latency is very high,
> suspend the watchdog timer for 5 minutes.
>
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
>   kernel/time/clocksource.c | 45 ++++++++++++++++++++++++++++-----------
>   1 file changed, 32 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 9cf32ccda715..8cd74b89d577 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -384,6 +384,15 @@ void clocksource_verify_percpu(struct clocksource *cs)
>   }
>   EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
>   
> +static inline void clocksource_reset_watchdog(void)
> +{
> +	struct clocksource *cs;
> +
> +	list_for_each_entry(cs, &watchdog_list, wd_list)
> +		cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
> +}
> +
> +
>   static void clocksource_watchdog(struct timer_list *unused)
>   {
>   	u64 csnow, wdnow, cslast, wdlast, delta;
> @@ -391,6 +400,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>   	int64_t wd_nsec, cs_nsec;
>   	struct clocksource *cs;
>   	enum wd_read_status read_ret;
> +	unsigned long extra_wait = 0;
>   	u32 md;
>   
>   	spin_lock(&watchdog_lock);
> @@ -410,13 +420,30 @@ static void clocksource_watchdog(struct timer_list *unused)
>   
>   		read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
>   
> -		if (read_ret != WD_READ_SUCCESS) {
> -			if (read_ret == WD_READ_UNSTABLE)
> -				/* Clock readout unreliable, so give it up. */
> -				__clocksource_unstable(cs);
> +		if (read_ret == WD_READ_UNSTABLE) {
> +			/* Clock readout unreliable, so give it up. */
> +			__clocksource_unstable(cs);
>   			continue;
>   		}
>   
> +		/*
> +		 * When WD_READ_SKIP is returned, it means the system is likely
> +		 * under very heavy load, where the latency of reading
> +		 * watchdog/clocksource is very big, and affect the accuracy of
> +		 * watchdog check. So give system some space and suspend the
> +		 * watchdog check for 5 minutes.
> +		 */
> +		if (read_ret == WD_READ_SKIP) {
> +			/*
> +			 * As the watchdog timer will be suspended, and
> +			 * cs->last could keep unchanged for 5 minutes, reset
> +			 * the counters.
> +			 */
> +			clocksource_reset_watchdog();
> +			extra_wait = HZ * 300;
> +			break;
> +		}
> +
>   		/* Clocksource initialized ? */
>   		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
>   		    atomic_read(&watchdog_reset_pending)) {
> @@ -512,7 +539,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>   	 * pair clocksource_stop_watchdog() clocksource_start_watchdog().
>   	 */
>   	if (!timer_pending(&watchdog_timer)) {
> -		watchdog_timer.expires += WATCHDOG_INTERVAL;
> +		watchdog_timer.expires += WATCHDOG_INTERVAL + extra_wait;
>   		add_timer_on(&watchdog_timer, next_cpu);
>   	}
>   out:
> @@ -537,14 +564,6 @@ static inline void clocksource_stop_watchdog(void)
>   	watchdog_running = 0;
>   }
>   
> -static inline void clocksource_reset_watchdog(void)
> -{
> -	struct clocksource *cs;
> -
> -	list_for_each_entry(cs, &watchdog_list, wd_list)
> -		cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
> -}
> -
>   static void clocksource_resume_watchdog(void)
>   {
>   	atomic_inc(&watchdog_reset_pending);

It looks reasonable to me. Thanks for the patch.

Acked-by: Waiman Long <longman@redhat.com>


  reply	other threads:[~2022-12-20 16:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-20  8:25 [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected Feng Tang
2022-12-20 16:11 ` Waiman Long [this message]
2022-12-20 18:34   ` Paul E. McKenney
2022-12-21  1:01     ` Feng Tang
2022-12-21  3:26       ` Waiman Long
2022-12-22  0:40         ` Paul E. McKenney
2022-12-22  3:39           ` Waiman Long
2022-12-22  5:55             ` Paul E. McKenney
2022-12-22  6:00               ` Feng Tang
2022-12-22  6:14                 ` Paul E. McKenney
2022-12-22  6:37                   ` Feng Tang
2022-12-22 18:24                     ` Paul E. McKenney
2022-12-22 21:42                       ` Paul E. McKenney
2022-12-22 23:28                         ` Paul E. McKenney
2022-12-23  2:09                         ` Feng Tang
2022-12-23  3:37                           ` Paul E. McKenney
     [not found]                             ` <ad71008d-4acc-d211-dc19-c33bb25ff42c@redhat.com>
2022-12-23  4:14                               ` Feng Tang
2022-12-27 18:38                                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fb04ee9-ce77-4835-2ad1-b7f8419cfb77@redhat.com \
    --to=longman@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).