linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH clocksource 0/2] Clocksource-watchdog classification and backoff
@ 2022-11-02 18:40 Paul E. McKenney
  2022-11-02 18:40 ` [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements Paul E. McKenney
  2022-11-02 18:40 ` [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney
  0 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-02 18:40 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing

Hello!

This series adds comments classifying bogus measurements and adds capped
exponential backoff for messages that are likely caused by overly busy
systems.

1.	Add comments to classify bogus measurements.

2.	Exponential backoff for load-induced bogus watchdog reads.

						Thanx, Paul

------------------------------------------------------------------------

 b/include/linux/clocksource.h |    4 ++++
 b/kernel/time/clocksource.c   |    2 ++
 kernel/time/clocksource.c     |   31 +++++++++++++++++++++++++------
 3 files changed, 31 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements
  2022-11-02 18:40 [PATCH clocksource 0/2] Clocksource-watchdog classification and backoff Paul E. McKenney
@ 2022-11-02 18:40 ` Paul E. McKenney
  2022-11-03  2:23   ` Waiman Long
  2022-11-04  2:13   ` Feng Tang
  2022-11-02 18:40 ` [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney
  1 sibling, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-02 18:40 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, John Stultz, Waiman Long

An extremely busy system can delay the clocksource watchdog, so that
the corresponding too-long bogus-measurement error does not necessarily
imply an error in the system.  However, a too-short bogus-measurement
error likely indicates a bug in hardware, firmware or software.

Therefore, add comments clarifying these bogus-measurement pr_warn()s.

Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Waiman Long <longman@redhat.com>
---
 kernel/time/clocksource.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index dcaf38c062161..3f5317faf891f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
 		if (wd_nsec < (wdi >> 2)) {
+			/* This usually indicates broken timer code or hardware. */
 			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
 		if (wd_nsec > (wdi << 2)) {
+			/* This can happen on busy systems, which can delay the watchdog. */
 			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-02 18:40 [PATCH clocksource 0/2] Clocksource-watchdog classification and backoff Paul E. McKenney
  2022-11-02 18:40 ` [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements Paul E. McKenney
@ 2022-11-02 18:40 ` Paul E. McKenney
  2022-11-03  2:28   ` Waiman Long
  2022-11-04  2:16   ` Feng Tang
  1 sibling, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-02 18:40 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, Waiman Long, John Stultz

The clocksource watchdog will reject measurements that are excessively
delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
watchdog interval.  On an extremely busy system, this can result in a
console message being printed every two seconds.  This is excessively
noisy for a non-error condition.

Therefore, apply exponential backoff to these messages.  This exponential
backoff is capped at 1024 times the watchdog interval, which comes to
not quite one message per ten minutes.

Please note that the bogus watchdog reads that occur when the watchdog
interval is less than 0.125 seconds are still printed unconditionally
because these likely correspond to a serious error condition in the
timer code or hardware.

[ paulmck: Apply Feng Tang feedback. ]

Reported-by: Waiman Long <longman@redhat.com>
Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Waiman Long <longman@redhat.com>
---
 include/linux/clocksource.h |  4 ++++
 kernel/time/clocksource.c   | 31 +++++++++++++++++++++++++------
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 1d42d4b173271..daac05aedf56a 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -125,6 +125,10 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
+	u64			wd_last_bogus;
+	int			wd_bogus_shift;
+	unsigned long		wd_bogus_count;
+	unsigned long		wd_bogus_count_last;
 #endif
 	struct module		*owner;
 };
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f5317faf891f..de8047b6720f5 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
-		if (wd_nsec < (wdi >> 2)) {
-			/* This usually indicates broken timer code or hardware. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
+			bool needwarn = false;
+			u64 wd_lb;
+
+			cs->wd_bogus_count++;
+			if (!cs->wd_bogus_shift) {
+				needwarn = true;
+			} else {
+				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
+				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
+					needwarn = true;
+			}
+			if (needwarn) {
+				/* This can happen on busy systems, which can delay the watchdog. */
+				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
+				cs->wd_last_bogus = wdnow;
+				if (cs->wd_bogus_shift < 10)
+					cs->wd_bogus_shift++;
+				cs->wd_bogus_count_last = cs->wd_bogus_count;
+			}
 			continue;
 		}
-		if (wd_nsec > (wdi << 2)) {
-			/* This can happen on busy systems, which can delay the watchdog. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		/* Check too-short measurements second to handle wrap. */
+		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
+			/* This usually indicates broken timer code or hardware. */
+			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements
  2022-11-02 18:40 ` [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements Paul E. McKenney
@ 2022-11-03  2:23   ` Waiman Long
  2022-11-03 20:47     ` Paul E. McKenney
  2022-11-04  2:13   ` Feng Tang
  1 sibling, 1 reply; 18+ messages in thread
From: Waiman Long @ 2022-11-03  2:23 UTC (permalink / raw)
  To: Paul E. McKenney, tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing, John Stultz

On 11/2/22 14:40, Paul E. McKenney wrote:
> An extremely busy system can delay the clocksource watchdog, so that
> the corresponding too-long bogus-measurement error does not necessarily
> imply an error in the system.  However, a too-short bogus-measurement
> error likely indicates a bug in hardware, firmware or software.
>
> Therefore, add comments clarifying these bogus-measurement pr_warn()s.
>
> Reported-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
> ---
>   kernel/time/clocksource.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index dcaf38c062161..3f5317faf891f 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
>   		/* Check for bogus measurements. */
>   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
>   		if (wd_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
>   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>   			continue;
>   		}
>   		if (wd_nsec > (wdi << 2)) {
> +			/* This can happen on busy systems, which can delay the watchdog. */
>   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>   			continue;
>   		}

Looks good.

Reviewed-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-02 18:40 ` [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney
@ 2022-11-03  2:28   ` Waiman Long
  2022-11-03 20:49     ` Paul E. McKenney
  2022-11-04  2:16   ` Feng Tang
  1 sibling, 1 reply; 18+ messages in thread
From: Waiman Long @ 2022-11-03  2:28 UTC (permalink / raw)
  To: Paul E. McKenney, tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing, John Stultz

On 11/2/22 14:40, Paul E. McKenney wrote:
> The clocksource watchdog will reject measurements that are excessively
> delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> watchdog interval.  On an extremely busy system, this can result in a
> console message being printed every two seconds.  This is excessively
> noisy for a non-error condition.
>
> Therefore, apply exponential backoff to these messages.  This exponential
> backoff is capped at 1024 times the watchdog interval, which comes to
> not quite one message per ten minutes.
>
> Please note that the bogus watchdog reads that occur when the watchdog
> interval is less than 0.125 seconds are still printed unconditionally
> because these likely correspond to a serious error condition in the
> timer code or hardware.
>
> [ paulmck: Apply Feng Tang feedback. ]
>
> Reported-by: Waiman Long <longman@redhat.com>
> Reported-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
> ---
>   include/linux/clocksource.h |  4 ++++
>   kernel/time/clocksource.c   | 31 +++++++++++++++++++++++++------
>   2 files changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 1d42d4b173271..daac05aedf56a 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -125,6 +125,10 @@ struct clocksource {
>   	struct list_head	wd_list;
>   	u64			cs_last;
>   	u64			wd_last;
> +	u64			wd_last_bogus;
> +	int			wd_bogus_shift;
> +	unsigned long		wd_bogus_count;
> +	unsigned long		wd_bogus_count_last;
>   #endif
>   	struct module		*owner;
>   };
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 3f5317faf891f..de8047b6720f5 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
>   
>   		/* Check for bogus measurements. */
>   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> -		if (wd_nsec < (wdi >> 2)) {
> -			/* This usually indicates broken timer code or hardware. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> +			bool needwarn = false;
> +			u64 wd_lb;
> +
> +			cs->wd_bogus_count++;
> +			if (!cs->wd_bogus_shift) {
> +				needwarn = true;
> +			} else {
> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> +					needwarn = true;
> +			}
> +			if (needwarn) {
> +				/* This can happen on busy systems, which can delay the watchdog. */
> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
> +				cs->wd_last_bogus = wdnow;
> +				if (cs->wd_bogus_shift < 10)
> +					cs->wd_bogus_shift++;
> +				cs->wd_bogus_count_last = cs->wd_bogus_count;

I don't think you need to keep 2 counters to store the number of skipped 
warnings. How about just use a single wd_bogus_skip_count like

If (needwarn) {
     pr_warn("... %lu additonal ...", ... cs->wd_bogus_skip_count);
     cs->wd_bogus_skip_count = 0:
} else {
     cs->wd_bogus_skip_count++;
}

Cheers,
Longman



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements
  2022-11-03  2:23   ` Waiman Long
@ 2022-11-03 20:47     ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-03 20:47 UTC (permalink / raw)
  To: Waiman Long
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On Wed, Nov 02, 2022 at 10:23:25PM -0400, Waiman Long wrote:
> On 11/2/22 14:40, Paul E. McKenney wrote:
> > An extremely busy system can delay the clocksource watchdog, so that
> > the corresponding too-long bogus-measurement error does not necessarily
> > imply an error in the system.  However, a too-short bogus-measurement
> > error likely indicates a bug in hardware, firmware or software.
> > 
> > Therefore, add comments clarifying these bogus-measurement pr_warn()s.
> > 
> > Reported-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: John Stultz <jstultz@google.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Waiman Long <longman@redhat.com>
> > ---
> >   kernel/time/clocksource.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index dcaf38c062161..3f5317faf891f 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
> >   		/* Check for bogus measurements. */
> >   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> >   		if (wd_nsec < (wdi >> 2)) {
> > +			/* This usually indicates broken timer code or hardware. */
> >   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> >   			continue;
> >   		}
> >   		if (wd_nsec > (wdi << 2)) {
> > +			/* This can happen on busy systems, which can delay the watchdog. */
> >   			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> >   			continue;
> >   		}
> 
> Looks good.
> 
> Reviewed-by: Waiman Long <longman@redhat.com>

Applied, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-03  2:28   ` Waiman Long
@ 2022-11-03 20:49     ` Paul E. McKenney
  2022-11-04  0:20       ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-03 20:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On Wed, Nov 02, 2022 at 10:28:05PM -0400, Waiman Long wrote:
> On 11/2/22 14:40, Paul E. McKenney wrote:
> > The clocksource watchdog will reject measurements that are excessively
> > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> > watchdog interval.  On an extremely busy system, this can result in a
> > console message being printed every two seconds.  This is excessively
> > noisy for a non-error condition.
> > 
> > Therefore, apply exponential backoff to these messages.  This exponential
> > backoff is capped at 1024 times the watchdog interval, which comes to
> > not quite one message per ten minutes.
> > 
> > Please note that the bogus watchdog reads that occur when the watchdog
> > interval is less than 0.125 seconds are still printed unconditionally
> > because these likely correspond to a serious error condition in the
> > timer code or hardware.
> > 
> > [ paulmck: Apply Feng Tang feedback. ]
> > 
> > Reported-by: Waiman Long <longman@redhat.com>
> > Reported-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: John Stultz <jstultz@google.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Waiman Long <longman@redhat.com>
> > ---
> >   include/linux/clocksource.h |  4 ++++
> >   kernel/time/clocksource.c   | 31 +++++++++++++++++++++++++------
> >   2 files changed, 29 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 1d42d4b173271..daac05aedf56a 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -125,6 +125,10 @@ struct clocksource {
> >   	struct list_head	wd_list;
> >   	u64			cs_last;
> >   	u64			wd_last;
> > +	u64			wd_last_bogus;
> > +	int			wd_bogus_shift;
> > +	unsigned long		wd_bogus_count;
> > +	unsigned long		wd_bogus_count_last;
> >   #endif
> >   	struct module		*owner;
> >   };
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 3f5317faf891f..de8047b6720f5 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
> >   		/* Check for bogus measurements. */
> >   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > -		if (wd_nsec < (wdi >> 2)) {
> > -			/* This usually indicates broken timer code or hardware. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > +			bool needwarn = false;
> > +			u64 wd_lb;
> > +
> > +			cs->wd_bogus_count++;
> > +			if (!cs->wd_bogus_shift) {
> > +				needwarn = true;
> > +			} else {
> > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > +					needwarn = true;
> > +			}
> > +			if (needwarn) {
> > +				/* This can happen on busy systems, which can delay the watchdog. */
> > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
> > +				cs->wd_last_bogus = wdnow;
> > +				if (cs->wd_bogus_shift < 10)
> > +					cs->wd_bogus_shift++;
> > +				cs->wd_bogus_count_last = cs->wd_bogus_count;
> 
> I don't think you need to keep 2 counters to store the number of skipped
> warnings. How about just use a single wd_bogus_skip_count like
> 
> If (needwarn) {
>     pr_warn("... %lu additonal ...", ... cs->wd_bogus_skip_count);
>     cs->wd_bogus_skip_count = 0:
> } else {
>     cs->wd_bogus_skip_count++;
> }

Good point!  Perhaps I am too used to there being multiple entities taking
snapshots?  Anyway, does the updated version shown below look better?

							Thanx, Paul

------------------------------------------------------------------------

commit da44b8af99222ff8761a98ca8c00837a7d607d28
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri Oct 28 10:38:58 2022 -0700

    clocksource: Exponential backoff for load-induced bogus watchdog reads
    
    The clocksource watchdog will reject measurements that are excessively
    delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
    watchdog interval.  On an extremely busy system, this can result in a
    console message being printed every two seconds.  This is excessively
    noisy for a non-error condition.
    
    Therefore, apply exponential backoff to these messages.  This exponential
    backoff is capped at 1024 times the watchdog interval, which comes to
    not quite one message per ten minutes.
    
    Please note that the bogus watchdog reads that occur when the watchdog
    interval is less than 0.125 seconds are still printed unconditionally
    because these likely correspond to a serious error condition in the
    timer code or hardware.
    
    [ paulmck: Apply Feng Tang feedback. ]
    [ paulmck: Apply Waiman Long feedback. ]
    
    Reported-by: Waiman Long <longman@redhat.com>
    Reported-by: Feng Tang <feng.tang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: John Stultz <jstultz@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Stephen Boyd <sboyd@kernel.org>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Waiman Long <longman@redhat.com>

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 1d42d4b173271..23b73f2293d6d 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -125,6 +125,9 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
+	u64			wd_last_bogus;
+	int			wd_bogus_shift;
+	unsigned long		wd_bogus_count;
 #endif
 	struct module		*owner;
 };
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f5317faf891f..1eefb56505350 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
-		if (wd_nsec < (wdi >> 2)) {
-			/* This usually indicates broken timer code or hardware. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
+			bool needwarn = false;
+			u64 wd_lb;
+
+			cs->wd_bogus_count++;
+			if (!cs->wd_bogus_shift) {
+				needwarn = true;
+			} else {
+				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
+				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
+					needwarn = true;
+			}
+			if (needwarn) {
+				/* This can happen on busy systems, which can delay the watchdog. */
+				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
+				cs->wd_last_bogus = wdnow;
+				if (cs->wd_bogus_shift < 10)
+					cs->wd_bogus_shift++;
+				cs->wd_bogus_count = 0;
+			}
 			continue;
 		}
-		if (wd_nsec > (wdi << 2)) {
-			/* This can happen on busy systems, which can delay the watchdog. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		/* Check too-short measurements second to handle wrap. */
+		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
+			/* This usually indicates broken timer code or hardware. */
+			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-03 20:49     ` Paul E. McKenney
@ 2022-11-04  0:20       ` Waiman Long
  2022-11-04  0:26         ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2022-11-04  0:20 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On 11/3/22 16:49, Paul E. McKenney wrote:
> commit da44b8af99222ff8761a98ca8c00837a7d607d28
> Author: Paul E. McKenney<paulmck@kernel.org>
> Date:   Fri Oct 28 10:38:58 2022 -0700
>
>      clocksource: Exponential backoff for load-induced bogus watchdog reads
>      
>      The clocksource watchdog will reject measurements that are excessively
>      delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
>      watchdog interval.  On an extremely busy system, this can result in a
>      console message being printed every two seconds.  This is excessively
>      noisy for a non-error condition.
>      
>      Therefore, apply exponential backoff to these messages.  This exponential
>      backoff is capped at 1024 times the watchdog interval, which comes to
>      not quite one message per ten minutes.
>      
>      Please note that the bogus watchdog reads that occur when the watchdog
>      interval is less than 0.125 seconds are still printed unconditionally
>      because these likely correspond to a serious error condition in the
>      timer code or hardware.
>      
>      [ paulmck: Apply Feng Tang feedback. ]
>      [ paulmck: Apply Waiman Long feedback. ]
>      
>      Reported-by: Waiman Long<longman@redhat.com>
>      Reported-by: Feng Tang<feng.tang@intel.com>
>      Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
>      Cc: John Stultz<jstultz@google.com>
>      Cc: Thomas Gleixner<tglx@linutronix.de>
>      Cc: Stephen Boyd<sboyd@kernel.org>
>      Cc: Feng Tang<feng.tang@intel.com>
>      Cc: Waiman Long<longman@redhat.com>
>
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 1d42d4b173271..23b73f2293d6d 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -125,6 +125,9 @@ struct clocksource {
>   	struct list_head	wd_list;
>   	u64			cs_last;
>   	u64			wd_last;
> +	u64			wd_last_bogus;
> +	int			wd_bogus_shift;
> +	unsigned long		wd_bogus_count;
>   #endif
>   	struct module		*owner;
>   };
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 3f5317faf891f..1eefb56505350 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
>   
>   		/* Check for bogus measurements. */
>   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> -		if (wd_nsec < (wdi >> 2)) {
> -			/* This usually indicates broken timer code or hardware. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> +			bool needwarn = false;
> +			u64 wd_lb;
> +
> +			cs->wd_bogus_count++;
> +			if (!cs->wd_bogus_shift) {
> +				needwarn = true;
> +			} else {
> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> +					needwarn = true;
> +			}
> +			if (needwarn) {
> +				/* This can happen on busy systems, which can delay the watchdog. */
> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);

Just one question, does "%lu additional" means the number of bogus count 
that doesn't meet the needwarn requirement and hence skipped. If so, I 
think you have to use "cs->wd_bogus_cnt - 1". Other than that, the 
change looks good to me.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-04  0:20       ` Waiman Long
@ 2022-11-04  0:26         ` Paul E. McKenney
  2022-11-04  1:01           ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-04  0:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
> On 11/3/22 16:49, Paul E. McKenney wrote:
> > commit da44b8af99222ff8761a98ca8c00837a7d607d28
> > Author: Paul E. McKenney<paulmck@kernel.org>
> > Date:   Fri Oct 28 10:38:58 2022 -0700
> > 
> >      clocksource: Exponential backoff for load-induced bogus watchdog reads
> >      The clocksource watchdog will reject measurements that are excessively
> >      delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> >      watchdog interval.  On an extremely busy system, this can result in a
> >      console message being printed every two seconds.  This is excessively
> >      noisy for a non-error condition.
> >      Therefore, apply exponential backoff to these messages.  This exponential
> >      backoff is capped at 1024 times the watchdog interval, which comes to
> >      not quite one message per ten minutes.
> >      Please note that the bogus watchdog reads that occur when the watchdog
> >      interval is less than 0.125 seconds are still printed unconditionally
> >      because these likely correspond to a serious error condition in the
> >      timer code or hardware.
> >      [ paulmck: Apply Feng Tang feedback. ]
> >      [ paulmck: Apply Waiman Long feedback. ]
> >      Reported-by: Waiman Long<longman@redhat.com>
> >      Reported-by: Feng Tang<feng.tang@intel.com>
> >      Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
> >      Cc: John Stultz<jstultz@google.com>
> >      Cc: Thomas Gleixner<tglx@linutronix.de>
> >      Cc: Stephen Boyd<sboyd@kernel.org>
> >      Cc: Feng Tang<feng.tang@intel.com>
> >      Cc: Waiman Long<longman@redhat.com>
> > 
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 1d42d4b173271..23b73f2293d6d 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -125,6 +125,9 @@ struct clocksource {
> >   	struct list_head	wd_list;
> >   	u64			cs_last;
> >   	u64			wd_last;
> > +	u64			wd_last_bogus;
> > +	int			wd_bogus_shift;
> > +	unsigned long		wd_bogus_count;
> >   #endif
> >   	struct module		*owner;
> >   };
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 3f5317faf891f..1eefb56505350 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
> >   		/* Check for bogus measurements. */
> >   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > -		if (wd_nsec < (wdi >> 2)) {
> > -			/* This usually indicates broken timer code or hardware. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > +			bool needwarn = false;
> > +			u64 wd_lb;
> > +
> > +			cs->wd_bogus_count++;
> > +			if (!cs->wd_bogus_shift) {
> > +				needwarn = true;
> > +			} else {
> > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > +					needwarn = true;
> > +			}
> > +			if (needwarn) {
> > +				/* This can happen on busy systems, which can delay the watchdog. */
> > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
> 
> Just one question, does "%lu additional" means the number of bogus count
> that doesn't meet the needwarn requirement and hence skipped. If so, I think
> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
> good to me.

It means the number since the last report, or, for the first report,
the number since boot.

Does that work for you?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-04  0:26         ` Paul E. McKenney
@ 2022-11-04  1:01           ` Waiman Long
  2022-11-04  2:23             ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2022-11-04  1:01 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On 11/3/22 20:26, Paul E. McKenney wrote:
> On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
>> On 11/3/22 16:49, Paul E. McKenney wrote:
>>> commit da44b8af99222ff8761a98ca8c00837a7d607d28
>>> Author: Paul E. McKenney<paulmck@kernel.org>
>>> Date:   Fri Oct 28 10:38:58 2022 -0700
>>>
>>>       clocksource: Exponential backoff for load-induced bogus watchdog reads
>>>       The clocksource watchdog will reject measurements that are excessively
>>>       delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
>>>       watchdog interval.  On an extremely busy system, this can result in a
>>>       console message being printed every two seconds.  This is excessively
>>>       noisy for a non-error condition.
>>>       Therefore, apply exponential backoff to these messages.  This exponential
>>>       backoff is capped at 1024 times the watchdog interval, which comes to
>>>       not quite one message per ten minutes.
>>>       Please note that the bogus watchdog reads that occur when the watchdog
>>>       interval is less than 0.125 seconds are still printed unconditionally
>>>       because these likely correspond to a serious error condition in the
>>>       timer code or hardware.
>>>       [ paulmck: Apply Feng Tang feedback. ]
>>>       [ paulmck: Apply Waiman Long feedback. ]
>>>       Reported-by: Waiman Long<longman@redhat.com>
>>>       Reported-by: Feng Tang<feng.tang@intel.com>
>>>       Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
>>>       Cc: John Stultz<jstultz@google.com>
>>>       Cc: Thomas Gleixner<tglx@linutronix.de>
>>>       Cc: Stephen Boyd<sboyd@kernel.org>
>>>       Cc: Feng Tang<feng.tang@intel.com>
>>>       Cc: Waiman Long<longman@redhat.com>
>>>
>>> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
>>> index 1d42d4b173271..23b73f2293d6d 100644
>>> --- a/include/linux/clocksource.h
>>> +++ b/include/linux/clocksource.h
>>> @@ -125,6 +125,9 @@ struct clocksource {
>>>    	struct list_head	wd_list;
>>>    	u64			cs_last;
>>>    	u64			wd_last;
>>> +	u64			wd_last_bogus;
>>> +	int			wd_bogus_shift;
>>> +	unsigned long		wd_bogus_count;
>>>    #endif
>>>    	struct module		*owner;
>>>    };
>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>> index 3f5317faf891f..1eefb56505350 100644
>>> --- a/kernel/time/clocksource.c
>>> +++ b/kernel/time/clocksource.c
>>> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
>>>    		/* Check for bogus measurements. */
>>>    		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
>>> -		if (wd_nsec < (wdi >> 2)) {
>>> -			/* This usually indicates broken timer code or hardware. */
>>> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>>> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
>>> +			bool needwarn = false;
>>> +			u64 wd_lb;
>>> +
>>> +			cs->wd_bogus_count++;
>>> +			if (!cs->wd_bogus_shift) {
>>> +				needwarn = true;
>>> +			} else {
>>> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
>>> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
>>> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
>>> +					needwarn = true;
>>> +			}
>>> +			if (needwarn) {
>>> +				/* This can happen on busy systems, which can delay the watchdog. */
>>> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
>> Just one question, does "%lu additional" means the number of bogus count
>> that doesn't meet the needwarn requirement and hence skipped. If so, I think
>> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
>> good to me.
> It means the number since the last report, or, for the first report,
> the number since boot.
>
> Does that work for you?

OK, I think the word "additional" tricks me into thinking about extra 
bogus messages in additional to the current one. Using another word like 
"total" may be less confusing.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements
  2022-11-02 18:40 ` [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements Paul E. McKenney
  2022-11-03  2:23   ` Waiman Long
@ 2022-11-04  2:13   ` Feng Tang
  1 sibling, 0 replies; 18+ messages in thread
From: Feng Tang @ 2022-11-04  2:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, John Stultz,
	Waiman Long

On Wed, Nov 02, 2022 at 11:40:08AM -0700, Paul E. McKenney wrote:
> An extremely busy system can delay the clocksource watchdog, so that
> the corresponding too-long bogus-measurement error does not necessarily
> imply an error in the system.  However, a too-short bogus-measurement
> error likely indicates a bug in hardware, firmware or software.
> 
> Therefore, add comments clarifying these bogus-measurement pr_warn()s.

Looks good to me.

Reviewed-by: Feng Tang <feng.tang@intel.com>

> 
> Reported-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
> ---
>  kernel/time/clocksource.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index dcaf38c062161..3f5317faf891f 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -443,10 +443,12 @@ static void clocksource_watchdog(struct timer_list *unused)
>  		/* Check for bogus measurements. */
>  		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
>  		if (wd_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
>  			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>  			continue;
>  		}
>  		if (wd_nsec > (wdi << 2)) {
> +			/* This can happen on busy systems, which can delay the watchdog. */
>  			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>  			continue;
>  		}
> -- 
> 2.31.1.189.g2e36527f23
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-02 18:40 ` [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney
  2022-11-03  2:28   ` Waiman Long
@ 2022-11-04  2:16   ` Feng Tang
  2022-11-04  2:25     ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Feng Tang @ 2022-11-04  2:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Waiman Long,
	John Stultz

On Wed, Nov 02, 2022 at 11:40:09AM -0700, Paul E. McKenney wrote:
> The clocksource watchdog will reject measurements that are excessively
> delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> watchdog interval.  On an extremely busy system, this can result in a
> console message being printed every two seconds.  This is excessively
> noisy for a non-error condition.
> 
> Therefore, apply exponential backoff to these messages.  This exponential
> backoff is capped at 1024 times the watchdog interval, which comes to
> not quite one message per ten minutes.
> 
> Please note that the bogus watchdog reads that occur when the watchdog
> interval is less than 0.125 seconds are still printed unconditionally
> because these likely correspond to a serious error condition in the
> timer code or hardware.

I saw there is ongoing discussion about some wording, overall it
looks good to me, thanks!

Reviewed-by: Feng Tang <feng.tang@intel.com>

> [ paulmck: Apply Feng Tang feedback. ]
> 
> Reported-by: Waiman Long <longman@redhat.com>
> Reported-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
> ---
>  include/linux/clocksource.h |  4 ++++
>  kernel/time/clocksource.c   | 31 +++++++++++++++++++++++++------
>  2 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 1d42d4b173271..daac05aedf56a 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -125,6 +125,10 @@ struct clocksource {
>  	struct list_head	wd_list;
>  	u64			cs_last;
>  	u64			wd_last;
> +	u64			wd_last_bogus;
> +	int			wd_bogus_shift;
> +	unsigned long		wd_bogus_count;
> +	unsigned long		wd_bogus_count_last;
>  #endif
>  	struct module		*owner;
>  };
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 3f5317faf891f..de8047b6720f5 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
>  
>  		/* Check for bogus measurements. */
>  		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> -		if (wd_nsec < (wdi >> 2)) {
> -			/* This usually indicates broken timer code or hardware. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> +			bool needwarn = false;
> +			u64 wd_lb;
> +
> +			cs->wd_bogus_count++;
> +			if (!cs->wd_bogus_shift) {
> +				needwarn = true;
> +			} else {
> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> +					needwarn = true;
> +			}
> +			if (needwarn) {
> +				/* This can happen on busy systems, which can delay the watchdog. */
> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
> +				cs->wd_last_bogus = wdnow;
> +				if (cs->wd_bogus_shift < 10)
> +					cs->wd_bogus_shift++;
> +				cs->wd_bogus_count_last = cs->wd_bogus_count;
> +			}
>  			continue;
>  		}
> -		if (wd_nsec > (wdi << 2)) {
> -			/* This can happen on busy systems, which can delay the watchdog. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +		/* Check too-short measurements second to handle wrap. */
> +		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
> +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>  			continue;
>  		}
>  
> -- 
> 2.31.1.189.g2e36527f23
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-04  1:01           ` Waiman Long
@ 2022-11-04  2:23             ` Paul E. McKenney
  2022-11-04 13:55               ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-04  2:23 UTC (permalink / raw)
  To: Waiman Long
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote:
> On 11/3/22 20:26, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
> > > On 11/3/22 16:49, Paul E. McKenney wrote:
> > > > commit da44b8af99222ff8761a98ca8c00837a7d607d28
> > > > Author: Paul E. McKenney<paulmck@kernel.org>
> > > > Date:   Fri Oct 28 10:38:58 2022 -0700
> > > > 
> > > >       clocksource: Exponential backoff for load-induced bogus watchdog reads
> > > >       The clocksource watchdog will reject measurements that are excessively
> > > >       delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> > > >       watchdog interval.  On an extremely busy system, this can result in a
> > > >       console message being printed every two seconds.  This is excessively
> > > >       noisy for a non-error condition.
> > > >       Therefore, apply exponential backoff to these messages.  This exponential
> > > >       backoff is capped at 1024 times the watchdog interval, which comes to
> > > >       not quite one message per ten minutes.
> > > >       Please note that the bogus watchdog reads that occur when the watchdog
> > > >       interval is less than 0.125 seconds are still printed unconditionally
> > > >       because these likely correspond to a serious error condition in the
> > > >       timer code or hardware.
> > > >       [ paulmck: Apply Feng Tang feedback. ]
> > > >       [ paulmck: Apply Waiman Long feedback. ]
> > > >       Reported-by: Waiman Long<longman@redhat.com>
> > > >       Reported-by: Feng Tang<feng.tang@intel.com>
> > > >       Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
> > > >       Cc: John Stultz<jstultz@google.com>
> > > >       Cc: Thomas Gleixner<tglx@linutronix.de>
> > > >       Cc: Stephen Boyd<sboyd@kernel.org>
> > > >       Cc: Feng Tang<feng.tang@intel.com>
> > > >       Cc: Waiman Long<longman@redhat.com>
> > > > 
> > > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > > > index 1d42d4b173271..23b73f2293d6d 100644
> > > > --- a/include/linux/clocksource.h
> > > > +++ b/include/linux/clocksource.h
> > > > @@ -125,6 +125,9 @@ struct clocksource {
> > > >    	struct list_head	wd_list;
> > > >    	u64			cs_last;
> > > >    	u64			wd_last;
> > > > +	u64			wd_last_bogus;
> > > > +	int			wd_bogus_shift;
> > > > +	unsigned long		wd_bogus_count;
> > > >    #endif
> > > >    	struct module		*owner;
> > > >    };
> > > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > > > index 3f5317faf891f..1eefb56505350 100644
> > > > --- a/kernel/time/clocksource.c
> > > > +++ b/kernel/time/clocksource.c
> > > > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
> > > >    		/* Check for bogus measurements. */
> > > >    		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > > > -		if (wd_nsec < (wdi >> 2)) {
> > > > -			/* This usually indicates broken timer code or hardware. */
> > > > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > > > +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > > > +			bool needwarn = false;
> > > > +			u64 wd_lb;
> > > > +
> > > > +			cs->wd_bogus_count++;
> > > > +			if (!cs->wd_bogus_shift) {
> > > > +				needwarn = true;
> > > > +			} else {
> > > > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > > > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > > > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > > > +					needwarn = true;
> > > > +			}
> > > > +			if (needwarn) {
> > > > +				/* This can happen on busy systems, which can delay the watchdog. */
> > > > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
> > > Just one question, does "%lu additional" means the number of bogus count
> > > that doesn't meet the needwarn requirement and hence skipped. If so, I think
> > > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
> > > good to me.
> > It means the number since the last report, or, for the first report,
> > the number since boot.
> > 
> > Does that work for you?
> 
> OK, I think the word "additional" tricks me into thinking about extra bogus
> messages in additional to the current one. Using another word like "total"
> may be less confusing.

My concern with "total" is that people might think that the numbers
meant the total number of instances since boot.

So how about "(9 since last message)" or similar?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-04  2:16   ` Feng Tang
@ 2022-11-04  2:25     ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-04  2:25 UTC (permalink / raw)
  To: Feng Tang
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Waiman Long,
	John Stultz

On Fri, Nov 04, 2022 at 10:16:53AM +0800, Feng Tang wrote:
> On Wed, Nov 02, 2022 at 11:40:09AM -0700, Paul E. McKenney wrote:
> > The clocksource watchdog will reject measurements that are excessively
> > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> > watchdog interval.  On an extremely busy system, this can result in a
> > console message being printed every two seconds.  This is excessively
> > noisy for a non-error condition.
> > 
> > Therefore, apply exponential backoff to these messages.  This exponential
> > backoff is capped at 1024 times the watchdog interval, which comes to
> > not quite one message per ten minutes.
> > 
> > Please note that the bogus watchdog reads that occur when the watchdog
> > interval is less than 0.125 seconds are still printed unconditionally
> > because these likely correspond to a serious error condition in the
> > timer code or hardware.
> 
> I saw there is ongoing discussion about some wording, overall it
> looks good to me, thanks!
> 
> Reviewed-by: Feng Tang <feng.tang@intel.com>

Thank you!  I will apply both of your Reviewed-by tags on the next rebase,
by which time hopefully Waiman and I come to agreement on wording.  ;-)

						Thanx, Paul

> > [ paulmck: Apply Feng Tang feedback. ]
> > 
> > Reported-by: Waiman Long <longman@redhat.com>
> > Reported-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: John Stultz <jstultz@google.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Waiman Long <longman@redhat.com>
> > ---
> >  include/linux/clocksource.h |  4 ++++
> >  kernel/time/clocksource.c   | 31 +++++++++++++++++++++++++------
> >  2 files changed, 29 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 1d42d4b173271..daac05aedf56a 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -125,6 +125,10 @@ struct clocksource {
> >  	struct list_head	wd_list;
> >  	u64			cs_last;
> >  	u64			wd_last;
> > +	u64			wd_last_bogus;
> > +	int			wd_bogus_shift;
> > +	unsigned long		wd_bogus_count;
> > +	unsigned long		wd_bogus_count_last;
> >  #endif
> >  	struct module		*owner;
> >  };
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 3f5317faf891f..de8047b6720f5 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
> >  
> >  		/* Check for bogus measurements. */
> >  		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > -		if (wd_nsec < (wdi >> 2)) {
> > -			/* This usually indicates broken timer code or hardware. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > +			bool needwarn = false;
> > +			u64 wd_lb;
> > +
> > +			cs->wd_bogus_count++;
> > +			if (!cs->wd_bogus_shift) {
> > +				needwarn = true;
> > +			} else {
> > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > +					needwarn = true;
> > +			}
> > +			if (needwarn) {
> > +				/* This can happen on busy systems, which can delay the watchdog. */
> > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
> > +				cs->wd_last_bogus = wdnow;
> > +				if (cs->wd_bogus_shift < 10)
> > +					cs->wd_bogus_shift++;
> > +				cs->wd_bogus_count_last = cs->wd_bogus_count;
> > +			}
> >  			continue;
> >  		}
> > -		if (wd_nsec > (wdi << 2)) {
> > -			/* This can happen on busy systems, which can delay the watchdog. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		/* Check too-short measurements second to handle wrap. */
> > +		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
> > +			/* This usually indicates broken timer code or hardware. */
> > +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> >  			continue;
> >  		}
> >  
> > -- 
> > 2.31.1.189.g2e36527f23
> > 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-04  2:23             ` Paul E. McKenney
@ 2022-11-04 13:55               ` Waiman Long
  2022-11-05  2:38                 ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2022-11-04 13:55 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On 11/3/22 22:23, Paul E. McKenney wrote:
> On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote:
>> On 11/3/22 20:26, Paul E. McKenney wrote:
>>> On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
>>>> On 11/3/22 16:49, Paul E. McKenney wrote:
>>>>> commit da44b8af99222ff8761a98ca8c00837a7d607d28
>>>>> Author: Paul E. McKenney<paulmck@kernel.org>
>>>>> Date:   Fri Oct 28 10:38:58 2022 -0700
>>>>>
>>>>>        clocksource: Exponential backoff for load-induced bogus watchdog reads
>>>>>        The clocksource watchdog will reject measurements that are excessively
>>>>>        delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
>>>>>        watchdog interval.  On an extremely busy system, this can result in a
>>>>>        console message being printed every two seconds.  This is excessively
>>>>>        noisy for a non-error condition.
>>>>>        Therefore, apply exponential backoff to these messages.  This exponential
>>>>>        backoff is capped at 1024 times the watchdog interval, which comes to
>>>>>        not quite one message per ten minutes.
>>>>>        Please note that the bogus watchdog reads that occur when the watchdog
>>>>>        interval is less than 0.125 seconds are still printed unconditionally
>>>>>        because these likely correspond to a serious error condition in the
>>>>>        timer code or hardware.
>>>>>        [ paulmck: Apply Feng Tang feedback. ]
>>>>>        [ paulmck: Apply Waiman Long feedback. ]
>>>>>        Reported-by: Waiman Long<longman@redhat.com>
>>>>>        Reported-by: Feng Tang<feng.tang@intel.com>
>>>>>        Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
>>>>>        Cc: John Stultz<jstultz@google.com>
>>>>>        Cc: Thomas Gleixner<tglx@linutronix.de>
>>>>>        Cc: Stephen Boyd<sboyd@kernel.org>
>>>>>        Cc: Feng Tang<feng.tang@intel.com>
>>>>>        Cc: Waiman Long<longman@redhat.com>
>>>>>
>>>>> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
>>>>> index 1d42d4b173271..23b73f2293d6d 100644
>>>>> --- a/include/linux/clocksource.h
>>>>> +++ b/include/linux/clocksource.h
>>>>> @@ -125,6 +125,9 @@ struct clocksource {
>>>>>     	struct list_head	wd_list;
>>>>>     	u64			cs_last;
>>>>>     	u64			wd_last;
>>>>> +	u64			wd_last_bogus;
>>>>> +	int			wd_bogus_shift;
>>>>> +	unsigned long		wd_bogus_count;
>>>>>     #endif
>>>>>     	struct module		*owner;
>>>>>     };
>>>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>>>> index 3f5317faf891f..1eefb56505350 100644
>>>>> --- a/kernel/time/clocksource.c
>>>>> +++ b/kernel/time/clocksource.c
>>>>> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
>>>>>     		/* Check for bogus measurements. */
>>>>>     		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
>>>>> -		if (wd_nsec < (wdi >> 2)) {
>>>>> -			/* This usually indicates broken timer code or hardware. */
>>>>> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>>>>> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
>>>>> +			bool needwarn = false;
>>>>> +			u64 wd_lb;
>>>>> +
>>>>> +			cs->wd_bogus_count++;
>>>>> +			if (!cs->wd_bogus_shift) {
>>>>> +				needwarn = true;
>>>>> +			} else {
>>>>> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
>>>>> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
>>>>> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
>>>>> +					needwarn = true;
>>>>> +			}
>>>>> +			if (needwarn) {
>>>>> +				/* This can happen on busy systems, which can delay the watchdog. */
>>>>> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
>>>> Just one question, does "%lu additional" means the number of bogus count
>>>> that doesn't meet the needwarn requirement and hence skipped. If so, I think
>>>> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
>>>> good to me.
>>> It means the number since the last report, or, for the first report,
>>> the number since boot.
>>>
>>> Does that work for you?
>> OK, I think the word "additional" tricks me into thinking about extra bogus
>> messages in additional to the current one. Using another word like "total"
>> may be less confusing.
> My concern with "total" is that people might think that the numbers
> meant the total number of instances since boot.
>
> So how about "(9 since last message)" or similar?
>
> 							Thanx, Paul

Yes, that looks good to me.

Thanks!
Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-04 13:55               ` Waiman Long
@ 2022-11-05  2:38                 ` Paul E. McKenney
  2022-11-05  2:49                   ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-05  2:38 UTC (permalink / raw)
  To: Waiman Long
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On Fri, Nov 04, 2022 at 09:55:02AM -0400, Waiman Long wrote:
> On 11/3/22 22:23, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote:
> > > On 11/3/22 20:26, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
> > > > > On 11/3/22 16:49, Paul E. McKenney wrote:

[ . . . ]

> > > > > > +					needwarn = true;
> > > > > > +			}
> > > > > > +			if (needwarn) {
> > > > > > +				/* This can happen on busy systems, which can delay the watchdog. */
> > > > > > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
> > > > > Just one question, does "%lu additional" means the number of bogus count
> > > > > that doesn't meet the needwarn requirement and hence skipped. If so, I think
> > > > > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
> > > > > good to me.
> > > > It means the number since the last report, or, for the first report,
> > > > the number since boot.
> > > > 
> > > > Does that work for you?
> > > OK, I think the word "additional" tricks me into thinking about extra bogus
> > > messages in additional to the current one. Using another word like "total"
> > > may be less confusing.
> > My concern with "total" is that people might think that the numbers
> > meant the total number of instances since boot.
> > 
> > So how about "(9 since last message)" or similar?
> > 
> > 							Thanx, Paul
> 
> Yes, that looks good to me.

Thank you, and please see below for the updated patch.

							Thanx, Paul

------------------------------------------------------------------------

commit a7dc308e8359eafb58df360e06b66ecbf79a4d0b
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri Oct 28 10:38:58 2022 -0700

    clocksource: Exponential backoff for load-induced bogus watchdog reads
    
    The clocksource watchdog will reject measurements that are excessively
    delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
    watchdog interval.  On an extremely busy system, this can result in a
    console message being printed every two seconds.  This is excessively
    noisy for a non-error condition.
    
    Therefore, apply exponential backoff to these messages.  This exponential
    backoff is capped at 1024 times the watchdog interval, which comes to
    not quite one message per ten minutes.
    
    Please note that the bogus watchdog reads that occur when the watchdog
    interval is less than 0.125 seconds are still printed unconditionally
    because these likely correspond to a serious error condition in the
    timer code or hardware.
    
    [ paulmck: Apply Feng Tang feedback. ]
    [ paulmck: Apply Waiman Long feedback. ]
    
    Reported-by: Waiman Long <longman@redhat.com>
    Reported-by: Feng Tang <feng.tang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Feng Tang <feng.tang@intel.com>
    Cc: John Stultz <jstultz@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Stephen Boyd <sboyd@kernel.org>
    Cc: Waiman Long <longman@redhat.com>

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 1d42d4b173271..23b73f2293d6d 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -125,6 +125,9 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
+	u64			wd_last_bogus;
+	int			wd_bogus_shift;
+	unsigned long		wd_bogus_count;
 #endif
 	struct module		*owner;
 };
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f5317faf891f..4015ec6503a52 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
-		if (wd_nsec < (wdi >> 2)) {
-			/* This usually indicates broken timer code or hardware. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
+			bool needwarn = false;
+			u64 wd_lb;
+
+			cs->wd_bogus_count++;
+			if (!cs->wd_bogus_shift) {
+				needwarn = true;
+			} else {
+				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
+				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
+					needwarn = true;
+			}
+			if (needwarn) {
+				/* This can happen on busy systems, which can delay the watchdog. */
+				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu since last message), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
+				cs->wd_last_bogus = wdnow;
+				if (cs->wd_bogus_shift < 10)
+					cs->wd_bogus_shift++;
+				cs->wd_bogus_count = 0;
+			}
 			continue;
 		}
-		if (wd_nsec > (wdi << 2)) {
-			/* This can happen on busy systems, which can delay the watchdog. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		/* Check too-short measurements second to handle wrap. */
+		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
+			/* This usually indicates broken timer code or hardware. */
+			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-05  2:38                 ` Paul E. McKenney
@ 2022-11-05  2:49                   ` Waiman Long
  2022-11-05 19:31                     ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2022-11-05  2:49 UTC (permalink / raw)
  To: paulmck
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz


On 11/4/22 22:38, Paul E. McKenney wrote:
> On Fri, Nov 04, 2022 at 09:55:02AM -0400, Waiman Long wrote:
>> On 11/3/22 22:23, Paul E. McKenney wrote:
>>> On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote:
>>>> On 11/3/22 20:26, Paul E. McKenney wrote:
>>>>> On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
>>>>>> On 11/3/22 16:49, Paul E. McKenney wrote:
> [ . . . ]
>
>>>>>>> +					needwarn = true;
>>>>>>> +			}
>>>>>>> +			if (needwarn) {
>>>>>>> +				/* This can happen on busy systems, which can delay the watchdog. */
>>>>>>> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
>>>>>> Just one question, does "%lu additional" means the number of bogus count
>>>>>> that doesn't meet the needwarn requirement and hence skipped. If so, I think
>>>>>> you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
>>>>>> good to me.
>>>>> It means the number since the last report, or, for the first report,
>>>>> the number since boot.
>>>>>
>>>>> Does that work for you?
>>>> OK, I think the word "additional" tricks me into thinking about extra bogus
>>>> messages in additional to the current one. Using another word like "total"
>>>> may be less confusing.
>>> My concern with "total" is that people might think that the numbers
>>> meant the total number of instances since boot.
>>>
>>> So how about "(9 since last message)" or similar?
>>>
>>> 							Thanx, Paul
>> Yes, that looks good to me.
> Thank you, and please see below for the updated patch.
>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit a7dc308e8359eafb58df360e06b66ecbf79a4d0b
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Fri Oct 28 10:38:58 2022 -0700
>
>      clocksource: Exponential backoff for load-induced bogus watchdog reads
>      
>      The clocksource watchdog will reject measurements that are excessively
>      delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
>      watchdog interval.  On an extremely busy system, this can result in a
>      console message being printed every two seconds.  This is excessively
>      noisy for a non-error condition.
>      
>      Therefore, apply exponential backoff to these messages.  This exponential
>      backoff is capped at 1024 times the watchdog interval, which comes to
>      not quite one message per ten minutes.
>      
>      Please note that the bogus watchdog reads that occur when the watchdog
>      interval is less than 0.125 seconds are still printed unconditionally
>      because these likely correspond to a serious error condition in the
>      timer code or hardware.
>      
>      [ paulmck: Apply Feng Tang feedback. ]
>      [ paulmck: Apply Waiman Long feedback. ]
>      
>      Reported-by: Waiman Long <longman@redhat.com>
>      Reported-by: Feng Tang <feng.tang@intel.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>      Reviewed-by: Feng Tang <feng.tang@intel.com>
>      Cc: John Stultz <jstultz@google.com>
>      Cc: Thomas Gleixner <tglx@linutronix.de>
>      Cc: Stephen Boyd <sboyd@kernel.org>
>      Cc: Waiman Long <longman@redhat.com>
>
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 1d42d4b173271..23b73f2293d6d 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -125,6 +125,9 @@ struct clocksource {
>   	struct list_head	wd_list;
>   	u64			cs_last;
>   	u64			wd_last;
> +	u64			wd_last_bogus;
> +	int			wd_bogus_shift;
> +	unsigned long		wd_bogus_count;
>   #endif
>   	struct module		*owner;
>   };
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 3f5317faf891f..4015ec6503a52 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
>   
>   		/* Check for bogus measurements. */
>   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> -		if (wd_nsec < (wdi >> 2)) {
> -			/* This usually indicates broken timer code or hardware. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> +			bool needwarn = false;
> +			u64 wd_lb;
> +
> +			cs->wd_bogus_count++;
> +			if (!cs->wd_bogus_shift) {
> +				needwarn = true;
> +			} else {
> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> +					needwarn = true;
> +			}
> +			if (needwarn) {
> +				/* This can happen on busy systems, which can delay the watchdog. */
> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu since last message), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
> +				cs->wd_last_bogus = wdnow;
> +				if (cs->wd_bogus_shift < 10)
> +					cs->wd_bogus_shift++;
> +				cs->wd_bogus_count = 0;
> +			}
>   			continue;
>   		}
> -		if (wd_nsec > (wdi << 2)) {
> -			/* This can happen on busy systems, which can delay the watchdog. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +		/* Check too-short measurements second to handle wrap. */
> +		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
> +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
>   			continue;
>   		}
>   
>
Looks good to me.

Reviewed-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads
  2022-11-05  2:49                   ` Waiman Long
@ 2022-11-05 19:31                     ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-05 19:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	John Stultz

On Fri, Nov 04, 2022 at 10:49:13PM -0400, Waiman Long wrote:
> 
> On 11/4/22 22:38, Paul E. McKenney wrote:
> > On Fri, Nov 04, 2022 at 09:55:02AM -0400, Waiman Long wrote:
> > > On 11/3/22 22:23, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2022 at 09:01:45PM -0400, Waiman Long wrote:
> > > > > On 11/3/22 20:26, Paul E. McKenney wrote:
> > > > > > On Thu, Nov 03, 2022 at 08:20:27PM -0400, Waiman Long wrote:
> > > > > > > On 11/3/22 16:49, Paul E. McKenney wrote:
> > [ . . . ]
> > 
> > > > > > > > +					needwarn = true;
> > > > > > > > +			}
> > > > > > > > +			if (needwarn) {
> > > > > > > > +				/* This can happen on busy systems, which can delay the watchdog. */
> > > > > > > > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
> > > > > > > Just one question, does "%lu additional" means the number of bogus count
> > > > > > > that doesn't meet the needwarn requirement and hence skipped. If so, I think
> > > > > > > you have to use "cs->wd_bogus_cnt - 1". Other than that, the change looks
> > > > > > > good to me.
> > > > > > It means the number since the last report, or, for the first report,
> > > > > > the number since boot.
> > > > > > 
> > > > > > Does that work for you?
> > > > > OK, I think the word "additional" tricks me into thinking about extra bogus
> > > > > messages in additional to the current one. Using another word like "total"
> > > > > may be less confusing.
> > > > My concern with "total" is that people might think that the numbers
> > > > meant the total number of instances since boot.
> > > > 
> > > > So how about "(9 since last message)" or similar?
> > > > 
> > > > 							Thanx, Paul
> > > Yes, that looks good to me.
> > Thank you, and please see below for the updated patch.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit a7dc308e8359eafb58df360e06b66ecbf79a4d0b
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Fri Oct 28 10:38:58 2022 -0700
> > 
> >      clocksource: Exponential backoff for load-induced bogus watchdog reads
> >      The clocksource watchdog will reject measurements that are excessively
> >      delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> >      watchdog interval.  On an extremely busy system, this can result in a
> >      console message being printed every two seconds.  This is excessively
> >      noisy for a non-error condition.
> >      Therefore, apply exponential backoff to these messages.  This exponential
> >      backoff is capped at 1024 times the watchdog interval, which comes to
> >      not quite one message per ten minutes.
> >      Please note that the bogus watchdog reads that occur when the watchdog
> >      interval is less than 0.125 seconds are still printed unconditionally
> >      because these likely correspond to a serious error condition in the
> >      timer code or hardware.
> >      [ paulmck: Apply Feng Tang feedback. ]
> >      [ paulmck: Apply Waiman Long feedback. ]
> >      Reported-by: Waiman Long <longman@redhat.com>
> >      Reported-by: Feng Tang <feng.tang@intel.com>
> >      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >      Reviewed-by: Feng Tang <feng.tang@intel.com>
> >      Cc: John Stultz <jstultz@google.com>
> >      Cc: Thomas Gleixner <tglx@linutronix.de>
> >      Cc: Stephen Boyd <sboyd@kernel.org>
> >      Cc: Waiman Long <longman@redhat.com>
> > 
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 1d42d4b173271..23b73f2293d6d 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -125,6 +125,9 @@ struct clocksource {
> >   	struct list_head	wd_list;
> >   	u64			cs_last;
> >   	u64			wd_last;
> > +	u64			wd_last_bogus;
> > +	int			wd_bogus_shift;
> > +	unsigned long		wd_bogus_count;
> >   #endif
> >   	struct module		*owner;
> >   };
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 3f5317faf891f..4015ec6503a52 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
> >   		/* Check for bogus measurements. */
> >   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > -		if (wd_nsec < (wdi >> 2)) {
> > -			/* This usually indicates broken timer code or hardware. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > +			bool needwarn = false;
> > +			u64 wd_lb;
> > +
> > +			cs->wd_bogus_count++;
> > +			if (!cs->wd_bogus_shift) {
> > +				needwarn = true;
> > +			} else {
> > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > +					needwarn = true;
> > +			}
> > +			if (needwarn) {
> > +				/* This can happen on busy systems, which can delay the watchdog. */
> > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu since last message), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
> > +				cs->wd_last_bogus = wdnow;
> > +				if (cs->wd_bogus_shift < 10)
> > +					cs->wd_bogus_shift++;
> > +				cs->wd_bogus_count = 0;
> > +			}
> >   			continue;
> >   		}
> > -		if (wd_nsec > (wdi << 2)) {
> > -			/* This can happen on busy systems, which can delay the watchdog. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		/* Check too-short measurements second to handle wrap. */
> > +		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
> > +			/* This usually indicates broken timer code or hardware. */
> > +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> >   			continue;
> >   		}
> > 
> Looks good to me.
> 
> Reviewed-by: Waiman Long <longman@redhat.com>

Thank you!  I will apply this on my next rebase.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-11-05 19:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-02 18:40 [PATCH clocksource 0/2] Clocksource-watchdog classification and backoff Paul E. McKenney
2022-11-02 18:40 ` [PATCH clocksource 1/2] clocksource: Add comments to classify bogus measurements Paul E. McKenney
2022-11-03  2:23   ` Waiman Long
2022-11-03 20:47     ` Paul E. McKenney
2022-11-04  2:13   ` Feng Tang
2022-11-02 18:40 ` [PATCH clocksource 2/2] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney
2022-11-03  2:28   ` Waiman Long
2022-11-03 20:49     ` Paul E. McKenney
2022-11-04  0:20       ` Waiman Long
2022-11-04  0:26         ` Paul E. McKenney
2022-11-04  1:01           ` Waiman Long
2022-11-04  2:23             ` Paul E. McKenney
2022-11-04 13:55               ` Waiman Long
2022-11-05  2:38                 ` Paul E. McKenney
2022-11-05  2:49                   ` Waiman Long
2022-11-05 19:31                     ` Paul E. McKenney
2022-11-04  2:16   ` Feng Tang
2022-11-04  2:25     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).