[PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13
@ 2021-04-14  4:34 Paul E. McKenney
  2021-04-14  4:35 ` [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Limit number of CPUs checked for clock synchronization.

Changes since v7, based on Thomas Gleixner feedback:

o	Fix embarrassing git-format-patch operator error.

o	Merge pairwise clock-desynchronization checking into the checking
	of per-CPU clock synchronization when marked unstable.

o	Do selective per-CPU checking rather than blindly checking all
	CPUs.  Provide a clocksource.verify_n_cpus kernel boot parameter
	to control this behavior, with the value -1 choosing the old
	check-all-CPUs behavior.  The default is to randomly check 8 CPUs.

o	Fix the clock-desynchronization checking to avoid a potential
	use-after-free error for dynamically allocated clocksource
	structures.

o	Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
	clocksource skew checking.

o	Update commit logs and do code-style updates.

Changes since v5:

o	Rebased to v5.12-rc5.

Changes since v4:

o	Rebased to v5.12-rc1.

Changes since v3:

o	Rebased to v5.11.

o	Apply Randy Dunlap feedback.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt   |   26 +++
 b/Documentation/admin-guide/kernel-parameters.txt |   22 ++
 b/arch/x86/kernel/kvmclock.c                      |    2 
 b/arch/x86/kernel/tsc.c                           |    3 
 b/include/linux/clocksource.h                     |    2 
 b/kernel/time/clocksource.c                       |   27 +++
 kernel/time/clocksource.c                         |  171 +++++++++++++++++++++-
 7 files changed, 243 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
@ 2021-04-14  4:35 ` Paul E. McKenney
  2021-04-16 20:10   ` Thomas Gleixner
  2021-04-14  4:35 ` [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected Paul E. McKenney
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:35 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays.  Therefore, provide
clocksource.inject_delay_freq and clocksource.inject_delay_run kernel boot
parameters that specify that sufficient delay be injected to cause the
clocksource_watchdog() function to mark a clock unstable.  This delay is
injected every Nth set of M calls to clocksource_watchdog(), where N is
the value specified for the inject_delay_freq boot parameter and M is the
value specified for the inject_delay_run boot parameter.  Values of zero
or less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         | 22 +++++++++++++++
 kernel/time/clocksource.c                     | 27 +++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..fc57952dcba0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -583,6 +583,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a2cc7c..4be4391aa72f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		pr_warn("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		pr_warn("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-04-14  4:35 ` [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
@ 2021-04-14  4:35 ` Paul E. McKenney
  2021-04-16 20:45   ` Thomas Gleixner
  2021-04-17 12:24   ` Thomas Gleixner
  2021-04-14  4:36 ` [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:35 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

Therefore, re-read the watchdog clock on either side of the read from
the clock under test.  If the watchdog clock shows an excessive time
delta between its pair of reads, the reads are retried.  The maximum
number of retries is specified by a new kernel boot parameter
clocksource.max_read_retries, which defaults to three, that is,
up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4be4391aa72f..3ac19a7859f5 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_delta, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,24 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_delta = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_delta > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_delta;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries) {
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
+		}
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-04-14  4:35 ` [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
  2021-04-14  4:35 ` [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected Paul E. McKenney
@ 2021-04-14  4:36 ` Paul E. McKenney
  2021-04-17 12:28   ` Thomas Gleixner
  2021-04-17 12:47   ` Thomas Gleixner
  2021-04-14  4:36 ` [PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
  2021-04-14  4:36 ` [PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
  4 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:36 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong.  Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 63 +++++++++++++++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1fc0962c89c0..97eeaf164296 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc2771f..56289170753c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..83a3ebff7456 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3ac19a7859f5..69311deab428 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,63 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static u64 csnow_mid;
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	csnow_mid = cs->read(cs);
+}
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	int64_t cs_nsec, cs_nsec_max, cs_nsec_min;
+	u64 csnow_begin, csnow_end;
+	bool firsttime = 1;
+	int testcpu;
+	s64 delta;
+	int cpu;
+
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	preempt_disable();
+	testcpu = smp_processor_id();
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
+	for_each_online_cpu(cpu) {
+		if (cpu == testcpu)
+			continue;
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
+	}
+	preempt_enable();
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead), testcpu, cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind), testcpu, cs->name);
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			testcpu, cs_nsec_min, cs_nsec_max, cs->name);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -450,6 +507,12 @@ static int __clocksource_watchdog_kthread(void)
 	unsigned long flags;
 	int select = 0;
 
+	/* Do any required per-CPU skew verification. */
+	if (curr_clocksource &&
+	    curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
+	    curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
+		clocksource_verify_percpu(curr_clocksource);
+
 	spin_lock_irqsave(&watchdog_lock, flags);
 	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-04-14  4:36 ` [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
@ 2021-04-14  4:36 ` Paul E. McKenney
  2021-04-14  4:36 ` [PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
  4 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:36 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

Code that checks for clock desynchronization must itself be tested, so
create a new clocksource.inject_delay_shift_percpu= kernel boot parameter
that adds or subtracts a large value from the check read, using the
specified bit of the CPU ID to determine whether to add or to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fc57952dcba0..f9da90f2791f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,22 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Clocksource delay injection partitions the CPUs
+			into two sets, one whose clocks are moved ahead
+			and the other whose clocks are moved behind.
+			This kernel parameter selects the CPU-number
+			bit that determines which of these two sets the
+			corresponding CPU is placed into.  For example,
+			setting this parameter to the value 4 will result
+			in the first set containing alternating groups
+			of 16 CPUs whose clocks are moved ahead, while
+			the second set will contain the rest of the CPUs,
+			whose clocks are moved behind.
+
+			The default value of -1 disables this type of
+			error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 69311deab428..5f641b15ec6c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -218,8 +220,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	csnow_mid = cs->read(cs);
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu(struct clocksource *cs)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization
  2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-04-14  4:36 ` [PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
@ 2021-04-14  4:36 ` Paul E. McKenney
  4 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:36 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney

Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
that clock is checked on all CPUs.  This is thorough, but might not be
what you want on a system with a few tens of CPUs, let alone a few hundred
of them.

Therefore, by default check only up to eight randomly chosen CPUs.
Also provide a new clocksource.verify_n_cpus kernel boot parameter.
A value of -1 says to check all of the CPUs, and a non-negative value says
to randomly select that number of CPUs, without concern about selecting
the same CPU multiple times.  However, make use of a cpumask so that a
given CPU will be checked at most once.

Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1.
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         | 10 +++
 kernel/time/clocksource.c                     | 74 ++++++++++++++++++-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f9da90f2791f..c09be8392bd6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -621,6 +621,16 @@
 			unstable.  Defaults to three retries, that is,
 			four attempts to read the clock under test.
 
+	clocksource.verify_n_cpus= [KNL]
+			Limit the number of CPUs checked for clocksources
+			marked with CLOCK_SOURCE_VERIFY_PERCPU that
+			are marked unstable due to excessive skew.
+			A negative value says to check all CPUs, while
+			zero says not to check any.  Values larger than
+			nr_cpu_ids are silently truncated to nr_cpu_ids.
+			The actual CPUs are chosen randomly, with
+			no replacement if the same CPU is chosen twice.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 5f641b15ec6c..8f4967e59b05 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -15,6 +15,8 @@
 #include <linux/tick.h>
 #include <linux/kthread.h>
 #include <linux/delay.h>
+#include <linux/prandom.h>
+#include <linux/cpu.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -194,6 +196,8 @@ static int inject_delay_shift_percpu = -1;
 module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
+static int verify_n_cpus = 8;
+module_param(verify_n_cpus, int, 0644);
 
 static void clocksource_watchdog_inject_delay(void)
 {
@@ -216,6 +220,55 @@ static void clocksource_watchdog_inject_delay(void)
 static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
+static cpumask_t cpus_chosen;
+
+static void clocksource_verify_choose_cpus(void)
+{
+	int cpu, i, n = verify_n_cpus;
+
+	if (n < 0) {
+		/* Check all of the CPUs. */
+		cpumask_copy(&cpus_chosen, cpu_online_mask);
+		cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
+		return;
+	}
+
+	/* If no checking desired, or no other CPU to check, leave. */
+	cpumask_clear(&cpus_chosen);
+	if (n == 0 || num_online_cpus() <= 1)
+		return;
+
+	/* Make sure to select at least one CPU other than the current CPU. */
+	cpu = cpumask_next(-1, cpu_online_mask);
+	if (cpu == smp_processor_id())
+		cpu = cpumask_next(cpu, cpu_online_mask);
+	if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
+		return;
+	cpumask_set_cpu(cpu, &cpus_chosen);
+
+	/* Force a sane value for the boot parameter. */
+	if (n > nr_cpu_ids)
+		n = nr_cpu_ids - 1;
+
+	/*
+	 * Randomly select the specified number of CPUs.  If the same
+	 * CPU is selected multiple times, that CPU is checked only once,
+	 * and no replacement CPU is selected.  This gracefully handles
+	 * situations where verify_n_cpus is greater than the number of
+	 * CPUs that are currently online.
+	 */
+	for (i = 1; i < n; i++) {
+		cpu = prandom_u32() % n;
+		cpu = cpumask_next(cpu - 1, cpu_online_mask);
+		if (cpu >= nr_cpu_ids)
+			cpu = cpumask_next(-1, cpu_online_mask);
+		if (!WARN_ON_ONCE(cpu >= nr_cpu_ids))
+			cpumask_set_cpu(cpu, &cpus_chosen);
+	}
+
+	/* Don't verify ourselves. */
+	cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
+}
 
 static void clocksource_verify_one_cpu(void *csin)
 {
@@ -239,12 +292,22 @@ static void clocksource_verify_percpu(struct clocksource *cs)
 	s64 delta;
 	int cpu;
 
+	if (verify_n_cpus == 0)
+		return;
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
+	get_online_cpus();
 	preempt_disable();
+	clocksource_verify_choose_cpus();
+	if (cpumask_weight(&cpus_chosen) == 0) {
+		preempt_enable();
+		put_online_cpus();
+		pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
+		return;
+	}
 	testcpu = smp_processor_id();
-	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
-	for_each_online_cpu(cpu) {
+	pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
+	for_each_cpu(cpu, &cpus_chosen) {
 		if (cpu == testcpu)
 			continue;
 		csnow_begin = cs->read(cs);
@@ -265,6 +328,7 @@ static void clocksource_verify_percpu(struct clocksource *cs)
 		firsttime = 0;
 	}
 	preempt_enable();
+	put_online_cpus();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead), testcpu, cs->name);
@@ -350,6 +414,12 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			if (curr_clocksource == cs)
+				pr_warn("                      '%s' is current clocksource.\n", cs->name);
+			else if (curr_clocksource)
+				pr_warn("                      '%s' (not '%s') is current clocksource.\n", curr_clocksource->name, cs->name);
+			else
+				pr_warn("                      No current clocksource.\n");
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-04-14  4:35 ` [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
@ 2021-04-16 20:10   ` Thomas Gleixner
  2021-04-16 22:38     ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-16 20:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
>  
> +static int inject_delay_freq;
> +module_param(inject_delay_freq, int, 0644);
> +static int inject_delay_run = 1;
> +module_param(inject_delay_run, int, 0644);

int? Can't we just make them 'unsigned int'? Negative values are not
that useful.

> +static int max_read_retries = 3;
> +module_param(max_read_retries, int, 0644);

max_read_retries is unused here. Should be in the patch which actually
uses it.

> +static void clocksource_watchdog_inject_delay(void)
> +{
> +	int i;
> +	static int injectfail = -1;
> +
> +	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
> +		return;
> +	if (injectfail < 0 || injectfail > INT_MAX / 2)
> +		injectfail = inject_delay_run;
> +	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {

Operator precedence based cleverness is really easy to parse - NOT!

> +		pr_warn("%s(): Injecting delay.\n", __func__);
> +		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
> +			udelay(1000);
> +		pr_warn("%s(): Done injecting delay.\n", __func__);
> +	}
> +
> +	WARN_ON_ONCE(injectfail < 0);
> +}

Brain melt stage reached by now.

        static unsigned int invocations, injections;

        if (!inject_delay_period || !inject_delay_repeat)
        	return;

        if (!(invocations % inject_delay_period)) {
        	mdelay(2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC);
                if (++injections < inject_delay_repeat)
                	return;
                injections = 0;
        }

        invocations++;
}

Hmm?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-14  4:35 ` [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected Paul E. McKenney
@ 2021-04-16 20:45   ` Thomas Gleixner
  2021-04-17  0:25     ` Paul E. McKenney
  2021-04-17 12:24   ` Thomas Gleixner
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-16 20:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
>  #define WATCHDOG_INTERVAL (HZ >> 1)
>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)

Didn't we discuss that the threshold is too big ?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-04-16 20:10   ` Thomas Gleixner
@ 2021-04-16 22:38     ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-16 22:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Fri, Apr 16, 2021 at 10:10:51PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
> >  
> > +static int inject_delay_freq;
> > +module_param(inject_delay_freq, int, 0644);
> > +static int inject_delay_run = 1;
> > +module_param(inject_delay_run, int, 0644);
> 
> int? Can't we just make them 'unsigned int'? Negative values are not
> that useful.
> 
> > +static int max_read_retries = 3;
> > +module_param(max_read_retries, int, 0644);
> 
> max_read_retries is unused here. Should be in the patch which actually
> uses it.

Good point, I will make all three unsigned int and move max_read_retries
to 2/5 ("clocksource: Retry clock read if long delays detected").

> > +static void clocksource_watchdog_inject_delay(void)
> > +{
> > +	int i;
> > +	static int injectfail = -1;
> > +
> > +	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
> > +		return;
> > +	if (injectfail < 0 || injectfail > INT_MAX / 2)
> > +		injectfail = inject_delay_run;
> > +	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
> 
> Operator precedence based cleverness is really easy to parse - NOT!
> 
> > +		pr_warn("%s(): Injecting delay.\n", __func__);
> > +		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
> > +			udelay(1000);
> > +		pr_warn("%s(): Done injecting delay.\n", __func__);
> > +	}
> > +
> > +	WARN_ON_ONCE(injectfail < 0);
> > +}
> 
> Brain melt stage reached by now.
> 
>         static unsigned int invocations, injections;
> 
>         if (!inject_delay_period || !inject_delay_repeat)
>         	return;
> 
>         if (!(invocations % inject_delay_period)) {
>         	mdelay(2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC);
>                 if (++injections < inject_delay_repeat)
>                 	return;
>                 injections = 0;
>         }
> 
>         invocations++;
> }
> 
> Hmm?

That is quite a bit nicer than the interacting parameters that I
had.  I will rework along these lines.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-16 20:45   ` Thomas Gleixner
@ 2021-04-17  0:25     ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-17  0:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Fri, Apr 16, 2021 at 10:45:28PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
> >  #define WATCHDOG_INTERVAL (HZ >> 1)
> >  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> 
> Didn't we discuss that the threshold is too big ?

Indeed we did!  How about like this, so that WATCHDOG_INTERVAL is at 500
microseconds and we tolerate up to 125 microseconds of delay during the
timer-read process?

I am firing up overnight tests.

							Thanx, Paul

------------------------------------------------------------------------

commit 6c52b5f3cfefd6e429efc4413fd25e3c394e959f
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri Apr 16 16:19:43 2021 -0700

    clocksource: Reduce WATCHDOG_THRESHOLD
    
    Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
    a 500-millisecond WATCHDOG_INTERVAL.  This requires that clocks be skewed
    by more than 12.5% in order to be marked unstable.  Except that a clock
    that is skewed by that much is probably destroying unsuspecting software
    right and left.  And given that there are now checks for false-positive
    skews due to delays between reading the two clocks, and given that current
    hardware clocks all increment well in excess of 1MHz, it should be possible
    to greatly decrease WATCHDOG_THRESHOLD.
    
    Therefore, decrease WATCHDOG_THRESHOLD from the current 62.5 milliseconds
    down to 500 microseconds.
    
    Suggested-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: John Stultz <john.stultz@linaro.org>
    Cc: Stephen Boyd <sboyd@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Mark Rutland <Mark.Rutland@arm.com>
    Cc: Marc Zyngier <maz@kernel.org>
    Cc: Andi Kleen <ak@linux.intel.com>
    [ paulmck: Apply Rik van Riel feedback. ]
    Reported-by: Chris Mason <clm@fb.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 8f4967e59b05..4ec19a13dcf0 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -125,8 +125,8 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  * Interval: 0.5sec Threshold: 0.0625s
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
-#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
+#define WATCHDOG_THRESHOLD (500 * NSEC_PER_USEC)
+#define WATCHDOG_MAX_SKEW (WATCHDOG_THRESHOLD / 4)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-14  4:35 ` [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected Paul E. McKenney
  2021-04-16 20:45   ` Thomas Gleixner
@ 2021-04-17 12:24   ` Thomas Gleixner
  2021-04-17 22:54     ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-17 12:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
>  #define WATCHDOG_INTERVAL (HZ >> 1)
>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> +#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)

That's ~15ms which is a tad large I'd say...
  
>  static void clocksource_watchdog_work(struct work_struct *work)
>  {
> @@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
>  static void clocksource_watchdog(struct timer_list *unused)
>  {
>  	struct clocksource *cs;
> -	u64 csnow, wdnow, cslast, wdlast, delta;
> -	int64_t wd_nsec, cs_nsec;
> +	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
> +	int64_t wd_nsec, wdagain_delta, wderr_nsec = 0, cs_nsec;
>  	int next_cpu, reset_pending;
> +	int nretries;
>  
>  	spin_lock(&watchdog_lock);
>  	if (!watchdog_running)
> @@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>  	reset_pending = atomic_read(&watchdog_reset_pending);
>  
>  	list_for_each_entry(cs, &watchdog_list, wd_list) {
> +		nretries = 0;
>  
>  		/* Clocksource already marked unstable? */
>  		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
> @@ -232,11 +235,24 @@ static void clocksource_watchdog(struct timer_list *unused)
>  			continue;
>  		}
>  
> +retry:
>  		local_irq_disable();
> -		csnow = cs->read(cs);
> -		clocksource_watchdog_inject_delay();
>  		wdnow = watchdog->read(watchdog);
> +		clocksource_watchdog_inject_delay();
> +		csnow = cs->read(cs);
> +		wdagain = watchdog->read(watchdog);
>  		local_irq_enable();
> +		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
> +		wdagain_delta = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> +		if (wdagain_delta > WATCHDOG_MAX_SKEW) {
> +			wderr_nsec = wdagain_delta;
> +			if (nretries++ < max_read_retries)
> +				goto retry;
> +		}
> +		if (nretries) {
> +			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
> +				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
> +		}
>  
>  		/* Clocksource initialized ? */
>  		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||

This can nicely be split out into a read function which avoids brain
overload when reading. Something like the uncompiled below.

I so wish we could just delete all of this horror instead of making it
more horrible.

Thanks,

        tglx
---
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,12 @@ static void __clocksource_change_rating(
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
 
+/*
+ * The maximum delay between two consecutive readouts of the watchdog
+ * clocksource to detect SMI,NMI,vCPU preemption.
+ */
+#define WATCHDOG_MAX_DELAY (100 * NSEC_PER_USEC)
+
 static void clocksource_watchdog_work(struct work_struct *work)
 {
 	/*
@@ -184,12 +190,37 @@ void clocksource_mark_unstable(struct cl
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
+{
+	unsigned int nretries;
+	u64 wd_end, wd_delta;
+	int64_t wd_delay;
+
+	for (nretries = 0; nretries < max_read_retries; nretries++) {
+		local_irq_disable();
+		*wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		*csnow = cs->read(cs);
+		wd_end = watchdog->read(watchdog);
+		local_irq_enable();
+
+		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
+		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
+		if (wd_delay < WATCHDOG_MAX_DELAY)
+			return true;
+	}
+
+	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, %d attempts\n",
+		smp_processor_id(), watchdog->name, wd_delay, nretries);
+	return false;
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
-	struct clocksource *cs;
 	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
 	int next_cpu, reset_pending;
+	int64_t wd_nsec, cs_nsec;
+	struct clocksource *cs;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -206,10 +237,14 @@ static void clocksource_watchdog(struct
 			continue;
 		}
 
-		local_irq_disable();
-		csnow = cs->read(cs);
-		wdnow = watchdog->read(watchdog);
-		local_irq_enable();
+		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
+			/*
+			 * No point to continue if the watchdog readout is
+			 * unreliable.
+			 */
+			__clocksource_unstable(cs);
+			continue;
+		}
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-14  4:36 ` [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
@ 2021-04-17 12:28   ` Thomas Gleixner
  2021-04-17 23:42     ` Paul E. McKenney
  2021-04-17 12:47   ` Thomas Gleixner
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-17 12:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 1fc0962c89c0..97eeaf164296 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
>  	.read	= kvm_clock_get_cycles,
>  	.rating	= 400,
>  	.mask	= CLOCKSOURCE_MASK(64),
> -	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
> +	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,

kvm_clock is not marked with CLOCK_SOURCE_MUST_VERIFY, so what's the
point of adding this here? It's not subject to be monitored by the
watchdog muck.

> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index f70dffc2771f..56289170753c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
>  	.mask			= CLOCKSOURCE_MASK(64),
>  	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
>  				  CLOCK_SOURCE_VALID_FOR_HRES |
> -				  CLOCK_SOURCE_MUST_VERIFY,
> +				  CLOCK_SOURCE_MUST_VERIFY |
> +				  CLOCK_SOURCE_VERIFY_PERCPU,

While this one is part of the horror show.

> +static u64 csnow_mid;
> +static cpumask_t cpus_ahead;
> +static cpumask_t cpus_behind;
> +
> +static void clocksource_verify_one_cpu(void *csin)
> +{
> +	struct clocksource *cs = (struct clocksource *)csin;
> +
> +	csnow_mid = cs->read(cs);
> +}
> +
> +static void clocksource_verify_percpu(struct clocksource *cs)
> +{
> +	int64_t cs_nsec, cs_nsec_max, cs_nsec_min;
> +	u64 csnow_begin, csnow_end;
> +	bool firsttime = 1;
> +	int testcpu;
> +	s64 delta;
> +	int cpu;

        int testcpu, cpu; :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-14  4:36 ` [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
  2021-04-17 12:28   ` Thomas Gleixner
@ 2021-04-17 12:47   ` Thomas Gleixner
  2021-04-17 23:51     ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-17 12:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Paul E. McKenney, Chris Mason

On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:

Bah, hit send too quick.

> +	cpumask_clear(&cpus_ahead);
> +	cpumask_clear(&cpus_behind);
> +	preempt_disable();

Daft. 

> +	testcpu = smp_processor_id();
> +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
> +	for_each_online_cpu(cpu) {
> +		if (cpu == testcpu)
> +			continue;
> +		csnow_begin = cs->read(cs);
> +		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
> +		csnow_end = cs->read(cs);

As this must run with interrupts enabled, that's a pretty rough
approximation like measuring wind speed with a wet thumb.

Wouldn't it be smarter to let the remote CPU do the watchdog dance and
take that result? i.e. split out more of the watchdog code so that you
can get the nanoseconds delta on that remote CPU to the watchdog.

> +		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
> +		if (delta < 0)
> +			cpumask_set_cpu(cpu, &cpus_behind);
> +		delta = (csnow_end - csnow_mid) & cs->mask;
> +		if (delta < 0)
> +			cpumask_set_cpu(cpu, &cpus_ahead);
> +		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
> +		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);

> +		if (firsttime || cs_nsec > cs_nsec_max)
> +			cs_nsec_max = cs_nsec;
> +		if (firsttime || cs_nsec < cs_nsec_min)
> +			cs_nsec_min = cs_nsec;
> +		firsttime = 0;

  int64_t cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;

and then the firsttime muck is not needed at all.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-17 12:24   ` Thomas Gleixner
@ 2021-04-17 22:54     ` Paul E. McKenney
  2021-04-17 23:15       ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-17 22:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Sat, Apr 17, 2021 at 02:24:23PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
> >  #define WATCHDOG_INTERVAL (HZ >> 1)
> >  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> > +#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
> 
> That's ~15ms which is a tad large I'd say...

And I have a separate patch queued to crank this down to 125us.

I ran 2,664 hours of rcutorture on this patch, and saw only 18 instances
of delay detected.  So 125us works well, but I am not inclined to decrease
it much.

I will try 50us this evening though, just for grins.  After all, only
those who have gone too far can possibly tell you how far you can go.

> >  static void clocksource_watchdog_work(struct work_struct *work)
> >  {
> > @@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
> >  static void clocksource_watchdog(struct timer_list *unused)
> >  {
> >  	struct clocksource *cs;
> > -	u64 csnow, wdnow, cslast, wdlast, delta;
> > -	int64_t wd_nsec, cs_nsec;
> > +	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
> > +	int64_t wd_nsec, wdagain_delta, wderr_nsec = 0, cs_nsec;
> >  	int next_cpu, reset_pending;
> > +	int nretries;
> >  
> >  	spin_lock(&watchdog_lock);
> >  	if (!watchdog_running)
> > @@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> >  	reset_pending = atomic_read(&watchdog_reset_pending);
> >  
> >  	list_for_each_entry(cs, &watchdog_list, wd_list) {
> > +		nretries = 0;
> >  
> >  		/* Clocksource already marked unstable? */
> >  		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
> > @@ -232,11 +235,24 @@ static void clocksource_watchdog(struct timer_list *unused)
> >  			continue;
> >  		}
> >  
> > +retry:
> >  		local_irq_disable();
> > -		csnow = cs->read(cs);
> > -		clocksource_watchdog_inject_delay();
> >  		wdnow = watchdog->read(watchdog);
> > +		clocksource_watchdog_inject_delay();
> > +		csnow = cs->read(cs);
> > +		wdagain = watchdog->read(watchdog);
> >  		local_irq_enable();
> > +		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
> > +		wdagain_delta = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > +		if (wdagain_delta > WATCHDOG_MAX_SKEW) {
> > +			wderr_nsec = wdagain_delta;
> > +			if (nretries++ < max_read_retries)
> > +				goto retry;
> > +		}
> > +		if (nretries) {
> > +			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
> > +				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
> > +		}
> >  
> >  		/* Clocksource initialized ? */
> >  		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
> 
> This can nicely be split out into a read function which avoids brain
> overload when reading. Something like the uncompiled below.

Good point, done, with minor adjustments.

> I so wish we could just delete all of this horror instead of making it
> more horrible.

Revisit deleting it in five years if there are no issues, whatever
"issue" might mean in this context?  Five years should give time for
this to propagate to the enterprise distros.

							Thanx, Paul

> Thanks,
> 
>         tglx
> ---
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -124,6 +124,12 @@ static void __clocksource_change_rating(
>  #define WATCHDOG_INTERVAL (HZ >> 1)
>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
>  
> +/*
> + * The maximum delay between two consecutive readouts of the watchdog
> + * clocksource to detect SMI,NMI,vCPU preemption.
> + */
> +#define WATCHDOG_MAX_DELAY (100 * NSEC_PER_USEC)
> +
>  static void clocksource_watchdog_work(struct work_struct *work)
>  {
>  	/*
> @@ -184,12 +190,37 @@ void clocksource_mark_unstable(struct cl
>  	spin_unlock_irqrestore(&watchdog_lock, flags);
>  }
>  
> +static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> +{
> +	unsigned int nretries;
> +	u64 wd_end, wd_delta;
> +	int64_t wd_delay;
> +
> +	for (nretries = 0; nretries < max_read_retries; nretries++) {
> +		local_irq_disable();
> +		*wdnow = watchdog->read(watchdog);
> +		clocksource_watchdog_inject_delay();
> +		*csnow = cs->read(cs);
> +		wd_end = watchdog->read(watchdog);
> +		local_irq_enable();
> +
> +		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
> +		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
> +		if (wd_delay < WATCHDOG_MAX_DELAY)
> +			return true;
> +	}
> +
> +	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, %d attempts\n",
> +		smp_processor_id(), watchdog->name, wd_delay, nretries);
> +	return false;
> +}
> +
>  static void clocksource_watchdog(struct timer_list *unused)
>  {
> -	struct clocksource *cs;
>  	u64 csnow, wdnow, cslast, wdlast, delta;
> -	int64_t wd_nsec, cs_nsec;
>  	int next_cpu, reset_pending;
> +	int64_t wd_nsec, cs_nsec;
> +	struct clocksource *cs;
>  
>  	spin_lock(&watchdog_lock);
>  	if (!watchdog_running)
> @@ -206,10 +237,14 @@ static void clocksource_watchdog(struct
>  			continue;
>  		}
>  
> -		local_irq_disable();
> -		csnow = cs->read(cs);
> -		wdnow = watchdog->read(watchdog);
> -		local_irq_enable();
> +		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
> +			/*
> +			 * No point to continue if the watchdog readout is
> +			 * unreliable.
> +			 */
> +			__clocksource_unstable(cs);
> +			continue;
> +		}
>  
>  		/* Clocksource initialized ? */
>  		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-17 22:54     ` Paul E. McKenney
@ 2021-04-17 23:15       ` Thomas Gleixner
  2021-04-17 23:40         ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2021-04-17 23:15 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Sat, Apr 17 2021 at 15:54, Paul E. McKenney wrote:
> On Sat, Apr 17, 2021 at 02:24:23PM +0200, Thomas Gleixner wrote:
>> I so wish we could just delete all of this horror instead of making it
>> more horrible.
>
> Revisit deleting it in five years if there are no issues, whatever
> "issue" might mean in this context?  Five years should give time for
> this to propagate to the enterprise distros.

That does not help with the fact that the broken hardware will still be
around in 5 years from now. Which is what prevents me from deleting the
whole watchdog muck ...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-17 23:15       ` Thomas Gleixner
@ 2021-04-17 23:40         ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-17 23:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Sun, Apr 18, 2021 at 01:15:26AM +0200, Thomas Gleixner wrote:
> On Sat, Apr 17 2021 at 15:54, Paul E. McKenney wrote:
> > On Sat, Apr 17, 2021 at 02:24:23PM +0200, Thomas Gleixner wrote:
> >> I so wish we could just delete all of this horror instead of making it
> >> more horrible.
> >
> > Revisit deleting it in five years if there are no issues, whatever
> > "issue" might mean in this context?  Five years should give time for
> > this to propagate to the enterprise distros.
> 
> That does not help with the fact that the broken hardware will still be
> around in 5 years from now. Which is what prevents me from deleting the
> whole watchdog muck ...

Ouch!  Sigh...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-17 12:28   ` Thomas Gleixner
@ 2021-04-17 23:42     ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-17 23:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Sat, Apr 17, 2021 at 02:28:22PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:
> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> > index 1fc0962c89c0..97eeaf164296 100644
> > --- a/arch/x86/kernel/kvmclock.c
> > +++ b/arch/x86/kernel/kvmclock.c
> > @@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
> >  	.read	= kvm_clock_get_cycles,
> >  	.rating	= 400,
> >  	.mask	= CLOCKSOURCE_MASK(64),
> > -	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
> > +	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
> 
> kvm_clock is not marked with CLOCK_SOURCE_MUST_VERIFY, so what's the
> point of adding this here? It's not subject to be monitored by the
> watchdog muck.

Good point, removed.

> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index f70dffc2771f..56289170753c 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
> >  	.mask			= CLOCKSOURCE_MASK(64),
> >  	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
> >  				  CLOCK_SOURCE_VALID_FOR_HRES |
> > -				  CLOCK_SOURCE_MUST_VERIFY,
> > +				  CLOCK_SOURCE_MUST_VERIFY |
> > +				  CLOCK_SOURCE_VERIFY_PERCPU,
> 
> While this one is part of the horror show.

It probably would be good to decorate other arch's per-CPU clocks
with CLOCK_SOURCE_VERIFY_PERCPU, but no takers thus far.

							Thanx, Paul

> > +static u64 csnow_mid;
> > +static cpumask_t cpus_ahead;
> > +static cpumask_t cpus_behind;
> > +
> > +static void clocksource_verify_one_cpu(void *csin)
> > +{
> > +	struct clocksource *cs = (struct clocksource *)csin;
> > +
> > +	csnow_mid = cs->read(cs);
> > +}
> > +
> > +static void clocksource_verify_percpu(struct clocksource *cs)
> > +{
> > +	int64_t cs_nsec, cs_nsec_max, cs_nsec_min;
> > +	u64 csnow_begin, csnow_end;
> > +	bool firsttime = 1;
> > +	int testcpu;
> > +	s64 delta;
> > +	int cpu;
> 
>         int testcpu, cpu; :)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-17 12:47   ` Thomas Gleixner
@ 2021-04-17 23:51     ` Paul E. McKenney
  2021-04-18 16:20       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-17 23:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Sat, Apr 17, 2021 at 02:47:18PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:
> 
> Bah, hit send too quick.
> 
> > +	cpumask_clear(&cpus_ahead);
> > +	cpumask_clear(&cpus_behind);
> > +	preempt_disable();
> 
> Daft. 

Would migrate_disable() be better?

Yes, I know, in virtual environments, the hypervisor can migrate anyway,
but this does limit the potential damage to one out of the two schedulers.

> > +	testcpu = smp_processor_id();
> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
> > +	for_each_online_cpu(cpu) {
> > +		if (cpu == testcpu)
> > +			continue;
> > +		csnow_begin = cs->read(cs);
> > +		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
> > +		csnow_end = cs->read(cs);
> 
> As this must run with interrupts enabled, that's a pretty rough
> approximation like measuring wind speed with a wet thumb.
> 
> Wouldn't it be smarter to let the remote CPU do the watchdog dance and
> take that result? i.e. split out more of the watchdog code so that you
> can get the nanoseconds delta on that remote CPU to the watchdog.

First, an interrupt, NMI, SMI, vCPU preemption, or whatever could
not cause a false positive.  A false negative, perhaps, but no
false positives.  Second, in normal operation, these are rare, so that
hitting the (eventual) default of eight CPUs is very likely to result in
tight bounds on the delay-based error for most of those CPUs.  Third,
we really need to compare the TSC on one CPU to the TSC on the other
in order to have a very clear indication of a problem, should a real
TSC-synchronization issue arise.  In contrast, comparisons against the
watchdog timer will be more complicated and any errors detected will be
quite hard to prove to be due to TSC issues.

Or am I once again missing something?

> > +		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
> > +		if (delta < 0)
> > +			cpumask_set_cpu(cpu, &cpus_behind);
> > +		delta = (csnow_end - csnow_mid) & cs->mask;
> > +		if (delta < 0)
> > +			cpumask_set_cpu(cpu, &cpus_ahead);
> > +		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
> > +		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
> 
> > +		if (firsttime || cs_nsec > cs_nsec_max)
> > +			cs_nsec_max = cs_nsec;
> > +		if (firsttime || cs_nsec < cs_nsec_min)
> > +			cs_nsec_min = cs_nsec;
> > +		firsttime = 0;
> 
>   int64_t cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
> 
> and then the firsttime muck is not needed at all.

Good point, will fix!

And again, thank you for looking all of this over.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-17 23:51     ` Paul E. McKenney
@ 2021-04-18 16:20       ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-04-18 16:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, Chris Mason

On Sat, Apr 17, 2021 at 04:51:36PM -0700, Paul E. McKenney wrote:
> On Sat, Apr 17, 2021 at 02:47:18PM +0200, Thomas Gleixner wrote:

[ . . . ]

> > > +		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
> > > +		if (delta < 0)
> > > +			cpumask_set_cpu(cpu, &cpus_behind);
> > > +		delta = (csnow_end - csnow_mid) & cs->mask;
> > > +		if (delta < 0)
> > > +			cpumask_set_cpu(cpu, &cpus_ahead);
> > > +		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
> > > +		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
> > 
> > > +		if (firsttime || cs_nsec > cs_nsec_max)
> > > +			cs_nsec_max = cs_nsec;
> > > +		if (firsttime || cs_nsec < cs_nsec_min)
> > > +			cs_nsec_min = cs_nsec;
> > > +		firsttime = 0;
> > 
> >   int64_t cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
> > 
> > and then the firsttime muck is not needed at all.
> 
> Good point, will fix!
> 
> And again, thank you for looking all of this over.

And overnight testing with a 50-microsecond WATCHDOG_MAX_SKEW was
uneventful.  However, I managed to miss printing when a retry was
necessary, so all that says is that no more than three retries were
ever required.  So I added test code to print a message whenever two
or more retries are required and restarted the tests.  Shorter run,
but more systems, so hopefully similar coverage.

If that works OK, I will resend the series this evening, Pacific Time.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-04-18 16:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-14  4:35 ` [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
2021-04-16 20:10   ` Thomas Gleixner
2021-04-16 22:38     ` Paul E. McKenney
2021-04-14  4:35 ` [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected Paul E. McKenney
2021-04-16 20:45   ` Thomas Gleixner
2021-04-17  0:25     ` Paul E. McKenney
2021-04-17 12:24   ` Thomas Gleixner
2021-04-17 22:54     ` Paul E. McKenney
2021-04-17 23:15       ` Thomas Gleixner
2021-04-17 23:40         ` Paul E. McKenney
2021-04-14  4:36 ` [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
2021-04-17 12:28   ` Thomas Gleixner
2021-04-17 23:42     ` Paul E. McKenney
2021-04-17 12:47   ` Thomas Gleixner
2021-04-17 23:51     ` Paul E. McKenney
2021-04-18 16:20       ` Paul E. McKenney
2021-04-14  4:36 ` [PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
2021-04-14  4:36 ` [PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.