linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC clocksource] Do not mark clocks unstable due to delays
@ 2021-01-06  0:40 Paul E. McKenney
  2021-01-06  0:41 ` [PATCH RFC clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
                   ` (5 more replies)
  0 siblings, 6 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-01-06  0:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   31 ++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  176 +++++++++++++++++++++---
 5 files changed, 189 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH RFC clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
@ 2021-01-06  0:41 ` paulmck
  2021-01-06  0:41 ` [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-06  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul E. McKenney, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 +++++++++++++++++++
 kernel/time/clocksource.c                       | 28 +++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c722ec1..15aa3fe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -577,6 +577,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..a0d9d36 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,32 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		printk("%s(): Injecting delay.\n", __func__);
+		injectfail = 0;
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		printk("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +235,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-01-06  0:41 ` [PATCH RFC clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-01-06  0:41 ` paulmck
  2021-01-06 16:28   ` Rik van Riel
  2021-01-06  0:41 ` [PATCH RFC clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: paulmck @ 2021-01-06  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul E. McKenney, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index a0d9d36..4663b86 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -203,7 +204,6 @@ static void clocksource_watchdog_inject_delay(void)
 		injectfail = inject_delay_run;
 	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
 		printk("%s(): Injecting delay.\n", __func__);
-		injectfail = 0;
 		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
 			udelay(1000);
 		printk("%s(): Done injecting delay.\n", __func__);
@@ -214,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -225,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -233,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH RFC clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-01-06  0:41 ` [PATCH RFC clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-01-06  0:41 ` [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-01-06  0:41 ` paulmck
  2021-01-06  0:41 ` [PATCH RFC clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-06  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul E. McKenney, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index aa59374..337bb2c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4663b86..23bcefe 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH RFC clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-01-06  0:41 ` [PATCH RFC clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-01-06  0:41 ` paulmck
  2021-01-06  0:41 ` [PATCH RFC clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-06  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul E. McKenney, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 15aa3fe..613ce32 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -593,6 +593,15 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Shift count to obtain bit from CPU number to
+			determine whether to shift the time of the per-CPU
+			clock under test ahead or behind.  For example,
+			setting this to the value four will result in
+			alternating groups of 16 CPUs shifting ahead and
+			the rest of the CPUs shifting behind.  The default
+			value of -1 disable this type of error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 23bcefe..67cf41c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH RFC clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-01-06  0:41 ` [PATCH RFC clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-01-06  0:41 ` paulmck
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-06  0:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Paul E. McKenney, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 67cf41c..31560c6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-01-06  0:41 ` [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-01-06 16:28   ` Rik van Riel
  2021-01-06 19:53     ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Rik van Riel @ 2021-01-06 16:28 UTC (permalink / raw)
  To: paulmck, linux-kernel
  Cc: kernel-team, John Stultz, Thomas Gleixner, Stephen Boyd,
	Jonathan Corbet, Mark Rutland, Marc Zyngier

On Tue, 2021-01-05 at 16:41 -0800, paulmck@kernel.org wrote:
> 
> @@ -203,7 +204,6 @@ static void
> clocksource_watchdog_inject_delay(void)
>  		injectfail = inject_delay_run;
>  	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
>  		printk("%s(): Injecting delay.\n", __func__);
> -		injectfail = 0;
>  		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC;
> i++)
>  			udelay(1000);

Wait, patch 1 just added that line?

Should patch 1 not add it and this
patch go without
this removal? :)

+               wdagain_nsec = clocksource_cyc2ns(delta, watchdog-
>mult, watchdog->shift);
+               if (wdagain_nsec < 0 || wdagain_nsec >
WATCHDOG_MAX_SKEW) {
+                       wderr_nsec = wdagain_nsec;
+                       if (nretries++ < max_read_retries)
+                               goto retry;
+               }

Given that clocksource_cyc2ns uses unsigned multiplication
followed by a right shift, do we need to test for <0?


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-01-06 16:28   ` Rik van Riel
@ 2021-01-06 19:53     ` Paul E. McKenney
  2021-01-06 20:59       ` Rik van Riel
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-01-06 19:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

On Wed, Jan 06, 2021 at 11:28:00AM -0500, Rik van Riel wrote:
> On Tue, 2021-01-05 at 16:41 -0800, paulmck@kernel.org wrote:
> > 
> > @@ -203,7 +204,6 @@ static void
> > clocksource_watchdog_inject_delay(void)
> >  		injectfail = inject_delay_run;
> >  	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
> >  		printk("%s(): Injecting delay.\n", __func__);
> > -		injectfail = 0;
> >  		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC;
> > i++)
> >  			udelay(1000);
> 
> Wait, patch 1 just added that line?
> 
> Should patch 1 not add it and this
> patch go without
> this removal? :)

Good catch, will fix.  ;-)

> +               wdagain_nsec = clocksource_cyc2ns(delta, watchdog-
> >mult, watchdog->shift);
> +               if (wdagain_nsec < 0 || wdagain_nsec >
> WATCHDOG_MAX_SKEW) {
> +                       wderr_nsec = wdagain_nsec;
> +                       if (nretries++ < max_read_retries)
> +                               goto retry;
> +               }
> 
> Given that clocksource_cyc2ns uses unsigned multiplication
> followed by a right shift, do we need to test for <0?

I am worried about the possibility of the "shift" argument to
clocksource_cyc2ns() being zero.  For example, unless I am missing
something, clocksource_tsc has a zero .shift field.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-01-06 19:53     ` Paul E. McKenney
@ 2021-01-06 20:59       ` Rik van Riel
  0 siblings, 0 replies; 72+ messages in thread
From: Rik van Riel @ 2021-01-06 20:59 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, kernel-team, John Stultz, Thomas Gleixner,
	Stephen Boyd, Jonathan Corbet, Mark Rutland, Marc Zyngier

On Wed, 2021-01-06 at 11:53 -0800, Paul E. McKenney wrote:
> On Wed, Jan 06, 2021 at 11:28:00AM -0500, Rik van Riel wrote:
> 
> > +               wdagain_nsec = clocksource_cyc2ns(delta, watchdog-
> > > mult, watchdog->shift);
> > +               if (wdagain_nsec < 0 || wdagain_nsec >
> > WATCHDOG_MAX_SKEW) {
> > +                       wderr_nsec = wdagain_nsec;
> > +                       if (nretries++ < max_read_retries)
> > +                               goto retry;
> > +               }
> > 
> > Given that clocksource_cyc2ns uses unsigned multiplication
> > followed by a right shift, do we need to test for <0?
> 
> I am worried about the possibility of the "shift" argument to
> clocksource_cyc2ns() being zero.  For example, unless I am missing
> something, clocksource_tsc has a zero .shift field.

Oh, good point!


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 clocksource] Do not mark clocks unstable due to delays
  2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                   ` (4 preceding siblings ...)
  2021-01-06  0:41 ` [PATCH RFC clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-01-12  0:42 ` Paul E. McKenney
  2021-01-12  0:45   ` [PATCH v2 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
                     ` (5 more replies)
  5 siblings, 6 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-01-12  0:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   31 ++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  174 +++++++++++++++++++++---
 5 files changed, 188 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
@ 2021-01-12  0:45   ` paulmck
  2021-01-12  0:45   ` [PATCH v2 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-12  0:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++++
 kernel/time/clocksource.c                       | 27 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e3cdb2..4c59813 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -577,6 +577,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..545889c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		printk("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		printk("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-01-12  0:45   ` [PATCH v2 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-01-12  0:45   ` paulmck
  2021-01-12  0:45   ` [PATCH v2 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-12  0:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 545889c..4663b86 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-01-12  0:45   ` [PATCH v2 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-01-12  0:45   ` [PATCH v2 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-01-12  0:45   ` paulmck
  2021-01-12  0:45   ` [PATCH v2 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-12  0:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index aa59374..337bb2c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4663b86..23bcefe 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                     ` (2 preceding siblings ...)
  2021-01-12  0:45   ` [PATCH v2 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-01-12  0:45   ` paulmck
  2021-01-12  0:45   ` [PATCH v2 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-12  0:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4c59813..ca64b0c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -593,6 +593,15 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Shift count to obtain bit from CPU number to
+			determine whether to shift the time of the per-CPU
+			clock under test ahead or behind.  For example,
+			setting this to the value four will result in
+			alternating groups of 16 CPUs shifting ahead and
+			the rest of the CPUs shifting behind.  The default
+			value of -1 disable this type of error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 23bcefe..67cf41c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v2 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                     ` (3 preceding siblings ...)
  2021-01-12  0:45   ` [PATCH v2 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-01-12  0:45   ` paulmck
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-01-12  0:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 67cf41c..3bae5fb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v3 clocksource] Do not mark clocks unstable due to delays
  2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                     ` (4 preceding siblings ...)
  2021-01-12  0:45   ` [PATCH v2 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-02-02 17:04   ` Paul E. McKenney
  2021-02-02 17:06     ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
                       ` (5 more replies)
  5 siblings, 6 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-02-02 17:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   31 ++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  174 +++++++++++++++++++++---
 5 files changed, 188 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
@ 2021-02-02 17:06     ` paulmck
  2021-02-02 17:06     ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-02 17:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++++
 kernel/time/clocksource.c                       | 27 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a10b545..9965266 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -577,6 +577,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..545889c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		printk("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		printk("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-02-02 17:06     ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-02-02 17:06     ` paulmck
  2021-02-02 17:06     ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-02 17:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 545889c..4663b86 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-02-02 17:06     ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-02-02 17:06     ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-02-02 17:06     ` paulmck
  2021-02-02 17:06     ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-02 17:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index aa59374..337bb2c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4663b86..23bcefe 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                       ` (2 preceding siblings ...)
  2021-02-02 17:06     ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-02-02 17:06     ` paulmck
  2021-02-02 19:51       ` Randy Dunlap
  2021-02-02 17:06     ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  5 siblings, 1 reply; 72+ messages in thread
From: paulmck @ 2021-02-02 17:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9965266..f561e94 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -593,6 +593,15 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Shift count to obtain bit from CPU number to
+			determine whether to shift the time of the per-CPU
+			clock under test ahead or behind.  For example,
+			setting this to the value four will result in
+			alternating groups of 16 CPUs shifting ahead and
+			the rest of the CPUs shifting behind.  The default
+			value of -1 disable this type of error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 23bcefe..67cf41c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                       ` (3 preceding siblings ...)
  2021-02-02 17:06     ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-02-02 17:06     ` paulmck
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-02 17:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 67cf41c..3bae5fb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-02-02 17:06     ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-02-02 19:51       ` Randy Dunlap
  2021-02-03  0:50         ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Randy Dunlap @ 2021-02-02 19:51 UTC (permalink / raw)
  To: paulmck, linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm

On 2/2/21 9:06 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> Code that checks for clock desynchronization must itself be tested, so
> this commit creates a new clocksource.inject_delay_shift_percpu= kernel
> boot parameter that adds or subtracts a large value from the check read,
> using the specified bit of the CPU ID to determine whether to add or
> to subtract.
> 
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Mark Rutland <Mark.Rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Reported-by: Chris Mason <clm@fb.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
>  kernel/time/clocksource.c                       | 10 +++++++++-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9965266..f561e94 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -593,6 +593,15 @@
>  			times the value specified for inject_delay_freq
>  			of consecutive non-delays.
>  
> +	clocksource.inject_delay_shift_percpu= [KNL]
> +			Shift count to obtain bit from CPU number to
> +			determine whether to shift the time of the per-CPU
> +			clock under test ahead or behind.  For example,

It's a good think that you give an example -- it helps a little bit.
That sentence above needs to be rewritten...

> +			setting this to the value four will result in
> +			alternating groups of 16 CPUs shifting ahead and
> +			the rest of the CPUs shifting behind.  The default
> +			value of -1 disable this type of error injection.

			            disables

> +
>  	clocksource.max_read_retries= [KNL]
>  			Number of clocksource_watchdog() retries due to
>  			external delays before the clock will be marked


-- 
~Randy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-02-02 19:51       ` Randy Dunlap
@ 2021-02-03  0:50         ` Paul E. McKenney
  2021-02-03  1:31           ` Randy Dunlap
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-02-03  0:50 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, kernel-team, john.stultz, tglx, sboyd, corbet,
	Mark.Rutland, maz, ak, clm

On Tue, Feb 02, 2021 at 11:51:02AM -0800, Randy Dunlap wrote:
> On 2/2/21 9:06 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > Code that checks for clock desynchronization must itself be tested, so
> > this commit creates a new clocksource.inject_delay_shift_percpu= kernel
> > boot parameter that adds or subtracts a large value from the check read,
> > using the specified bit of the CPU ID to determine whether to add or
> > to subtract.
> > 
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Mark Rutland <Mark.Rutland@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Reported-by: Chris Mason <clm@fb.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
> >  kernel/time/clocksource.c                       | 10 +++++++++-
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 9965266..f561e94 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -593,6 +593,15 @@
> >  			times the value specified for inject_delay_freq
> >  			of consecutive non-delays.
> >  
> > +	clocksource.inject_delay_shift_percpu= [KNL]
> > +			Shift count to obtain bit from CPU number to
> > +			determine whether to shift the time of the per-CPU
> > +			clock under test ahead or behind.  For example,
> 
> It's a good think that you give an example -- it helps a little bit.
> That sentence above needs to be rewritten...

That is a bit obscure, now that you mention it.

> > +			setting this to the value four will result in
> > +			alternating groups of 16 CPUs shifting ahead and
> > +			the rest of the CPUs shifting behind.  The default
> > +			value of -1 disable this type of error injection.
> 
> 			            disables

Good eyes!

So how about like this?

	clocksource.inject_delay_shift_percpu= [KNL]
			Clocksource delay injection partitions the CPUs
			into two sets, one whose clocks are moved ahead
			and the other whose clocks are moved behind.
			This kernel parameter selects the CPU-number
			bit that determines which of these two sets the
			corresponding CPU is placed into.  For example,
			setting this parameter to the value four will
			result in the first set containing alternating
			groups of 16 CPUs whose clocks are moved ahead,
			while the second set will contain the rest of
			the CPUs, whose clocks are moved behind.

			The default value of -1 disables this type of
			error injection.

							Thanx, Paul

> > +
> >  	clocksource.max_read_retries= [KNL]
> >  			Number of clocksource_watchdog() retries due to
> >  			external delays before the clock will be marked
> 
> 
> -- 
> ~Randy
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-02-03  0:50         ` Paul E. McKenney
@ 2021-02-03  1:31           ` Randy Dunlap
  2021-02-03  1:40             ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Randy Dunlap @ 2021-02-03  1:31 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, kernel-team, john.stultz, tglx, sboyd, corbet,
	Mark.Rutland, maz, ak, clm

On 2/2/21 4:50 PM, Paul E. McKenney wrote:
> On Tue, Feb 02, 2021 at 11:51:02AM -0800, Randy Dunlap wrote:
>> On 2/2/21 9:06 AM, paulmck@kernel.org wrote:
>>> From: "Paul E. McKenney" <paulmck@kernel.org>
>>>
>>> Code that checks for clock desynchronization must itself be tested, so
>>> this commit creates a new clocksource.inject_delay_shift_percpu= kernel
>>> boot parameter that adds or subtracts a large value from the check read,
>>> using the specified bit of the CPU ID to determine whether to add or
>>> to subtract.
>>>
>>> Cc: John Stultz <john.stultz@linaro.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Stephen Boyd <sboyd@kernel.org>
>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>> Cc: Mark Rutland <Mark.Rutland@arm.com>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Andi Kleen <ak@linux.intel.com>
>>> Reported-by: Chris Mason <clm@fb.com>
>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>> ---
>>>  Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
>>>  kernel/time/clocksource.c                       | 10 +++++++++-
>>>  2 files changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 9965266..f561e94 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -593,6 +593,15 @@
>>>  			times the value specified for inject_delay_freq
>>>  			of consecutive non-delays.
>>>  
>>> +	clocksource.inject_delay_shift_percpu= [KNL]
>>> +			Shift count to obtain bit from CPU number to
>>> +			determine whether to shift the time of the per-CPU
>>> +			clock under test ahead or behind.  For example,
>>
>> It's a good think that you give an example -- it helps a little bit.
>> That sentence above needs to be rewritten...
> 
> That is a bit obscure, now that you mention it.
> 
>>> +			setting this to the value four will result in
>>> +			alternating groups of 16 CPUs shifting ahead and
>>> +			the rest of the CPUs shifting behind.  The default
>>> +			value of -1 disable this type of error injection.
>>
>> 			            disables
> 
> Good eyes!
> 
> So how about like this?

Much better, thanks.

> 	clocksource.inject_delay_shift_percpu= [KNL]
> 			Clocksource delay injection partitions the CPUs
> 			into two sets, one whose clocks are moved ahead
> 			and the other whose clocks are moved behind.
> 			This kernel parameter selects the CPU-number
> 			bit that determines which of these two sets the
> 			corresponding CPU is placed into.  For example,
> 			setting this parameter to the value four will

I know that in "writing," "four" should be written out as you have it,
but IMO using "4" here would be much better.  FWIW.

> 			result in the first set containing alternating
> 			groups of 16 CPUs whose clocks are moved ahead,
> 			while the second set will contain the rest of
> 			the CPUs, whose clocks are moved behind.
> 
> 			The default value of -1 disables this type of
> 			error injection.
> 
> 							Thanx, Paul
> 
>>> +
>>>  	clocksource.max_read_retries= [KNL]
>>>  			Number of clocksource_watchdog() retries due to
>>>  			external delays before the clock will be marked


thanks!
-- 
~Randy


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-02-03  1:31           ` Randy Dunlap
@ 2021-02-03  1:40             ` Paul E. McKenney
  0 siblings, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-02-03  1:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, kernel-team, john.stultz, tglx, sboyd, corbet,
	Mark.Rutland, maz, ak, clm

On Tue, Feb 02, 2021 at 05:31:55PM -0800, Randy Dunlap wrote:
> On 2/2/21 4:50 PM, Paul E. McKenney wrote:
> > On Tue, Feb 02, 2021 at 11:51:02AM -0800, Randy Dunlap wrote:
> >> On 2/2/21 9:06 AM, paulmck@kernel.org wrote:
> >>> From: "Paul E. McKenney" <paulmck@kernel.org>
> >>>
> >>> Code that checks for clock desynchronization must itself be tested, so
> >>> this commit creates a new clocksource.inject_delay_shift_percpu= kernel
> >>> boot parameter that adds or subtracts a large value from the check read,
> >>> using the specified bit of the CPU ID to determine whether to add or
> >>> to subtract.
> >>>
> >>> Cc: John Stultz <john.stultz@linaro.org>
> >>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>> Cc: Stephen Boyd <sboyd@kernel.org>
> >>> Cc: Jonathan Corbet <corbet@lwn.net>
> >>> Cc: Mark Rutland <Mark.Rutland@arm.com>
> >>> Cc: Marc Zyngier <maz@kernel.org>
> >>> Cc: Andi Kleen <ak@linux.intel.com>
> >>> Reported-by: Chris Mason <clm@fb.com>
> >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >>> ---
> >>>  Documentation/admin-guide/kernel-parameters.txt |  9 +++++++++
> >>>  kernel/time/clocksource.c                       | 10 +++++++++-
> >>>  2 files changed, 18 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >>> index 9965266..f561e94 100644
> >>> --- a/Documentation/admin-guide/kernel-parameters.txt
> >>> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >>> @@ -593,6 +593,15 @@
> >>>  			times the value specified for inject_delay_freq
> >>>  			of consecutive non-delays.
> >>>  
> >>> +	clocksource.inject_delay_shift_percpu= [KNL]
> >>> +			Shift count to obtain bit from CPU number to
> >>> +			determine whether to shift the time of the per-CPU
> >>> +			clock under test ahead or behind.  For example,
> >>
> >> It's a good think that you give an example -- it helps a little bit.
> >> That sentence above needs to be rewritten...
> > 
> > That is a bit obscure, now that you mention it.
> > 
> >>> +			setting this to the value four will result in
> >>> +			alternating groups of 16 CPUs shifting ahead and
> >>> +			the rest of the CPUs shifting behind.  The default
> >>> +			value of -1 disable this type of error injection.
> >>
> >> 			            disables
> > 
> > Good eyes!
> > 
> > So how about like this?
> 
> Much better, thanks.
> 
> > 	clocksource.inject_delay_shift_percpu= [KNL]
> > 			Clocksource delay injection partitions the CPUs
> > 			into two sets, one whose clocks are moved ahead
> > 			and the other whose clocks are moved behind.
> > 			This kernel parameter selects the CPU-number
> > 			bit that determines which of these two sets the
> > 			corresponding CPU is placed into.  For example,
> > 			setting this parameter to the value four will
> 
> I know that in "writing," "four" should be written out as you have it,
> but IMO using "4" here would be much better.  FWIW.

As long as it is "four" and not "fore"!  Updated as requested.  ;-)

						Thanx, Paul

> > 			result in the first set containing alternating
> > 			groups of 16 CPUs whose clocks are moved ahead,
> > 			while the second set will contain the rest of
> > 			the CPUs, whose clocks are moved behind.
> > 
> > 			The default value of -1 disables this type of
> > 			error injection.
> > 
> > 							Thanx, Paul
> > 
> >>> +
> >>>  	clocksource.max_read_retries= [KNL]
> >>>  			Number of clocksource_watchdog() retries due to
> >>>  			external delays before the clock will be marked
> 
> 
> thanks!
> -- 
> ~Randy
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 clocksource] Do not mark clocks unstable due to delays
  2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                       ` (4 preceding siblings ...)
  2021-02-02 17:06     ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-02-17 21:28     ` Paul E. McKenney
  2021-02-17 21:29       ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
                         ` (5 more replies)
  5 siblings, 6 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-02-17 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

Changes since v3:

o	Rebased to v5.11.

o	Apply Randy Dunlap feedback.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   38 +++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  174 +++++++++++++++++++++---
 5 files changed, 195 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
@ 2021-02-17 21:29       ` paulmck
  2021-02-17 21:29       ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-17 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++++
 kernel/time/clocksource.c                       | 27 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a10b545..9965266 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -577,6 +577,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..4be4391 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		pr_warn("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		pr_warn("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-02-17 21:29       ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-02-17 21:29       ` paulmck
  2021-02-17 21:29       ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-17 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4be4391..3f734c6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
  2021-02-17 21:29       ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-02-17 21:29       ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-02-17 21:29       ` paulmck
  2021-02-17 21:29       ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-17 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index aa59374..337bb2c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f734c6..663bc53 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                         ` (2 preceding siblings ...)
  2021-02-17 21:29       ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-02-17 21:29       ` paulmck
  2021-02-17 21:29       ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-17 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9965266..628e87f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -593,6 +593,22 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Clocksource delay injection partitions the CPUs
+			into two sets, one whose clocks are moved ahead
+			and the other whose clocks are moved behind.
+			This kernel parameter selects the CPU-number
+			bit that determines which of these two sets the
+			corresponding CPU is placed into.  For example,
+			setting this parameter to the value 4 will result
+			in the first set containing alternating groups
+			of 16 CPUs whose clocks are moved ahead, while
+			the second set will contain the rest of the CPUs,
+			whose clocks are moved behind.
+
+			The default value of -1 disables this type of
+			error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 663bc53..df48416 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                         ` (3 preceding siblings ...)
  2021-02-17 21:29       ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-02-17 21:29       ` paulmck
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-02-17 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz,
	ak, clm, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index df48416..4161c84 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
                         ` (4 preceding siblings ...)
  2021-02-17 21:29       ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-03-04  0:49       ` Paul E. McKenney
  2021-03-04  0:53         ` [PATCH kernel/time 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
                           ` (5 more replies)
  5 siblings, 6 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-03-04  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

Changes since v4:

o	Rebased to v5.12-rc1.

Changes since v3:

o	Rebased to v5.11.

o	Apply Randy Dunlap feedback.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   38 +++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  174 +++++++++++++++++++++---
 5 files changed, 195 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH kernel/time 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
@ 2021-03-04  0:53         ` paulmck
  2021-03-04  0:53         ` [PATCH kernel/time 2/5] clocksource: Retry clock read if long delays detected paulmck
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-03-04  0:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++++
 kernel/time/clocksource.c                       | 27 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0454572..fc57952 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -583,6 +583,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..4be4391 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		pr_warn("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		pr_warn("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH kernel/time 2/5] clocksource: Retry clock read if long delays detected
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-03-04  0:53         ` [PATCH kernel/time 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-03-04  0:53         ` paulmck
  2021-03-04  0:53         ` [PATCH kernel/time 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-03-04  0:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4be4391..3f734c6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH kernel/time 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-03-04  0:53         ` [PATCH kernel/time 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-03-04  0:53         ` [PATCH kernel/time 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-03-04  0:53         ` paulmck
  2021-03-04  0:53         ` [PATCH kernel/time 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-03-04  0:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index aa59374..337bb2c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f734c6..663bc53 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH kernel/time 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                           ` (2 preceding siblings ...)
  2021-03-04  0:53         ` [PATCH kernel/time 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-03-04  0:53         ` paulmck
  2021-03-04  0:53         ` [PATCH kernel/time 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-03-04  0:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fc57952..f9da90f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,22 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Clocksource delay injection partitions the CPUs
+			into two sets, one whose clocks are moved ahead
+			and the other whose clocks are moved behind.
+			This kernel parameter selects the CPU-number
+			bit that determines which of these two sets the
+			corresponding CPU is placed into.  For example,
+			setting this parameter to the value 4 will result
+			in the first set containing alternating groups
+			of 16 CPUs whose clocks are moved ahead, while
+			the second set will contain the rest of the CPUs,
+			whose clocks are moved behind.
+
+			The default value of -1 disables this type of
+			error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 663bc53..df48416 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH kernel/time 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                           ` (3 preceding siblings ...)
  2021-03-04  0:53         ` [PATCH kernel/time 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-03-04  0:53         ` paulmck
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-03-04  0:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index df48416..4161c84 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                           ` (4 preceding siblings ...)
  2021-03-04  0:53         ` [PATCH kernel/time 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-04-02 20:29         ` Paul E. McKenney
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog paulmck
                             ` (4 more replies)
  5 siblings, 5 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-02 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

Changes since v5:

o	Rebased to v5.12-rc5.

Changes since v4:

o	Rebased to v5.12-rc1.

Changes since v3:

o	Rebased to v5.11.

o	Apply Randy Dunlap feedback.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   38 +++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  174 +++++++++++++++++++++---
 5 files changed, 195 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
@ 2021-04-02 20:31           ` paulmck
  2021-04-02 22:22             ` Thomas Gleixner
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Retry clock read if long delays detected paulmck
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: paulmck @ 2021-04-02 20:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++++
 kernel/time/clocksource.c                       | 27 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0454572..fc57952 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -583,6 +583,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..4be4391 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		pr_warn("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		pr_warn("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Retry clock read if long delays detected
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-04-02 20:31           ` paulmck
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Check per-CPU clock synchronization when marked unstable paulmck
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-04-02 20:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4be4391..3f734c6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Retry clock read if long delays detected paulmck
@ 2021-04-02 20:31           ` paulmck
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Do pairwise clock-desynchronization checking paulmck
  4 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-04-02 20:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1fc0962..97eeaf1 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f734c6..663bc53 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                             ` (2 preceding siblings ...)
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-04-02 20:31           ` paulmck
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Do pairwise clock-desynchronization checking paulmck
  4 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-04-02 20:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fc57952..f9da90f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,22 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Clocksource delay injection partitions the CPUs
+			into two sets, one whose clocks are moved ahead
+			and the other whose clocks are moved behind.
+			This kernel parameter selects the CPU-number
+			bit that determines which of these two sets the
+			corresponding CPU is placed into.  For example,
+			setting this parameter to the value 4 will result
+			in the first set containing alternating groups
+			of 16 CPUs whose clocks are moved ahead, while
+			the second set will contain the rest of the CPUs,
+			whose clocks are moved behind.
+
+			The default value of -1 disables this type of
+			error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 663bc53..df48416 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Do pairwise clock-desynchronization checking
  2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                             ` (3 preceding siblings ...)
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-04-02 20:31           ` paulmck
  4 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-04-02 20:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index df48416..4161c84 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog
  2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-04-02 22:22             ` Thomas Gleixner
  2021-04-02 22:37               ` Paul E. McKenney
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  0 siblings, 2 replies; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-02 22:22 UTC (permalink / raw)
  To: paulmck, linux-kernel
  Cc: john.stultz, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

On Fri, Apr 02 2021 at 13:31, paulmck wrote:

The subsystem prefix does not parse:

    [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog

I look at the actual code changes after the easter break.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog
  2021-04-02 22:22             ` Thomas Gleixner
@ 2021-04-02 22:37               ` Paul E. McKenney
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  1 sibling, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-02 22:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 03, 2021 at 12:22:40AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 02 2021 at 13:31, paulmck wrote:
> 
> The subsystem prefix does not parse:
> 
>     [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog

Where is that brown bag when I need it?  :-/

> I look at the actual code changes after the easter break.

I will resend a clean copy as a reply to your message.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-04-02 22:22             ` Thomas Gleixner
  2021-04-02 22:37               ` Paul E. McKenney
@ 2021-04-02 22:48               ` Paul E. McKenney
  2021-04-02 22:49                 ` [PATCH v7 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
                                   ` (5 more replies)
  1 sibling, 6 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-02 22:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides delay injection, which may be enabled via
kernel boot parameters to test the checking for delays.

Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.

1.	Provide module parameters to inject delays in watchdog.

2.	Retry clock read if long delays detected.

3.	Check per-CPU clock synchronization when marked unstable.

4.	Provide a module parameter to fuzz per-CPU clock checking.

5.	Do pairwise clock-desynchronization checking.

Changes since v7:

o	Fix embarrassing git-format-patch operator error.

Changes since v5:

o	Rebased to v5.12-rc5.

Changes since v4:

o	Rebased to v5.12-rc1.

Changes since v3:

o	Rebased to v5.11.

o	Apply Randy Dunlap feedback.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt |   38 +++++
 arch/x86/kernel/kvmclock.c                      |    2 
 arch/x86/kernel/tsc.c                           |    3 
 include/linux/clocksource.h                     |    2 
 kernel/time/clocksource.c                       |  174 +++++++++++++++++++++---
 5 files changed, 195 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
@ 2021-04-02 22:49                 ` paulmck
  2021-04-02 22:49                 ` [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
                                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-04-02 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can
delay interrupts-disabled regions of code ranging from SMI handlers to
vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

The first step is a way of injecting such delays, and this
commit therefore provides a clocksource.inject_delay_freq and
clocksource.inject_delay_run kernel boot parameters that specify that
sufficient delay be injected to cause the clocksource_watchdog()
function to mark a clock unstable.  This delay is injected every
Nth set of M calls to clocksource_watchdog(), where N is the value
specified for the inject_delay_freq boot parameter and M is the value
specified for the inject_delay_run boot parameter.  Values of zero or
less for either parameter disable delay injection, and the default for
clocksource.inject_delay_freq is zero, that is, disabled.  The default for
clocksource.inject_delay_run is the value one, that is single-call runs.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Apply Rik van Riel feedback. ]
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++++++++++
 kernel/time/clocksource.c                       | 27 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0454572..fc57952 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -583,6 +583,28 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.inject_delay_freq= [KNL]
+			Number of runs of calls to clocksource_watchdog()
+			before delays are injected between reads from the
+			two clocksources.  Values less than or equal to
+			zero disable this delay injection.  These delays
+			can cause clocks to be marked unstable, so use
+			of this parameter should therefore be avoided on
+			production systems.  Defaults to zero (disabled).
+
+	clocksource.inject_delay_run= [KNL]
+			Run lengths of clocksource_watchdog() delay
+			injections.  Specifying the value 8 will result
+			in eight consecutive delays followed by eight
+			times the value specified for inject_delay_freq
+			of consecutive non-delays.
+
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a..4be4391 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -184,6 +185,31 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static int inject_delay_freq;
+module_param(inject_delay_freq, int, 0644);
+static int inject_delay_run = 1;
+module_param(inject_delay_run, int, 0644);
+static int max_read_retries = 3;
+module_param(max_read_retries, int, 0644);
+
+static void clocksource_watchdog_inject_delay(void)
+{
+	int i;
+	static int injectfail = -1;
+
+	if (inject_delay_freq <= 0 || inject_delay_run <= 0)
+		return;
+	if (injectfail < 0 || injectfail > INT_MAX / 2)
+		injectfail = inject_delay_run;
+	if (!(++injectfail / inject_delay_run % inject_delay_freq)) {
+		pr_warn("%s(): Injecting delay.\n", __func__);
+		for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
+			udelay(1000);
+		pr_warn("%s(): Done injecting delay.\n", __func__);
+	}
+	WARN_ON_ONCE(injectfail < 0);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -208,6 +234,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		local_irq_disable();
 		csnow = cs->read(cs);
+		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
 		local_irq_enable();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-04-02 22:49                 ` [PATCH v7 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
@ 2021-04-02 22:49                 ` paulmck
  2021-04-10  8:41                   ` Thomas Gleixner
  2021-04-02 22:49                 ` [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
                                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: paulmck @ 2021-04-02 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

This commit therefore re-reads the watchdog clock on either side of
the read from the clock under test.  If the watchdog clock shows an
excessive time delta between its pair of reads, the reads are retried.
The maximum number of retries is specified by a new kernel boot
parameter clocksource.max_read_retries, which defaults to three, that
is, up to four reads, one initial and up to three retries.  If retries
were required, a message is printed on the console.  If the number of
retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4be4391..3f734c6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,7 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
-	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
+	int64_t wd_nsec, wdagain_nsec, wderr_nsec = 0, cs_nsec;
 	int next_cpu, reset_pending;
+	int nretries;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	reset_pending = atomic_read(&watchdog_reset_pending);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		nretries = 0;
 
 		/* Clocksource already marked unstable? */
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
@@ -232,11 +235,23 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
+retry:
 		local_irq_disable();
-		csnow = cs->read(cs);
-		clocksource_watchdog_inject_delay();
 		wdnow = watchdog->read(watchdog);
+		clocksource_watchdog_inject_delay();
+		csnow = cs->read(cs);
+		wdagain = watchdog->read(watchdog);
 		local_irq_enable();
+		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
+		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
+			wderr_nsec = wdagain_nsec;
+			if (nretries++ < max_read_retries)
+				goto retry;
+		}
+		if (nretries)
+			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
+				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
  2021-04-02 22:49                 ` [PATCH v7 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
  2021-04-02 22:49                 ` [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-04-02 22:49                 ` paulmck
  2021-04-10  9:00                   ` Thomas Gleixner
  2021-04-02 22:49                 ` [PATCH v7 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
                                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: paulmck @ 2021-04-02 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1fc0962..97eeaf1 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
 	.read	= kvm_clock_get_cycles,
 	.rating	= 400,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
 	.enable	= kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f734c6..663bc53 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
 	WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	__this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+	int cpu;
+	struct clocksource *cs;
+	int64_t cs_nsec;
+	u64 csnow_begin;
+	u64 csnow_end;
+	u64 delta;
+
+	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
+	if (WARN_ON_ONCE(!cs))
+		return;
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+		cs->name, smp_processor_id());
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	csnow_begin = cs->read(cs);
+	smp_call_function(clocksource_verify_one_cpu, cs, 1);
+	csnow_end = cs->read(cs);
+	for_each_online_cpu(cpu) {
+		if (cpu == smp_processor_id())
+			continue;
+		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+		if ((s64)delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+	}
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind),
+			smp_processor_id(), cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec, cs->name);
+	}
+	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
+}
+
+static DECLARE_WORK(clocksource_verify_work, clocksource_verify_percpu_wq);
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	if (!(cs->flags & CLOCK_SOURCE_VERIFY_PERCPU))
+		return;
+	if (smp_load_acquire(&clocksource_verify_work_cs)) { // pairs with release.
+		pr_warn("Previous clocksource synchronization still in flight.\n");
+		return;
+	}
+	smp_store_release(&clocksource_verify_work_cs, cs); //pairs with acquire.
+	queue_work(system_highpri_wq, &clocksource_verify_work);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	struct clocksource *cs;
@@ -284,6 +356,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			clocksource_verify_percpu(cs);
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v7 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                                   ` (2 preceding siblings ...)
  2021-04-02 22:49                 ` [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-04-02 22:49                 ` paulmck
  2021-04-02 22:49                 ` [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
  2021-04-10  8:01                 ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Thomas Gleixner
  5 siblings, 0 replies; 72+ messages in thread
From: paulmck @ 2021-04-02 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Code that checks for clock desynchronization must itself be tested, so
this commit creates a new clocksource.inject_delay_shift_percpu= kernel
boot parameter that adds or subtracts a large value from the check read,
using the specified bit of the CPU ID to determine whether to add or
to subtract.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Apply Randy Dunlap feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
 kernel/time/clocksource.c                       | 10 +++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fc57952..f9da90f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,22 @@
 			times the value specified for inject_delay_freq
 			of consecutive non-delays.
 
+	clocksource.inject_delay_shift_percpu= [KNL]
+			Clocksource delay injection partitions the CPUs
+			into two sets, one whose clocks are moved ahead
+			and the other whose clocks are moved behind.
+			This kernel parameter selects the CPU-number
+			bit that determines which of these two sets the
+			corresponding CPU is placed into.  For example,
+			setting this parameter to the value 4 will result
+			in the first set containing alternating groups
+			of 16 CPUs whose clocks are moved ahead, while
+			the second set will contain the rest of the CPUs,
+			whose clocks are moved behind.
+
+			The default value of -1 disables this type of
+			error injection.
+
 	clocksource.max_read_retries= [KNL]
 			Number of clocksource_watchdog() retries due to
 			external delays before the clock will be marked
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 663bc53..df48416 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -190,6 +190,8 @@ static int inject_delay_freq;
 module_param(inject_delay_freq, int, 0644);
 static int inject_delay_run = 1;
 module_param(inject_delay_run, int, 0644);
+static int inject_delay_shift_percpu = -1;
+module_param(inject_delay_shift_percpu, int, 0644);
 static int max_read_retries = 3;
 module_param(max_read_retries, int, 0644);
 
@@ -219,8 +221,14 @@ static cpumask_t cpus_behind;
 static void clocksource_verify_one_cpu(void *csin)
 {
 	struct clocksource *cs = (struct clocksource *)csin;
+	s64 delta = 0;
+	int sign;
 
-	__this_cpu_write(csnow_mid, cs->read(cs));
+	if (inject_delay_shift_percpu >= 0) {
+		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
+		delta = sign * NSEC_PER_SEC;
+	}
+	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                                   ` (3 preceding siblings ...)
  2021-04-02 22:49                 ` [PATCH v7 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
@ 2021-04-02 22:49                 ` paulmck
  2021-04-10  9:04                   ` Thomas Gleixner
  2021-04-10  8:01                 ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Thomas Gleixner
  5 siblings, 1 reply; 72+ messages in thread
From: paulmck @ 2021-04-02 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: john.stultz, tglx, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@kernel.org>

Although smp_call_function() has the advantage of simplicity, using
it to check for cross-CPU clock desynchronization means that any CPU
being slow reduces the sensitivity of the checking across all CPUs.
And it is not uncommon for smp_call_function() latencies to be in the
hundreds of microseconds.

This commit therefore switches to smp_call_function_single(), so that
delays from a given CPU affect only those measurements involving that
particular CPU.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index df48416..4161c84 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -214,7 +214,7 @@ static void clocksource_watchdog_inject_delay(void)
 }
 
 static struct clocksource *clocksource_verify_work_cs;
-static DEFINE_PER_CPU(u64, csnow_mid);
+static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
 
@@ -228,7 +228,7 @@ static void clocksource_verify_one_cpu(void *csin)
 		sign = ((smp_processor_id() >> inject_delay_shift_percpu) & 0x1) * 2 - 1;
 		delta = sign * NSEC_PER_SEC;
 	}
-	__this_cpu_write(csnow_mid, cs->read(cs) + delta);
+	csnow_mid = cs->read(cs) + delta;
 }
 
 static void clocksource_verify_percpu_wq(struct work_struct *unused)
@@ -236,9 +236,12 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 	int cpu;
 	struct clocksource *cs;
 	int64_t cs_nsec;
+	int64_t cs_nsec_max;
+	int64_t cs_nsec_min;
 	u64 csnow_begin;
 	u64 csnow_end;
-	u64 delta;
+	s64 delta;
+	bool firsttime = 1;
 
 	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
 	if (WARN_ON_ONCE(!cs))
@@ -247,19 +250,28 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		cs->name, smp_processor_id());
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
-	csnow_begin = cs->read(cs);
-	smp_call_function(clocksource_verify_one_cpu, cs, 1);
-	csnow_end = cs->read(cs);
+	preempt_disable();
 	for_each_online_cpu(cpu) {
 		if (cpu == smp_processor_id())
 			continue;
-		delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
-		if ((s64)delta < 0)
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_behind);
-		delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
-		if ((s64)delta < 0)
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
 			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (firsttime || cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (firsttime || cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+		firsttime = 0;
 	}
+	preempt_enable();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead),
@@ -268,12 +280,9 @@ static void clocksource_verify_percpu_wq(struct work_struct *unused)
 		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_behind),
 			smp_processor_id(), cs->name);
-	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
-		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
-		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
-		pr_warn("        CPU %d duration %lldns for clocksource %s.\n",
-			smp_processor_id(), cs_nsec, cs->name);
-	}
+	if (!firsttime && (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			smp_processor_id(), cs_nsec_min, cs_nsec_max, cs->name);
 	smp_store_release(&clocksource_verify_work_cs, NULL); // pairs with acquire.
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
                                   ` (4 preceding siblings ...)
  2021-04-02 22:49                 ` [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-04-10  8:01                 ` Thomas Gleixner
  2021-04-10 23:26                   ` Paul E. McKenney
  5 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-10  8:01 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Fri, Apr 02 2021 at 15:48, Paul E. McKenney wrote:
> Hello!
>
> If there is a sufficient delay between reading the watchdog clock and the
> clock under test, the clock under test will be marked unstable through no
> fault of its own.  This series checks for this, doing limited retries
> to get a good set of clock reads.  If the clock is marked unstable
> and is marked as being per-CPU, cross-CPU synchronization is checked.
> This series also provides delay injection, which may be enabled via
> kernel boot parameters to test the checking for delays.
>
> Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> vCPU preemption.

I buy the vCPU preemption part and TBH guests should not have that
watchdog thing active at all for exactly this reason.

SMI, NMI injecting 62.5ms delay? If that happens then the performance of
the clocksource is the least of your worries.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-02 22:49                 ` [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
@ 2021-04-10  8:41                   ` Thomas Gleixner
  2021-04-10 23:50                     ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-10  8:41 UTC (permalink / raw)
  To: paulmck, linux-kernel
  Cc: john.stultz, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

On Fri, Apr 02 2021 at 15:49, paulmck wrote:
> This commit therefore re-reads the watchdog clock on either side of

'This commit' is not any better than 'This patch' and this sentence
makes no sense. I might be missing something, but how exactly does "the
commit" re-read the watchdog clock?

 git grep 'This patch' Documentation/process/

> the read from the clock under test.  If the watchdog clock shows an
> +retry:
>  		local_irq_disable();
> -		csnow = cs->read(cs);
> -		clocksource_watchdog_inject_delay();
>  		wdnow = watchdog->read(watchdog);
> +		clocksource_watchdog_inject_delay();
> +		csnow = cs->read(cs);
> +		wdagain = watchdog->read(watchdog);
>  		local_irq_enable();
> +		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
> +		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);

That variable naming is confusing as hell. This is about the delta and
not about the second readout of the watchdog.

> +		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {

How exactly is this going negative especially with clocksources which
have a limited bitwidth? See clocksource_delta().

> +			wderr_nsec = wdagain_nsec;
> +			if (nretries++ < max_read_retries)
> +				goto retry;
> +		}
> +		if (nretries)
> +			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
> +				smp_processor_id(), watchdog->name, wderr_nsec, nretries);

Lacks curly braces around the pr_warn() simply because it's not a single
line. Breaks my parser :)

But if this ever happens to exceed max_read_retries, then what's the
point of continuing at all? The data is known to be crap already.

>  		/* Clocksource initialized ? */
>  		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-02 22:49                 ` [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
@ 2021-04-10  9:00                   ` Thomas Gleixner
  2021-04-11  0:20                     ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-10  9:00 UTC (permalink / raw)
  To: paulmck, linux-kernel
  Cc: john.stultz, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

On Fri, Apr 02 2021 at 15:49, paulmck wrote:
>
> +static void clocksource_verify_percpu_wq(struct work_struct *unused)
> +{
> +	int cpu;
> +	struct clocksource *cs;
> +	int64_t cs_nsec;
> +	u64 csnow_begin;
> +	u64 csnow_end;
> +	u64 delta;

Please use reverse fir tree ordering and stick variables of the same
type together:

	u64 csnow_begin, csnow_end, delta;
	struct clocksource *cs;
	s64 cs_nsec;
        int cpu;
> +
> +	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release

Please don't use tail comments. They are a horrible distraction.

> +	if (WARN_ON_ONCE(!cs))
> +		return;
> +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
> +		cs->name, smp_processor_id());
> +	cpumask_clear(&cpus_ahead);
> +	cpumask_clear(&cpus_behind);
> +	csnow_begin = cs->read(cs);

So this is invoked via work and the actual clocksource change is done
via work too. Once the clocksource is not longer actively used for
timekeeping it can go away. What's guaranteeing that this runs prior to
the clocksource change and 'cs' is valid throughout this function?

> +	queue_work(system_highpri_wq, &clocksource_verify_work);

This does not guarantee anything. So why does this need an extra work
function which is scheduled seperately?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-04-02 22:49                 ` [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
@ 2021-04-10  9:04                   ` Thomas Gleixner
  2021-04-11  0:21                     ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-10  9:04 UTC (permalink / raw)
  To: paulmck, linux-kernel
  Cc: john.stultz, sboyd, corbet, Mark.Rutland, maz, kernel-team,
	neeraju, ak, Paul E. McKenney

On Fri, Apr 02 2021 at 15:49, paulmck wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> Although smp_call_function() has the advantage of simplicity, using
> it to check for cross-CPU clock desynchronization means that any CPU
> being slow reduces the sensitivity of the checking across all CPUs.
> And it is not uncommon for smp_call_function() latencies to be in the
> hundreds of microseconds.
>
> This commit therefore switches to smp_call_function_single(), so that
> delays from a given CPU affect only those measurements involving that
> particular CPU.

Is there any reason I'm missing why this is not done right in patch 3/5
which introduces this synchronization check?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-04-10  8:01                 ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Thomas Gleixner
@ 2021-04-10 23:26                   ` Paul E. McKenney
  2021-04-11 10:58                     ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-10 23:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 10, 2021 at 10:01:58AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 02 2021 at 15:48, Paul E. McKenney wrote:
> > Hello!
> >
> > If there is a sufficient delay between reading the watchdog clock and the
> > clock under test, the clock under test will be marked unstable through no
> > fault of its own.  This series checks for this, doing limited retries
> > to get a good set of clock reads.  If the clock is marked unstable
> > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > This series also provides delay injection, which may be enabled via
> > kernel boot parameters to test the checking for delays.
> >
> > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > vCPU preemption.
> 
> I buy the vCPU preemption part and TBH guests should not have that
> watchdog thing active at all for exactly this reason.

Agreed, one approch is to enable the the clocksource watchdog only in
the hypervisor, and have some action on the guests triggered when the
host detects clock skew.

This works quite well, at least until something breaks in a way that
messes up clock reads from the guest but not from the host.  And I
am sure that any number of hardware guys will tell me that this just
isn't possible, but if failing hardware operated according to their
expectations, that hardware wouldn't be considered to be failing.
Or it wouldn't be hardware, firmware, or clock-driver bringup, as the
case may be.

> SMI, NMI injecting 62.5ms delay? If that happens then the performance of
> the clocksource is the least of your worries.

I was kind of hoping that you would tell me why the skew must be all the
way up to 62.5ms before the clock is disabled.  The watchdog currently
is quite happy with more than 10% skew between clocks.

100HZ clocks or some such?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected
  2021-04-10  8:41                   ` Thomas Gleixner
@ 2021-04-10 23:50                     ` Paul E. McKenney
  0 siblings, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-10 23:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 10, 2021 at 10:41:21AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 02 2021 at 15:49, paulmck wrote:
> > This commit therefore re-reads the watchdog clock on either side of
> 
> 'This commit' is not any better than 'This patch' and this sentence
> makes no sense. I might be missing something, but how exactly does "the
> commit" re-read the watchdog clock?
> 
>  git grep 'This patch' Documentation/process/

I will rework this.

> > the read from the clock under test.  If the watchdog clock shows an
> > +retry:
> >  		local_irq_disable();
> > -		csnow = cs->read(cs);
> > -		clocksource_watchdog_inject_delay();
> >  		wdnow = watchdog->read(watchdog);
> > +		clocksource_watchdog_inject_delay();
> > +		csnow = cs->read(cs);
> > +		wdagain = watchdog->read(watchdog);
> >  		local_irq_enable();
> > +		delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
> > +		wdagain_nsec = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> 
> That variable naming is confusing as hell. This is about the delta and
> not about the second readout of the watchdog.

How about wdagain_delta?

> > +		if (wdagain_nsec < 0 || wdagain_nsec > WATCHDOG_MAX_SKEW) {
> 
> How exactly is this going negative especially with clocksources which
> have a limited bitwidth? See clocksource_delta().

I thought that I had actually seen this happen, though it is of course
quite possible that it was due to a bug in an early version of my changes.

What I will do is to remove the less-than comparison and test with
a WARN_ON().  If that doesn't trigger, I will drop the WARN_ON().
If it does trigger, I will figure out why.

> > +			wderr_nsec = wdagain_nsec;
> > +			if (nretries++ < max_read_retries)
> > +				goto retry;
> > +		}
> > +		if (nretries)
> > +			pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d\n",
> > +				smp_processor_id(), watchdog->name, wderr_nsec, nretries);
> 
> Lacks curly braces around the pr_warn() simply because it's not a single
> line. Breaks my parser :)

OK, will fix.  ;-)

> But if this ever happens to exceed max_read_retries, then what's the
> point of continuing at all? The data is known to be crap already.

If there are four delays in four consecutive attempts to read out the
clocks -- with interrupts disabled -- then it is quite possible that the
delay is actually caused by the attempt to read the clock.  In which case,
marking the clock bad due to skew is a reasonable choice.

On the other hand, if the four consecutive delays are caused by something
like an NMI storm, then as you say, you have worse problems.

								Thanx, Paul

> >  		/* Clocksource initialized ? */
> >  		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-10  9:00                   ` Thomas Gleixner
@ 2021-04-11  0:20                     ` Paul E. McKenney
  2021-04-11 10:33                       ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-11  0:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 02 2021 at 15:49, paulmck wrote:
> >
> > +static void clocksource_verify_percpu_wq(struct work_struct *unused)
> > +{
> > +	int cpu;
> > +	struct clocksource *cs;
> > +	int64_t cs_nsec;
> > +	u64 csnow_begin;
> > +	u64 csnow_end;
> > +	u64 delta;
> 
> Please use reverse fir tree ordering and stick variables of the same
> type together:
> 
> 	u64 csnow_begin, csnow_end, delta;
> 	struct clocksource *cs;
> 	s64 cs_nsec;
>         int cpu;

Will do.

> > +
> > +	cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release
> 
> Please don't use tail comments. They are a horrible distraction.

I will remove it.

> > +	if (WARN_ON_ONCE(!cs))
> > +		return;
> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
> > +		cs->name, smp_processor_id());
> > +	cpumask_clear(&cpus_ahead);
> > +	cpumask_clear(&cpus_behind);
> > +	csnow_begin = cs->read(cs);
> 
> So this is invoked via work and the actual clocksource change is done
> via work too. Once the clocksource is not longer actively used for
> timekeeping it can go away. What's guaranteeing that this runs prior to
> the clocksource change and 'cs' is valid throughout this function?

From what I can see, cs->read() doesn't care whether or not the
clocksource has been marked unstable.  So it should be OK to call
cs->read() before, during, or after the call to __clocksource_unstable().

Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
so any clocksource that did not like cs->read() being invoked during
or after the call to __clocksource_unstable() should leave off the
CLOCK_SOURCE_VERIFY_PERCPU bit.

Or did I take a wrong turn somewhere in the pointers to functions?

> > +	queue_work(system_highpri_wq, &clocksource_verify_work);
> 
> This does not guarantee anything. So why does this need an extra work
> function which is scheduled seperately?

Because I was concerned about doing smp_call_function() while holding
watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
And it still looks like on x86 that spin_lock_irqsave() spins with irqs
disabled, which could result in deadlock.  The smp_call_function_single()
would wait for the target CPU to enable interrupts, which would not
happen until after the smp_call_function_single() returned due to its
caller holding watchdog_lock.

Or is there something that I am missing that prevents this deadlock
from occurring?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking
  2021-04-10  9:04                   ` Thomas Gleixner
@ 2021-04-11  0:21                     ` Paul E. McKenney
  0 siblings, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-11  0:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 10, 2021 at 11:04:54AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 02 2021 at 15:49, paulmck wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> >
> > Although smp_call_function() has the advantage of simplicity, using
> > it to check for cross-CPU clock desynchronization means that any CPU
> > being slow reduces the sensitivity of the checking across all CPUs.
> > And it is not uncommon for smp_call_function() latencies to be in the
> > hundreds of microseconds.
> >
> > This commit therefore switches to smp_call_function_single(), so that
> > delays from a given CPU affect only those measurements involving that
> > particular CPU.
> 
> Is there any reason I'm missing why this is not done right in patch 3/5
> which introduces this synchronization check?

None at all.  I will merge this into 3/5.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-11  0:20                     ` Paul E. McKenney
@ 2021-04-11 10:33                       ` Thomas Gleixner
  2021-04-11 16:46                         ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-11 10:33 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote:
> On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
>> > +	if (WARN_ON_ONCE(!cs))
>> > +		return;
>> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
>> > +		cs->name, smp_processor_id());
>> > +	cpumask_clear(&cpus_ahead);
>> > +	cpumask_clear(&cpus_behind);
>> > +	csnow_begin = cs->read(cs);
>> 
>> So this is invoked via work and the actual clocksource change is done
>> via work too. Once the clocksource is not longer actively used for
>> timekeeping it can go away. What's guaranteeing that this runs prior to
>> the clocksource change and 'cs' is valid throughout this function?
>
> From what I can see, cs->read() doesn't care whether or not the
> clocksource has been marked unstable.  So it should be OK to call
> cs->read() before, during, or after the call to __clocksource_unstable().
>
> Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
> so any clocksource that did not like cs->read() being invoked during
> or after the call to __clocksource_unstable() should leave off the
> CLOCK_SOURCE_VERIFY_PERCPU bit.
>
> Or did I take a wrong turn somewhere in the pointers to functions?

Right. cs->read() does not care, but what guarantees that cs is valid
and not freed yet? It's not an issue with TSC and KVMCLOCK, but
conceptually the following is possible:

watchdog()   
  queue_work(synccheck);
  queue_work(clocksource_change);

work:                                   
  synccheck()                             clocksource_change()
    preemption                            ...
                                          ...
                                          some_other_code():
                                             unregister_clocksource(cs)
                                             free(cs)
  cs->read()   <- UAF

>> > +	queue_work(system_highpri_wq, &clocksource_verify_work);
>> 
>> This does not guarantee anything. So why does this need an extra work
>> function which is scheduled seperately?
>
> Because I was concerned about doing smp_call_function() while holding
> watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
> And it still looks like on x86 that spin_lock_irqsave() spins with irqs
> disabled, which could result in deadlock.  The smp_call_function_single()
> would wait for the target CPU to enable interrupts, which would not
> happen until after the smp_call_function_single() returned due to its
> caller holding watchdog_lock.
>
> Or is there something that I am missing that prevents this deadlock
> from occurring?

The unstable mechanism is:

watchdog()
  __clocksource_unstable()
    schedule_work(&watchdog_work);

watchdog_work()
  kthread_run(clocksource_watchdog_thread);

cs_watchdog_thread()
  mutex_lock(&clocksource_mutex);
  if (__clocksource_watchdog_kthread())
	clocksource_select();
  mutex_unlock(&clocksource_mutex);

So what prevents you from doing that right in watchdog_work() or even in
cs_watchdog_thread() properly ordered against the actual clocksource
switch?

Hmm?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-04-10 23:26                   ` Paul E. McKenney
@ 2021-04-11 10:58                     ` Thomas Gleixner
  2021-04-11 16:50                       ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-11 10:58 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sat, Apr 10 2021 at 16:26, Paul E. McKenney wrote:
> On Sat, Apr 10, 2021 at 10:01:58AM +0200, Thomas Gleixner wrote:
>> On Fri, Apr 02 2021 at 15:48, Paul E. McKenney wrote:
>> I buy the vCPU preemption part and TBH guests should not have that
>> watchdog thing active at all for exactly this reason.
>
> Agreed, one approch is to enable the the clocksource watchdog only in
> the hypervisor, and have some action on the guests triggered when the
> host detects clock skew.
>
> This works quite well, at least until something breaks in a way that
> messes up clock reads from the guest but not from the host.  And I
> am sure that any number of hardware guys will tell me that this just
> isn't possible, but if failing hardware operated according to their
> expectations, that hardware wouldn't be considered to be failing.
> Or it wouldn't be hardware, firmware, or clock-driver bringup, as the
> case may be.

Don't tell me. The fact that this code exists at all is a horror on it's
own.

>> SMI, NMI injecting 62.5ms delay? If that happens then the performance of
>> the clocksource is the least of your worries.
>
> I was kind of hoping that you would tell me why the skew must be all the
> way up to 62.5ms before the clock is disabled.  The watchdog currently
> is quite happy with more than 10% skew between clocks.
>
> 100HZ clocks or some such?

Histerical raisins. When the clocksource watchdog was introduced it
replaced a x86 specific validation which was jiffies based. I have faint
memories that we wanted to have at least jiffies based checks preserved
in absence of other hardware, which had other problems and we gave up on
it. But obviously nobody thought about revisiting the threshold.

Yes, it's way too big. The slowest watchdog frequency on x86 is ~3.5 Mhz
(ACPI PMtimer). Don't know about the reference frequency on MIPS which
is the only other user of this.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-11 10:33                       ` Thomas Gleixner
@ 2021-04-11 16:46                         ` Paul E. McKenney
  2021-04-12  4:21                           ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-11 16:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sun, Apr 11, 2021 at 12:33:44PM +0200, Thomas Gleixner wrote:
> On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote:
> > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
> >> > +	if (WARN_ON_ONCE(!cs))
> >> > +		return;
> >> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
> >> > +		cs->name, smp_processor_id());
> >> > +	cpumask_clear(&cpus_ahead);
> >> > +	cpumask_clear(&cpus_behind);
> >> > +	csnow_begin = cs->read(cs);
> >> 
> >> So this is invoked via work and the actual clocksource change is done
> >> via work too. Once the clocksource is not longer actively used for
> >> timekeeping it can go away. What's guaranteeing that this runs prior to
> >> the clocksource change and 'cs' is valid throughout this function?
> >
> > From what I can see, cs->read() doesn't care whether or not the
> > clocksource has been marked unstable.  So it should be OK to call
> > cs->read() before, during, or after the call to __clocksource_unstable().
> >
> > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
> > so any clocksource that did not like cs->read() being invoked during
> > or after the call to __clocksource_unstable() should leave off the
> > CLOCK_SOURCE_VERIFY_PERCPU bit.
> >
> > Or did I take a wrong turn somewhere in the pointers to functions?
> 
> Right. cs->read() does not care, but what guarantees that cs is valid
> and not freed yet? It's not an issue with TSC and KVMCLOCK, but
> conceptually the following is possible:
> 
> watchdog()   
>   queue_work(synccheck);
>   queue_work(clocksource_change);
> 
> work:                                   
>   synccheck()                             clocksource_change()
>     preemption                            ...
>                                           ...
>                                           some_other_code():
>                                              unregister_clocksource(cs)
>                                              free(cs)
>   cs->read()   <- UAF

Got it, with the ingenic_tcu_init() function being case in point.
It invokes clcoksource_unregister() shortly followed by clk_put(), which,
if I found the correct clk_put(), can kfree() it.

Thank you!

> >> > +	queue_work(system_highpri_wq, &clocksource_verify_work);
> >> 
> >> This does not guarantee anything. So why does this need an extra work
> >> function which is scheduled seperately?
> >
> > Because I was concerned about doing smp_call_function() while holding
> > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
> > And it still looks like on x86 that spin_lock_irqsave() spins with irqs
> > disabled, which could result in deadlock.  The smp_call_function_single()
> > would wait for the target CPU to enable interrupts, which would not
> > happen until after the smp_call_function_single() returned due to its
> > caller holding watchdog_lock.
> >
> > Or is there something that I am missing that prevents this deadlock
> > from occurring?
> 
> The unstable mechanism is:
> 
> watchdog()
>   __clocksource_unstable()
>     schedule_work(&watchdog_work);
> 
> watchdog_work()
>   kthread_run(clocksource_watchdog_thread);
> 
> cs_watchdog_thread()
>   mutex_lock(&clocksource_mutex);
>   if (__clocksource_watchdog_kthread())
> 	clocksource_select();
>   mutex_unlock(&clocksource_mutex);
> 
> So what prevents you from doing that right in watchdog_work() or even in
> cs_watchdog_thread() properly ordered against the actual clocksource
> switch?
> 
> Hmm?

My own confusion, apparently.  :-/

So I need to is inline clocksource_verify_percpu_wq()
into clocksource_verify_percpu() and then move the call to
clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
before the existing call to list_del_init().  Will do!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13
  2021-04-11 10:58                     ` Thomas Gleixner
@ 2021-04-11 16:50                       ` Paul E. McKenney
  0 siblings, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-11 16:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sun, Apr 11, 2021 at 12:58:31PM +0200, Thomas Gleixner wrote:
> On Sat, Apr 10 2021 at 16:26, Paul E. McKenney wrote:
> > On Sat, Apr 10, 2021 at 10:01:58AM +0200, Thomas Gleixner wrote:
> >> On Fri, Apr 02 2021 at 15:48, Paul E. McKenney wrote:
> >> I buy the vCPU preemption part and TBH guests should not have that
> >> watchdog thing active at all for exactly this reason.
> >
> > Agreed, one approch is to enable the the clocksource watchdog only in
> > the hypervisor, and have some action on the guests triggered when the
> > host detects clock skew.
> >
> > This works quite well, at least until something breaks in a way that
> > messes up clock reads from the guest but not from the host.  And I
> > am sure that any number of hardware guys will tell me that this just
> > isn't possible, but if failing hardware operated according to their
> > expectations, that hardware wouldn't be considered to be failing.
> > Or it wouldn't be hardware, firmware, or clock-driver bringup, as the
> > case may be.
> 
> Don't tell me. The fact that this code exists at all is a horror on it's
> own.

Let's just say that really I did consider the option of just disabling
the watchdog for guest OSes, but it will likely be at least a few years
before per-CPU hardware clocks regain my trust.  :-/

> >> SMI, NMI injecting 62.5ms delay? If that happens then the performance of
> >> the clocksource is the least of your worries.
> >
> > I was kind of hoping that you would tell me why the skew must be all the
> > way up to 62.5ms before the clock is disabled.  The watchdog currently
> > is quite happy with more than 10% skew between clocks.
> >
> > 100HZ clocks or some such?
> 
> Histerical raisins. When the clocksource watchdog was introduced it
> replaced a x86 specific validation which was jiffies based. I have faint
> memories that we wanted to have at least jiffies based checks preserved
> in absence of other hardware, which had other problems and we gave up on
> it. But obviously nobody thought about revisiting the threshold.
> 
> Yes, it's way too big. The slowest watchdog frequency on x86 is ~3.5 Mhz
> (ACPI PMtimer). Don't know about the reference frequency on MIPS which
> is the only other user of this.

Whew!  I will try reducing the permitted skew.  My (perhaps naive) guess
is that with the delay rejection, there is no reason that it cannot
be decreased at least to 500us.  If that goes well, I will send along
another patch.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-11 16:46                         ` Paul E. McKenney
@ 2021-04-12  4:21                           ` Paul E. McKenney
  2021-04-12 13:08                             ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-12  4:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
> On Sun, Apr 11, 2021 at 12:33:44PM +0200, Thomas Gleixner wrote:
> > On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote:
> > > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
> > >> > +	if (WARN_ON_ONCE(!cs))
> > >> > +		return;
> > >> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
> > >> > +		cs->name, smp_processor_id());
> > >> > +	cpumask_clear(&cpus_ahead);
> > >> > +	cpumask_clear(&cpus_behind);
> > >> > +	csnow_begin = cs->read(cs);
> > >> 
> > >> So this is invoked via work and the actual clocksource change is done
> > >> via work too. Once the clocksource is not longer actively used for
> > >> timekeeping it can go away. What's guaranteeing that this runs prior to
> > >> the clocksource change and 'cs' is valid throughout this function?
> > >
> > > From what I can see, cs->read() doesn't care whether or not the
> > > clocksource has been marked unstable.  So it should be OK to call
> > > cs->read() before, during, or after the call to __clocksource_unstable().
> > >
> > > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
> > > so any clocksource that did not like cs->read() being invoked during
> > > or after the call to __clocksource_unstable() should leave off the
> > > CLOCK_SOURCE_VERIFY_PERCPU bit.
> > >
> > > Or did I take a wrong turn somewhere in the pointers to functions?
> > 
> > Right. cs->read() does not care, but what guarantees that cs is valid
> > and not freed yet? It's not an issue with TSC and KVMCLOCK, but
> > conceptually the following is possible:
> > 
> > watchdog()   
> >   queue_work(synccheck);
> >   queue_work(clocksource_change);
> > 
> > work:                                   
> >   synccheck()                             clocksource_change()
> >     preemption                            ...
> >                                           ...
> >                                           some_other_code():
> >                                              unregister_clocksource(cs)
> >                                              free(cs)
> >   cs->read()   <- UAF
> 
> Got it, with the ingenic_tcu_init() function being case in point.
> It invokes clcoksource_unregister() shortly followed by clk_put(), which,
> if I found the correct clk_put(), can kfree() it.
> 
> Thank you!
> 
> > >> > +	queue_work(system_highpri_wq, &clocksource_verify_work);
> > >> 
> > >> This does not guarantee anything. So why does this need an extra work
> > >> function which is scheduled seperately?
> > >
> > > Because I was concerned about doing smp_call_function() while holding
> > > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
> > > And it still looks like on x86 that spin_lock_irqsave() spins with irqs
> > > disabled, which could result in deadlock.  The smp_call_function_single()
> > > would wait for the target CPU to enable interrupts, which would not
> > > happen until after the smp_call_function_single() returned due to its
> > > caller holding watchdog_lock.
> > >
> > > Or is there something that I am missing that prevents this deadlock
> > > from occurring?
> > 
> > The unstable mechanism is:
> > 
> > watchdog()
> >   __clocksource_unstable()
> >     schedule_work(&watchdog_work);
> > 
> > watchdog_work()
> >   kthread_run(clocksource_watchdog_thread);
> > 
> > cs_watchdog_thread()
> >   mutex_lock(&clocksource_mutex);
> >   if (__clocksource_watchdog_kthread())
> > 	clocksource_select();
> >   mutex_unlock(&clocksource_mutex);
> > 
> > So what prevents you from doing that right in watchdog_work() or even in
> > cs_watchdog_thread() properly ordered against the actual clocksource
> > switch?
> > 
> > Hmm?
> 
> My own confusion, apparently.  :-/
> 
> So I need to is inline clocksource_verify_percpu_wq()
> into clocksource_verify_percpu() and then move the call to
> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
> before the existing call to list_del_init().  Will do!

Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
due to interrupts being disabled across that list_del_init().

Possibilities include:

1.	Figure out why interrupts must be disabled only sometimes while
	holding watchdog_lock, in the hope that they need not be across
	the entire critical section for __clocksource_watchdog_kthread().
	As in:

		local_irq_restore(flags);
		clocksource_verify_percpu(cs);
		local_irq_save(flags);

	Trying this first with lockdep enabled.  Might be spectacular.

2.	Invoke clocksource_verify_percpu() from its original
	location in clocksource_watchdog(), just before the call to
	__clocksource_unstable().  This relies on the fact that
	clocksource_watchdog() acquires watchdog_lock without
	disabling interrupts.

3.	Restrict CLOCK_SOURCE_VERIFY_PERCPU to clocksource structures
	that are statically allocated, thus avoiding the use-after-free
	problem.  Rely on KASAN to enforce this restriction.

4.	Add reference counting or some such to clock sources.

5.	Your ideas here.

I will give this more thought, but #2 is looking pretty good at this point.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12  4:21                           ` Paul E. McKenney
@ 2021-04-12 13:08                             ` Thomas Gleixner
  2021-04-12 18:20                               ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-12 13:08 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote:
> On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
>> So I need to is inline clocksource_verify_percpu_wq()
>> into clocksource_verify_percpu() and then move the call to
>> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
>> before the existing call to list_del_init().  Will do!
>
> Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
> due to interrupts being disabled across that list_del_init().
>
> Possibilities include:
>
> 1.	Figure out why interrupts must be disabled only sometimes while
> 	holding watchdog_lock, in the hope that they need not be across
> 	the entire critical section for __clocksource_watchdog_kthread().
> 	As in:
>
> 		local_irq_restore(flags);
> 		clocksource_verify_percpu(cs);
> 		local_irq_save(flags);
>
> 	Trying this first with lockdep enabled.  Might be spectacular.

Yes, it's a possible deadlock against the watchdog timer firing ...

The reason for irqsave is again historical AFAICT and nobody bothered to
clean it up. spin_lock_bh() should be sufficient to serialize against
the watchdog timer, though I haven't looked at all possible scenarios.

> 2.	Invoke clocksource_verify_percpu() from its original
> 	location in clocksource_watchdog(), just before the call to
> 	__clocksource_unstable().  This relies on the fact that
> 	clocksource_watchdog() acquires watchdog_lock without
> 	disabling interrupts.

That should be fine, but this might cause the softirq to 'run' for a
very long time which is not pretty either.

Aside of that, do we really need to check _all_ online CPUs? What you
are trying to figure out is whether the wreckage is CPU local or global,
right?

Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
enough? Either the other CPU has the same wreckage, then it's global or
it hasn't which points to a per CPU local issue.

Sure it does not catch the case where a subset (>1) of all CPUs is
affected, but I'm not seing how that really buys us anything.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12 13:08                             ` Thomas Gleixner
@ 2021-04-12 18:20                               ` Paul E. McKenney
  2021-04-12 18:54                                 ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-12 18:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
> On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote:
> > On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
> >> So I need to is inline clocksource_verify_percpu_wq()
> >> into clocksource_verify_percpu() and then move the call to
> >> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
> >> before the existing call to list_del_init().  Will do!
> >
> > Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
> > due to interrupts being disabled across that list_del_init().
> >
> > Possibilities include:
> >
> > 1.	Figure out why interrupts must be disabled only sometimes while
> > 	holding watchdog_lock, in the hope that they need not be across
> > 	the entire critical section for __clocksource_watchdog_kthread().
> > 	As in:
> >
> > 		local_irq_restore(flags);
> > 		clocksource_verify_percpu(cs);
> > 		local_irq_save(flags);
> >
> > 	Trying this first with lockdep enabled.  Might be spectacular.
> 
> Yes, it's a possible deadlock against the watchdog timer firing ...

And lockdep most emphatically agrees with you.  ;-)

> The reason for irqsave is again historical AFAICT and nobody bothered to
> clean it up. spin_lock_bh() should be sufficient to serialize against
> the watchdog timer, though I haven't looked at all possible scenarios.

Though if BH is disabled, there is not so much advantage to
invoking it from __clocksource_watchdog_kthread().  Might as
well just invoke it directly from clocksource_watchdog().

> > 2.	Invoke clocksource_verify_percpu() from its original
> > 	location in clocksource_watchdog(), just before the call to
> > 	__clocksource_unstable().  This relies on the fact that
> > 	clocksource_watchdog() acquires watchdog_lock without
> > 	disabling interrupts.
> 
> That should be fine, but this might cause the softirq to 'run' for a
> very long time which is not pretty either.
> 
> Aside of that, do we really need to check _all_ online CPUs? What you
> are trying to figure out is whether the wreckage is CPU local or global,
> right?
> 
> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
> enough? Either the other CPU has the same wreckage, then it's global or
> it hasn't which points to a per CPU local issue.
> 
> Sure it does not catch the case where a subset (>1) of all CPUs is
> affected, but I'm not seing how that really buys us anything.

Good point!  My thought is to randomly pick eight CPUs to keep the
duration reasonable while having a good chance of hitting "interesting"
CPU choices in multicore and multisocket systems.

However, if a hard-to-reproduce problem occurred, it would be good to take
the hit and scan all the CPUs.  Additionally, there are some workloads
for which the switch from TSC to HPET is fatal anyway due to increased
overhead.  For these workloads, the full CPU scan is no additional pain.

So I am thinking in terms of a default that probes eight randomly selected
CPUs without worrying about duplicates (as in there would be some chance
that fewer CPUs would actually be probed), but with a boot-time flag
that does all CPUs.  I would add the (default) random selection as a
separate patch.

I will send a new series out later today, Pacific Time.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12 18:20                               ` Paul E. McKenney
@ 2021-04-12 18:54                                 ` Thomas Gleixner
  2021-04-12 19:57                                   ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-12 18:54 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

Paul,

On Mon, Apr 12 2021 at 11:20, Paul E. McKenney wrote:
> On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
>> The reason for irqsave is again historical AFAICT and nobody bothered to
>> clean it up. spin_lock_bh() should be sufficient to serialize against
>> the watchdog timer, though I haven't looked at all possible scenarios.
>
> Though if BH is disabled, there is not so much advantage to
> invoking it from __clocksource_watchdog_kthread().  Might as
> well just invoke it directly from clocksource_watchdog().
>
>> > 2.	Invoke clocksource_verify_percpu() from its original
>> > 	location in clocksource_watchdog(), just before the call to
>> > 	__clocksource_unstable().  This relies on the fact that
>> > 	clocksource_watchdog() acquires watchdog_lock without
>> > 	disabling interrupts.
>> 
>> That should be fine, but this might cause the softirq to 'run' for a
>> very long time which is not pretty either.
>> 
>> Aside of that, do we really need to check _all_ online CPUs? What you
>> are trying to figure out is whether the wreckage is CPU local or global,
>> right?
>> 
>> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
>> enough? Either the other CPU has the same wreckage, then it's global or
>> it hasn't which points to a per CPU local issue.
>> 
>> Sure it does not catch the case where a subset (>1) of all CPUs is
>> affected, but I'm not seing how that really buys us anything.
>
> Good point!  My thought is to randomly pick eight CPUs to keep the
> duration reasonable while having a good chance of hitting "interesting"
> CPU choices in multicore and multisocket systems.
>
> However, if a hard-to-reproduce problem occurred, it would be good to take
> the hit and scan all the CPUs.  Additionally, there are some workloads
> for which the switch from TSC to HPET is fatal anyway due to increased
> overhead.  For these workloads, the full CPU scan is no additional pain.
>
> So I am thinking in terms of a default that probes eight randomly selected
> CPUs without worrying about duplicates (as in there would be some chance
> that fewer CPUs would actually be probed), but with a boot-time flag
> that does all CPUs.  I would add the (default) random selection as a
> separate patch.

You can't do without making it complex, right? Keep it simple is not an
option for a RCU hacker it seems :)

> I will send a new series out later today, Pacific Time.

Can you do me a favour and send it standalone and not as yet another
reply to this existing thread maze. A trivial lore link to the previous
version gives enough context.

Thanks,

        tglx
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12 18:54                                 ` Thomas Gleixner
@ 2021-04-12 19:57                                   ` Paul E. McKenney
  2021-04-12 20:37                                     ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-12 19:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Mon, Apr 12 2021 at 11:20, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
> >> The reason for irqsave is again historical AFAICT and nobody bothered to
> >> clean it up. spin_lock_bh() should be sufficient to serialize against
> >> the watchdog timer, though I haven't looked at all possible scenarios.
> >
> > Though if BH is disabled, there is not so much advantage to
> > invoking it from __clocksource_watchdog_kthread().  Might as
> > well just invoke it directly from clocksource_watchdog().
> >
> >> > 2.	Invoke clocksource_verify_percpu() from its original
> >> > 	location in clocksource_watchdog(), just before the call to
> >> > 	__clocksource_unstable().  This relies on the fact that
> >> > 	clocksource_watchdog() acquires watchdog_lock without
> >> > 	disabling interrupts.
> >> 
> >> That should be fine, but this might cause the softirq to 'run' for a
> >> very long time which is not pretty either.
> >> 
> >> Aside of that, do we really need to check _all_ online CPUs? What you
> >> are trying to figure out is whether the wreckage is CPU local or global,
> >> right?
> >> 
> >> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
> >> enough? Either the other CPU has the same wreckage, then it's global or
> >> it hasn't which points to a per CPU local issue.
> >> 
> >> Sure it does not catch the case where a subset (>1) of all CPUs is
> >> affected, but I'm not seing how that really buys us anything.
> >
> > Good point!  My thought is to randomly pick eight CPUs to keep the
> > duration reasonable while having a good chance of hitting "interesting"
> > CPU choices in multicore and multisocket systems.
> >
> > However, if a hard-to-reproduce problem occurred, it would be good to take
> > the hit and scan all the CPUs.  Additionally, there are some workloads
> > for which the switch from TSC to HPET is fatal anyway due to increased
> > overhead.  For these workloads, the full CPU scan is no additional pain.
> >
> > So I am thinking in terms of a default that probes eight randomly selected
> > CPUs without worrying about duplicates (as in there would be some chance
> > that fewer CPUs would actually be probed), but with a boot-time flag
> > that does all CPUs.  I would add the (default) random selection as a
> > separate patch.
> 
> You can't do without making it complex, right? Keep it simple is not an
> option for a RCU hacker it seems :)

But it was simple!  It just hit all the CPUs.

However, you (quite rightly) pointed out that this simple approach had
a few shortcomings.  ;-)

> > I will send a new series out later today, Pacific Time.
> 
> Can you do me a favour and send it standalone and not as yet another
> reply to this existing thread maze. A trivial lore link to the previous
> version gives enough context.

Will do!

Of course, it turns out that lockdep also doesn't like waited-on
smp_call_function_single() invocations from timer handlers,
so I am currently looking at other options for dealing with that
potential use-after-free.  I am starting to like the looks of "only set
CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
and let KASAN enforce this restriction", but I have not quite given up
on making it more general.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12 19:57                                   ` Paul E. McKenney
@ 2021-04-12 20:37                                     ` Thomas Gleixner
  2021-04-12 23:18                                       ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-12 20:37 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
>> > I will send a new series out later today, Pacific Time.
>> 
>> Can you do me a favour and send it standalone and not as yet another
>> reply to this existing thread maze. A trivial lore link to the previous
>> version gives enough context.
>
> Will do!
>
> Of course, it turns out that lockdep also doesn't like waited-on
> smp_call_function_single() invocations from timer handlers,
> so I am currently looking at other options for dealing with that
> potential use-after-free.  I am starting to like the looks of "only set
> CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> and let KASAN enforce this restriction", but I have not quite given up
> on making it more general.

The simplest point is in the thread under the clocksource_mutex which
prevents anything from vanishing under your feet.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12 20:37                                     ` Thomas Gleixner
@ 2021-04-12 23:18                                       ` Paul E. McKenney
  2021-04-13 20:49                                         ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-12 23:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> >> > I will send a new series out later today, Pacific Time.
> >> 
> >> Can you do me a favour and send it standalone and not as yet another
> >> reply to this existing thread maze. A trivial lore link to the previous
> >> version gives enough context.
> >
> > Will do!
> >
> > Of course, it turns out that lockdep also doesn't like waited-on
> > smp_call_function_single() invocations from timer handlers,
> > so I am currently looking at other options for dealing with that
> > potential use-after-free.  I am starting to like the looks of "only set
> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> > and let KASAN enforce this restriction", but I have not quite given up
> > on making it more general.
> 
> The simplest point is in the thread under the clocksource_mutex which
> prevents anything from vanishing under your feet.

And lockdep is -much- happier with the setup shown below, so thank
you again!

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index f047c6cb056c..34dc38b6b923 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
 	unsigned long flags;
 	int select = 0;
 
+	/* Do any required per-CPU skew verification. */
+	list_for_each_entry(cs, &watchdog_list, wd_list) {
+		if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) ==
+		    (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
+			clocksource_verify_percpu(cs);
+	}
+
 	spin_lock_irqsave(&watchdog_lock, flags);
 	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-12 23:18                                       ` Paul E. McKenney
@ 2021-04-13 20:49                                         ` Thomas Gleixner
  2021-04-14  4:48                                           ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2021-04-13 20:49 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

Paul,

On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote:
> On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
>> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
>> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
>> >> > I will send a new series out later today, Pacific Time.
>> >> 
>> >> Can you do me a favour and send it standalone and not as yet another
>> >> reply to this existing thread maze. A trivial lore link to the previous
>> >> version gives enough context.
>> >
>> > Will do!
>> >
>> > Of course, it turns out that lockdep also doesn't like waited-on
>> > smp_call_function_single() invocations from timer handlers,
>> > so I am currently looking at other options for dealing with that
>> > potential use-after-free.  I am starting to like the looks of "only set
>> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
>> > and let KASAN enforce this restriction", but I have not quite given up
>> > on making it more general.
>> 
>> The simplest point is in the thread under the clocksource_mutex which
>> prevents anything from vanishing under your feet.
>
> And lockdep is -much- happier with the setup shown below, so thank
> you again!

But it is too simple now :) ...

> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index f047c6cb056c..34dc38b6b923 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
>  	unsigned long flags;
>  	int select = 0;
>  
> +	/* Do any required per-CPU skew verification. */
> +	list_for_each_entry(cs, &watchdog_list, wd_list) {
> +		if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) ==
> +		    (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
> +			clocksource_verify_percpu(cs);
> +	}

because that list is _NOT_ protected by the clocksource_mutex as you
noticed yourself already.

But you don't have to walk that list at all because the only interesting
thing is the currently active clocksource, which is about to be changed
in case the watchdog marked it unstable and cannot be changed by any
other code concurrently because clocksource_mutex is held.

So all you need is:

	if (curr_clocksource &&
	    curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
	    curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
		clocksource_verify_percpu_wreckage(curr_clocksource);

Hmm?

Thanks,

        tglx
        


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-04-13 20:49                                         ` Thomas Gleixner
@ 2021-04-14  4:48                                           ` Paul E. McKenney
  0 siblings, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2021-04-14  4:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak

On Tue, Apr 13, 2021 at 10:49:11PM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
> >> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> >> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> >> >> > I will send a new series out later today, Pacific Time.
> >> >> 
> >> >> Can you do me a favour and send it standalone and not as yet another
> >> >> reply to this existing thread maze. A trivial lore link to the previous
> >> >> version gives enough context.
> >> >
> >> > Will do!
> >> >
> >> > Of course, it turns out that lockdep also doesn't like waited-on
> >> > smp_call_function_single() invocations from timer handlers,
> >> > so I am currently looking at other options for dealing with that
> >> > potential use-after-free.  I am starting to like the looks of "only set
> >> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> >> > and let KASAN enforce this restriction", but I have not quite given up
> >> > on making it more general.
> >> 
> >> The simplest point is in the thread under the clocksource_mutex which
> >> prevents anything from vanishing under your feet.
> >
> > And lockdep is -much- happier with the setup shown below, so thank
> > you again!
> 
> But it is too simple now :) ...
> 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index f047c6cb056c..34dc38b6b923 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
> >  	unsigned long flags;
> >  	int select = 0;
> >  
> > +	/* Do any required per-CPU skew verification. */
> > +	list_for_each_entry(cs, &watchdog_list, wd_list) {
> > +		if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) ==
> > +		    (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
> > +			clocksource_verify_percpu(cs);
> > +	}
> 
> because that list is _NOT_ protected by the clocksource_mutex as you
> noticed yourself already.
> 
> But you don't have to walk that list at all because the only interesting
> thing is the currently active clocksource, which is about to be changed
> in case the watchdog marked it unstable and cannot be changed by any
> other code concurrently because clocksource_mutex is held.
> 
> So all you need is:
> 
> 	if (curr_clocksource &&
> 	    curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
> 	    curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
> 		clocksource_verify_percpu_wreckage(curr_clocksource);
> 
> Hmm?

With the addition of a clocksource=tsc boot parameter, this approach
does appear to work, thank you!  I sent out the updated series.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2021-04-14  4:48 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-01-06  0:41 ` [PATCH RFC clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-01-06  0:41 ` [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-01-06 16:28   ` Rik van Riel
2021-01-06 19:53     ` Paul E. McKenney
2021-01-06 20:59       ` Rik van Riel
2021-01-06  0:41 ` [PATCH RFC clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-01-06  0:41 ` [PATCH RFC clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-01-06  0:41 ` [PATCH RFC clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-01-12  0:45   ` [PATCH v2 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-02-02 17:06     ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-02-02 17:06     ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-02-02 17:06     ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-02-02 17:06     ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-02-02 19:51       ` Randy Dunlap
2021-02-03  0:50         ` Paul E. McKenney
2021-02-03  1:31           ` Randy Dunlap
2021-02-03  1:40             ` Paul E. McKenney
2021-02-02 17:06     ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-02-17 21:29       ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-02-17 21:29       ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-02-17 21:29       ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-02-17 21:29       ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-02-17 21:29       ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-03-04  0:53         ` [PATCH kernel/time 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-03-04  0:53         ` [PATCH kernel/time 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-03-04  0:53         ` [PATCH kernel/time 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-03-04  0:53         ` [PATCH kernel/time 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-03-04  0:53         ` [PATCH kernel/time 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog paulmck
2021-04-02 22:22             ` Thomas Gleixner
2021-04-02 22:37               ` Paul E. McKenney
2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-02 22:49                 ` [PATCH v7 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-04-02 22:49                 ` [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-04-10  8:41                   ` Thomas Gleixner
2021-04-10 23:50                     ` Paul E. McKenney
2021-04-02 22:49                 ` [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-04-10  9:00                   ` Thomas Gleixner
2021-04-11  0:20                     ` Paul E. McKenney
2021-04-11 10:33                       ` Thomas Gleixner
2021-04-11 16:46                         ` Paul E. McKenney
2021-04-12  4:21                           ` Paul E. McKenney
2021-04-12 13:08                             ` Thomas Gleixner
2021-04-12 18:20                               ` Paul E. McKenney
2021-04-12 18:54                                 ` Thomas Gleixner
2021-04-12 19:57                                   ` Paul E. McKenney
2021-04-12 20:37                                     ` Thomas Gleixner
2021-04-12 23:18                                       ` Paul E. McKenney
2021-04-13 20:49                                         ` Thomas Gleixner
2021-04-14  4:48                                           ` Paul E. McKenney
2021-04-02 22:49                 ` [PATCH v7 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-04-02 22:49                 ` [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-04-10  9:04                   ` Thomas Gleixner
2021-04-11  0:21                     ` Paul E. McKenney
2021-04-10  8:01                 ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Thomas Gleixner
2021-04-10 23:26                   ` Paul E. McKenney
2021-04-11 10:58                     ` Thomas Gleixner
2021-04-11 16:50                       ` Paul E. McKenney
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Retry clock read if long delays detected paulmck
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Do pairwise clock-desynchronization checking paulmck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).