linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14
@ 2021-05-11 23:34 Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 1/6] clocksource: Retry clock read if long delays detected Paul E. McKenney
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing

Hello!

If there is a sufficient delay between reading the watchdog clock and the
clock under test, the clock under test will be marked unstable through no
fault of its own.  This series checks for this, doing limited retries
to get a good set of clock reads.  If the clock is marked unstable
and is marked as being per-CPU, cross-CPU synchronization is checked.
This series also provides a clocksource-watchdog-test kernel module that
tests this new ability of distinguishing delay-induced clock skew from
true clock skew.

Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
vCPU preemption.

The patches in this series are as follows:

1.	Retry clock read if long delays detected.

2.	Check per-CPU clock synchronization when marked unstable.

3.	Limit number of CPUs checked for clock synchronization.

4.	Reduce clocksource-skew threshold for TSC.

5.	Provide kernel module to test clocksource watchdog.

6.	Print deviation in nanoseconds for unstable case, courtesy of
	Feng Tang.

Changes since v13:

o	Forward-port to v5.13-rc1.

o	Add patch 6 from Feng Tang.

Changes since v12, based on feedback from kernel test robot and Stephen
Rothwell:

o	Export clocksource_verify_percpu().

Link: https://lore.kernel.org/lkml/20210501003204.GA2447938@paulmck-ThinkPad-P17-Gen-1/

Changes since v11, based on feedback from Thomas Gleixner:

o	Remove the fault-injection code from clocksource.c.

o	Create a kernel/time/clocksource-wdtest.c kernel module that
	creates its own clocksource structures and injects delays
	as part of their ->read() functions.

o	Make this kernel module splat upon error, for example, when
	a clocksource is not marked unstable but should have been.

o	Apply a couple more "Link:" fields to all patches.

Changes since v10 based on feedback from Thomas Gleixner, Feng Tang,
and Andi Kleen:

o	Add an uncertainty_margin field to the clocksource structure to
	allow skew cutoffs to be tailored to the pair of clocksources
	that the watchdog is comparing.

o	Manually initialize this uncertainty_margin field for
	clocksource_tsc_early and clocksource_jiffies, thus avoiding
	the need for special-case code to allow for the unusually
	large skews inherent to these clocksources.

Changes since v9:

o	Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
	Zhengjun; and Thomas Gleixner.

o	Improve CPU selection for clock-synchronization checking.

Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/

Changes since v8, based on Thomas Gleixner feedback:

o	Reduced clock-skew threshold to 200us and delay limit to 50us.

o	Split out a cs_watchdog_read() function.

o	Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.

o	Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.

Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/

Changes since v7, based on Thomas Gleixner feedback:

o	Fix embarrassing git-format-patch operator error.

o	Merge pairwise clock-desynchronization checking into the checking
	of per-CPU clock synchronization when marked unstable.

o	Do selective per-CPU checking rather than blindly checking all
	CPUs.  Provide a clocksource.verify_n_cpus kernel boot parameter
	to control this behavior, with the value -1 choosing the old
	check-all-CPUs behavior.  The default is to randomly check 8 CPUs.

o	Fix the clock-desynchronization checking to avoid a potential
	use-after-free error for dynamically allocated clocksource
	structures.

o	Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
	clocksource skew checking.

o	Update commit logs and do code-style updates.

Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/

Changes since v5:

o	Rebased to v5.12-rc5.

Changes since v4:

o	Rebased to v5.12-rc1.

Changes since v3:

o	Rebased to v5.11.

o	Apply Randy Dunlap feedback.

Changes since v2:

o	Rebased to v5.11-rc6.

o	Updated Cc: list.

Changes since v1:

o	Applied feedback from Rik van Riel.

o	Rebased to v5.11-rc3.

o	Stripped "RFC" from the subject lines.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/admin-guide/kernel-parameters.txt   |   16 +
 arch/x86/kernel/tsc.c                             |    1 
 b/Documentation/admin-guide/kernel-parameters.txt |    6 
 b/arch/x86/kernel/tsc.c                           |    3 
 b/include/linux/clocksource.h                     |    2 
 b/kernel/time/Makefile                            |    1 
 b/kernel/time/clocksource-wdtest.c                |  202 ++++++++++++++++++++++
 b/kernel/time/clocksource.c                       |   52 +++++
 b/kernel/time/jiffies.c                           |   15 -
 b/lib/Kconfig.debug                               |   12 +
 include/linux/clocksource.h                       |    6 
 kernel/time/clocksource.c                         |  202 +++++++++++++++++++---
 12 files changed, 482 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v14 clocksource 1/6] clocksource: Retry clock read if long delays detected
  2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
@ 2021-05-11 23:34 ` Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 2/6] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, Xing Zhengjun, Chris Mason

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  Yes, interrupts are
disabled across those two reads, but there are no shortage of things that
can delay interrupts-disabled regions of code ranging from SMI handlers
to vCPU preemption.  It would be good to have some indication as to why
the clock was marked unstable.

Therefore, re-read the watchdog clock on either side of the read
from the clock under test.  If the watchdog clock shows an excessive
time delta between its pair of reads, the reads are retried.  The
maximum number of retries is specified by a new kernel boot parameter
clocksource.max_read_retries, which defaults to three, that is, up to four
reads, one initial and up to three retries.  If more than one retry was
required, a message is printed on the console (the occasional single retry
is expected behavior, especially in guest OSes).  If the maximum number
of retries is exceeded, the clock under test will be marked unstable.
However, the probability of this happening due to various sorts of
delays is quite small.  In addition, the reason (clock-read delays)
for the unstable marking will be apparent.

Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Feng Tang <feng.tang@intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Per-clocksource retries per Neeraj Upadhyay feedback. ]
[ paulmck: Don't reset injectfail per Neeraj Upadhyay feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  6 +++
 kernel/time/clocksource.c                     | 52 ++++++++++++++++---
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..c6886a44f679 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -581,6 +581,12 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.max_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 2cd902592fc1..2df3b281bb35 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,13 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
 
+/*
+ * Maximum permissible delay between two readouts of the watchdog
+ * clocksource surrounding a read of the clocksource being validated.
+ * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
+ */
+#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
+
 static void clocksource_watchdog_work(struct work_struct *work)
 {
 	/*
@@ -184,12 +191,44 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static ulong max_read_retries = 3;
+module_param(max_read_retries, ulong, 0644);
+
+static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
+{
+	unsigned int nretries;
+	u64 wd_end, wd_delta;
+	int64_t wd_delay;
+
+	for (nretries = 0; nretries <= max_read_retries; nretries++) {
+		local_irq_disable();
+		*wdnow = watchdog->read(watchdog);
+		*csnow = cs->read(cs);
+		wd_end = watchdog->read(watchdog);
+		local_irq_enable();
+
+		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
+		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
+		if (wd_delay <= WATCHDOG_MAX_SKEW) {
+			if (nretries > 1 || nretries >= max_read_retries) {
+				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
+					smp_processor_id(), watchdog->name, nretries);
+			}
+			return true;
+		}
+	}
+
+	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
+		smp_processor_id(), watchdog->name, wd_delay, nretries);
+	return false;
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
-	struct clocksource *cs;
 	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
 	int next_cpu, reset_pending;
+	int64_t wd_nsec, cs_nsec;
+	struct clocksource *cs;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -206,10 +245,11 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
-		local_irq_disable();
-		csnow = cs->read(cs);
-		wdnow = watchdog->read(watchdog);
-		local_irq_enable();
+		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
+			/* Clock readout unreliable, so give it up. */
+			__clocksource_unstable(cs);
+			continue;
+		}
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v14 clocksource 2/6] clocksource: Check per-CPU clock synchronization when marked unstable
  2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 1/6] clocksource: Retry clock read if long delays detected Paul E. McKenney
@ 2021-05-11 23:34 ` Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 3/6] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, Xing Zhengjun, Chris Mason

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong.  Just like in real life.

Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ]
[ paulmck: Apply Thomas Gleixner feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/tsc.c       |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 60 +++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 57ec01192180..6eb1b097e97e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1152,7 +1152,8 @@ static struct clocksource clocksource_tsc = {
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_VALID_FOR_HRES |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index d6ab416ee2d2..7f83d51c0fd7 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -137,7 +137,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
 #define CLOCK_SOURCE_RESELECT			0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU		0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 2df3b281bb35..d994762fcdca 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -223,6 +223,60 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
 	return false;
 }
 
+static u64 csnow_mid;
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+	struct clocksource *cs = (struct clocksource *)csin;
+
+	csnow_mid = cs->read(cs);
+}
+
+static void clocksource_verify_percpu(struct clocksource *cs)
+{
+	int64_t cs_nsec, cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
+	u64 csnow_begin, csnow_end;
+	int cpu, testcpu;
+	s64 delta;
+
+	cpumask_clear(&cpus_ahead);
+	cpumask_clear(&cpus_behind);
+	preempt_disable();
+	testcpu = smp_processor_id();
+	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
+	for_each_online_cpu(cpu) {
+		if (cpu == testcpu)
+			continue;
+		csnow_begin = cs->read(cs);
+		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
+		csnow_end = cs->read(cs);
+		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
+		if (delta < 0)
+			cpumask_set_cpu(cpu, &cpus_behind);
+		delta = (csnow_end - csnow_mid) & cs->mask;
+		if (delta < 0)
+			cpumask_set_cpu(cpu, &cpus_ahead);
+		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
+		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		if (cs_nsec > cs_nsec_max)
+			cs_nsec_max = cs_nsec;
+		if (cs_nsec < cs_nsec_min)
+			cs_nsec_min = cs_nsec;
+	}
+	preempt_enable();
+	if (!cpumask_empty(&cpus_ahead))
+		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_ahead), testcpu, cs->name);
+	if (!cpumask_empty(&cpus_behind))
+		pr_warn("        CPUs %*pbl behind CPU %d for clocksource %s.\n",
+			cpumask_pr_args(&cpus_behind), testcpu, cs->name);
+	if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind))
+		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
+			testcpu, cs_nsec_min, cs_nsec_max, cs->name);
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	u64 csnow, wdnow, cslast, wdlast, delta;
@@ -447,6 +501,12 @@ static int __clocksource_watchdog_kthread(void)
 	unsigned long flags;
 	int select = 0;
 
+	/* Do any required per-CPU skew verification. */
+	if (curr_clocksource &&
+	    curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
+	    curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
+		clocksource_verify_percpu(curr_clocksource);
+
 	spin_lock_irqsave(&watchdog_lock, flags);
 	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v14 clocksource 3/6] clocksource: Limit number of CPUs checked for clock synchronization
  2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 1/6] clocksource: Retry clock read if long delays detected Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 2/6] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
@ 2021-05-11 23:34 ` Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC Paul E. McKenney
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, Xing Zhengjun

Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
that clock is checked on all CPUs.  This is thorough, but might not be
what you want on a system with a few tens of CPUs, let alone a few hundred
of them.

Therefore, by default check only up to eight randomly chosen CPUs.
Also provide a new clocksource.verify_n_cpus kernel boot parameter.
A value of -1 says to check all of the CPUs, and a non-negative value says
to randomly select that number of CPUs, without concern about selecting
the same CPU multiple times.  However, make use of a cpumask so that a
given CPU will be checked at most once.

Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1.
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         | 10 +++
 kernel/time/clocksource.c                     | 74 ++++++++++++++++++-
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c6886a44f679..0ef0ee743f65 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -587,6 +587,16 @@
 			unstable.  Defaults to three retries, that is,
 			four attempts to read the clock under test.
 
+	clocksource.verify_n_cpus= [KNL]
+			Limit the number of CPUs checked for clocksources
+			marked with CLOCK_SOURCE_VERIFY_PERCPU that
+			are marked unstable due to excessive skew.
+			A negative value says to check all CPUs, while
+			zero says not to check any.  Values larger than
+			nr_cpu_ids are silently truncated to nr_cpu_ids.
+			The actual CPUs are chosen randomly, with
+			no replacement if the same CPU is chosen twice.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index d994762fcdca..66243da2dadb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -14,6 +14,8 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/prandom.h>
+#include <linux/cpu.h>
 
 #include "tick-internal.h"
 #include "timekeeping_internal.h"
@@ -193,6 +195,8 @@ void clocksource_mark_unstable(struct clocksource *cs)
 
 static ulong max_read_retries = 3;
 module_param(max_read_retries, ulong, 0644);
+static int verify_n_cpus = 8;
+module_param(verify_n_cpus, int, 0644);
 
 static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
 {
@@ -226,6 +230,55 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
 static u64 csnow_mid;
 static cpumask_t cpus_ahead;
 static cpumask_t cpus_behind;
+static cpumask_t cpus_chosen;
+
+static void clocksource_verify_choose_cpus(void)
+{
+	int cpu, i, n = verify_n_cpus;
+
+	if (n < 0) {
+		/* Check all of the CPUs. */
+		cpumask_copy(&cpus_chosen, cpu_online_mask);
+		cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
+		return;
+	}
+
+	/* If no checking desired, or no other CPU to check, leave. */
+	cpumask_clear(&cpus_chosen);
+	if (n == 0 || num_online_cpus() <= 1)
+		return;
+
+	/* Make sure to select at least one CPU other than the current CPU. */
+	cpu = cpumask_next(-1, cpu_online_mask);
+	if (cpu == smp_processor_id())
+		cpu = cpumask_next(cpu, cpu_online_mask);
+	if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
+		return;
+	cpumask_set_cpu(cpu, &cpus_chosen);
+
+	/* Force a sane value for the boot parameter. */
+	if (n > nr_cpu_ids)
+		n = nr_cpu_ids;
+
+	/*
+	 * Randomly select the specified number of CPUs.  If the same
+	 * CPU is selected multiple times, that CPU is checked only once,
+	 * and no replacement CPU is selected.  This gracefully handles
+	 * situations where verify_n_cpus is greater than the number of
+	 * CPUs that are currently online.
+	 */
+	for (i = 1; i < n; i++) {
+		cpu = prandom_u32() % nr_cpu_ids;
+		cpu = cpumask_next(cpu - 1, cpu_online_mask);
+		if (cpu >= nr_cpu_ids)
+			cpu = cpumask_next(-1, cpu_online_mask);
+		if (!WARN_ON_ONCE(cpu >= nr_cpu_ids))
+			cpumask_set_cpu(cpu, &cpus_chosen);
+	}
+
+	/* Don't verify ourselves. */
+	cpumask_clear_cpu(smp_processor_id(), &cpus_chosen);
+}
 
 static void clocksource_verify_one_cpu(void *csin)
 {
@@ -241,12 +294,22 @@ static void clocksource_verify_percpu(struct clocksource *cs)
 	int cpu, testcpu;
 	s64 delta;
 
+	if (verify_n_cpus == 0)
+		return;
 	cpumask_clear(&cpus_ahead);
 	cpumask_clear(&cpus_behind);
+	get_online_cpus();
 	preempt_disable();
+	clocksource_verify_choose_cpus();
+	if (cpumask_weight(&cpus_chosen) == 0) {
+		preempt_enable();
+		put_online_cpus();
+		pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
+		return;
+	}
 	testcpu = smp_processor_id();
-	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
-	for_each_online_cpu(cpu) {
+	pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
+	for_each_cpu(cpu, &cpus_chosen) {
 		if (cpu == testcpu)
 			continue;
 		csnow_begin = cs->read(cs);
@@ -266,6 +329,7 @@ static void clocksource_verify_percpu(struct clocksource *cs)
 			cs_nsec_min = cs_nsec;
 	}
 	preempt_enable();
+	put_online_cpus();
 	if (!cpumask_empty(&cpus_ahead))
 		pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
 			cpumask_pr_args(&cpus_ahead), testcpu, cs->name);
@@ -336,6 +400,12 @@ static void clocksource_watchdog(struct timer_list *unused)
 				watchdog->name, wdnow, wdlast, watchdog->mask);
 			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
 				cs->name, csnow, cslast, cs->mask);
+			if (curr_clocksource == cs)
+				pr_warn("                      '%s' is current clocksource.\n", cs->name);
+			else if (curr_clocksource)
+				pr_warn("                      '%s' (not '%s') is current clocksource.\n", curr_clocksource->name, cs->name);
+			else
+				pr_warn("                      No current clocksource.\n");
 			__clocksource_unstable(cs);
 			continue;
 		}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC
  2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-05-11 23:34 ` [PATCH v14 clocksource 3/6] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
@ 2021-05-11 23:34 ` Paul E. McKenney
  2021-05-12  2:18   ` Feng Tang
  2021-05-11 23:34 ` [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog Paul E. McKenney
  2021-05-11 23:34 ` [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case Paul E. McKenney
  5 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, Xing Zhengjun

Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
a 500-millisecond WATCHDOG_INTERVAL.  This requires that clocks be skewed
by more than 12.5% in order to be marked unstable.  Except that a clock
that is skewed by that much is probably destroying unsuspecting software
right and left.  And given that there are now checks for false-positive
skews due to delays between reading the two clocks, it should be possible
to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
such as TSC.

Therefore, add a new uncertainty_margin field to the clocksource
structure that contains the maximum uncertainty in nanoseconds for
the corresponding clock.  This field may be initialized manually,
as it is for clocksource_tsc_early and clocksource_jiffies, which
is copied to refined_jiffies.  If the field is not initialized
manually, it will be computed at clock-registry time as the period
of the clock in question based on the scale and freq parameters to
__clocksource_update_freq_scale() function.  If either of those two
parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is
used as a cowardly alternative to dividing by zero.  No matter how the
uncertainty_margin field is calculated, it is bounded below by twice
WATCHDOG_MAX_SKEW, that is, by 100 microseconds.

Note that manually initialized uncertainty_margin fields are not adjusted,
but there is a WARN_ON_ONCE() that triggers if any such field is less than
twice WATCHDOG_MAX_SKEW.  This WARN_ON_ONCE() is intended to discourage
production use of the one-nanosecond uncertainty_margin values that are
used to test the clock-skew code itself.

The actual clock-skew check uses the sum of the uncertainty_margin fields
of the two clocksource structures being compared.  Integer overflow is
avoided because the largest computed value of the uncertainty_margin
fields is one billion (10^9), and double that value fits into an
unsigned int.  However, if someone manually specifies (say) UINT_MAX,
they will get what they deserve.

Note that the refined_jiffies uncertainty_margin field is initialized to
TICK_NSEC, which means that skew checks involving this clocksource will
be sufficently forgiving.  In a similar vein, the clocksource_tsc_early
uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
replicates the current behavior and allows custom setting if needed
in order to address the rare skews detected for this clocksource in
current mainline.

Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/tsc.c       |  1 +
 include/linux/clocksource.h |  3 +++
 kernel/time/clocksource.c   | 48 +++++++++++++++++++++++++++++--------
 kernel/time/jiffies.c       | 15 ++++++------
 4 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 6eb1b097e97e..2e076a459a0c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1128,6 +1128,7 @@ static int tsc_cs_enable(struct clocksource *cs)
 static struct clocksource clocksource_tsc_early = {
 	.name			= "tsc-early",
 	.rating			= 299,
+	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
 	.read			= read_tsc,
 	.mask			= CLOCKSOURCE_MASK(64),
 	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 7f83d51c0fd7..895203727cb5 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -43,6 +43,8 @@ struct module;
  * @shift:		Cycle to nanosecond divisor (power of two)
  * @max_idle_ns:	Maximum idle time permitted by the clocksource (nsecs)
  * @maxadj:		Maximum adjustment value to mult (~11%)
+ * @uncertainty_margin:	Maximum uncertainty in nanoseconds per half second.
+ *			Zero says to use default WATCHDOG_THRESHOLD.
  * @archdata:		Optional arch-specific data
  * @max_cycles:		Maximum safe cycle value which won't overflow on
  *			multiplication
@@ -98,6 +100,7 @@ struct clocksource {
 	u32			shift;
 	u64			max_idle_ns;
 	u32			maxadj;
+	u32			uncertainty_margin;
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
 	struct arch_clocksource_data archdata;
 #endif
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 66243da2dadb..9ebf9931f3d6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -95,6 +95,20 @@ static char override_name[CS_NAME_LEN];
 static int finished_booting;
 static u64 suspend_start;
 
+/*
+ * Threshold: 0.0312s, when doubled: 0.0625s.
+ * Also a default for cs->uncertainty_margin when registering clocks.
+ */
+#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)
+
+/*
+ * Maximum permissible delay between two readouts of the watchdog
+ * clocksource surrounding a read of the clocksource being validated.
+ * This delay could be due to SMIs, NMIs, or to VCPU preemptions.  Used as
+ * a lower bound for cs->uncertainty_margin values when registering clocks.
+ */
+#define WATCHDOG_MAX_SKEW (50 * NSEC_PER_USEC)
+
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
 static void clocksource_select(void);
@@ -121,17 +135,9 @@ static int clocksource_watchdog_kthread(void *data);
 static void __clocksource_change_rating(struct clocksource *cs, int rating);
 
 /*
- * Interval: 0.5sec Threshold: 0.0625s
+ * Interval: 0.5sec.
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
-
-/*
- * Maximum permissible delay between two readouts of the watchdog
- * clocksource surrounding a read of the clocksource being validated.
- * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
- */
-#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -347,6 +353,7 @@ static void clocksource_watchdog(struct timer_list *unused)
 	int next_cpu, reset_pending;
 	int64_t wd_nsec, cs_nsec;
 	struct clocksource *cs;
+	u32 md;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -393,7 +400,8 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 
 		/* Check the deviation from the watchdog clocksource. */
-		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
+		if (abs(cs_nsec - wd_nsec) > md) {
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
@@ -1046,6 +1054,26 @@ void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq
 		clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
 				       NSEC_PER_SEC / scale, sec * scale);
 	}
+
+	/*
+	 * If the uncertainty margin is not specified, calculate it.
+	 * If both scale and freq are non-zero, calculate the clock
+	 * period, but bound below at 2*WATCHDOG_MAX_SKEW.  However,
+	 * if either of scale or freq is zero, be very conservative and
+	 * take the tens-of-milliseconds WATCHDOG_THRESHOLD value for the
+	 * uncertainty margin.  Allow stupidly small uncertainty margins
+	 * to be specified by the caller for testing purposes, but warn
+	 * to discourage production use of this capability.
+	 */
+	if (scale && freq && !cs->uncertainty_margin) {
+		cs->uncertainty_margin = NSEC_PER_SEC / (scale * freq);
+		if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
+			cs->uncertainty_margin = 2 * WATCHDOG_MAX_SKEW;
+	} else if (!cs->uncertainty_margin) {
+		cs->uncertainty_margin = WATCHDOG_THRESHOLD;
+	}
+	WARN_ON_ONCE(cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW);
+
 	/*
 	 * Ensure clocksources that have large 'mult' values don't overflow
 	 * when adjusted.
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a492e4da69ba..b3f608c2b936 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -49,13 +49,14 @@ static u64 jiffies_read(struct clocksource *cs)
  * for "tick-less" systems.
  */
 static struct clocksource clocksource_jiffies = {
-	.name		= "jiffies",
-	.rating		= 1, /* lowest valid rating*/
-	.read		= jiffies_read,
-	.mask		= CLOCKSOURCE_MASK(32),
-	.mult		= TICK_NSEC << JIFFIES_SHIFT, /* details above */
-	.shift		= JIFFIES_SHIFT,
-	.max_cycles	= 10,
+	.name			= "jiffies",
+	.rating			= 1, /* lowest valid rating*/
+	.uncertainty_margin	= TICK_NSEC,
+	.read			= jiffies_read,
+	.mask			= CLOCKSOURCE_MASK(32),
+	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */
+	.shift			= JIFFIES_SHIFT,
+	.max_cycles		= 10,
 };
 
 __cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog
  2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-05-11 23:34 ` [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC Paul E. McKenney
@ 2021-05-11 23:34 ` Paul E. McKenney
  2021-05-13  3:29   ` Feng Tang
  2021-05-11 23:34 ` [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case Paul E. McKenney
  5 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E. McKenney, Xing Zhengjun, Chris Mason

When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  It would be good
to have a way of testing the clocksource watchdog's ability to
distinguish between these two causes of clock skew and instability.

Therefore, provide a new clocksource-wdtest module selected by a new
TEST_CLOCKSOURCE_WATCHDOG Kconfig option.  This module has a single module
parameter named "holdoff" that provides the number of seconds of delay
before testing should start, which defaults to zero when built as a module
and to 10 seconds when built directly into the kernel.  Very large systems
that boot slowly may need to increase the value of this module parameter.

This module uses hand-crafted clocksource structures to do its testing,
thus avoiding messing up timing for the rest of the kernel and for user
applications.  This module first verifies that the ->uncertainty_margin
field of the clocksource structures are set sanely.  It then tests the
delay-detection capability of the clocksource watchdog, increasing the
number of consecutive delays injected, first provoking console messages
complaining about the delays and finally forcing a clock-skew event.
Unexpected test results cause at least one WARN_ON_ONCE() console splat.
If there are no splats, the test has passed.  Finally, it fuzzes the
value returned from a clocksource to test the clocksource watchdog's
ability to detect time skew.

This module checks the state of its clocksource after each test, and
uses WARN_ON_ONCE() to emit a console splat if there are any failures.
This should enable all types of test frameworks to detect any such
failures.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Reported-by: Chris Mason <clm@fb.com>
[ paulmck: Export clocksource_verify_percpu per kernel test robot and Stephen Rothwell. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |   6 +
 include/linux/clocksource.h                   |   3 +
 kernel/time/Makefile                          |   1 +
 kernel/time/clocksource-wdtest.c              | 202 ++++++++++++++++++
 kernel/time/clocksource.c                     |  12 +-
 lib/Kconfig.debug                             |  12 ++
 6 files changed, 231 insertions(+), 5 deletions(-)
 create mode 100644 kernel/time/clocksource-wdtest.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0ef0ee743f65..9da285a98aa6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -597,6 +597,12 @@
 			The actual CPUs are chosen randomly, with
 			no replacement if the same CPU is chosen twice.
 
+	clocksource-wdtest.holdoff= [KNL]
+			Set the time in seconds that the clocksource
+			watchdog test waits before commencing its tests.
+			Defaults to zero when built as a module and to
+			10 seconds when built into the kernel.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 895203727cb5..1d42d4b17327 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -291,4 +291,7 @@ static inline void timer_probe(void) {}
 #define TIMER_ACPI_DECLARE(name, table_id, fn)		\
 	ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
 
+extern ulong max_cswd_read_retries;
+void clocksource_verify_percpu(struct clocksource *cs);
+
 #endif /* _LINUX_CLOCKSOURCE_H */
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 1fb1c1ef6a19..1ed85b25b096 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -21,3 +21,4 @@ obj-$(CONFIG_HAVE_GENERIC_VDSO)			+= vsyscall.o
 obj-$(CONFIG_DEBUG_FS)				+= timekeeping_debug.o
 obj-$(CONFIG_TEST_UDELAY)			+= test_udelay.o
 obj-$(CONFIG_TIME_NS)				+= namespace.o
+obj-$(CONFIG_TEST_CLOCKSOURCE_WATCHDOG)		+= clocksource-wdtest.o
diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c
new file mode 100644
index 000000000000..01df12395c0e
--- /dev/null
+++ b/kernel/time/clocksource-wdtest.c
@@ -0,0 +1,202 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Unit test for the clocksource watchdog.
+ *
+ * Copyright (C) 2021 Facebook, Inc.
+ *
+ * Author: Paul E. McKenney <paulmck@kernel.org>
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/clocksource.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
+#include <linux/tick.h>
+#include <linux/kthread.h>
+#include <linux/delay.h>
+#include <linux/prandom.h>
+#include <linux/cpu.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Paul E. McKenney <paulmck@kernel.org>");
+
+static int holdoff = IS_BUILTIN(CONFIG_TEST_CLOCKSOURCE_WATCHDOG) ? 10 : 0;
+module_param(holdoff, int, 0444);
+MODULE_PARM_DESC(holdoff, "Time to wait to start test (s).");
+
+/* Watchdog kthread's task_struct pointer for debug purposes. */
+static struct task_struct *wdtest_task;
+
+static u64 wdtest_jiffies_read(struct clocksource *cs)
+{
+	return (u64)jiffies;
+}
+
+/* Assume HZ > 100. */
+#define JIFFIES_SHIFT	8
+
+static struct clocksource clocksource_wdtest_jiffies = {
+	.name			= "wdtest-jiffies",
+	.rating			= 1, /* lowest valid rating*/
+	.uncertainty_margin	= TICK_NSEC,
+	.read			= wdtest_jiffies_read,
+	.mask			= CLOCKSOURCE_MASK(32),
+	.flags			= CLOCK_SOURCE_MUST_VERIFY,
+	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */
+	.shift			= JIFFIES_SHIFT,
+	.max_cycles		= 10,
+};
+
+static int wdtest_ktime_read_ndelays;
+static bool wdtest_ktime_read_fuzz;
+
+static u64 wdtest_ktime_read(struct clocksource *cs)
+{
+	int wkrn = READ_ONCE(wdtest_ktime_read_ndelays);
+	static int sign = 1;
+	u64 ret;
+
+	if (wkrn) {
+		udelay(cs->uncertainty_margin / 250);
+		WRITE_ONCE(wdtest_ktime_read_ndelays, wkrn - 1);
+	}
+	ret = ktime_get_real_fast_ns();
+	if (READ_ONCE(wdtest_ktime_read_fuzz)) {
+		sign = -sign;
+		ret = ret + sign * 100 * NSEC_PER_MSEC;
+	}
+	return ret;
+}
+
+static void wdtest_ktime_cs_mark_unstable(struct clocksource *cs)
+{
+	pr_info("--- Marking %s unstable due to clocksource watchdog.\n", cs->name);
+}
+
+#define KTIME_FLAGS (CLOCK_SOURCE_IS_CONTINUOUS | \
+		     CLOCK_SOURCE_VALID_FOR_HRES | \
+		     CLOCK_SOURCE_MUST_VERIFY | \
+		     CLOCK_SOURCE_VERIFY_PERCPU)
+
+static struct clocksource clocksource_wdtest_ktime = {
+	.name			= "wdtest-ktime",
+	.rating			= 300,
+	.read			= wdtest_ktime_read,
+	.mask			= CLOCKSOURCE_MASK(64),
+	.flags			= KTIME_FLAGS,
+	.mark_unstable		= wdtest_ktime_cs_mark_unstable,
+	.list			= LIST_HEAD_INIT(clocksource_wdtest_ktime.list),
+};
+
+/* Reset the clocksource if needed. */
+static void wdtest_ktime_clocksource_reset(void)
+{
+	if (clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE) {
+		clocksource_unregister(&clocksource_wdtest_ktime);
+		clocksource_wdtest_ktime.flags = KTIME_FLAGS;
+		schedule_timeout_uninterruptible(HZ / 10);
+		clocksource_register_khz(&clocksource_wdtest_ktime, 1000 * 1000);
+	}
+}
+
+/* Run the specified series of watchdog tests. */
+static int wdtest_func(void *arg)
+{
+	unsigned long j1, j2;
+	char *s;
+	int i;
+
+	schedule_timeout_uninterruptible(holdoff * HZ);
+
+	/*
+	 * Verify that jiffies-like clocksources get the manually
+	 * specified uncertainty margin.
+	 */
+	pr_info("--- Verify jiffies-like uncertainty margin.\n");
+	__clocksource_register(&clocksource_wdtest_jiffies);
+	WARN_ON_ONCE(clocksource_wdtest_jiffies.uncertainty_margin != TICK_NSEC);
+
+	j1 = clocksource_wdtest_jiffies.read(&clocksource_wdtest_jiffies);
+	schedule_timeout_uninterruptible(HZ);
+	j2 = clocksource_wdtest_jiffies.read(&clocksource_wdtest_jiffies);
+	WARN_ON_ONCE(j1 == j2);
+
+	clocksource_unregister(&clocksource_wdtest_jiffies);
+
+	/*
+	 * Verify that tsc-like clocksources are assigned a reasonable
+	 * uncertainty margin.
+	 */
+	pr_info("--- Verify tsc-like uncertainty margin.\n");
+	clocksource_register_khz(&clocksource_wdtest_ktime, 1000 * 1000);
+	WARN_ON_ONCE(clocksource_wdtest_ktime.uncertainty_margin < NSEC_PER_USEC);
+
+	j1 = clocksource_wdtest_ktime.read(&clocksource_wdtest_ktime);
+	udelay(1);
+	j2 = clocksource_wdtest_ktime.read(&clocksource_wdtest_ktime);
+	pr_info("--- tsc-like times: %lu - %lu = %lu.\n", j2, j1, j2 - j1);
+	WARN_ON_ONCE(time_before(j2, j1 + NSEC_PER_USEC));
+
+	/* Verify tsc-like stability with various numbers of errors injected. */
+	for (i = 0; i <= max_cswd_read_retries + 1; i++) {
+		if (i <= 1 && i < max_cswd_read_retries)
+			s = "";
+		else if (i <= max_cswd_read_retries)
+			s = ", expect message";
+		else
+			s = ", expect clock skew";
+		pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s);
+		WRITE_ONCE(wdtest_ktime_read_ndelays, i);
+		schedule_timeout_uninterruptible(2 * HZ);
+		WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays));
+		WARN_ON_ONCE((i <= max_cswd_read_retries) !=
+			     !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
+		wdtest_ktime_clocksource_reset();
+	}
+
+	/* Verify tsc-like stability with clock-value-fuzz error injection. */
+	pr_info("--- Watchdog clock-value-fuzz error injection, expect clock skew and per-CPU mismatches.\n");
+	WRITE_ONCE(wdtest_ktime_read_fuzz, true);
+	schedule_timeout_uninterruptible(2 * HZ);
+	WARN_ON_ONCE(!(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
+	clocksource_verify_percpu(&clocksource_wdtest_ktime);
+	WRITE_ONCE(wdtest_ktime_read_fuzz, false);
+
+	clocksource_unregister(&clocksource_wdtest_ktime);
+
+	pr_info("--- Done with test.\n");
+	return 0;
+}
+
+static void wdtest_print_module_parms(void)
+{
+	pr_alert("--- holdoff=%d\n", holdoff);
+}
+
+/* Cleanup function. */
+static void clocksource_wdtest_cleanup(void)
+{
+}
+
+static int __init clocksource_wdtest_init(void)
+{
+	int ret = 0;
+
+	wdtest_print_module_parms();
+
+	/* Create watchdog-test task. */
+	wdtest_task = kthread_run(wdtest_func, NULL, "wdtest");
+	if (IS_ERR(wdtest_task)) {
+		ret = PTR_ERR(wdtest_task);
+		pr_warn("%s: Failed to create wdtest kthread.\n", __func__);
+		wdtest_task = NULL;
+		return ret;
+	}
+
+	return 0;
+}
+
+module_init(clocksource_wdtest_init);
+module_exit(clocksource_wdtest_cleanup);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 9ebf9931f3d6..bbe1bcf44ffa 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -199,8 +199,9 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
-static ulong max_read_retries = 3;
-module_param(max_read_retries, ulong, 0644);
+ulong max_cswd_read_retries = 3;
+module_param(max_cswd_read_retries, ulong, 0644);
+EXPORT_SYMBOL_GPL(max_cswd_read_retries);
 static int verify_n_cpus = 8;
 module_param(verify_n_cpus, int, 0644);
 
@@ -210,7 +211,7 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
 	u64 wd_end, wd_delta;
 	int64_t wd_delay;
 
-	for (nretries = 0; nretries <= max_read_retries; nretries++) {
+	for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) {
 		local_irq_disable();
 		*wdnow = watchdog->read(watchdog);
 		*csnow = cs->read(cs);
@@ -220,7 +221,7 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
 		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
 		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
 		if (wd_delay <= WATCHDOG_MAX_SKEW) {
-			if (nretries > 1 || nretries >= max_read_retries) {
+			if (nretries > 1 || nretries >= max_cswd_read_retries) {
 				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
 					smp_processor_id(), watchdog->name, nretries);
 			}
@@ -293,7 +294,7 @@ static void clocksource_verify_one_cpu(void *csin)
 	csnow_mid = cs->read(cs);
 }
 
-static void clocksource_verify_percpu(struct clocksource *cs)
+void clocksource_verify_percpu(struct clocksource *cs)
 {
 	int64_t cs_nsec, cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
 	u64 csnow_begin, csnow_end;
@@ -346,6 +347,7 @@ static void clocksource_verify_percpu(struct clocksource *cs)
 		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
 			testcpu, cs_nsec_min, cs_nsec_max, cs->name);
 }
+EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
 
 static void clocksource_watchdog(struct timer_list *unused)
 {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 678c13967580..0a5a70c742e6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2571,6 +2571,18 @@ config TEST_FPU
 
 	  If unsure, say N.
 
+config TEST_CLOCKSOURCE_WATCHDOG
+	tristate "Test clocksource watchdog in kernel space"
+	depends on CLOCKSOURCE_WATCHDOG
+	help
+	  Enable this option to create a kernel module that will trigger
+	  a test of the clocksource watchdog.  This module may be loaded
+	  via modprobe or insmod in which case it will run upon being
+	  loaded, or it may be built in, in which case it will run
+	  shortly after boot.
+
+	  If unsure, say N.
+
 endif # RUNTIME_TESTING_MENU
 
 config ARCH_USE_MEMTEST
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case
  2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2021-05-11 23:34 ` [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog Paul E. McKenney
@ 2021-05-11 23:34 ` Paul E. McKenney
  2021-05-12  2:21   ` Feng Tang
  5 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-11 23:34 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland, maz,
	kernel-team, neeraju, ak, feng.tang, zhengjun.xing,
	Paul E . McKenney

From: Feng Tang <feng.tang@intel.com>

Currently when an unstable clocksource is detected, the raw counters
of that clocksource and watchdog will be printed, which can only be
understood after some math calculation.  So print the existing delta in
nanoseconds to make it easier for humans to check the results.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/time/clocksource.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index bbe1bcf44ffa..9c45b98e60e2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -406,10 +406,10 @@ static void clocksource_watchdog(struct timer_list *unused)
 		if (abs(cs_nsec - wd_nsec) > md) {
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
-			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-				watchdog->name, wdnow, wdlast, watchdog->mask);
-			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
-				cs->name, csnow, cslast, cs->mask);
+			pr_warn("                      '%s' wd_nesc: %lld wd_now: %llx wd_last: %llx mask: %llx\n",
+				watchdog->name, wd_nsec, wdnow, wdlast, watchdog->mask);
+			pr_warn("                      '%s' cs_nsec: %lld cs_now: %llx cs_last: %llx mask: %llx\n",
+				cs->name, cs_nsec, csnow, cslast, cs->mask);
 			if (curr_clocksource == cs)
 				pr_warn("                      '%s' is current clocksource.\n", cs->name);
 			else if (curr_clocksource)
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC
  2021-05-11 23:34 ` [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC Paul E. McKenney
@ 2021-05-12  2:18   ` Feng Tang
  2021-05-12  3:51     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Feng Tang @ 2021-05-12  2:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Xing Zhengjun

Hi Paul,

On Tue, May 11, 2021 at 04:34:53PM -0700, Paul E. McKenney wrote:
> Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
> a 500-millisecond WATCHDOG_INTERVAL.  This requires that clocks be skewed
> by more than 12.5% in order to be marked unstable.  Except that a clock
> that is skewed by that much is probably destroying unsuspecting software
> right and left.  And given that there are now checks for false-positive
> skews due to delays between reading the two clocks, it should be possible
> to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
> such as TSC.
> 
> Therefore, add a new uncertainty_margin field to the clocksource
> structure that contains the maximum uncertainty in nanoseconds for
> the corresponding clock.  This field may be initialized manually,
> as it is for clocksource_tsc_early and clocksource_jiffies, which
> is copied to refined_jiffies.  If the field is not initialized
> manually, it will be computed at clock-registry time as the period
> of the clock in question based on the scale and freq parameters to
> __clocksource_update_freq_scale() function.  If either of those two
> parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is
> used as a cowardly alternative to dividing by zero.  No matter how the
> uncertainty_margin field is calculated, it is bounded below by twice
> WATCHDOG_MAX_SKEW, that is, by 100 microseconds.
> 
> Note that manually initialized uncertainty_margin fields are not adjusted,
> but there is a WARN_ON_ONCE() that triggers if any such field is less than
> twice WATCHDOG_MAX_SKEW.  This WARN_ON_ONCE() is intended to discourage
> production use of the one-nanosecond uncertainty_margin values that are
> used to test the clock-skew code itself.
> 
> The actual clock-skew check uses the sum of the uncertainty_margin fields
> of the two clocksource structures being compared.  Integer overflow is
> avoided because the largest computed value of the uncertainty_margin
> fields is one billion (10^9), and double that value fits into an
> unsigned int.  However, if someone manually specifies (say) UINT_MAX,
> they will get what they deserve.
> 
> Note that the refined_jiffies uncertainty_margin field is initialized to
> TICK_NSEC, which means that skew checks involving this clocksource will
> be sufficently forgiving.  In a similar vein, the clocksource_tsc_early
> uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
> replicates the current behavior and allows custom setting if needed
> in order to address the rare skews detected for this clocksource in
> current mainline.
> 
> Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
> Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
> Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
> Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
> Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Mark Rutland <Mark.Rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
> Cc: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  arch/x86/kernel/tsc.c       |  1 +
>  include/linux/clocksource.h |  3 +++
>  kernel/time/clocksource.c   | 48 +++++++++++++++++++++++++++++--------
>  kernel/time/jiffies.c       | 15 ++++++------
>  4 files changed, 50 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 6eb1b097e97e..2e076a459a0c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1128,6 +1128,7 @@ static int tsc_cs_enable(struct clocksource *cs)
>  static struct clocksource clocksource_tsc_early = {
>  	.name			= "tsc-early",
>  	.rating			= 299,
> +	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
>  	.read			= read_tsc,
>  	.mask			= CLOCKSOURCE_MASK(64),
>  	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 7f83d51c0fd7..895203727cb5 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -43,6 +43,8 @@ struct module;
>   * @shift:		Cycle to nanosecond divisor (power of two)
>   * @max_idle_ns:	Maximum idle time permitted by the clocksource (nsecs)
>   * @maxadj:		Maximum adjustment value to mult (~11%)
> + * @uncertainty_margin:	Maximum uncertainty in nanoseconds per half second.
> + *			Zero says to use default WATCHDOG_THRESHOLD.
>   * @archdata:		Optional arch-specific data
>   * @max_cycles:		Maximum safe cycle value which won't overflow on
>   *			multiplication
> @@ -98,6 +100,7 @@ struct clocksource {
>  	u32			shift;
>  	u64			max_idle_ns;
>  	u32			maxadj;
> +	u32			uncertainty_margin;
>  #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
>  	struct arch_clocksource_data archdata;
>  #endif
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 66243da2dadb..9ebf9931f3d6 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -95,6 +95,20 @@ static char override_name[CS_NAME_LEN];
>  static int finished_booting;
>  static u64 suspend_start;
>  
> +/*
> + * Threshold: 0.0312s, when doubled: 0.0625s.
> + * Also a default for cs->uncertainty_margin when registering clocks.
> + */
> +#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)
> +
> +/*
> + * Maximum permissible delay between two readouts of the watchdog
> + * clocksource surrounding a read of the clocksource being validated.
> + * This delay could be due to SMIs, NMIs, or to VCPU preemptions.  Used as
> + * a lower bound for cs->uncertainty_margin values when registering clocks.
> + */
> +#define WATCHDOG_MAX_SKEW (50 * NSEC_PER_USEC)
> +
>  #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
>  static void clocksource_watchdog_work(struct work_struct *work);
>  static void clocksource_select(void);
> @@ -121,17 +135,9 @@ static int clocksource_watchdog_kthread(void *data);
>  static void __clocksource_change_rating(struct clocksource *cs, int rating);
>  
>  /*
> - * Interval: 0.5sec Threshold: 0.0625s
> + * Interval: 0.5sec.
>   */
>  #define WATCHDOG_INTERVAL (HZ >> 1)
> -#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> -
> -/*
> - * Maximum permissible delay between two readouts of the watchdog
> - * clocksource surrounding a read of the clocksource being validated.
> - * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
> - */
> -#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
>  
>  static void clocksource_watchdog_work(struct work_struct *work)
>  {
> @@ -347,6 +353,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>  	int next_cpu, reset_pending;
>  	int64_t wd_nsec, cs_nsec;
>  	struct clocksource *cs;
> +	u32 md;
>  
>  	spin_lock(&watchdog_lock);
>  	if (!watchdog_running)
> @@ -393,7 +400,8 @@ static void clocksource_watchdog(struct timer_list *unused)
>  			continue;
>  
>  		/* Check the deviation from the watchdog clocksource. */
> -		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
> +		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
> +		if (abs(cs_nsec - wd_nsec) > md) {
>  			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
>  				smp_processor_id(), cs->name);
>  			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
> @@ -1046,6 +1054,26 @@ void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq
>  		clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
>  				       NSEC_PER_SEC / scale, sec * scale);
>  	}
> +
> +	/*
> +	 * If the uncertainty margin is not specified, calculate it.
> +	 * If both scale and freq are non-zero, calculate the clock
> +	 * period, but bound below at 2*WATCHDOG_MAX_SKEW.  However,
> +	 * if either of scale or freq is zero, be very conservative and
> +	 * take the tens-of-milliseconds WATCHDOG_THRESHOLD value for the
> +	 * uncertainty margin.  Allow stupidly small uncertainty margins
> +	 * to be specified by the caller for testing purposes, but warn
> +	 * to discourage production use of this capability.
> +	 */
> +	if (scale && freq && !cs->uncertainty_margin) {
> +		cs->uncertainty_margin = NSEC_PER_SEC / (scale * freq);
> +		if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
> +			cs->uncertainty_margin = 2 * WATCHDOG_MAX_SKEW;
> +	} else if (!cs->uncertainty_margin) {
> +		cs->uncertainty_margin = WATCHDOG_THRESHOLD;
> +	}
> +	WARN_ON_ONCE(cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW);
> +
>  	/*
>  	 * Ensure clocksources that have large 'mult' values don't overflow
>  	 * when adjusted.
> diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> index a492e4da69ba..b3f608c2b936 100644
> --- a/kernel/time/jiffies.c
> +++ b/kernel/time/jiffies.c
> @@ -49,13 +49,14 @@ static u64 jiffies_read(struct clocksource *cs)
>   * for "tick-less" systems.
>   */
>  static struct clocksource clocksource_jiffies = {
> -	.name		= "jiffies",
> -	.rating		= 1, /* lowest valid rating*/
> -	.read		= jiffies_read,
> -	.mask		= CLOCKSOURCE_MASK(32),
> -	.mult		= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> -	.shift		= JIFFIES_SHIFT,
> -	.max_cycles	= 10,
> +	.name			= "jiffies",
> +	.rating			= 1, /* lowest valid rating*/
> +	.uncertainty_margin	= TICK_NSEC,

'jiffies' is known to be very bad as a watchdog ("worse bandaid" in
Thomas' words :)), and TICK_NSEC just turns to 1ms for HZ=1000 case. 
Maybe we should give it a bigger margin, like the 32ms margin for 
'tsc-early'?

Other than this, it looks good to me, thanks!

- Feng

> +	.read			= jiffies_read,
> +	.mask			= CLOCKSOURCE_MASK(32),
> +	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> +	.shift			= JIFFIES_SHIFT,
> +	.max_cycles		= 10,
>  };
>  
>  __cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
> -- 
> 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case
  2021-05-11 23:34 ` [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case Paul E. McKenney
@ 2021-05-12  2:21   ` Feng Tang
  2021-05-12  3:38     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Feng Tang @ 2021-05-12  2:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing

On Tue, May 11, 2021 at 04:34:55PM -0700, Paul E. McKenney wrote:
> From: Feng Tang <feng.tang@intel.com>
> 
> Currently when an unstable clocksource is detected, the raw counters
> of that clocksource and watchdog will be printed, which can only be
> understood after some math calculation.  So print the existing delta in
> nanoseconds to make it easier for humans to check the results.
> 
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/time/clocksource.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index bbe1bcf44ffa..9c45b98e60e2 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -406,10 +406,10 @@ static void clocksource_watchdog(struct timer_list *unused)
>  		if (abs(cs_nsec - wd_nsec) > md) {
>  			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
>  				smp_processor_id(), cs->name);
> -			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
> -				watchdog->name, wdnow, wdlast, watchdog->mask);
> -			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
> -				cs->name, csnow, cslast, cs->mask);
> +			pr_warn("                      '%s' wd_nesc: %lld wd_now: %llx wd_last: %llx mask: %llx\n",

There is a typo in the message, 'wd_nesc' should be 'wd_nsec' , 
sorry for that.

Thanks,
Feng


> +				watchdog->name, wd_nsec, wdnow, wdlast, watchdog->mask);
> +			pr_warn("                      '%s' cs_nsec: %lld cs_now: %llx cs_last: %llx mask: %llx\n",
> +				cs->name, cs_nsec, csnow, cslast, cs->mask);
>  			if (curr_clocksource == cs)
>  				pr_warn("                      '%s' is current clocksource.\n", cs->name);
>  			else if (curr_clocksource)
> -- 
> 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case
  2021-05-12  2:21   ` Feng Tang
@ 2021-05-12  3:38     ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-12  3:38 UTC (permalink / raw)
  To: Feng Tang
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing

On Wed, May 12, 2021 at 10:21:36AM +0800, Feng Tang wrote:
> On Tue, May 11, 2021 at 04:34:55PM -0700, Paul E. McKenney wrote:
> > From: Feng Tang <feng.tang@intel.com>
> > 
> > Currently when an unstable clocksource is detected, the raw counters
> > of that clocksource and watchdog will be printed, which can only be
> > understood after some math calculation.  So print the existing delta in
> > nanoseconds to make it easier for humans to check the results.
> > 
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  kernel/time/clocksource.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index bbe1bcf44ffa..9c45b98e60e2 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -406,10 +406,10 @@ static void clocksource_watchdog(struct timer_list *unused)
> >  		if (abs(cs_nsec - wd_nsec) > md) {
> >  			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
> >  				smp_processor_id(), cs->name);
> > -			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
> > -				watchdog->name, wdnow, wdlast, watchdog->mask);
> > -			pr_warn("                      '%s' cs_now: %llx cs_last: %llx mask: %llx\n",
> > -				cs->name, csnow, cslast, cs->mask);
> > +			pr_warn("                      '%s' wd_nesc: %lld wd_now: %llx wd_last: %llx mask: %llx\n",
> 
> There is a typo in the message, 'wd_nesc' should be 'wd_nsec' , 
> sorry for that.

No problem!  I already have you covered with bfa55d346b23 ("squash!
clocksource: Print deviation in nanoseconds for unstable case").

I will merge it on the next rebase.

							Thanx, Paul

> Thanks,
> Feng
> 
> 
> > +				watchdog->name, wd_nsec, wdnow, wdlast, watchdog->mask);
> > +			pr_warn("                      '%s' cs_nsec: %lld cs_now: %llx cs_last: %llx mask: %llx\n",
> > +				cs->name, cs_nsec, csnow, cslast, cs->mask);
> >  			if (curr_clocksource == cs)
> >  				pr_warn("                      '%s' is current clocksource.\n", cs->name);
> >  			else if (curr_clocksource)
> > -- 
> > 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC
  2021-05-12  2:18   ` Feng Tang
@ 2021-05-12  3:51     ` Paul E. McKenney
  2021-05-12 13:18       ` Feng Tang
  0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-12  3:51 UTC (permalink / raw)
  To: Feng Tang
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Xing Zhengjun

On Wed, May 12, 2021 at 10:18:49AM +0800, Feng Tang wrote:
> Hi Paul,
> 
> On Tue, May 11, 2021 at 04:34:53PM -0700, Paul E. McKenney wrote:
> > Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
> > a 500-millisecond WATCHDOG_INTERVAL.  This requires that clocks be skewed
> > by more than 12.5% in order to be marked unstable.  Except that a clock
> > that is skewed by that much is probably destroying unsuspecting software
> > right and left.  And given that there are now checks for false-positive
> > skews due to delays between reading the two clocks, it should be possible
> > to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
> > such as TSC.
> > 
> > Therefore, add a new uncertainty_margin field to the clocksource
> > structure that contains the maximum uncertainty in nanoseconds for
> > the corresponding clock.  This field may be initialized manually,
> > as it is for clocksource_tsc_early and clocksource_jiffies, which
> > is copied to refined_jiffies.  If the field is not initialized
> > manually, it will be computed at clock-registry time as the period
> > of the clock in question based on the scale and freq parameters to
> > __clocksource_update_freq_scale() function.  If either of those two
> > parameters are zero, the tens-of-milliseconds WATCHDOG_THRESHOLD is
> > used as a cowardly alternative to dividing by zero.  No matter how the
> > uncertainty_margin field is calculated, it is bounded below by twice
> > WATCHDOG_MAX_SKEW, that is, by 100 microseconds.
> > 
> > Note that manually initialized uncertainty_margin fields are not adjusted,
> > but there is a WARN_ON_ONCE() that triggers if any such field is less than
> > twice WATCHDOG_MAX_SKEW.  This WARN_ON_ONCE() is intended to discourage
> > production use of the one-nanosecond uncertainty_margin values that are
> > used to test the clock-skew code itself.
> > 
> > The actual clock-skew check uses the sum of the uncertainty_margin fields
> > of the two clocksource structures being compared.  Integer overflow is
> > avoided because the largest computed value of the uncertainty_margin
> > fields is one billion (10^9), and double that value fits into an
> > unsigned int.  However, if someone manually specifies (say) UINT_MAX,
> > they will get what they deserve.
> > 
> > Note that the refined_jiffies uncertainty_margin field is initialized to
> > TICK_NSEC, which means that skew checks involving this clocksource will
> > be sufficently forgiving.  In a similar vein, the clocksource_tsc_early
> > uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
> > replicates the current behavior and allows custom setting if needed
> > in order to address the rare skews detected for this clocksource in
> > current mainline.
> > 
> > Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
> > Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
> > Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
> > Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
> > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Mark Rutland <Mark.Rutland@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  arch/x86/kernel/tsc.c       |  1 +
> >  include/linux/clocksource.h |  3 +++
> >  kernel/time/clocksource.c   | 48 +++++++++++++++++++++++++++++--------
> >  kernel/time/jiffies.c       | 15 ++++++------
> >  4 files changed, 50 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 6eb1b097e97e..2e076a459a0c 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1128,6 +1128,7 @@ static int tsc_cs_enable(struct clocksource *cs)
> >  static struct clocksource clocksource_tsc_early = {
> >  	.name			= "tsc-early",
> >  	.rating			= 299,
> > +	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
> >  	.read			= read_tsc,
> >  	.mask			= CLOCKSOURCE_MASK(64),
> >  	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 7f83d51c0fd7..895203727cb5 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -43,6 +43,8 @@ struct module;
> >   * @shift:		Cycle to nanosecond divisor (power of two)
> >   * @max_idle_ns:	Maximum idle time permitted by the clocksource (nsecs)
> >   * @maxadj:		Maximum adjustment value to mult (~11%)
> > + * @uncertainty_margin:	Maximum uncertainty in nanoseconds per half second.
> > + *			Zero says to use default WATCHDOG_THRESHOLD.
> >   * @archdata:		Optional arch-specific data
> >   * @max_cycles:		Maximum safe cycle value which won't overflow on
> >   *			multiplication
> > @@ -98,6 +100,7 @@ struct clocksource {
> >  	u32			shift;
> >  	u64			max_idle_ns;
> >  	u32			maxadj;
> > +	u32			uncertainty_margin;
> >  #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
> >  	struct arch_clocksource_data archdata;
> >  #endif
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 66243da2dadb..9ebf9931f3d6 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -95,6 +95,20 @@ static char override_name[CS_NAME_LEN];
> >  static int finished_booting;
> >  static u64 suspend_start;
> >  
> > +/*
> > + * Threshold: 0.0312s, when doubled: 0.0625s.
> > + * Also a default for cs->uncertainty_margin when registering clocks.
> > + */
> > +#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 5)
> > +
> > +/*
> > + * Maximum permissible delay between two readouts of the watchdog
> > + * clocksource surrounding a read of the clocksource being validated.
> > + * This delay could be due to SMIs, NMIs, or to VCPU preemptions.  Used as
> > + * a lower bound for cs->uncertainty_margin values when registering clocks.
> > + */
> > +#define WATCHDOG_MAX_SKEW (50 * NSEC_PER_USEC)
> > +
> >  #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
> >  static void clocksource_watchdog_work(struct work_struct *work);
> >  static void clocksource_select(void);
> > @@ -121,17 +135,9 @@ static int clocksource_watchdog_kthread(void *data);
> >  static void __clocksource_change_rating(struct clocksource *cs, int rating);
> >  
> >  /*
> > - * Interval: 0.5sec Threshold: 0.0625s
> > + * Interval: 0.5sec.
> >   */
> >  #define WATCHDOG_INTERVAL (HZ >> 1)
> > -#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> > -
> > -/*
> > - * Maximum permissible delay between two readouts of the watchdog
> > - * clocksource surrounding a read of the clocksource being validated.
> > - * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
> > - */
> > -#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
> >  
> >  static void clocksource_watchdog_work(struct work_struct *work)
> >  {
> > @@ -347,6 +353,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> >  	int next_cpu, reset_pending;
> >  	int64_t wd_nsec, cs_nsec;
> >  	struct clocksource *cs;
> > +	u32 md;
> >  
> >  	spin_lock(&watchdog_lock);
> >  	if (!watchdog_running)
> > @@ -393,7 +400,8 @@ static void clocksource_watchdog(struct timer_list *unused)
> >  			continue;
> >  
> >  		/* Check the deviation from the watchdog clocksource. */
> > -		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
> > +		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
> > +		if (abs(cs_nsec - wd_nsec) > md) {
> >  			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
> >  				smp_processor_id(), cs->name);
> >  			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
> > @@ -1046,6 +1054,26 @@ void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq
> >  		clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
> >  				       NSEC_PER_SEC / scale, sec * scale);
> >  	}
> > +
> > +	/*
> > +	 * If the uncertainty margin is not specified, calculate it.
> > +	 * If both scale and freq are non-zero, calculate the clock
> > +	 * period, but bound below at 2*WATCHDOG_MAX_SKEW.  However,
> > +	 * if either of scale or freq is zero, be very conservative and
> > +	 * take the tens-of-milliseconds WATCHDOG_THRESHOLD value for the
> > +	 * uncertainty margin.  Allow stupidly small uncertainty margins
> > +	 * to be specified by the caller for testing purposes, but warn
> > +	 * to discourage production use of this capability.
> > +	 */
> > +	if (scale && freq && !cs->uncertainty_margin) {
> > +		cs->uncertainty_margin = NSEC_PER_SEC / (scale * freq);
> > +		if (cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW)
> > +			cs->uncertainty_margin = 2 * WATCHDOG_MAX_SKEW;
> > +	} else if (!cs->uncertainty_margin) {
> > +		cs->uncertainty_margin = WATCHDOG_THRESHOLD;
> > +	}
> > +	WARN_ON_ONCE(cs->uncertainty_margin < 2 * WATCHDOG_MAX_SKEW);
> > +
> >  	/*
> >  	 * Ensure clocksources that have large 'mult' values don't overflow
> >  	 * when adjusted.
> > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> > index a492e4da69ba..b3f608c2b936 100644
> > --- a/kernel/time/jiffies.c
> > +++ b/kernel/time/jiffies.c
> > @@ -49,13 +49,14 @@ static u64 jiffies_read(struct clocksource *cs)
> >   * for "tick-less" systems.
> >   */
> >  static struct clocksource clocksource_jiffies = {
> > -	.name		= "jiffies",
> > -	.rating		= 1, /* lowest valid rating*/
> > -	.read		= jiffies_read,
> > -	.mask		= CLOCKSOURCE_MASK(32),
> > -	.mult		= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> > -	.shift		= JIFFIES_SHIFT,
> > -	.max_cycles	= 10,
> > +	.name			= "jiffies",
> > +	.rating			= 1, /* lowest valid rating*/
> > +	.uncertainty_margin	= TICK_NSEC,
> 
> 'jiffies' is known to be very bad as a watchdog ("worse bandaid" in
> Thomas' words :)), and TICK_NSEC just turns to 1ms for HZ=1000 case. 
> Maybe we should give it a bigger margin, like the 32ms margin for 
> 'tsc-early'?
> 
> Other than this, it looks good to me, thanks!

As in the fixup diff below?

Would you be willing to provide an ack or tested-by for the rest
of the series?  (I have your ack on 1/6.)

						Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index b3f608c2b936..01935aafdb46 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -51,7 +51,7 @@ static u64 jiffies_read(struct clocksource *cs)
 static struct clocksource clocksource_jiffies = {
 	.name			= "jiffies",
 	.rating			= 1, /* lowest valid rating*/
-	.uncertainty_margin	= TICK_NSEC,
+	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
 	.read			= jiffies_read,
 	.mask			= CLOCKSOURCE_MASK(32),
 	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC
  2021-05-12  3:51     ` Paul E. McKenney
@ 2021-05-12 13:18       ` Feng Tang
  2021-05-12 17:14         ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Feng Tang @ 2021-05-12 13:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Xing Zhengjun

Hi Paul,

On Tue, May 11, 2021 at 08:51:56PM -0700, Paul E. McKenney wrote:
> On Wed, May 12, 2021 at 10:18:49AM +0800, Feng Tang wrote:
> > Hi Paul,
> > 
> > On Tue, May 11, 2021 at 04:34:53PM -0700, Paul E. McKenney wrote:

[SNIP]
> > >  	 * Ensure clocksources that have large 'mult' values don't overflow
> > >  	 * when adjusted.
> > > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> > > index a492e4da69ba..b3f608c2b936 100644
> > > --- a/kernel/time/jiffies.c
> > > +++ b/kernel/time/jiffies.c
> > > @@ -49,13 +49,14 @@ static u64 jiffies_read(struct clocksource *cs)
> > >   * for "tick-less" systems.
> > >   */
> > >  static struct clocksource clocksource_jiffies = {
> > > -	.name		= "jiffies",
> > > -	.rating		= 1, /* lowest valid rating*/
> > > -	.read		= jiffies_read,
> > > -	.mask		= CLOCKSOURCE_MASK(32),
> > > -	.mult		= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> > > -	.shift		= JIFFIES_SHIFT,
> > > -	.max_cycles	= 10,
> > > +	.name			= "jiffies",
> > > +	.rating			= 1, /* lowest valid rating*/
> > > +	.uncertainty_margin	= TICK_NSEC,
> > 
> > 'jiffies' is known to be very bad as a watchdog ("worse bandaid" in
> > Thomas' words :)), and TICK_NSEC just turns to 1ms for HZ=1000 case. 
> > Maybe we should give it a bigger margin, like the 32ms margin for 
> > 'tsc-early'?
> > 
> > Other than this, it looks good to me, thanks!
> 
> As in the fixup diff below?

Yep, thanks,

> 
> Would you be willing to provide an ack or tested-by for the rest
> of the series?  (I have your ack on 1/6.)

I haven't figured out a way to check 5/6 patch yet, but feel free
to add my ack for the first 4 patches (1/6 ~ 4/6) 

	Acked-by: Feng Tang <feng.tang@intel.com>

Thanks,
Feng

> 
> 						Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> index b3f608c2b936..01935aafdb46 100644
> --- a/kernel/time/jiffies.c
> +++ b/kernel/time/jiffies.c
> @@ -51,7 +51,7 @@ static u64 jiffies_read(struct clocksource *cs)
>  static struct clocksource clocksource_jiffies = {
>  	.name			= "jiffies",
>  	.rating			= 1, /* lowest valid rating*/
> -	.uncertainty_margin	= TICK_NSEC,
> +	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
>  	.read			= jiffies_read,
>  	.mask			= CLOCKSOURCE_MASK(32),
>  	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC
  2021-05-12 13:18       ` Feng Tang
@ 2021-05-12 17:14         ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-12 17:14 UTC (permalink / raw)
  To: Feng Tang
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Xing Zhengjun

On Wed, May 12, 2021 at 09:18:15PM +0800, Feng Tang wrote:
> Hi Paul,
> 
> On Tue, May 11, 2021 at 08:51:56PM -0700, Paul E. McKenney wrote:
> > On Wed, May 12, 2021 at 10:18:49AM +0800, Feng Tang wrote:
> > > Hi Paul,
> > > 
> > > On Tue, May 11, 2021 at 04:34:53PM -0700, Paul E. McKenney wrote:
> 
> [SNIP]
> > > >  	 * Ensure clocksources that have large 'mult' values don't overflow
> > > >  	 * when adjusted.
> > > > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> > > > index a492e4da69ba..b3f608c2b936 100644
> > > > --- a/kernel/time/jiffies.c
> > > > +++ b/kernel/time/jiffies.c
> > > > @@ -49,13 +49,14 @@ static u64 jiffies_read(struct clocksource *cs)
> > > >   * for "tick-less" systems.
> > > >   */
> > > >  static struct clocksource clocksource_jiffies = {
> > > > -	.name		= "jiffies",
> > > > -	.rating		= 1, /* lowest valid rating*/
> > > > -	.read		= jiffies_read,
> > > > -	.mask		= CLOCKSOURCE_MASK(32),
> > > > -	.mult		= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> > > > -	.shift		= JIFFIES_SHIFT,
> > > > -	.max_cycles	= 10,
> > > > +	.name			= "jiffies",
> > > > +	.rating			= 1, /* lowest valid rating*/
> > > > +	.uncertainty_margin	= TICK_NSEC,
> > > 
> > > 'jiffies' is known to be very bad as a watchdog ("worse bandaid" in
> > > Thomas' words :)), and TICK_NSEC just turns to 1ms for HZ=1000 case. 
> > > Maybe we should give it a bigger margin, like the 32ms margin for 
> > > 'tsc-early'?
> > > 
> > > Other than this, it looks good to me, thanks!
> > 
> > As in the fixup diff below?
> 
> Yep, thanks,
> 
> > Would you be willing to provide an ack or tested-by for the rest
> > of the series?  (I have your ack on 1/6.)
> 
> I haven't figured out a way to check 5/6 patch yet, but feel free
> to add my ack for the first 4 patches (1/6 ~ 4/6) 
> 
> 	Acked-by: Feng Tang <feng.tang@intel.com>

Thank you, and I will apply this.

On the unlikely off-chance that this works in your environtment, here
is how I test it:

	tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --bootargs "clocksource.max_cswd_read_retries=1" --trust-make > /tmp/kvm.sh.out 2>&1

	tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make > /tmp/kvm.sh.out 2>&1

	tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --trust-make > /tmp/kvm.sh.out 2>&1

If there are no splats, the test passed.  The third is tests running the
kernel without the module loaded, so it is probably redundant with your
other testing.  The test completes in a few minutes, give or take your
kernel build time compared to mine.

These create VMs running rcutorture, and test the clocksource watchdog
as a side effect.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog
  2021-05-11 23:34 ` [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog Paul E. McKenney
@ 2021-05-13  3:29   ` Feng Tang
  2021-05-13  4:01     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Feng Tang @ 2021-05-13  3:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Xing Zhengjun,
	Chris Mason

On Tue, May 11, 2021 at 04:34:54PM -0700, Paul E. McKenney wrote:
> When the clocksource watchdog marks a clock as unstable, this might
> be due to that clock being unstable or it might be due to delays that
> happen to occur between the reads of the two clocks.  It would be good
> to have a way of testing the clocksource watchdog's ability to
> distinguish between these two causes of clock skew and instability.
> 
> Therefore, provide a new clocksource-wdtest module selected by a new
> TEST_CLOCKSOURCE_WATCHDOG Kconfig option.  This module has a single module
> parameter named "holdoff" that provides the number of seconds of delay
> before testing should start, which defaults to zero when built as a module
> and to 10 seconds when built directly into the kernel.  Very large systems
> that boot slowly may need to increase the value of this module parameter.
> 
> This module uses hand-crafted clocksource structures to do its testing,
> thus avoiding messing up timing for the rest of the kernel and for user
> applications.  This module first verifies that the ->uncertainty_margin
> field of the clocksource structures are set sanely.  It then tests the
> delay-detection capability of the clocksource watchdog, increasing the
> number of consecutive delays injected, first provoking console messages
> complaining about the delays and finally forcing a clock-skew event.
> Unexpected test results cause at least one WARN_ON_ONCE() console splat.
> If there are no splats, the test has passed.  Finally, it fuzzes the
> value returned from a clocksource to test the clocksource watchdog's
> ability to detect time skew.
> 
> This module checks the state of its clocksource after each test, and
> uses WARN_ON_ONCE() to emit a console splat if there are any failures.
> This should enable all types of test frameworks to detect any such
> failures.
> 
> This facility is intended for diagnostic use only, and should be avoided
> on production systems.

I tried the kvm rcutorture way:
 "tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s
 --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y"
 --bootargs "clo
 +cksource.max_cswd_read_retries=1" --trust-make > /tmp/kvm.sh.out 2>&1"

Also I run it through a baremetal laptop with and without "nohpet" in
kernel cmdline. The debug messages from this module all look as expected, 
so

	Tested-by: Feng Tang <feng.tang@intel.com>

Thanks,
Feng


> Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
> Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
> Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
> Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
> Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Mark Rutland <Mark.Rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
> Reported-by: Chris Mason <clm@fb.com>
> [ paulmck: Export clocksource_verify_percpu per kernel test robot and Stephen Rothwell. ]
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         |   6 +
>  include/linux/clocksource.h                   |   3 +
>  kernel/time/Makefile                          |   1 +
>  kernel/time/clocksource-wdtest.c              | 202 ++++++++++++++++++
>  kernel/time/clocksource.c                     |  12 +-
>  lib/Kconfig.debug                             |  12 ++
>  6 files changed, 231 insertions(+), 5 deletions(-)
>  create mode 100644 kernel/time/clocksource-wdtest.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 0ef0ee743f65..9da285a98aa6 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -597,6 +597,12 @@
>  			The actual CPUs are chosen randomly, with
>  			no replacement if the same CPU is chosen twice.
>  
> +	clocksource-wdtest.holdoff= [KNL]
> +			Set the time in seconds that the clocksource
> +			watchdog test waits before commencing its tests.
> +			Defaults to zero when built as a module and to
> +			10 seconds when built into the kernel.
> +
>  	clearcpuid=BITNUM[,BITNUM...] [X86]
>  			Disable CPUID feature X for the kernel. See
>  			arch/x86/include/asm/cpufeatures.h for the valid bit
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 895203727cb5..1d42d4b17327 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -291,4 +291,7 @@ static inline void timer_probe(void) {}
>  #define TIMER_ACPI_DECLARE(name, table_id, fn)		\
>  	ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
>  
> +extern ulong max_cswd_read_retries;
> +void clocksource_verify_percpu(struct clocksource *cs);
> +
>  #endif /* _LINUX_CLOCKSOURCE_H */
> diff --git a/kernel/time/Makefile b/kernel/time/Makefile
> index 1fb1c1ef6a19..1ed85b25b096 100644
> --- a/kernel/time/Makefile
> +++ b/kernel/time/Makefile
> @@ -21,3 +21,4 @@ obj-$(CONFIG_HAVE_GENERIC_VDSO)			+= vsyscall.o
>  obj-$(CONFIG_DEBUG_FS)				+= timekeeping_debug.o
>  obj-$(CONFIG_TEST_UDELAY)			+= test_udelay.o
>  obj-$(CONFIG_TIME_NS)				+= namespace.o
> +obj-$(CONFIG_TEST_CLOCKSOURCE_WATCHDOG)		+= clocksource-wdtest.o
> diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c
> new file mode 100644
> index 000000000000..01df12395c0e
> --- /dev/null
> +++ b/kernel/time/clocksource-wdtest.c
> @@ -0,0 +1,202 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Unit test for the clocksource watchdog.
> + *
> + * Copyright (C) 2021 Facebook, Inc.
> + *
> + * Author: Paul E. McKenney <paulmck@kernel.org>
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/device.h>
> +#include <linux/clocksource.h>
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
> +#include <linux/tick.h>
> +#include <linux/kthread.h>
> +#include <linux/delay.h>
> +#include <linux/prandom.h>
> +#include <linux/cpu.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Paul E. McKenney <paulmck@kernel.org>");
> +
> +static int holdoff = IS_BUILTIN(CONFIG_TEST_CLOCKSOURCE_WATCHDOG) ? 10 : 0;
> +module_param(holdoff, int, 0444);
> +MODULE_PARM_DESC(holdoff, "Time to wait to start test (s).");
> +
> +/* Watchdog kthread's task_struct pointer for debug purposes. */
> +static struct task_struct *wdtest_task;
> +
> +static u64 wdtest_jiffies_read(struct clocksource *cs)
> +{
> +	return (u64)jiffies;
> +}
> +
> +/* Assume HZ > 100. */
> +#define JIFFIES_SHIFT	8
> +
> +static struct clocksource clocksource_wdtest_jiffies = {
> +	.name			= "wdtest-jiffies",
> +	.rating			= 1, /* lowest valid rating*/
> +	.uncertainty_margin	= TICK_NSEC,
> +	.read			= wdtest_jiffies_read,
> +	.mask			= CLOCKSOURCE_MASK(32),
> +	.flags			= CLOCK_SOURCE_MUST_VERIFY,
> +	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> +	.shift			= JIFFIES_SHIFT,
> +	.max_cycles		= 10,
> +};
> +
> +static int wdtest_ktime_read_ndelays;
> +static bool wdtest_ktime_read_fuzz;
> +
> +static u64 wdtest_ktime_read(struct clocksource *cs)
> +{
> +	int wkrn = READ_ONCE(wdtest_ktime_read_ndelays);
> +	static int sign = 1;
> +	u64 ret;
> +
> +	if (wkrn) {
> +		udelay(cs->uncertainty_margin / 250);
> +		WRITE_ONCE(wdtest_ktime_read_ndelays, wkrn - 1);
> +	}
> +	ret = ktime_get_real_fast_ns();
> +	if (READ_ONCE(wdtest_ktime_read_fuzz)) {
> +		sign = -sign;
> +		ret = ret + sign * 100 * NSEC_PER_MSEC;
> +	}
> +	return ret;
> +}
> +
> +static void wdtest_ktime_cs_mark_unstable(struct clocksource *cs)
> +{
> +	pr_info("--- Marking %s unstable due to clocksource watchdog.\n", cs->name);
> +}
> +
> +#define KTIME_FLAGS (CLOCK_SOURCE_IS_CONTINUOUS | \
> +		     CLOCK_SOURCE_VALID_FOR_HRES | \
> +		     CLOCK_SOURCE_MUST_VERIFY | \
> +		     CLOCK_SOURCE_VERIFY_PERCPU)
> +
> +static struct clocksource clocksource_wdtest_ktime = {
> +	.name			= "wdtest-ktime",
> +	.rating			= 300,
> +	.read			= wdtest_ktime_read,
> +	.mask			= CLOCKSOURCE_MASK(64),
> +	.flags			= KTIME_FLAGS,
> +	.mark_unstable		= wdtest_ktime_cs_mark_unstable,
> +	.list			= LIST_HEAD_INIT(clocksource_wdtest_ktime.list),
> +};
> +
> +/* Reset the clocksource if needed. */
> +static void wdtest_ktime_clocksource_reset(void)
> +{
> +	if (clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE) {
> +		clocksource_unregister(&clocksource_wdtest_ktime);
> +		clocksource_wdtest_ktime.flags = KTIME_FLAGS;
> +		schedule_timeout_uninterruptible(HZ / 10);
> +		clocksource_register_khz(&clocksource_wdtest_ktime, 1000 * 1000);
> +	}
> +}
> +
> +/* Run the specified series of watchdog tests. */
> +static int wdtest_func(void *arg)
> +{
> +	unsigned long j1, j2;
> +	char *s;
> +	int i;
> +
> +	schedule_timeout_uninterruptible(holdoff * HZ);
> +
> +	/*
> +	 * Verify that jiffies-like clocksources get the manually
> +	 * specified uncertainty margin.
> +	 */
> +	pr_info("--- Verify jiffies-like uncertainty margin.\n");
> +	__clocksource_register(&clocksource_wdtest_jiffies);
> +	WARN_ON_ONCE(clocksource_wdtest_jiffies.uncertainty_margin != TICK_NSEC);
> +
> +	j1 = clocksource_wdtest_jiffies.read(&clocksource_wdtest_jiffies);
> +	schedule_timeout_uninterruptible(HZ);
> +	j2 = clocksource_wdtest_jiffies.read(&clocksource_wdtest_jiffies);
> +	WARN_ON_ONCE(j1 == j2);
> +
> +	clocksource_unregister(&clocksource_wdtest_jiffies);
> +
> +	/*
> +	 * Verify that tsc-like clocksources are assigned a reasonable
> +	 * uncertainty margin.
> +	 */
> +	pr_info("--- Verify tsc-like uncertainty margin.\n");
> +	clocksource_register_khz(&clocksource_wdtest_ktime, 1000 * 1000);
> +	WARN_ON_ONCE(clocksource_wdtest_ktime.uncertainty_margin < NSEC_PER_USEC);
> +
> +	j1 = clocksource_wdtest_ktime.read(&clocksource_wdtest_ktime);
> +	udelay(1);
> +	j2 = clocksource_wdtest_ktime.read(&clocksource_wdtest_ktime);
> +	pr_info("--- tsc-like times: %lu - %lu = %lu.\n", j2, j1, j2 - j1);
> +	WARN_ON_ONCE(time_before(j2, j1 + NSEC_PER_USEC));
> +
> +	/* Verify tsc-like stability with various numbers of errors injected. */
> +	for (i = 0; i <= max_cswd_read_retries + 1; i++) {
> +		if (i <= 1 && i < max_cswd_read_retries)
> +			s = "";
> +		else if (i <= max_cswd_read_retries)
> +			s = ", expect message";
> +		else
> +			s = ", expect clock skew";
> +		pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s);
> +		WRITE_ONCE(wdtest_ktime_read_ndelays, i);
> +		schedule_timeout_uninterruptible(2 * HZ);
> +		WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays));
> +		WARN_ON_ONCE((i <= max_cswd_read_retries) !=
> +			     !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
> +		wdtest_ktime_clocksource_reset();
> +	}
> +
> +	/* Verify tsc-like stability with clock-value-fuzz error injection. */
> +	pr_info("--- Watchdog clock-value-fuzz error injection, expect clock skew and per-CPU mismatches.\n");
> +	WRITE_ONCE(wdtest_ktime_read_fuzz, true);
> +	schedule_timeout_uninterruptible(2 * HZ);
> +	WARN_ON_ONCE(!(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
> +	clocksource_verify_percpu(&clocksource_wdtest_ktime);
> +	WRITE_ONCE(wdtest_ktime_read_fuzz, false);
> +
> +	clocksource_unregister(&clocksource_wdtest_ktime);
> +
> +	pr_info("--- Done with test.\n");
> +	return 0;
> +}
> +
> +static void wdtest_print_module_parms(void)
> +{
> +	pr_alert("--- holdoff=%d\n", holdoff);
> +}
> +
> +/* Cleanup function. */
> +static void clocksource_wdtest_cleanup(void)
> +{
> +}
> +
> +static int __init clocksource_wdtest_init(void)
> +{
> +	int ret = 0;
> +
> +	wdtest_print_module_parms();
> +
> +	/* Create watchdog-test task. */
> +	wdtest_task = kthread_run(wdtest_func, NULL, "wdtest");
> +	if (IS_ERR(wdtest_task)) {
> +		ret = PTR_ERR(wdtest_task);
> +		pr_warn("%s: Failed to create wdtest kthread.\n", __func__);
> +		wdtest_task = NULL;
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +module_init(clocksource_wdtest_init);
> +module_exit(clocksource_wdtest_cleanup);
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 9ebf9931f3d6..bbe1bcf44ffa 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -199,8 +199,9 @@ void clocksource_mark_unstable(struct clocksource *cs)
>  	spin_unlock_irqrestore(&watchdog_lock, flags);
>  }
>  
> -static ulong max_read_retries = 3;
> -module_param(max_read_retries, ulong, 0644);
> +ulong max_cswd_read_retries = 3;
> +module_param(max_cswd_read_retries, ulong, 0644);
> +EXPORT_SYMBOL_GPL(max_cswd_read_retries);
>  static int verify_n_cpus = 8;
>  module_param(verify_n_cpus, int, 0644);
>  
> @@ -210,7 +211,7 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
>  	u64 wd_end, wd_delta;
>  	int64_t wd_delay;
>  
> -	for (nretries = 0; nretries <= max_read_retries; nretries++) {
> +	for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) {
>  		local_irq_disable();
>  		*wdnow = watchdog->read(watchdog);
>  		*csnow = cs->read(cs);
> @@ -220,7 +221,7 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
>  		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
>  		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
>  		if (wd_delay <= WATCHDOG_MAX_SKEW) {
> -			if (nretries > 1 || nretries >= max_read_retries) {
> +			if (nretries > 1 || nretries >= max_cswd_read_retries) {
>  				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
>  					smp_processor_id(), watchdog->name, nretries);
>  			}
> @@ -293,7 +294,7 @@ static void clocksource_verify_one_cpu(void *csin)
>  	csnow_mid = cs->read(cs);
>  }
>  
> -static void clocksource_verify_percpu(struct clocksource *cs)
> +void clocksource_verify_percpu(struct clocksource *cs)
>  {
>  	int64_t cs_nsec, cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
>  	u64 csnow_begin, csnow_end;
> @@ -346,6 +347,7 @@ static void clocksource_verify_percpu(struct clocksource *cs)
>  		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
>  			testcpu, cs_nsec_min, cs_nsec_max, cs->name);
>  }
> +EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
>  
>  static void clocksource_watchdog(struct timer_list *unused)
>  {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 678c13967580..0a5a70c742e6 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2571,6 +2571,18 @@ config TEST_FPU
>  
>  	  If unsure, say N.
>  
> +config TEST_CLOCKSOURCE_WATCHDOG
> +	tristate "Test clocksource watchdog in kernel space"
> +	depends on CLOCKSOURCE_WATCHDOG
> +	help
> +	  Enable this option to create a kernel module that will trigger
> +	  a test of the clocksource watchdog.  This module may be loaded
> +	  via modprobe or insmod in which case it will run upon being
> +	  loaded, or it may be built in, in which case it will run
> +	  shortly after boot.
> +
> +	  If unsure, say N.
> +
>  endif # RUNTIME_TESTING_MENU
>  
>  config ARCH_USE_MEMTEST
> -- 
> 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog
  2021-05-13  3:29   ` Feng Tang
@ 2021-05-13  4:01     ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2021-05-13  4:01 UTC (permalink / raw)
  To: Feng Tang
  Cc: tglx, linux-kernel, john.stultz, sboyd, corbet, Mark.Rutland,
	maz, kernel-team, neeraju, ak, zhengjun.xing, Xing Zhengjun,
	Chris Mason

On Thu, May 13, 2021 at 11:29:47AM +0800, Feng Tang wrote:
> On Tue, May 11, 2021 at 04:34:54PM -0700, Paul E. McKenney wrote:
> > When the clocksource watchdog marks a clock as unstable, this might
> > be due to that clock being unstable or it might be due to delays that
> > happen to occur between the reads of the two clocks.  It would be good
> > to have a way of testing the clocksource watchdog's ability to
> > distinguish between these two causes of clock skew and instability.
> > 
> > Therefore, provide a new clocksource-wdtest module selected by a new
> > TEST_CLOCKSOURCE_WATCHDOG Kconfig option.  This module has a single module
> > parameter named "holdoff" that provides the number of seconds of delay
> > before testing should start, which defaults to zero when built as a module
> > and to 10 seconds when built directly into the kernel.  Very large systems
> > that boot slowly may need to increase the value of this module parameter.
> > 
> > This module uses hand-crafted clocksource structures to do its testing,
> > thus avoiding messing up timing for the rest of the kernel and for user
> > applications.  This module first verifies that the ->uncertainty_margin
> > field of the clocksource structures are set sanely.  It then tests the
> > delay-detection capability of the clocksource watchdog, increasing the
> > number of consecutive delays injected, first provoking console messages
> > complaining about the delays and finally forcing a clock-skew event.
> > Unexpected test results cause at least one WARN_ON_ONCE() console splat.
> > If there are no splats, the test has passed.  Finally, it fuzzes the
> > value returned from a clocksource to test the clocksource watchdog's
> > ability to detect time skew.
> > 
> > This module checks the state of its clocksource after each test, and
> > uses WARN_ON_ONCE() to emit a console splat if there are any failures.
> > This should enable all types of test frameworks to detect any such
> > failures.
> > 
> > This facility is intended for diagnostic use only, and should be avoided
> > on production systems.
> 
> I tried the kvm rcutorture way:
>  "tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s
>  --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y"
>  --bootargs "clo
>  +cksource.max_cswd_read_retries=1" --trust-make > /tmp/kvm.sh.out 2>&1"
> 
> Also I run it through a baremetal laptop with and without "nohpet" in
> kernel cmdline. The debug messages from this module all look as expected, 
> so
> 
> 	Tested-by: Feng Tang <feng.tang@intel.com>

Thank you!  I will apply this on my next rebase.

							Thanx, Paul

> Thanks,
> Feng
> 
> 
> > Link: https://lore.kernel.org/lkml/202104291438.PuHsxRkl-lkp@intel.com/
> > Link: https://lore.kernel.org/lkml/20210429140440.GT975577@paulmck-ThinkPad-P17-Gen-1
> > Link: https://lore.kernel.org/lkml/20210425224540.GA1312438@paulmck-ThinkPad-P17-Gen-1/
> > Link: https://lore.kernel.org/lkml/20210420064934.GE31773@xsang-OptiPlex-9020/
> > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Stephen Boyd <sboyd@kernel.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Mark Rutland <Mark.Rutland@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
> > Reported-by: Chris Mason <clm@fb.com>
> > [ paulmck: Export clocksource_verify_percpu per kernel test robot and Stephen Rothwell. ]
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |   6 +
> >  include/linux/clocksource.h                   |   3 +
> >  kernel/time/Makefile                          |   1 +
> >  kernel/time/clocksource-wdtest.c              | 202 ++++++++++++++++++
> >  kernel/time/clocksource.c                     |  12 +-
> >  lib/Kconfig.debug                             |  12 ++
> >  6 files changed, 231 insertions(+), 5 deletions(-)
> >  create mode 100644 kernel/time/clocksource-wdtest.c
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 0ef0ee743f65..9da285a98aa6 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -597,6 +597,12 @@
> >  			The actual CPUs are chosen randomly, with
> >  			no replacement if the same CPU is chosen twice.
> >  
> > +	clocksource-wdtest.holdoff= [KNL]
> > +			Set the time in seconds that the clocksource
> > +			watchdog test waits before commencing its tests.
> > +			Defaults to zero when built as a module and to
> > +			10 seconds when built into the kernel.
> > +
> >  	clearcpuid=BITNUM[,BITNUM...] [X86]
> >  			Disable CPUID feature X for the kernel. See
> >  			arch/x86/include/asm/cpufeatures.h for the valid bit
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 895203727cb5..1d42d4b17327 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -291,4 +291,7 @@ static inline void timer_probe(void) {}
> >  #define TIMER_ACPI_DECLARE(name, table_id, fn)		\
> >  	ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
> >  
> > +extern ulong max_cswd_read_retries;
> > +void clocksource_verify_percpu(struct clocksource *cs);
> > +
> >  #endif /* _LINUX_CLOCKSOURCE_H */
> > diff --git a/kernel/time/Makefile b/kernel/time/Makefile
> > index 1fb1c1ef6a19..1ed85b25b096 100644
> > --- a/kernel/time/Makefile
> > +++ b/kernel/time/Makefile
> > @@ -21,3 +21,4 @@ obj-$(CONFIG_HAVE_GENERIC_VDSO)			+= vsyscall.o
> >  obj-$(CONFIG_DEBUG_FS)				+= timekeeping_debug.o
> >  obj-$(CONFIG_TEST_UDELAY)			+= test_udelay.o
> >  obj-$(CONFIG_TIME_NS)				+= namespace.o
> > +obj-$(CONFIG_TEST_CLOCKSOURCE_WATCHDOG)		+= clocksource-wdtest.o
> > diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c
> > new file mode 100644
> > index 000000000000..01df12395c0e
> > --- /dev/null
> > +++ b/kernel/time/clocksource-wdtest.c
> > @@ -0,0 +1,202 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Unit test for the clocksource watchdog.
> > + *
> > + * Copyright (C) 2021 Facebook, Inc.
> > + *
> > + * Author: Paul E. McKenney <paulmck@kernel.org>
> > + */
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/device.h>
> > +#include <linux/clocksource.h>
> > +#include <linux/init.h>
> > +#include <linux/module.h>
> > +#include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
> > +#include <linux/tick.h>
> > +#include <linux/kthread.h>
> > +#include <linux/delay.h>
> > +#include <linux/prandom.h>
> > +#include <linux/cpu.h>
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Paul E. McKenney <paulmck@kernel.org>");
> > +
> > +static int holdoff = IS_BUILTIN(CONFIG_TEST_CLOCKSOURCE_WATCHDOG) ? 10 : 0;
> > +module_param(holdoff, int, 0444);
> > +MODULE_PARM_DESC(holdoff, "Time to wait to start test (s).");
> > +
> > +/* Watchdog kthread's task_struct pointer for debug purposes. */
> > +static struct task_struct *wdtest_task;
> > +
> > +static u64 wdtest_jiffies_read(struct clocksource *cs)
> > +{
> > +	return (u64)jiffies;
> > +}
> > +
> > +/* Assume HZ > 100. */
> > +#define JIFFIES_SHIFT	8
> > +
> > +static struct clocksource clocksource_wdtest_jiffies = {
> > +	.name			= "wdtest-jiffies",
> > +	.rating			= 1, /* lowest valid rating*/
> > +	.uncertainty_margin	= TICK_NSEC,
> > +	.read			= wdtest_jiffies_read,
> > +	.mask			= CLOCKSOURCE_MASK(32),
> > +	.flags			= CLOCK_SOURCE_MUST_VERIFY,
> > +	.mult			= TICK_NSEC << JIFFIES_SHIFT, /* details above */
> > +	.shift			= JIFFIES_SHIFT,
> > +	.max_cycles		= 10,
> > +};
> > +
> > +static int wdtest_ktime_read_ndelays;
> > +static bool wdtest_ktime_read_fuzz;
> > +
> > +static u64 wdtest_ktime_read(struct clocksource *cs)
> > +{
> > +	int wkrn = READ_ONCE(wdtest_ktime_read_ndelays);
> > +	static int sign = 1;
> > +	u64 ret;
> > +
> > +	if (wkrn) {
> > +		udelay(cs->uncertainty_margin / 250);
> > +		WRITE_ONCE(wdtest_ktime_read_ndelays, wkrn - 1);
> > +	}
> > +	ret = ktime_get_real_fast_ns();
> > +	if (READ_ONCE(wdtest_ktime_read_fuzz)) {
> > +		sign = -sign;
> > +		ret = ret + sign * 100 * NSEC_PER_MSEC;
> > +	}
> > +	return ret;
> > +}
> > +
> > +static void wdtest_ktime_cs_mark_unstable(struct clocksource *cs)
> > +{
> > +	pr_info("--- Marking %s unstable due to clocksource watchdog.\n", cs->name);
> > +}
> > +
> > +#define KTIME_FLAGS (CLOCK_SOURCE_IS_CONTINUOUS | \
> > +		     CLOCK_SOURCE_VALID_FOR_HRES | \
> > +		     CLOCK_SOURCE_MUST_VERIFY | \
> > +		     CLOCK_SOURCE_VERIFY_PERCPU)
> > +
> > +static struct clocksource clocksource_wdtest_ktime = {
> > +	.name			= "wdtest-ktime",
> > +	.rating			= 300,
> > +	.read			= wdtest_ktime_read,
> > +	.mask			= CLOCKSOURCE_MASK(64),
> > +	.flags			= KTIME_FLAGS,
> > +	.mark_unstable		= wdtest_ktime_cs_mark_unstable,
> > +	.list			= LIST_HEAD_INIT(clocksource_wdtest_ktime.list),
> > +};
> > +
> > +/* Reset the clocksource if needed. */
> > +static void wdtest_ktime_clocksource_reset(void)
> > +{
> > +	if (clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE) {
> > +		clocksource_unregister(&clocksource_wdtest_ktime);
> > +		clocksource_wdtest_ktime.flags = KTIME_FLAGS;
> > +		schedule_timeout_uninterruptible(HZ / 10);
> > +		clocksource_register_khz(&clocksource_wdtest_ktime, 1000 * 1000);
> > +	}
> > +}
> > +
> > +/* Run the specified series of watchdog tests. */
> > +static int wdtest_func(void *arg)
> > +{
> > +	unsigned long j1, j2;
> > +	char *s;
> > +	int i;
> > +
> > +	schedule_timeout_uninterruptible(holdoff * HZ);
> > +
> > +	/*
> > +	 * Verify that jiffies-like clocksources get the manually
> > +	 * specified uncertainty margin.
> > +	 */
> > +	pr_info("--- Verify jiffies-like uncertainty margin.\n");
> > +	__clocksource_register(&clocksource_wdtest_jiffies);
> > +	WARN_ON_ONCE(clocksource_wdtest_jiffies.uncertainty_margin != TICK_NSEC);
> > +
> > +	j1 = clocksource_wdtest_jiffies.read(&clocksource_wdtest_jiffies);
> > +	schedule_timeout_uninterruptible(HZ);
> > +	j2 = clocksource_wdtest_jiffies.read(&clocksource_wdtest_jiffies);
> > +	WARN_ON_ONCE(j1 == j2);
> > +
> > +	clocksource_unregister(&clocksource_wdtest_jiffies);
> > +
> > +	/*
> > +	 * Verify that tsc-like clocksources are assigned a reasonable
> > +	 * uncertainty margin.
> > +	 */
> > +	pr_info("--- Verify tsc-like uncertainty margin.\n");
> > +	clocksource_register_khz(&clocksource_wdtest_ktime, 1000 * 1000);
> > +	WARN_ON_ONCE(clocksource_wdtest_ktime.uncertainty_margin < NSEC_PER_USEC);
> > +
> > +	j1 = clocksource_wdtest_ktime.read(&clocksource_wdtest_ktime);
> > +	udelay(1);
> > +	j2 = clocksource_wdtest_ktime.read(&clocksource_wdtest_ktime);
> > +	pr_info("--- tsc-like times: %lu - %lu = %lu.\n", j2, j1, j2 - j1);
> > +	WARN_ON_ONCE(time_before(j2, j1 + NSEC_PER_USEC));
> > +
> > +	/* Verify tsc-like stability with various numbers of errors injected. */
> > +	for (i = 0; i <= max_cswd_read_retries + 1; i++) {
> > +		if (i <= 1 && i < max_cswd_read_retries)
> > +			s = "";
> > +		else if (i <= max_cswd_read_retries)
> > +			s = ", expect message";
> > +		else
> > +			s = ", expect clock skew";
> > +		pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s);
> > +		WRITE_ONCE(wdtest_ktime_read_ndelays, i);
> > +		schedule_timeout_uninterruptible(2 * HZ);
> > +		WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays));
> > +		WARN_ON_ONCE((i <= max_cswd_read_retries) !=
> > +			     !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
> > +		wdtest_ktime_clocksource_reset();
> > +	}
> > +
> > +	/* Verify tsc-like stability with clock-value-fuzz error injection. */
> > +	pr_info("--- Watchdog clock-value-fuzz error injection, expect clock skew and per-CPU mismatches.\n");
> > +	WRITE_ONCE(wdtest_ktime_read_fuzz, true);
> > +	schedule_timeout_uninterruptible(2 * HZ);
> > +	WARN_ON_ONCE(!(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
> > +	clocksource_verify_percpu(&clocksource_wdtest_ktime);
> > +	WRITE_ONCE(wdtest_ktime_read_fuzz, false);
> > +
> > +	clocksource_unregister(&clocksource_wdtest_ktime);
> > +
> > +	pr_info("--- Done with test.\n");
> > +	return 0;
> > +}
> > +
> > +static void wdtest_print_module_parms(void)
> > +{
> > +	pr_alert("--- holdoff=%d\n", holdoff);
> > +}
> > +
> > +/* Cleanup function. */
> > +static void clocksource_wdtest_cleanup(void)
> > +{
> > +}
> > +
> > +static int __init clocksource_wdtest_init(void)
> > +{
> > +	int ret = 0;
> > +
> > +	wdtest_print_module_parms();
> > +
> > +	/* Create watchdog-test task. */
> > +	wdtest_task = kthread_run(wdtest_func, NULL, "wdtest");
> > +	if (IS_ERR(wdtest_task)) {
> > +		ret = PTR_ERR(wdtest_task);
> > +		pr_warn("%s: Failed to create wdtest kthread.\n", __func__);
> > +		wdtest_task = NULL;
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +module_init(clocksource_wdtest_init);
> > +module_exit(clocksource_wdtest_cleanup);
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 9ebf9931f3d6..bbe1bcf44ffa 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -199,8 +199,9 @@ void clocksource_mark_unstable(struct clocksource *cs)
> >  	spin_unlock_irqrestore(&watchdog_lock, flags);
> >  }
> >  
> > -static ulong max_read_retries = 3;
> > -module_param(max_read_retries, ulong, 0644);
> > +ulong max_cswd_read_retries = 3;
> > +module_param(max_cswd_read_retries, ulong, 0644);
> > +EXPORT_SYMBOL_GPL(max_cswd_read_retries);
> >  static int verify_n_cpus = 8;
> >  module_param(verify_n_cpus, int, 0644);
> >  
> > @@ -210,7 +211,7 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> >  	u64 wd_end, wd_delta;
> >  	int64_t wd_delay;
> >  
> > -	for (nretries = 0; nretries <= max_read_retries; nretries++) {
> > +	for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) {
> >  		local_irq_disable();
> >  		*wdnow = watchdog->read(watchdog);
> >  		*csnow = cs->read(cs);
> > @@ -220,7 +221,7 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> >  		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
> >  		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift);
> >  		if (wd_delay <= WATCHDOG_MAX_SKEW) {
> > -			if (nretries > 1 || nretries >= max_read_retries) {
> > +			if (nretries > 1 || nretries >= max_cswd_read_retries) {
> >  				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
> >  					smp_processor_id(), watchdog->name, nretries);
> >  			}
> > @@ -293,7 +294,7 @@ static void clocksource_verify_one_cpu(void *csin)
> >  	csnow_mid = cs->read(cs);
> >  }
> >  
> > -static void clocksource_verify_percpu(struct clocksource *cs)
> > +void clocksource_verify_percpu(struct clocksource *cs)
> >  {
> >  	int64_t cs_nsec, cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
> >  	u64 csnow_begin, csnow_end;
> > @@ -346,6 +347,7 @@ static void clocksource_verify_percpu(struct clocksource *cs)
> >  		pr_warn("        CPU %d check durations %lldns - %lldns for clocksource %s.\n",
> >  			testcpu, cs_nsec_min, cs_nsec_max, cs->name);
> >  }
> > +EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
> >  
> >  static void clocksource_watchdog(struct timer_list *unused)
> >  {
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 678c13967580..0a5a70c742e6 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2571,6 +2571,18 @@ config TEST_FPU
> >  
> >  	  If unsure, say N.
> >  
> > +config TEST_CLOCKSOURCE_WATCHDOG
> > +	tristate "Test clocksource watchdog in kernel space"
> > +	depends on CLOCKSOURCE_WATCHDOG
> > +	help
> > +	  Enable this option to create a kernel module that will trigger
> > +	  a test of the clocksource watchdog.  This module may be loaded
> > +	  via modprobe or insmod in which case it will run upon being
> > +	  loaded, or it may be built in, in which case it will run
> > +	  shortly after boot.
> > +
> > +	  If unsure, say N.
> > +
> >  endif # RUNTIME_TESTING_MENU
> >  
> >  config ARCH_USE_MEMTEST
> > -- 
> > 2.31.1.189.g2e36527f23

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-13  4:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 23:34 [PATCH v14 clocksource 0/5] Do not mark clocks unstable due to delays for v5.14 Paul E. McKenney
2021-05-11 23:34 ` [PATCH v14 clocksource 1/6] clocksource: Retry clock read if long delays detected Paul E. McKenney
2021-05-11 23:34 ` [PATCH v14 clocksource 2/6] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
2021-05-11 23:34 ` [PATCH v14 clocksource 3/6] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
2021-05-11 23:34 ` [PATCH v14 clocksource 4/6] clocksource: Reduce clocksource-skew threshold for TSC Paul E. McKenney
2021-05-12  2:18   ` Feng Tang
2021-05-12  3:51     ` Paul E. McKenney
2021-05-12 13:18       ` Feng Tang
2021-05-12 17:14         ` Paul E. McKenney
2021-05-11 23:34 ` [PATCH v14 clocksource 5/6] clocksource: Provide kernel module to test clocksource watchdog Paul E. McKenney
2021-05-13  3:29   ` Feng Tang
2021-05-13  4:01     ` Paul E. McKenney
2021-05-11 23:34 ` [PATCH v14 clocksource 6/6] clocksource: Print deviation in nanoseconds for unstable case Paul E. McKenney
2021-05-12  2:21   ` Feng Tang
2021-05-12  3:38     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).