All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@kernel.org>, Chris Mason <clm@fb.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Feng Tang <feng.tang@intel.com>, Sasha Levin <sashal@kernel.org>,
	linux-doc@vger.kernel.org
Subject: [PATCH AUTOSEL 5.4 19/26] clocksource: Retry clock read if long delays detected
Date: Mon,  5 Jul 2021 11:30:32 -0400	[thread overview]
Message-ID: <20210705153039.1521781-19-sashal@kernel.org> (raw)
In-Reply-To: <20210705153039.1521781-1-sashal@kernel.org>

From: "Paul E. McKenney" <paulmck@kernel.org>

[ Upstream commit db3a34e17433de2390eb80d436970edcebd0ca3e ]

When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can delay
interrupts-disabled regions of code ranging from SMI handlers to vCPU
preemption.  It would be good to have some indication as to why the clock
was marked unstable.

Therefore, re-read the watchdog clock on either side of the read from the
clock under test.  If the watchdog clock shows an excessive time delta
between its pair of reads, the reads are retried.

The maximum number of retries is specified by a new kernel boot parameter
clocksource.max_cswd_read_retries, which defaults to three, that is, up to
four reads, one initial and up to three retries.  If more than one retry
was required, a message is printed on the console (the occasional single
retry is expected behavior, especially in guest OSes).  If the maximum
number of retries is exceeded, the clock under test will be marked
unstable.  However, the probability of this happening due to various sorts
of delays is quite small.  In addition, the reason (clock-read delays) for
the unstable marking will be apparent.

Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  6 +++
 kernel/time/clocksource.c                     | 53 ++++++++++++++++---
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a19ae163c058..dbb68067ba4e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -567,6 +567,12 @@
 			loops can be debugged more effectively on production
 			systems.
 
+	clocksource.max_cswd_read_retries= [KNL]
+			Number of clocksource_watchdog() retries due to
+			external delays before the clock will be marked
+			unstable.  Defaults to three retries, that is,
+			four attempts to read the clock under test.
+
 	clearcpuid=BITNUM[,BITNUM...] [X86]
 			Disable CPUID feature X for the kernel. See
 			arch/x86/include/asm/cpufeatures.h for the valid bit
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 428beb69426a..6863a054c970 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,13 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
 
+/*
+ * Maximum permissible delay between two readouts of the watchdog
+ * clocksource surrounding a read of the clocksource being validated.
+ * This delay could be due to SMIs, NMIs, or to VCPU preemptions.
+ */
+#define WATCHDOG_MAX_SKEW (100 * NSEC_PER_USEC)
+
 static void clocksource_watchdog_work(struct work_struct *work)
 {
 	/*
@@ -184,12 +191,45 @@ void clocksource_mark_unstable(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static ulong max_cswd_read_retries = 3;
+module_param(max_cswd_read_retries, ulong, 0644);
+
+static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
+{
+	unsigned int nretries;
+	u64 wd_end, wd_delta;
+	int64_t wd_delay;
+
+	for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) {
+		local_irq_disable();
+		*wdnow = watchdog->read(watchdog);
+		*csnow = cs->read(cs);
+		wd_end = watchdog->read(watchdog);
+		local_irq_enable();
+
+		wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
+		wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult,
+					      watchdog->shift);
+		if (wd_delay <= WATCHDOG_MAX_SKEW) {
+			if (nretries > 1 || nretries >= max_cswd_read_retries) {
+				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
+					smp_processor_id(), watchdog->name, nretries);
+			}
+			return true;
+		}
+	}
+
+	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
+		smp_processor_id(), watchdog->name, wd_delay, nretries);
+	return false;
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
-	struct clocksource *cs;
 	u64 csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
 	int next_cpu, reset_pending;
+	int64_t wd_nsec, cs_nsec;
+	struct clocksource *cs;
 
 	spin_lock(&watchdog_lock);
 	if (!watchdog_running)
@@ -206,10 +246,11 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 
-		local_irq_disable();
-		csnow = cs->read(cs);
-		wdnow = watchdog->read(watchdog);
-		local_irq_enable();
+		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
+			/* Clock readout unreliable, so give it up. */
+			__clocksource_unstable(cs);
+			continue;
+		}
 
 		/* Clocksource initialized ? */
 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
-- 
2.30.2


  parent reply	other threads:[~2021-07-05 15:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05 15:30 [PATCH AUTOSEL 5.4 01/26] HID: do not use down_interruptible() when unbinding devices Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 02/26] EDAC/ti: Add missing MODULE_DEVICE_TABLE Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 03/26] ACPI: processor idle: Fix up C-state latency if not ordered Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 04/26] hv_utils: Fix passing zero to 'PTR_ERR' warning Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 05/26] lib: vsprintf: Fix handling of number field widths in vsscanf Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 06/26] ACPI: EC: Make more Asus laptops use ECDT _GPE Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 07/26] block_dump: remove block_dump feature in mark_inode_dirty() Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 08/26] fs: dlm: cancel work sync othercon Sasha Levin
2021-07-05 15:30   ` [Cluster-devel] " Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 09/26] random32: Fix implicit truncation warning in prandom_seed_state() Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 10/26] fs: dlm: fix memory leak when fenced Sasha Levin
2021-07-05 15:30   ` [Cluster-devel] " Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 11/26] ACPICA: Fix memory leak caused by _CID repair function Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 12/26] ACPI: bus: Call kobject_put() in acpi_init() error path Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 13/26] ACPI: resources: Add checks for ACPI IRQ override Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 14/26] block: fix race between adding/removing rq qos and normal IO Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 15/26] platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard() Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 16/26] nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst() Sasha Levin
2021-07-05 15:30   ` Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 17/26] EDAC/Intel: Do not load EDAC driver when running as a guest Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 18/26] PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv() Sasha Levin
2021-07-05 15:30 ` Sasha Levin [this message]
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 20/26] ACPI: tables: Add custom DSDT file as makefile prerequisite Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 21/26] HID: wacom: Correct base usage for capacitive ExpressKey status bits Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 22/26] cifs: fix missing spinlock around update to ses->status Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 23/26] block: fix discard request merge Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 24/26] kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 25/26] ia64: mca_drv: fix incorrect array size calculation Sasha Levin
2021-07-05 15:30   ` Sasha Levin
2021-07-05 15:30 ` [PATCH AUTOSEL 5.4 26/26] writeback, cgroup: increment isw_nr_in_flight before grabbing an inode Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210705153039.1521781-19-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=clm@fb.com \
    --cc=feng.tang@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.