stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Feng Tang <feng.tang@intel.com>,
	Jiri Wiesner <jwiesner@suse.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: [PATCH 6.1 64/64] clocksource: Skip watchdog check for large watchdog intervals
Date: Tue, 13 Feb 2024 18:21:50 +0100	[thread overview]
Message-ID: <20240213171846.748324516@linuxfoundation.org> (raw)
In-Reply-To: <20240213171844.702064831@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Wiesner <jwiesner@suse.de>

commit 644649553508b9bacf0fc7a5bdc4f9e0165576a5 upstream.

There have been reports of the watchdog marking clocksources unstable on
machines with 8 NUMA nodes:

  clocksource: timekeeping watchdog on CPU373:
  Marking clocksource 'tsc' as unstable because the skew is too large:
  clocksource:   'hpet' wd_nsec: 14523447520
  clocksource:   'tsc'  cs_nsec: 14524115132

The measured clocksource skew - the absolute difference between cs_nsec
and wd_nsec - was 668 microseconds:

  cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612

The kernel used 200 microseconds for the uncertainty_margin of both the
clocksource and watchdog, resulting in a threshold of 400 microseconds (the
md variable). Both the cs_nsec and the wd_nsec value indicate that the
readout interval was circa 14.5 seconds.  The observed behaviour is that
watchdog checks failed for large readout intervals on 8 NUMA node
machines. This indicates that the size of the skew was directly proportinal
to the length of the readout interval on those machines. The measured
clocksource skew, 668 microseconds, was evaluated against a threshold (the
md variable) that is suited for readout intervals of roughly
WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.

The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew
threshold") was to tighten the threshold for evaluating skew and set the
lower bound for the uncertainty_margin of clocksources to twice
WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource
watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to
125 microseconds to fit the limit of NTP, which is able to use a
clocksource that suffers from up to 500 microseconds of skew per second.
Both the TSC and the HPET use default uncertainty_margin. When the
readout interval gets stretched the default uncertainty_margin is no
longer a suitable lower bound for evaluating skew - it imposes a limit
that is far stricter than the skew with which NTP can deal.

The root causes of the skew being directly proportinal to the length of
the readout interval are:

  * the inaccuracy of the shift/mult pairs of clocksources and the watchdog
  * the conversion to nanoseconds is imprecise for large readout intervals

Prevent this by skipping the current watchdog check if the readout
interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout
interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin
(of the TSC and HPET) corresponds to a limit on clocksource skew of 250
ppm (microseconds of skew per second).  To keep the limit imposed by NTP
(500 microseconds of skew per second) for all possible readout intervals,
the margins would have to be scaled so that the threshold value is
proportional to the length of the actual readout interval.

As for why the readout interval may get stretched: Since the watchdog is
executed in softirq context the expiration of the watchdog timer can get
severely delayed on account of a ksoftirqd thread not getting to run in a
timely manner. Surely, a system with such belated softirq execution is not
working well and the scheduling issue should be looked into but the
clocksource watchdog should be able to deal with it accordingly.

Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240122172350.GA740@incl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/time/clocksource.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -126,6 +126,7 @@ static DECLARE_WORK(watchdog_work, clock
 static DEFINE_SPINLOCK(watchdog_lock);
 static int watchdog_running;
 static atomic_t watchdog_reset_pending;
+static int64_t watchdog_max_interval;
 
 static inline void clocksource_watchdog_lock(unsigned long *flags)
 {
@@ -144,6 +145,7 @@ static void __clocksource_change_rating(
  * Interval: 0.5sec.
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
+#define WATCHDOG_INTERVAL_MAX_NS ((2 * WATCHDOG_INTERVAL) * (NSEC_PER_SEC / HZ))
 
 static void clocksource_watchdog_work(struct work_struct *work)
 {
@@ -396,8 +398,8 @@ static inline void clocksource_reset_wat
 static void clocksource_watchdog(struct timer_list *unused)
 {
 	u64 csnow, wdnow, cslast, wdlast, delta;
+	int64_t wd_nsec, cs_nsec, interval;
 	int next_cpu, reset_pending;
-	int64_t wd_nsec, cs_nsec;
 	struct clocksource *cs;
 	enum wd_read_status read_ret;
 	unsigned long extra_wait = 0;
@@ -467,6 +469,27 @@ static void clocksource_watchdog(struct
 		if (atomic_read(&watchdog_reset_pending))
 			continue;
 
+		/*
+		 * The processing of timer softirqs can get delayed (usually
+		 * on account of ksoftirqd not getting to run in a timely
+		 * manner), which causes the watchdog interval to stretch.
+		 * Skew detection may fail for longer watchdog intervals
+		 * on account of fixed margins being used.
+		 * Some clocksources, e.g. acpi_pm, cannot tolerate
+		 * watchdog intervals longer than a few seconds.
+		 */
+		interval = max(cs_nsec, wd_nsec);
+		if (unlikely(interval > WATCHDOG_INTERVAL_MAX_NS)) {
+			if (system_state > SYSTEM_SCHEDULING &&
+			    interval > 2 * watchdog_max_interval) {
+				watchdog_max_interval = interval;
+				pr_warn("Long readout interval, skipping watchdog check: cs_nsec: %lld wd_nsec: %lld\n",
+					cs_nsec, wd_nsec);
+			}
+			watchdog_timer.expires = jiffies;
+			continue;
+		}
+
 		/* Check the deviation from the watchdog clocksource. */
 		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
 		if (abs(cs_nsec - wd_nsec) > md) {



  parent reply	other threads:[~2024-02-13 17:26 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 17:20 [PATCH 6.1 00/64] 6.1.78-rc1 review Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 01/64] ext4: regenerate buddy after block freeing failed if under fc replay Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 02/64] dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 03/64] dmaengine: ti: k3-udma: Report short packet errors Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 04/64] dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 05/64] dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 06/64] phy: renesas: rcar-gen3-usb2: Fix returning wrong error code Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 07/64] dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 08/64] phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 09/64] cifs: failure to add channel on iface should bump up weight Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 10/64] drm/msms/dp: fixed link clock divider bits be over written in BPC unknown case Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 11/64] drm/msm/dp: return correct Colorimetry for DP_TEST_DYNAMIC_RANGE_CEA case Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 12/64] drm/msm/dpu: check for valid hw_pp in dpu_encoder_helper_phys_cleanup Greg Kroah-Hartman
2024-02-13 17:20 ` [PATCH 6.1 13/64] net: stmmac: xgmac: fix handling of DPP safety error for DMA channels Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 14/64] wifi: mac80211: fix waiting for beacons logic Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 15/64] netdevsim: avoid potential loop in nsim_dev_trap_report_work() Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 16/64] net: atlantic: Fix DMA mapping for PTP hwts ring Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 17/64] selftests: net: cut more slack for gro fwd tests Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 18/64] selftests: net: avoid just another constant wait Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 19/64] tunnels: fix out of bounds access when building IPv6 PMTU error Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 20/64] atm: idt77252: fix a memleak in open_card_ubr0 Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 21/64] octeontx2-pf: Fix a memleak otx2_sq_init Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 22/64] hwmon: (aspeed-pwm-tacho) mutex for tach reading Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 23/64] hwmon: (coretemp) Fix out-of-bounds memory access Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 24/64] hwmon: (coretemp) Fix bogus core_id to attr name mapping Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 25/64] inet: read sk->sk_family once in inet_recv_error() Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 26/64] drm/i915/gvt: Fix uninitialized variable in handle_mmio() Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 27/64] rxrpc: Fix response to PING RESPONSE ACKs to a dead call Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 28/64] tipc: Check the bearer type before calling tipc_udp_nl_bearer_add() Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 29/64] af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 30/64] ppp_async: limit MRU to 64K Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 31/64] selftests: cmsg_ipv6: repeat the exact packet Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 32/64] netfilter: nft_compat: narrow down revision to unsigned 8-bits Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 33/64] netfilter: nft_compat: reject unused compat flag Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 34/64] netfilter: nft_compat: restrict match/target protocol to u16 Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 35/64] drm/amd/display: Implement bounds check for stream encoder creation in DCN301 Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 36/64] netfilter: nft_ct: reject direction for ct id Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 37/64] netfilter: nft_set_pipapo: store index in scratch maps Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 38/64] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 39/64] netfilter: nft_set_pipapo: remove scratch_aligned pointer Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 40/64] fs/ntfs3: Fix an NULL dereference bug Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 41/64] scsi: core: Move scsi_host_busy() out of host lock if it is for per-command Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 42/64] blk-iocost: Fix an UBSAN shift-out-of-bounds warning Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 43/64] fs: dlm: dont put dlm_local_addrs on heap Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 44/64] mtd: parsers: ofpart: add workaround for #size-cells 0 Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 45/64] ALSA: usb-audio: Add delay quirk for MOTU M Series 2nd revision Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 46/64] ALSA: usb-audio: Add a quirk for Yamaha YIT-W12TX transmitter Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 47/64] ALSA: usb-audio: add quirk for RODE NT-USB+ Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 48/64] USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 49/64] USB: serial: option: add Fibocom FM101-GL variant Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 50/64] USB: serial: cp210x: add ID for IMST iM871A-USB Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 51/64] usb: dwc3: host: Set XHCI_SG_TRB_CACHE_SIZE_QUIRK Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 52/64] usb: host: xhci-plat: Add support for XHCI_SG_TRB_CACHE_SIZE_QUIRK Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 53/64] xhci: process isoc TD properly when there was a transaction error mid TD Greg Kroah-Hartman
2024-02-13 18:47   ` Michał Pecio
2024-02-14 14:23     ` Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 54/64] xhci: handle isoc Babble and Buffer Overrun events properly Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 55/64] hrtimer: Report offline hrtimer enqueue Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 56/64] Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 57/64] Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 58/64] io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 59/64] Revert "ASoC: amd: Add new dmi entries for acp5x platform" Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 60/64] vhost: use kzalloc() instead of kmalloc() followed by memset() Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 61/64] RDMA/irdma: Fix support for 64k pages Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 62/64] f2fs: add helper to check compression level Greg Kroah-Hartman
2024-02-13 17:21 ` [PATCH 6.1 63/64] block: treat poll queue enter similarly to timeouts Greg Kroah-Hartman
2024-02-13 17:21 ` Greg Kroah-Hartman [this message]
2024-02-13 19:03 ` [PATCH 6.1 00/64] 6.1.78-rc1 review SeongJae Park
2024-02-13 19:34 ` Pavel Machek
2024-02-13 21:04 ` Kelsey Steele
2024-02-13 21:15 ` Florian Fainelli
2024-02-13 22:46 ` Allen
2024-02-14  0:17 ` Shuah Khan
2024-02-14  9:03 ` Jon Hunter
2024-02-14 13:08   ` Greg Kroah-Hartman
2024-02-14 13:15     ` Jon Hunter
2024-02-14 14:23       ` Greg Kroah-Hartman
2024-02-14 11:07 ` Naresh Kamboju
2024-02-14 11:22 ` Yann Sionneau
2024-02-14 16:44 ` Sven Joachim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240213171846.748324516@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=feng.tang@intel.com \
    --cc=jwiesner@suse.de \
    --cc=patches@lists.linux.dev \
    --cc=paulmck@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).