linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Tejun Heo <tj@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.12 23/43] wq: handle VM suspension in stall detection
Date: Thu,  3 Jun 2021 13:07:13 -0400	[thread overview]
Message-ID: <20210603170734.3168284-23-sashal@kernel.org> (raw)
In-Reply-To: <20210603170734.3168284-1-sashal@kernel.org>

From: Sergey Senozhatsky <senozhatsky@chromium.org>

[ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ]

If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
once this VCPU resumes it will see the new jiffies value, while it
may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
There is a small chance of misreported WQ stalls in the meantime,
because new jiffies is time_after() old 'ts + thresh'.

wq_watchdog_timer_fn()
{
	for_each_pool(pool, pi) {
		if (time_after(jiffies, ts + thresh)) {
			pr_emerg("BUG: workqueue lockup - pool");
		}
	}
}

Save jiffies at the beginning of this function and use that value
for stall detection. If VM gets suspended then we continue using
"old" jiffies value and old WQ touch timestamps. If IRQ at some
point restarts the stall detection cycle (pvclock_touch_watchdogs())
then old jiffies will always be before new 'ts + thresh'.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/workqueue.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 79f2319543ce..994eafd25d64 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -50,6 +50,7 @@
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
 #include <linux/nmi.h>
+#include <linux/kvm_para.h>
 
 #include "workqueue_internal.h"
 
@@ -5772,6 +5773,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 {
 	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
 	bool lockup_detected = false;
+	unsigned long now = jiffies;
 	struct worker_pool *pool;
 	int pi;
 
@@ -5786,6 +5788,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		if (list_empty(&pool->worklist))
 			continue;
 
+		/*
+		 * If a virtual machine is stopped by the host it can look to
+		 * the watchdog like a stall.
+		 */
+		kvm_check_and_clear_guest_paused();
+
 		/* get the latest of pool and touched timestamps */
 		if (pool->cpu >= 0)
 			touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
@@ -5799,12 +5807,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			ts = touched;
 
 		/* did we stall? */
-		if (time_after(jiffies, ts + thresh)) {
+		if (time_after(now, ts + thresh)) {
 			lockup_detected = true;
 			pr_emerg("BUG: workqueue lockup - pool");
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
-				jiffies_to_msecs(jiffies - pool_ts) / 1000);
+				jiffies_to_msecs(now - pool_ts) / 1000);
 		}
 	}
 
-- 
2.30.2


  parent reply	other threads:[~2021-06-03 17:08 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 17:06 [PATCH AUTOSEL 5.12 01/43] ASoC: max98088: fix ni clock divider calculation Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 02/43] ASoC: amd: fix for pcm_read() error Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 03/43] spi: Fix spi device unregister flow Sasha Levin
2021-06-06 11:10   ` Lukas Wunner
2021-06-10 17:55     ` Sasha Levin
2021-06-10 19:22       ` Saravana Kannan
2021-06-10 19:26         ` Lukas Wunner
2021-06-10 19:30           ` Saravana Kannan
2021-06-10 22:29             ` Lukas Wunner
2021-06-10 23:01               ` Saravana Kannan
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 04/43] spi: spi-zynq-qspi: Fix stack violation bug Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 05/43] bpf: Forbid trampoline attach for functions with variable arguments Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 06/43] ASoC: codecs: lpass-rx-macro: add missing MODULE_DEVICE_TABLE Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 07/43] ASoC: codecs: lpass-tx-macro: " Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 08/43] net/nfc/rawsock.c: fix a permission check bug Sasha Levin
2021-06-03 17:06 ` [PATCH AUTOSEL 5.12 09/43] usb: cdns3: Fix runtime PM imbalance on error Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 10/43] ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 11/43] ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 12/43] bpf: Add deny list of btf ids check for tracing programs Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 13/43] vfio-ccw: Reset FSM state to IDLE inside FSM Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 14/43] vfio-ccw: Serialize FSM IDLE state with I/O completion Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 15/43] ASoC: sti-sas: add missing MODULE_DEVICE_TABLE Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 16/43] spi: sprd: Add " Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 17/43] usb: chipidea: udc: assign interrupt number to USB gadget structure Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 18/43] isdn: mISDN: netjet: Fix crash in nj_probe: Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 19/43] bonding: init notify_work earlier to avoid uninitialized use Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 20/43] netlink: disable IRQs for netlink_lock_table() Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 21/43] net: mdiobus: get rid of a BUG_ON() Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 22/43] cgroup: disable controllers at parse time Sasha Levin
2021-06-03 17:07 ` Sasha Levin [this message]
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 24/43] net/qla3xxx: fix schedule while atomic in ql_sem_spinlock Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 25/43] RDS tcp loopback connection can hang Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 26/43] net:sfc: fix non-freed irq in legacy irq mode Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 27/43] scsi: bnx2fc: Return failure if io_req is already in ABTS processing Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 28/43] scsi: vmw_pvscsi: Set correct residual data length Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 29/43] scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 30/43] scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 31/43] net: macb: ensure the device is available before accessing GEMGXL control registers Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 32/43] net: appletalk: cops: Fix data race in cops_probe1 Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 33/43] net: dsa: microchip: enable phy errata workaround on 9567 Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 34/43] Makefile: LTO: have linker check -Wframe-larger-than Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 35/43] nvme-fabrics: decode host pathing error for connect Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 36/43] MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 37/43] bpf, selftests: Adjust few selftest result_unpriv outcomes Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 38/43] dm verity: fix require_signatures module_param permissions Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 39/43] bnx2x: Fix missing error code in bnx2x_iov_init_one() Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 40/43] nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 41/43] nvmet: fix false keep-alive timeout when a controller is torn down Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 42/43] powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers Sasha Levin
2021-06-04  0:42   ` Michael Ellerman
2021-06-04  0:58     ` Chris Packham
2021-06-10 22:00       ` Sasha Levin
2021-06-03 17:07 ` [PATCH AUTOSEL 5.12 43/43] powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 " Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603170734.3168284-23-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).