All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Tejun Heo <tj@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.4 15/31] wq: handle VM suspension in stall detection
Date: Thu,  3 Jun 2021 13:09:03 -0400	[thread overview]
Message-ID: <20210603170919.3169112-15-sashal@kernel.org> (raw)
In-Reply-To: <20210603170919.3169112-1-sashal@kernel.org>

From: Sergey Senozhatsky <senozhatsky@chromium.org>

[ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ]

If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
once this VCPU resumes it will see the new jiffies value, while it
may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
There is a small chance of misreported WQ stalls in the meantime,
because new jiffies is time_after() old 'ts + thresh'.

wq_watchdog_timer_fn()
{
	for_each_pool(pool, pi) {
		if (time_after(jiffies, ts + thresh)) {
			pr_emerg("BUG: workqueue lockup - pool");
		}
	}
}

Save jiffies at the beginning of this function and use that value
for stall detection. If VM gets suspended then we continue using
"old" jiffies value and old WQ touch timestamps. If IRQ at some
point restarts the stall detection cycle (pvclock_touch_watchdogs())
then old jiffies will always be before new 'ts + thresh'.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/workqueue.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5d7092e32912..8f41499d8257 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -50,6 +50,7 @@
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
 #include <linux/nmi.h>
+#include <linux/kvm_para.h>
 
 #include "workqueue_internal.h"
 
@@ -5734,6 +5735,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 {
 	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
 	bool lockup_detected = false;
+	unsigned long now = jiffies;
 	struct worker_pool *pool;
 	int pi;
 
@@ -5748,6 +5750,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		if (list_empty(&pool->worklist))
 			continue;
 
+		/*
+		 * If a virtual machine is stopped by the host it can look to
+		 * the watchdog like a stall.
+		 */
+		kvm_check_and_clear_guest_paused();
+
 		/* get the latest of pool and touched timestamps */
 		pool_ts = READ_ONCE(pool->watchdog_ts);
 		touched = READ_ONCE(wq_watchdog_touched);
@@ -5766,12 +5774,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		}
 
 		/* did we stall? */
-		if (time_after(jiffies, ts + thresh)) {
+		if (time_after(now, ts + thresh)) {
 			lockup_detected = true;
 			pr_emerg("BUG: workqueue lockup - pool");
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
-				jiffies_to_msecs(jiffies - pool_ts) / 1000);
+				jiffies_to_msecs(now - pool_ts) / 1000);
 		}
 	}
 
-- 
2.30.2


  parent reply	other threads:[~2021-06-03 17:13 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 17:08 [PATCH AUTOSEL 5.4 01/31] ASoC: max98088: fix ni clock divider calculation Sasha Levin
2021-06-03 17:08 ` Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 02/31] spi: Fix spi device unregister flow Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 03/31] net/nfc/rawsock.c: fix a permission check bug Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 04/31] usb: cdns3: Fix runtime PM imbalance on error Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 05/31] ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet Sasha Levin
2021-06-03 17:08   ` Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 06/31] ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet Sasha Levin
2021-06-03 17:08   ` Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 07/31] vfio-ccw: Serialize FSM IDLE state with I/O completion Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 08/31] ASoC: sti-sas: add missing MODULE_DEVICE_TABLE Sasha Levin
2021-06-03 17:08   ` Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 09/31] spi: sprd: Add " Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 10/31] isdn: mISDN: netjet: Fix crash in nj_probe: Sasha Levin
2021-06-03 17:08 ` [PATCH AUTOSEL 5.4 11/31] bonding: init notify_work earlier to avoid uninitialized use Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 12/31] netlink: disable IRQs for netlink_lock_table() Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 13/31] net: mdiobus: get rid of a BUG_ON() Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 14/31] cgroup: disable controllers at parse time Sasha Levin
2021-06-03 17:09 ` Sasha Levin [this message]
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 16/31] net/qla3xxx: fix schedule while atomic in ql_sem_spinlock Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 17/31] RDS tcp loopback connection can hang Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 18/31] scsi: bnx2fc: Return failure if io_req is already in ABTS processing Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 19/31] scsi: vmw_pvscsi: Set correct residual data length Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 20/31] scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 21/31] scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 22/31] net: macb: ensure the device is available before accessing GEMGXL control registers Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 23/31] net: appletalk: cops: Fix data race in cops_probe1 Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 24/31] net: dsa: microchip: enable phy errata workaround on 9567 Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 25/31] nvme-fabrics: decode host pathing error for connect Sasha Levin
2021-06-03 17:09   ` Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 26/31] MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 27/31] dm verity: fix require_signatures module_param permissions Sasha Levin
2021-06-03 17:09   ` [dm-devel] " Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 28/31] bnx2x: Fix missing error code in bnx2x_iov_init_one() Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 29/31] nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME Sasha Levin
2021-06-03 17:09   ` Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 30/31] powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers Sasha Levin
2021-06-03 17:09   ` [PATCH AUTOSEL 5.4 30/31] powerpc/fsl: set fsl, i2c-erratum-a004447 " Sasha Levin
2021-06-03 17:09 ` [PATCH AUTOSEL 5.4 31/31] powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 " Sasha Levin
2021-06-03 17:09   ` [PATCH AUTOSEL 5.4 31/31] powerpc/fsl: set fsl, i2c-erratum-a004447 " Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603170919.3169112-15-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.