linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] wq: handle VM suspension in stall detection
@ 2021-05-20 10:14 Sergey Senozhatsky
  2021-05-20 16:59 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Sergey Senozhatsky @ 2021-05-20 10:14 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Suleiman Souhlal, Sergey Senozhatsky

If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
once this VCPU resumes it will see the new jiffies value, while it
may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
There is a small chance of misreported WQ stalls in the meantime,
because new jiffies is time_after() old 'ts + thresh'.

wq_watchdog_timer_fn()
{
	for_each_pool(pool, pi) {
		if (time_after(jiffies, ts + thresh)) {
			pr_emerg("BUG: workqueue lockup - pool");
		}
	}
}

Save jiffies at the beginning of this function and use that value
for stall detection. If VM gets suspended then we continue using
"old" jiffies value and old WQ touch timestamps. If IRQ at some
point restarts the stall detection cycle (pvclock_touch_watchdogs())
then old jiffies will always be before new 'ts + thresh'.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 kernel/workqueue.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b19d759e55a5..50142fc08902 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -50,6 +50,7 @@
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
 #include <linux/nmi.h>
+#include <linux/kvm_para.h>
 
 #include "workqueue_internal.h"
 
@@ -5772,6 +5773,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 {
 	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
 	bool lockup_detected = false;
+	unsigned long now = jiffies;
 	struct worker_pool *pool;
 	int pi;
 
@@ -5786,6 +5788,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 		if (list_empty(&pool->worklist))
 			continue;
 
+		/*
+		 * If a virtual machine is stopped by the host it can look to
+		 * the watchdog like a stall.
+		 */
+		kvm_check_and_clear_guest_paused();
+
 		/* get the latest of pool and touched timestamps */
 		if (pool->cpu >= 0)
 			touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
@@ -5799,12 +5807,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			ts = touched;
 
 		/* did we stall? */
-		if (time_after(jiffies, ts + thresh)) {
+		if (time_after(now, ts + thresh)) {
 			lockup_detected = true;
 			pr_emerg("BUG: workqueue lockup - pool");
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
-				jiffies_to_msecs(jiffies - pool_ts) / 1000);
+				jiffies_to_msecs(now - pool_ts) / 1000);
 		}
 	}
 
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] wq: handle VM suspension in stall detection
  2021-05-20 10:14 [PATCH] wq: handle VM suspension in stall detection Sergey Senozhatsky
@ 2021-05-20 16:59 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2021-05-20 16:59 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Lai Jiangshan, linux-kernel, Suleiman Souhlal

On Thu, May 20, 2021 at 07:14:22PM +0900, Sergey Senozhatsky wrote:
> If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
> once this VCPU resumes it will see the new jiffies value, while it
> may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
> VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
> There is a small chance of misreported WQ stalls in the meantime,
> because new jiffies is time_after() old 'ts + thresh'.
> 
> wq_watchdog_timer_fn()
> {
> 	for_each_pool(pool, pi) {
> 		if (time_after(jiffies, ts + thresh)) {
> 			pr_emerg("BUG: workqueue lockup - pool");
> 		}
> 	}
> }
> 
> Save jiffies at the beginning of this function and use that value
> for stall detection. If VM gets suspended then we continue using
> "old" jiffies value and old WQ touch timestamps. If IRQ at some
> point restarts the stall detection cycle (pvclock_touch_watchdogs())
> then old jiffies will always be before new 'ts + thresh'.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>

Applied to wq/for-5.13-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-05-20 17:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20 10:14 [PATCH] wq: handle VM suspension in stall detection Sergey Senozhatsky
2021-05-20 16:59 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).