All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3,RESEND] workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog()
@ 2021-03-24 11:40 Wang Qing
  2021-03-27  2:38 ` 王擎
  2021-04-04 17:27 ` [PATCH " Tejun Heo
  0 siblings, 2 replies; 3+ messages in thread
From: Wang Qing @ 2021-03-24 11:40 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, Andrew Morton, Wang Qing,
	Guilherme G. Piccoli, Andrey Ignatov, Vlastimil Babka,
	Santosh Sivaraj, linux-kernel

There are two workqueue-specific watchdog timestamps:

    + @wq_watchdog_touched_cpu (per-CPU) updated by
      touch_softlockup_watchdog()

    + @wq_watchdog_touched (global) updated by
      touch_all_softlockup_watchdogs()

watchdog_timer_fn() checks only the global @wq_watchdog_touched for
unbound workqueues. As a result, unbound workqueues are not aware
of touch_softlockup_watchdog(). The watchdog might report a stall
even when the unbound workqueues are blocked by a known slow code.

Solution:
touch_softlockup_watchdog() must touch also the global @wq_watchdog_touched 
timestamp.

The global timestamp can not longer be used for bound workqueues
because it is updated on all CPUs. Instead, bound workqueues
have to check only @wq_watchdog_touched_cpu and these timestamp
has to be updated for all CPUs in touch_all_softlockup_watchdogs().

Beware:
The change might cause the opposite problem. An unbound workqueue
might get blocked on CPU A because of a real softlockup. The workqueue
watchdog would miss it when the timestamp got touched on CPU B.

It is acceptable because softlockups are detected by softlockup
watchdog. The workqueue watchdog is there to detect stalls where
a work never finishes, for example, because of dependencies of works
queued into the same workqueue.

V3:
- Modify the commit message clearly according to Petr's suggestion.

Signed-off-by: Wang Qing <wangqing@vivo.com>
---
 kernel/watchdog.c  |  5 +++--
 kernel/workqueue.c | 17 ++++++-----------
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7110906..107bc38
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -278,9 +278,10 @@ void touch_all_softlockup_watchdogs(void)
 	 * update as well, the only side effect might be a cycle delay for
 	 * the softlockup check.
 	 */
-	for_each_cpu(cpu, &watchdog_allowed_mask)
+	for_each_cpu(cpu, &watchdog_allowed_mask) {
 		per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
-	wq_watchdog_touch(-1);
+		wq_watchdog_touch(cpu);
+	}
 }
 
 void touch_softlockup_watchdog_sync(void)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d150da..be08295
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5787,22 +5787,17 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			continue;
 
 		/* get the latest of pool and touched timestamps */
+		if (pool->cpu >= 0)
+			touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
+		else
+			touched = READ_ONCE(wq_watchdog_touched);
 		pool_ts = READ_ONCE(pool->watchdog_ts);
-		touched = READ_ONCE(wq_watchdog_touched);
 
 		if (time_after(pool_ts, touched))
 			ts = pool_ts;
 		else
 			ts = touched;
 
-		if (pool->cpu >= 0) {
-			unsigned long cpu_touched =
-				READ_ONCE(per_cpu(wq_watchdog_touched_cpu,
-						  pool->cpu));
-			if (time_after(cpu_touched, ts))
-				ts = cpu_touched;
-		}
-
 		/* did we stall? */
 		if (time_after(jiffies, ts + thresh)) {
 			lockup_detected = true;
@@ -5826,8 +5821,8 @@ notrace void wq_watchdog_touch(int cpu)
 {
 	if (cpu >= 0)
 		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
-	else
-		wq_watchdog_touched = jiffies;
+
+	wq_watchdog_touched = jiffies;
 }
 
 static void wq_watchdog_set_thresh(unsigned long thresh)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re:[PATCH V3,RESEND] workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog()
  2021-03-24 11:40 [PATCH V3,RESEND] workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() Wang Qing
@ 2021-03-27  2:38 ` 王擎
  2021-04-04 17:27 ` [PATCH " Tejun Heo
  1 sibling, 0 replies; 3+ messages in thread
From: 王擎 @ 2021-03-27  2:38 UTC (permalink / raw)
  To: pmladek
  Cc: Tejun Heo, Lai Jiangshan, Andrew Morton, Guilherme G. Piccoli,
	Andrey Ignatov, Vlastimil Babka, Santosh Sivaraj, linux-kernel


>V3:
>- Modify the commit message clearly according to Petr's suggestion.
>
>Signed-off-by: Wang Qing <wangqing@vivo.com>
>---
> kernel/watchdog.c  |  5 +++--
> kernel/workqueue.c | 17 ++++++-----------
> 2 files changed, 9 insertions(+), 13 deletions(-)
>
>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>index 7110906..107bc38
>--- a/kernel/watchdog.c
>+++ b/kernel/watchdog.c
>@@ -278,9 +278,10 @@ void touch_all_softlockup_watchdogs(void)
> 	 * update as well, the only side effect might be a cycle delay for
> 	 * the softlockup check.
> 	 */
>-	for_each_cpu(cpu, &watchdog_allowed_mask)
>+	for_each_cpu(cpu, &watchdog_allowed_mask) {
> 		per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
>-	wq_watchdog_touch(-1);
>+		wq_watchdog_touch(cpu);
>+	}
> }
> 
> void touch_softlockup_watchdog_sync(void)
>diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>index 0d150da..be08295
>--- a/kernel/workqueue.c
>+++ b/kernel/workqueue.c
>@@ -5787,22 +5787,17 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
> 			continue;
> 
> 		/* get the latest of pool and touched timestamps */
>+		if (pool->cpu >= 0)
>+			touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
>+		else
>+			touched = READ_ONCE(wq_watchdog_touched);
> 		pool_ts = READ_ONCE(pool->watchdog_ts);
>-		touched = READ_ONCE(wq_watchdog_touched);
> 
> 		if (time_after(pool_ts, touched))
> 			ts = pool_ts;
> 		else
> 			ts = touched;
> 
>-		if (pool->cpu >= 0) {
>-			unsigned long cpu_touched =
>-				READ_ONCE(per_cpu(wq_watchdog_touched_cpu,
>-						  pool->cpu));
>-			if (time_after(cpu_touched, ts))
>-				ts = cpu_touched;
>-		}
>-
> 		/* did we stall? */
> 		if (time_after(jiffies, ts + thresh)) {
> 			lockup_detected = true;
>@@ -5826,8 +5821,8 @@ notrace void wq_watchdog_touch(int cpu)
> {
> 	if (cpu >= 0)
> 		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
>-	else
>-		wq_watchdog_touched = jiffies;
>+
>+	wq_watchdog_touched = jiffies;
> }
> 
> static void wq_watchdog_set_thresh(unsigned long thresh)
>-- 
>2.7.4
>

Hi Petr:
Can you give a reviewed tag, or pick it up to workqueue tree?
Thanks,
Qing





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V3,RESEND] workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog()
  2021-03-24 11:40 [PATCH V3,RESEND] workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() Wang Qing
  2021-03-27  2:38 ` 王擎
@ 2021-04-04 17:27 ` Tejun Heo
  1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2021-04-04 17:27 UTC (permalink / raw)
  To: Wang Qing
  Cc: Lai Jiangshan, Andrew Morton, Guilherme G. Piccoli,
	Andrey Ignatov, Vlastimil Babka, Santosh Sivaraj, linux-kernel

Applied to wq/for-5.12-fixes w/ minor description update.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-04 17:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 11:40 [PATCH V3,RESEND] workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() Wang Qing
2021-03-27  2:38 ` 王擎
2021-04-04 17:27 ` [PATCH " Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.