All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Tejun Heo <tj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH v4] workqueue: Wrap flush_workqueue() using an inline function
Date: Fri, 27 May 2022 15:21:40 +0900	[thread overview]
Message-ID: <f9743e46-d467-82aa-effc-597b38e7377d@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <43845fc4-eb41-e3c1-4e47-1cc80530ea09@I-love.SAKURA.ne.jp>

Since flush operation synchronously waits for completion, flushing
system-wide WQs (e.g. system_wq) might introduce possibility of deadlock
due to unexpected locking dependency. Tejun Heo commented at [1] that it
makes no sense at all to call flush_workqueue() on the shared WQs as the
caller has no idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, syzbot found a
circular locking dependency caused by flushing system_wq WQ [2].

Therefore, let's change the direction to that developers had better use
their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is
inevitable.

Steps for converting system-wide WQs into local WQs are explained at [3],
and a conversion to stop flushing system-wide WQs is in progress. Now we
want some mechanism for preventing developers who are not aware of this
conversion from again start flushing system-wide WQs.

Since I found that WARN_ON() is complete but awkward approach for teaching
developers about this problem, let's use __compiletime_warning() for
incomplete but handy approach. For completeness, we will also insert
WARN_ON() into __flush_workqueue() after all users stopped calling
flush_scheduled_work().

Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1]
Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2]
Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp [3]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Since all flush_workqueue(system_*_wq) users are removed by now, and I removed
flush_scheduled_work() part from this patch, this patch is ready to go to linux.git.

Changes in v4:
  It turned out that attempt to emit warning message to flush_scheduled_work() users
  based on "!defined(CONFIG_WERROR)" does not work, for Talla, RavitejaX Goud
  <ravitejax.goud.talla@intel.com> found that one of modules which call
  flush_scheduled_work() locally applies -Werror option.

  Therefore, convert BUILD_BUG_ON() to __compiletime_warning() and
  rename the backend function to __flush_workqueue().

Changes in v3:
  Revert suggested change in v2, for kernel test robot <lkp@intel.com> found

    warning: Function parameter or member 'flush_workqueue' not described in 'void'
    warning: expecting prototype for flush_workqueue(). Prototype was for void() instead

  when built with W=1 option.

Changes in v2:
  Use "void (flush_workqueue)(struct workqueue_struct *wq)" and remove
  "#undef flush_workqueue", suggested by Joe Perches <joe@perches.com>.

 include/linux/workqueue.h | 51 ++++++++++++++++++++++++++++++++++-----
 kernel/workqueue.c        | 16 +++++++++---
 2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..3d63104a41b7 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -445,7 +445,7 @@ extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay);
 extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
 
-extern void flush_workqueue(struct workqueue_struct *wq);
+extern void __flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
 
 extern int schedule_on_each_cpu(work_func_t func);
@@ -563,15 +563,23 @@ static inline bool schedule_work(struct work_struct *work)
 	return queue_work(system_wq, work);
 }
 
+/*
+ * Detect attempt to flush system-wide workqueues at compile time when possible.
+ *
+ * See https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
+ * for reasons and steps for converting system-wide workqueues into local workqueues.
+ */
+extern void __warn_flushing_systemwide_wq(void)
+	__compiletime_warning("Please avoid flushing system-wide workqueues.");
+
 /**
  * flush_scheduled_work - ensure that any scheduled work has run to completion.
  *
  * Forces execution of the kernel-global workqueue and blocks until its
  * completion.
  *
- * Think twice before calling this function!  It's very easy to get into
- * trouble if you don't take great care.  Either of the following situations
- * will lead to deadlock:
+ * It's very easy to get into trouble if you don't take great care.
+ * Either of the following situations will lead to deadlock:
  *
  *	One of the work items currently on the workqueue needs to acquire
  *	a lock held by your code or its caller.
@@ -586,10 +594,41 @@ static inline bool schedule_work(struct work_struct *work)
  * need to know that a particular work item isn't queued and isn't running.
  * In such cases you should use cancel_delayed_work_sync() or
  * cancel_work_sync() instead.
+ *
+ * Please stop calling this function! A conversion to stop flushing system-wide
+ * workqueues is in progress. This function will be removed after all in-tree
+ * users stopped calling this function.
+ */
+static inline void __deprecated flush_scheduled_work(void)
+{
+	__flush_workqueue(system_wq);
+}
+
+/**
+ * flush_workqueue - ensure that any scheduled work has run to completion.
+ * @wq: workqueue to flush
+ *
+ * This function sleeps until all work items which were queued on entry
+ * have finished execution, but it is not livelocked by new incoming ones.
  */
-static inline void flush_scheduled_work(void)
+static __always_inline void flush_workqueue(struct workqueue_struct *wq)
 {
-	flush_workqueue(system_wq);
+	if ((__builtin_constant_p(wq == system_wq) &&
+	     wq == system_wq) ||
+	    (__builtin_constant_p(wq == system_highpri_wq) &&
+	     wq == system_highpri_wq) ||
+	    (__builtin_constant_p(wq == system_long_wq) &&
+	     wq == system_long_wq) ||
+	    (__builtin_constant_p(wq == system_unbound_wq) &&
+	     wq == system_unbound_wq) ||
+	    (__builtin_constant_p(wq == system_freezable_wq) &&
+	     wq == system_freezable_wq) ||
+	    (__builtin_constant_p(wq == system_power_efficient_wq) &&
+	     wq == system_power_efficient_wq) ||
+	    (__builtin_constant_p(wq == system_freezable_power_efficient_wq) &&
+	     wq == system_freezable_power_efficient_wq))
+		__warn_flushing_systemwide_wq();
+	__flush_workqueue(wq);
 }
 
 /**
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4056f2a3f9d5..1ea50f6be843 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2788,13 +2788,13 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
 }
 
 /**
- * flush_workqueue - ensure that any scheduled work has run to completion.
+ * __flush_workqueue - ensure that any scheduled work has run to completion.
  * @wq: workqueue to flush
  *
  * This function sleeps until all work items which were queued on entry
  * have finished execution, but it is not livelocked by new incoming ones.
  */
-void flush_workqueue(struct workqueue_struct *wq)
+void __flush_workqueue(struct workqueue_struct *wq)
 {
 	struct wq_flusher this_flusher = {
 		.list = LIST_HEAD_INIT(this_flusher.list),
@@ -2943,7 +2943,7 @@ void flush_workqueue(struct workqueue_struct *wq)
 out_unlock:
 	mutex_unlock(&wq->mutex);
 }
-EXPORT_SYMBOL(flush_workqueue);
+EXPORT_SYMBOL(__flush_workqueue);
 
 /**
  * drain_workqueue - drain a workqueue
@@ -2971,7 +2971,7 @@ void drain_workqueue(struct workqueue_struct *wq)
 		wq->flags |= __WQ_DRAINING;
 	mutex_unlock(&wq->mutex);
 reflush:
-	flush_workqueue(wq);
+	__flush_workqueue(wq);
 
 	mutex_lock(&wq->mutex);
 
@@ -6111,3 +6111,11 @@ void __init workqueue_init(void)
 	wq_online = true;
 	wq_watchdog_init();
 }
+
+/*
+ * Despite the naming, this is a no-op function which is here only for avoiding
+ * link error. Since compile-time warning may fail to catch, we will need to
+ * emit run-time warning from __flush_workqueue().
+ */
+void __warn_flushing_systemwide_wq(void) { }
+EXPORT_SYMBOL(__warn_flushing_systemwide_wq);
-- 
2.18.4



      reply	other threads:[~2022-05-27  6:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-24 23:31 [PATCH] checkpatch: warn about flushing system-wide workqueues Tetsuo Handa
2022-04-24 23:45 ` Joe Perches
2022-04-25  0:33   ` Tetsuo Handa
2022-05-05 13:42     ` Tetsuo Handa
2022-05-05 15:48       ` Joe Perches
2022-05-05 17:32       ` Tejun Heo
2022-05-05 23:29         ` Tetsuo Handa
2022-05-12 16:46           ` Tejun Heo
2022-05-16  1:32             ` [PATCH v2] workqueue: Wrap flush_workqueue() using a macro Tetsuo Handa
2022-05-16  5:00               ` [PATCH v3] " Tetsuo Handa
2022-05-16  7:18                 ` Joe Perches
2022-05-16  8:34                   ` Rasmus Villemoes
2022-05-20  8:01                 ` Tejun Heo
2022-05-20  9:51                   ` Tetsuo Handa
2022-05-20 11:11                     ` Tejun Heo
2022-05-20 11:43                       ` Tetsuo Handa
2022-05-20 17:10                         ` Tejun Heo
2022-05-21  1:14                           ` Tetsuo Handa
2022-05-21  4:57                             ` Tejun Heo
2022-05-21 11:37                               ` Tetsuo Handa
2022-05-23 19:04                                 ` Tejun Heo
2022-05-24 10:51                                   ` Tetsuo Handa
2022-05-27  6:21                                     ` Tetsuo Handa [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9743e46-d467-82aa-effc-597b38e7377d@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.