linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
@ 2022-03-19  4:42 Tetsuo Handa
  2022-03-19 17:15 ` Linus Torvalds
  2022-03-21 16:52 ` Tejun Heo
  0 siblings, 2 replies; 11+ messages in thread
From: Tetsuo Handa @ 2022-03-19  4:42 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan; +Cc: LKML, Linus Torvalds

Since flush operation synchronously waits for completion, flushing
kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
due to unexpected locking dependency. Tejun Heo commented that it makes no
sense at all to call flush_workqueue() on the shared WQs as the caller has
no idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, syzbot found a
circular locking dependency caused by flushing system_long_wq WQ [1].
Therefore, let's change the direction to that developers had better use
their local WQs if flush_workqueue() is inevitable.

To give developers time to update their modules, for now just emit
a warning message with ratelimit when flush_workqueue() is called on
kernel-global WQs. We will eventually convert this warning message into
WARN_ON() and kill flush_scheduled_work().

This patch introduces __WQ_NO_FLUSH flag for emitting warning. Don't set
this flag when creating your local WQs while updating your module, for
destroy_workqueue() will involve flush operation.

Theoretically, flushing specific work item using flush_work() queued on
kernel-global WQs (which are !WQ_MEM_RECLAIM) has possibility of deadlock.
But this patch does not emit warning when flush_work() is called on work
items queued on kernel-global WQs, based on assumption that we can create
kworker threads as needed and we won't hit max_active limit.

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
This patch has been in linux-next.git as commit b9c20da356db1b39 ("workqueue:
Warn flushing of kernel-global workqueues") since next-20220301, and I got no
failure reports. I think that this patch is safe to go to linux.git; then
developers will find this patch and update their modules to use their local WQ.

Changes in v3:
  Don't check flush_work() attempt.
  Use a private WQ_ flag.
Changes in v2:
  Removed #ifdef CONFIG_PROVE_LOCKING=y check.
  Also check flush_work() attempt.
  Shorten warning message.
  Introduced a public WQ_ flag, which is initially meant for use by
  only system-wide WQs, but allows private WQs used by built-in modules
  to use this flag for detecting unexpected flush attempts if they want.

 include/linux/workqueue.h | 15 +++------------
 kernel/workqueue.c        | 36 +++++++++++++++++++++++++++++-------
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..7b13fae377e2 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -339,6 +339,7 @@ enum {
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
 	__WQ_ORDERED_EXPLICIT	= 1 << 19, /* internal: alloc_ordered_workqueue() */
+	__WQ_NO_FLUSH           = 1 << 20, /* internal: warn flush_workqueue() */
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
@@ -569,18 +570,8 @@ static inline bool schedule_work(struct work_struct *work)
  * Forces execution of the kernel-global workqueue and blocks until its
  * completion.
  *
- * Think twice before calling this function!  It's very easy to get into
- * trouble if you don't take great care.  Either of the following situations
- * will lead to deadlock:
- *
- *	One of the work items currently on the workqueue needs to acquire
- *	a lock held by your code or its caller.
- *
- *	Your code is running in the context of a work routine.
- *
- * They will be detected by lockdep when they occur, but the first might not
- * occur very often.  It depends on what work items are on the workqueue and
- * what locks they need, which you have no control over.
+ * Please stop calling this function. If you need to flush kernel-global
+ * workqueue, please use your local workqueue.
  *
  * In most situations flushing the entire workqueue is overkill; you merely
  * need to know that a particular work item isn't queued and isn't running.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..bc271579704f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,25 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
 	return wait;
 }
 
+static void warn_flush_attempt(struct workqueue_struct *wq)
+{
+	/*
+	 * Since there are known in-tree modules which will emit this warning,
+	 * for now don't use WARN_ON() in order not to break kernel testing.
+	 *
+	 * Print whole traces with ratelimit, in order to make sure that
+	 * this warning is not overlooked while this warning does not flood
+	 * console and kernel log buffer.
+	 */
+	static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+	ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+	if (!__ratelimit(&flush_warn_rs))
+		return;
+	pr_warn("Please do not flush %s WQ.\n", wq->name);
+	dump_stack();
+}
+
 /**
  * flush_workqueue - ensure that any scheduled work has run to completion.
  * @wq: workqueue to flush
@@ -2824,6 +2843,9 @@ void flush_workqueue(struct workqueue_struct *wq)
 	if (WARN_ON(!wq_online))
 		return;
 
+	if (unlikely(wq->flags & __WQ_NO_FLUSH))
+		warn_flush_attempt(wq);
+
 	lock_map_acquire(&wq->lockdep_map);
 	lock_map_release(&wq->lockdep_map);
 
@@ -6054,17 +6076,17 @@ void __init workqueue_init_early(void)
 		ordered_wq_attrs[i] = attrs;
 	}
 
-	system_wq = alloc_workqueue("events", 0, 0);
-	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
-	system_long_wq = alloc_workqueue("events_long", 0, 0);
-	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
+	system_wq = alloc_workqueue("events", __WQ_NO_FLUSH, 0);
+	system_highpri_wq = alloc_workqueue("events_highpri", __WQ_NO_FLUSH | WQ_HIGHPRI, 0);
+	system_long_wq = alloc_workqueue("events_long", __WQ_NO_FLUSH, 0);
+	system_unbound_wq = alloc_workqueue("events_unbound", __WQ_NO_FLUSH | WQ_UNBOUND,
 					    WQ_UNBOUND_MAX_ACTIVE);
 	system_freezable_wq = alloc_workqueue("events_freezable",
-					      WQ_FREEZABLE, 0);
+					      __WQ_NO_FLUSH | WQ_FREEZABLE, 0);
 	system_power_efficient_wq = alloc_workqueue("events_power_efficient",
-					      WQ_POWER_EFFICIENT, 0);
+					      __WQ_NO_FLUSH | WQ_POWER_EFFICIENT, 0);
 	system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
-					      WQ_FREEZABLE | WQ_POWER_EFFICIENT,
+					      __WQ_NO_FLUSH | WQ_FREEZABLE | WQ_POWER_EFFICIENT,
 					      0);
 	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
 	       !system_unbound_wq || !system_freezable_wq ||
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-03-19  4:42 [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues Tetsuo Handa
@ 2022-03-19 17:15 ` Linus Torvalds
  2022-03-20  6:06   ` Tetsuo Handa
  2022-03-21 16:52 ` Tejun Heo
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2022-03-19 17:15 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Tejun Heo, Lai Jiangshan, LKML

On Fri, Mar 18, 2022 at 9:43 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Since flush operation synchronously waits for completion, flushing
> kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
> due to unexpected locking dependency. Tejun Heo commented that it makes no
> sense at all to call flush_workqueue() on the shared WQs as the caller has
> no idea what it's gonna end up waiting for.

NAK on this patch for a very simple reason:

  static inline void flush_scheduled_work(void)
  {
        flush_workqueue(system_wq);
  }

and now grep for flush_scheduled_work().

The *other* system workqueue flushes may be rare and "that subsystem
should just be converted to do its own workqueue".

But flush_scheduled_work() is literally exported as an explicit and
separate interface,

The fact that the function has a big warning in the comment above it
doesn't change that fact. At the very least, this patch has to be
preceded by a couple of other patches that fix a couple of subsystems
and document "this is what you should do".

Because suddenly saying "hey, we gave you this interface, now we're
warning about it because it's going to go away" without actually
showing how to do it instead is not acceptable.

And honestly, I don't personally see a good conversion. We literally
have all those "schedule_{delayed_}work{_on}()" etc helper functions
that are *designed* to use this system_wq. People *need* the ability
to flush those things, even if it's only for module unload.

So I really think this patch on its own is completely broken. It'd not
pointing to a way forward, it's just saying "don't do this" with no
really acceptable way to not do it.

Removing flush_scheduled_work() needs to be paired with removing the
"schedule_{delayed_}work{_on}()" helpers too.

And I don't see you having a good alternative.

So until that clear way forward exists, NAK.

             Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-03-19 17:15 ` Linus Torvalds
@ 2022-03-20  6:06   ` Tetsuo Handa
  2022-03-20 16:59     ` Linus Torvalds
  2022-03-21 17:02     ` Tejun Heo
  0 siblings, 2 replies; 11+ messages in thread
From: Tetsuo Handa @ 2022-03-20  6:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tejun Heo, Lai Jiangshan, LKML

On 2022/03/20 2:15, Linus Torvalds wrote:
> On Fri, Mar 18, 2022 at 9:43 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> Since flush operation synchronously waits for completion, flushing
>> kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
>> due to unexpected locking dependency. Tejun Heo commented that it makes no
>> sense at all to call flush_workqueue() on the shared WQs as the caller has
>> no idea what it's gonna end up waiting for.
> 
> NAK on this patch for a very simple reason:
> 
>   static inline void flush_scheduled_work(void)
>   {
>         flush_workqueue(system_wq);
>   }
> 
> and now grep for flush_scheduled_work().

I know there are in-tree flush_scheduled_work() users.

> 
> The *other* system workqueue flushes may be rare and "that subsystem
> should just be converted to do its own workqueue".
> 
> But flush_scheduled_work() is literally exported as an explicit and
> separate interface,
> 
> The fact that the function has a big warning in the comment above it
> doesn't change that fact. At the very least, this patch has to be
> preceded by a couple of other patches that fix a couple of subsystems
> and document "this is what you should do".

This change was in Tejun's todo list
( https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org ).
But scrubbing the existing users will need some time.

This patch is intended for

  (a) preventing developers from adding more flush_workqueue() calls on
      kernel-global WQs.

  (b) serving as an anchor to be referenced from individual patches

.

> 
> Because suddenly saying "hey, we gave you this interface, now we're
> warning about it because it's going to go away" without actually
> showing how to do it instead is not acceptable.

This patch avoids emitting "WARNING:" in order not to disturb kernel testing.
This change is not as urgent as security fix. We can wait for several release
cycles until all in-tree users update their modules. I don't want to send
blind conversion patches, for what the proper fix is depends on what that
module is doing. For example, commit 081bdc9fe05bb232 ("RDMA/ib_srp: Fix a
deadlock") chose just removing flush_workqueue(system_long_wq) call. This
change is expected to be carefully handled by each module's maintainers.

> 
> And honestly, I don't personally see a good conversion. We literally
> have all those "schedule_{delayed_}work{_on}()" etc helper functions
> that are *designed* to use this system_wq. People *need* the ability
> to flush those things, even if it's only for module unload.

Users of all those "schedule_{delayed_}work{_on}()" etc helper functions
need to be updated only if flush_workqueue() is needed. And even if
flush_workqueue() happens only upon module unload, there is possibility
of deadlock.

> 
> So I really think this patch on its own is completely broken. It'd not
> pointing to a way forward, it's just saying "don't do this" with no
> really acceptable way to not do it.

If you want how to convert, I can include it into patch description.

> 
> Removing flush_scheduled_work() needs to be paired with removing the
> "schedule_{delayed_}work{_on}()" helpers too.

No need to remove the "schedule_{delayed_}work{_on}()" helpers.
Those who don't need flush_workqueue() can continue using these helpers.

Those who need flush_workqueue() can convert

  schedule_work(some_work);

into

  queue_work(some_workqueue_for_that_module, some_work);

.

> 
> And I don't see you having a good alternative.

What alternative are you expecting? We already have alternatives.
This change (replacing system_wq with module's local workqueue as
an example) is a matter of doing

  (1) add

        some_workqueue_for_that_module = alloc_workqueue("some_name", 0, 0);

      into module's __init function.

  (2) add

        destroy_workqueue(some_workqueue_for_that_module);

      into module's __exit function.

  (3) replace

        schedule_work(some_work);

      with

        queue_work(some_workqueue_for_that_module, some_work);

      throughout that module.

  (4) replace

        flush_scheduled_work();

      with

        flush_workqueue(some_workqueue_for_that_module);

      throughout that module.

if flush_scheduled_work() cannot be avoided.

> 
> So until that clear way forward exists, NAK.

We know that allocating a !WQ_MEM_RECLAIM workqueue consumes little
resource ( https://lore.kernel.org/all/YhUq70Y%2FtKGGNbR0@slm.duckdns.org ).
This is a step by step conversion if flush_workqueue() is unavoidable.

Again, we already have alternatives. Why NAK?

> 
>              Linus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-03-20  6:06   ` Tetsuo Handa
@ 2022-03-20 16:59     ` Linus Torvalds
  2022-03-21 17:02     ` Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2022-03-20 16:59 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Tejun Heo, Lai Jiangshan, LKML

On Sat, Mar 19, 2022 at 11:06 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> I know there are in-tree flush_scheduled_work() users.

It's not just that there are in-tree users - there are in-tree users
in CORE CODE anat pretty much everybody runs.

This is not some kind of "it's some broken rare driver that we can't
test because almost nobody has the hardware".

> This patch is intended for
>
>   (a) preventing developers from adding more flush_workqueue() calls on
>       kernel-global WQs.

No. We're not taking stuff like this when core code has it, and will
trigger the same warning.

A warning that nobody will look at anyway, because it's just a single
line in dmesg, and because I bet there are perfectly regular machines
that will trigger the warning with existing users.

For example, I'm looking at that acpi_release_memory() case as an
example - used by ucsi_acpi_probe. I have no idea whether UCSI is
actually all that common or not, but the Kconfig docs say

   "UCSI is available on most of the new Intel based systems
   that are equipped with Embedded Controller and USB Type-C ports"

which makes me think it's not exactly rare.

We're not adding warnings for alleged new uses - when that warning
would trigger on normal system,s.

>   (b) serving as an anchor to be referenced from individual patches

No. You can already reference the existing comment. There's no need to
create a bad commit just to make excuses to fix things up.

> This patch avoids emitting "WARNING:" in order not to disturb kernel testing.

See above: the patch is _pointless_.

And until normal systems don't have the warning, I don't want this
kind of pointless and wrong patch.

> If you want how to convert, I can include it into patch description.

I just want the existign users converted.

Once there are no existing users, we remove the interface.

And once the regular "flush_scheduled_work()" and friends are gone,
*THEN* we can add real verification that nobody tries to flush those
system-wide workqueues.

                 Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-03-19  4:42 [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues Tetsuo Handa
  2022-03-19 17:15 ` Linus Torvalds
@ 2022-03-21 16:52 ` Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2022-03-21 16:52 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Lai Jiangshan, LKML, Linus Torvalds

Hello, Tetsuo.

On Sat, Mar 19, 2022 at 01:42:59PM +0900, Tetsuo Handa wrote:
> Since flush operation synchronously waits for completion, flushing
> kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
> due to unexpected locking dependency. Tejun Heo commented that it makes no
> sense at all to call flush_workqueue() on the shared WQs as the caller has
> no idea what it's gonna end up waiting for.
> 
> Although there is flush_scheduled_work() which flushes system_wq WQ with
> "Think twice before calling this function! It's very easy to get into
> trouble if you don't take great care." warning message, syzbot found a
> circular locking dependency caused by flushing system_long_wq WQ [1].
> Therefore, let's change the direction to that developers had better use
> their local WQs if flush_workqueue() is inevitable.
> 
> To give developers time to update their modules, for now just emit
> a warning message with ratelimit when flush_workqueue() is called on
> kernel-global WQs. We will eventually convert this warning message into
> WARN_ON() and kill flush_scheduled_work().
> 
> This patch introduces __WQ_NO_FLUSH flag for emitting warning. Don't set
> this flag when creating your local WQs while updating your module, for
> destroy_workqueue() will involve flush operation.
> 
> Theoretically, flushing specific work item using flush_work() queued on
> kernel-global WQs (which are !WQ_MEM_RECLAIM) has possibility of deadlock.
> But this patch does not emit warning when flush_work() is called on work
> items queued on kernel-global WQs, based on assumption that we can create
> kworker threads as needed and we won't hit max_active limit.

As noted in the other thread, we can float this sort of patches in -next to
suss out ones which aren't immediately obvious - e.g. something saves a
pointer to one of the system workqueues and flushes that, but if this is to
happen, somebody has to do the most of the legwork to convert most if not
all of the existing users, which isn't necessarily difficult but is tedious,
which is why it hasn't been done yet. So, if you wanna take on this, you
gotta take on the conversions too. Just declaring it so doesn't really work.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-03-20  6:06   ` Tetsuo Handa
  2022-03-20 16:59     ` Linus Torvalds
@ 2022-03-21 17:02     ` Tejun Heo
  2022-05-12 10:38       ` Dmitry Torokhov
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2022-03-21 17:02 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Linus Torvalds, Lai Jiangshan, LKML

Hello,

On Sun, Mar 20, 2022 at 03:06:30PM +0900, Tetsuo Handa wrote:
...
> What alternative are you expecting? We already have alternatives.
> This change (replacing system_wq with module's local workqueue as
> an example) is a matter of doing
> 
>   (1) add
> 
>         some_workqueue_for_that_module = alloc_workqueue("some_name", 0, 0);
> 
>       into module's __init function.
> 
>   (2) add
> 
>         destroy_workqueue(some_workqueue_for_that_module);
> 
>       into module's __exit function.
> 
>   (3) replace
> 
>         schedule_work(some_work);
> 
>       with
> 
>         queue_work(some_workqueue_for_that_module, some_work);
> 
>       throughout that module.
> 
>   (4) replace
> 
>         flush_scheduled_work();
> 
>       with
> 
>         flush_workqueue(some_workqueue_for_that_module);
> 
>       throughout that module.
> 
> if flush_scheduled_work() cannot be avoided.

I'm willing to bet that the majority of the use cases can be converted to
use flush_work() and that'd be the preference. We need a separate workqueue
iff the flush requrement is complex (e.g. there are multiple dynamic work
items in flight which need to be flushed together) or the work items needs
some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
to the system_wq users in the first place.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-03-21 17:02     ` Tejun Heo
@ 2022-05-12 10:38       ` Dmitry Torokhov
  2022-05-12 11:32         ` Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Torokhov @ 2022-05-12 10:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Tetsuo Handa, Linus Torvalds, Lai Jiangshan, LKML

Hi Tejun,

On Mon, Mar 21, 2022 at 07:02:45AM -1000, Tejun Heo wrote:
> I'm willing to bet that the majority of the use cases can be converted to
> use flush_work() and that'd be the preference. We need a separate workqueue
> iff the flush requrement is complex (e.g. there are multiple dynamic work
> items in flight which need to be flushed together) or the work items needs
> some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
> to the system_wq users in the first place.

This means that now the code has to keep track of all work items that it
allocated, instead of being able "fire and forget" works (when dealing
with extremely infrequent events) and rely on flush_workqueue() to
cleanup. That flush typically happens in module unload path, and I
wonder if the restriction on flush_workqueue() could be relaxed to allow
calling it on unload.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-05-12 10:38       ` Dmitry Torokhov
@ 2022-05-12 11:32         ` Tetsuo Handa
  2022-05-12 13:13           ` Dmitry Torokhov
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2022-05-12 11:32 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Linus Torvalds, Lai Jiangshan, LKML, Tejun Heo

On 2022/05/12 19:38, Dmitry Torokhov wrote:
> Hi Tejun,
> 
> On Mon, Mar 21, 2022 at 07:02:45AM -1000, Tejun Heo wrote:
>> I'm willing to bet that the majority of the use cases can be converted to
>> use flush_work() and that'd be the preference. We need a separate workqueue
>> iff the flush requrement is complex (e.g. there are multiple dynamic work
>> items in flight which need to be flushed together) or the work items needs
>> some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
>> to the system_wq users in the first place.
> 
> This means that now the code has to keep track of all work items that it
> allocated, instead of being able "fire and forget" works (when dealing
> with extremely infrequent events) and rely on flush_workqueue() to
> cleanup.

Yes. Moreover, a patch to catch and refuse at compile time was proposed at
https://lkml.kernel.org/r/738afe71-2983-05d5-f0fc-d94efbdf7634@I-love.SAKURA.ne.jp .

>          That flush typically happens in module unload path, and I
> wonder if the restriction on flush_workqueue() could be relaxed to allow
> calling it on unload.

A patch for drivers/input/mouse/psmouse-smbus.c is waiting for your response at
https://lkml.kernel.org/r/25e2b787-cb2c-fb0d-d62c-6577ad1cd9df@I-love.SAKURA.ne.jp .
Like many modules, flush_workqueue() happens on only module unload in your case.

We currently don't have a flag to tell whether the caller is inside module unload
path. And even inside module unload path, flushing the system-wide workqueue is
problematic under e.g. GFP_NOFS/GFP_NOIO context. Therefore, I don't think that
the caller is inside module unload path as a good exception.

Removing flush_scheduled_work() is for proactively avoiding new problems like
https://lkml.kernel.org/r/385ce718-f965-4005-56b6-34922c4533b8@I-love.SAKURA.ne.jp
and https://lkml.kernel.org/r/20220225112405.355599-10-Jerome.Pouiller@silabs.com .

Using local WQ also helps for documentation purpose.
This change makes clear where the work's dependency is.
Please grep the linux-next.git tree. Some have been already converted.

Any chance you have too many out-of-tree modules to convert?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-05-12 11:32         ` Tetsuo Handa
@ 2022-05-12 13:13           ` Dmitry Torokhov
  2022-05-12 16:56             ` Tejun Heo
  2022-05-16  1:56             ` Tetsuo Handa
  0 siblings, 2 replies; 11+ messages in thread
From: Dmitry Torokhov @ 2022-05-12 13:13 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Linus Torvalds, Lai Jiangshan, LKML, Tejun Heo

On Thu, May 12, 2022 at 08:32:10PM +0900, Tetsuo Handa wrote:
> On 2022/05/12 19:38, Dmitry Torokhov wrote:
> > Hi Tejun,
> > 
> > On Mon, Mar 21, 2022 at 07:02:45AM -1000, Tejun Heo wrote:
> >> I'm willing to bet that the majority of the use cases can be converted to
> >> use flush_work() and that'd be the preference. We need a separate workqueue
> >> iff the flush requrement is complex (e.g. there are multiple dynamic work
> >> items in flight which need to be flushed together) or the work items needs
> >> some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
> >> to the system_wq users in the first place.
> > 
> > This means that now the code has to keep track of all work items that it
> > allocated, instead of being able "fire and forget" works (when dealing
> > with extremely infrequent events) and rely on flush_workqueue() to
> > cleanup.
> 
> Yes. Moreover, a patch to catch and refuse at compile time was proposed at
> https://lkml.kernel.org/r/738afe71-2983-05d5-f0fc-d94efbdf7634@I-love.SAKURA.ne.jp .

My comment was not a wholesale endorsement of Tejun's statement, but
rather a note of the fact that it again adds complexity (at least as far
as driver writers are concerned) to the kernel code.

Also as far as I can see the patch was rejected.

> 
> >          That flush typically happens in module unload path, and I
> > wonder if the restriction on flush_workqueue() could be relaxed to allow
> > calling it on unload.
> 
> A patch for drivers/input/mouse/psmouse-smbus.c is waiting for your response at
> https://lkml.kernel.org/r/25e2b787-cb2c-fb0d-d62c-6577ad1cd9df@I-love.SAKURA.ne.jp .
> Like many modules, flush_workqueue() happens on only module unload in your case.

Yes, I saw that patch, and that is what prompted my response. I find it
adding complexity and I was wondering if it could be avoided. It also
unclear to me if there is an additional cost coming from allocating a
dedicated workqueue.

> 
> We currently don't have a flag to tell whether the caller is inside module unload
> path. And even inside module unload path, flushing the system-wide workqueue is
> problematic under e.g. GFP_NOFS/GFP_NOIO context.

Sorry, I do not follow here. Are there module unloading code that runs
as GFP_NOFS/GFP_NOIO?

> Therefore, I don't think that
> the caller is inside module unload path as a good exception.
> 
> Removing flush_scheduled_work() is for proactively avoiding new problems like
> https://lkml.kernel.org/r/385ce718-f965-4005-56b6-34922c4533b8@I-love.SAKURA.ne.jp
> and https://lkml.kernel.org/r/20220225112405.355599-10-Jerome.Pouiller@silabs.com .
> 
> Using local WQ also helps for documentation purpose.
> This change makes clear where the work's dependency is.
> Please grep the linux-next.git tree. Some have been already converted.

I understand that for some of them the change makes sense, but it would
be nice to continue using simple API under limited circumstances.

> 
> Any chance you have too many out-of-tree modules to convert?
> 

No, we are trying to get everything upstream.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-05-12 13:13           ` Dmitry Torokhov
@ 2022-05-12 16:56             ` Tejun Heo
  2022-05-16  1:56             ` Tetsuo Handa
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2022-05-12 16:56 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Tetsuo Handa, Linus Torvalds, Lai Jiangshan, LKML

Hello, Dmitry.

On Thu, May 12, 2022 at 06:13:35AM -0700, Dmitry Torokhov wrote:
> > > This means that now the code has to keep track of all work items that it
> > > allocated, instead of being able "fire and forget" works (when dealing
> > > with extremely infrequent events) and rely on flush_workqueue() to
> > > cleanup.
> > 
> > Yes. Moreover, a patch to catch and refuse at compile time was proposed at
> > https://lkml.kernel.org/r/738afe71-2983-05d5-f0fc-d94efbdf7634@I-love.SAKURA.ne.jp .
> 
> My comment was not a wholesale endorsement of Tejun's statement, but
> rather a note of the fact that it again adds complexity (at least as far
> as driver writers are concerned) to the kernel code.

I was more thinking about cases where there are a small number of static
work items. If there are multiple dynamic work items, creating a workqueue
as a flush domain is the way to go. It does add a bit of complexity but
shouldn't be too bad - e.g. it just adds the alloc_workqueue() during init
and the exit path can simply use destroy_workqueue() instead of flush.

> > >          That flush typically happens in module unload path, and I
> > > wonder if the restriction on flush_workqueue() could be relaxed to allow
> > > calling it on unload.
> > 
> > A patch for drivers/input/mouse/psmouse-smbus.c is waiting for your response at
> > https://lkml.kernel.org/r/25e2b787-cb2c-fb0d-d62c-6577ad1cd9df@I-love.SAKURA.ne.jp .
> > Like many modules, flush_workqueue() happens on only module unload in your case.
> 
> Yes, I saw that patch, and that is what prompted my response. I find it
> adding complexity and I was wondering if it could be avoided. It also
> unclear to me if there is an additional cost coming from allocating a
> dedicated workqueue.

A workqueue without WQ_RECLAIM is really cheap. All it does is tracking
what's in flight for that particular frontend while interfacing with the
shared worker pool.

> I understand that for some of them the change makes sense, but it would
> be nice to continue using simple API under limited circumstances.

Hmmm... unfortunately, I can't think of a way to guarantee that a module
unloading path can't get involved in a deadlock scenario through system_wq.
Given that the added complexity should be something like half a dozen lines
of code, switching to separte workqueues feels like the right direction to
me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues
  2022-05-12 13:13           ` Dmitry Torokhov
  2022-05-12 16:56             ` Tejun Heo
@ 2022-05-16  1:56             ` Tetsuo Handa
  1 sibling, 0 replies; 11+ messages in thread
From: Tetsuo Handa @ 2022-05-16  1:56 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Linus Torvalds, Lai Jiangshan, LKML, Tejun Heo

On 2022/05/12 22:13, Dmitry Torokhov wrote:
> Also as far as I can see the patch was rejected.

Not rejected. I sent
https://lkml.kernel.org/r/7b2fecdb-59ae-2c54-5a5b-774ef7054d1b@I-love.SAKURA.ne.jp
for 5.19.

>> We currently don't have a flag to tell whether the caller is inside module unload
>> path. And even inside module unload path, flushing the system-wide workqueue is
>> problematic under e.g. GFP_NOFS/GFP_NOIO context.
> 
> Sorry, I do not follow here. Are there module unloading code that runs
> as GFP_NOFS/GFP_NOIO?

It is GFP_KERNEL context when module's __exit function is called. But whether
flush_workqueue() is called from restricted context depends on what locks does
the module's __exit function hold.

If request_module() is called from some work function using one of system-wide WQs,
and flush_workqueue() is called on that WQ from module's __exit function, the kernel
might deadlock on module_mutex lock. Making sure that flush_workqueue() is not called
on system-wide WQs is the safer choice.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-16  1:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-19  4:42 [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues Tetsuo Handa
2022-03-19 17:15 ` Linus Torvalds
2022-03-20  6:06   ` Tetsuo Handa
2022-03-20 16:59     ` Linus Torvalds
2022-03-21 17:02     ` Tejun Heo
2022-05-12 10:38       ` Dmitry Torokhov
2022-05-12 11:32         ` Tetsuo Handa
2022-05-12 13:13           ` Dmitry Torokhov
2022-05-12 16:56             ` Tejun Heo
2022-05-16  1:56             ` Tetsuo Handa
2022-03-21 16:52 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).