linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
       [not found]     ` <YjS+Jr6QudSKMSGy@slm.duckdns.org>
@ 2022-03-19  2:02       ` Tetsuo Handa
  2022-03-21 16:55         ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2022-03-19  2:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On 2022/03/19 2:15, Tejun Heo wrote:
> Hello,
> 
> On Fri, Mar 18, 2022 at 09:05:42PM +0900, Tetsuo Handa wrote:
>> But since include/linux/workqueue.h only says
>>
>> 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
>>
>> , I can't tell when not to specify __WQ_LEGACY and WQ_MEM_RECLAIM together...
>>
>> Tejun, what is the intent of this warning? Can the description of __WQ_LEGACY flag
>> be updated? I think that the loop module had better reserve one "struct task_struct"
>> for each loop device.
>>
>> I guess that, in general, waiting for a work in !WQ_MEM_RECLAIM WQ from a
>> WQ_MEM_RECLAIM WQ is dangerous because that work may not be able to find
>> "struct task_struct" for processing that work. Then, what we should do is to
>> create mp->m_sync_workqueue with WQ_MEM_RECLAIM flag added instead of creating
>> lo->workqueue with __WQ_LEGACY + WQ_MEM_RECLAIM flags added...
>>
>> Is __WQ_LEGACY + WQ_MEM_RECLAIM combination a hack for silencing this warning
>> without fixing various WQs used by xfs and other filesystems?
> 
> So, create_workqueue() is the deprecated interface and always imples
> MEM_RECLAIM because back when the interface was added each wq had a
> dedicated worker and there's no way to tell one way or the other. The
> warning is telling you to convert the workqueue to the alloc_workqueue()
> interface and explicitly use WQ_MEM_RECLAIM flag if the workqueue is gonna
> participate in MEM_RECLAIM chain.

Is the intent of __WQ_LEGACY flag to indicate that "this WQ was created
using deprecated interface" ? But such intention no longer holds true.

Despite __WQ_LEGACY flag is described as "internal: create*_workqueue()",
tegra194_cpufreq_probe()/scsi_add_host_with_dma()/iscsi_host_alloc()/
iscsi_transport_init() are passing __WQ_LEGACY flag using alloc_workqueue()
interface. Therefore, __WQ_LEGACY flag is no longer a meaningful indicator of
"internal: create*_workqueue()". Description for __WQ_LEGACY flag needs an
update.

Here is an example program which reproduces

	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
		  "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps",
		  worker->current_pwq->wq->name, worker->current_func,
		  target_wq->name, target_func);

reported in https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ .

---------- test.c ----------
#include <linux/module.h>
#include <linux/sched.h>

static struct workqueue_struct *wq1;
static struct workqueue_struct *wq2;
static struct work_struct w1;
static struct work_struct w2;

static void wq2_workfn(struct work_struct *work)
{
}

static void wq1_workfn(struct work_struct *work)
{
	INIT_WORK(&w2, wq2_workfn);
	queue_work(wq2, &w2);
	flush_work(&w2);
}

static int __init wq_test_init(void)
{
	wq1 = alloc_workqueue("wq1", WQ_MEM_RECLAIM, 0);
	wq2 = alloc_workqueue("wq2", 0, 0);
	INIT_WORK(&w1, wq1_workfn);
	queue_work(wq1, &w1);
	flush_work(&w1);
	destroy_workqueue(wq2);
	destroy_workqueue(wq1);
	return -EINVAL;
}

module_init(wq_test_init);
MODULE_LICENSE("GPL");
---------- test.c ----------

----------
[  152.666153] test: loading out-of-tree module taints kernel.
[  152.673510] ------------[ cut here ]------------
[  152.675765] workqueue: WQ_MEM_RECLAIM wq1:wq1_workfn [test] is flushing !WQ_MEM_RECLAIM wq2:wq2_workfn [test]
[  152.675790] WARNING: CPU: 0 PID: 259 at kernel/workqueue.c:2650 check_flush_dependency+0x169/0x170
[  152.682636] Modules linked in: test(O+) loop dm_mod dax serio_raw sg vmw_vmci fuse drm sd_mod t10_pi ata_generic pata_acpi ahci libahci ata_piix mptspi mptscsih i2c_piix4 mptbase i2c_core libata scsi_transport_spi e1000
[  152.690020] CPU: 0 PID: 259 Comm: kworker/0:2 Kdump: loaded Tainted: G           O      5.17.0-rc8+ #72
[  152.693869] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[  152.697764] Workqueue: wq1 wq1_workfn [test]
[  152.699762] RIP: 0010:check_flush_dependency+0x169/0x170
[  152.701817] Code: 8d 7f 18 e8 89 84 3a 00 49 8b 57 18 49 81 c5 60 01 00 00 48 c7 c7 60 f4 86 82 4c 89 e6 4c 89 e9 4d 89 f0 31 c0 e8 c7 80 fc ff <0f> 0b e9 61 ff ff ff 55 41 57 41 56 41 55 41 54 53 48 83 ec 18 48
[  152.709031] RSP: 0018:ffff8881110a7b00 EFLAGS: 00010046
[  152.711434] RAX: 19db87cad24ebb00 RBX: ffffe8ffffa4cc00 RCX: 0000000000000002
[  152.714781] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: ffffffff84a84f00
[  152.717334] RBP: 0000000000000000 R08: dffffc0000000000 R09: ffffed1023546ef8
[  152.723543] R10: ffffed1023546ef8 R11: 1ffff11023546ef7 R12: ffff888113e75160
[  152.729068] R13: ffff888113e70f60 R14: ffffffffa0848090 R15: ffff8881014c3528
[  152.731834] FS:  0000000000000000(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000
[  152.734735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  152.736840] CR2: 0000557cfd128768 CR3: 000000007216f006 CR4: 00000000001706f0
[  152.739790] Call Trace:
[  152.740715]  <TASK>
[  152.742072]  start_flush_work+0xf9/0x440
[  152.746516]  __flush_work+0xed/0x170
[  152.747845]  ? flush_work+0x10/0x10
[  152.749240]  ? __queue_work+0x558/0x5b0
[  152.750648]  ? queue_work_on+0xe0/0x160
[  152.752036]  ? lockdep_hardirqs_on+0xe6/0x170
[  152.753757]  ? queue_work_on+0xed/0x160
[  152.755546]  ? wq_worker_last_func+0x20/0x20
[  152.757177]  ? rcu_read_lock_sched_held+0x87/0x100
[  152.758960]  ? perf_trace_rcu_stall_warning+0x210/0x210
[  152.760929]  process_one_work+0x45e/0x6b0
[  152.762587]  ? rescuer_thread+0x9f0/0x9f0
[  152.764332]  ? _raw_spin_lock_irqsave+0xf0/0xf0
[  152.766110]  worker_thread+0x4d7/0x960
[  152.767523]  ? _raw_spin_unlock+0x40/0x40
[  152.769146]  ? preempt_count_sub+0xf/0xc0
[  152.770595]  ? _raw_spin_unlock_irqrestore+0xb2/0x110
[  152.773091]  ? rcu_lock_release+0x20/0x20
[  152.774637]  kthread+0x178/0x1a0
[  152.775819]  ? kthread_blkcg+0x50/0x50
[  152.777228]  ret_from_fork+0x1f/0x30
[  152.778603]  </TASK>
[  152.779427] irq event stamp: 12002
[  152.782443] hardirqs last  enabled at (12001): [<ffffffff81102620>] queue_work_on+0xe0/0x160
[  152.786186] hardirqs last disabled at (12002): [<ffffffff8237473b>] _raw_spin_lock_irq+0x7b/0xe0
[  152.789707] softirqs last  enabled at (11996): [<ffffffff810d9105>] irq_exit_rcu+0xb5/0x100
[  152.792767] softirqs last disabled at (11971): [<ffffffff810d9105>] irq_exit_rcu+0xb5/0x100
[  152.795802] ---[ end trace 0000000000000000 ]---
----------

But if I do

-	wq1 = alloc_workqueue("wq1", WQ_MEM_RECLAIM, 0);
+	wq1 = alloc_workqueue("wq1", __WQ_LEGACY | WQ_MEM_RECLAIM, 0);

, this warning goes away. Therefore, it seems to me that __WQ_LEGACY flag is used
in combination with WQ_MEM_RECLAIM flag in order to ask check_flush_dependency()
not to emit this warning when we cannot afford doing

-	wq2 = alloc_workqueue("wq2", 0, 0);
+	wq2 = alloc_workqueue("wq2", WQ_MEM_RECLAIM, 0);

because the owner of wq1 and wq2 differs.

Given that the legacy create_workqueue() interface always implied WQ_MEM_RECLAIM flag,
maybe it is better to make alloc_workqueue() interface WQ_MEM_RECLAIM by default.
That is, obsolete WQ_MEM_RECLAIM flag and __WQ_LEGACY flag, and introduce a new flag
(e.g. WQ_MAY_SHARE_WORKER) which is passed to alloc_workqueue() interface only when
it is absolutely confident that this WQ never participates in memory reclaim path and
never participates in flush_workqueue()/flush_work() operation.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-19  2:02       ` [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue Tetsuo Handa
@ 2022-03-21 16:55         ` Tejun Heo
  2022-03-21 22:53           ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2022-03-21 16:55 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

Hello,

On Sat, Mar 19, 2022 at 11:02:51AM +0900, Tetsuo Handa wrote:
> Is the intent of __WQ_LEGACY flag to indicate that "this WQ was created
> using deprecated interface" ? But such intention no longer holds true.
> 
> Despite __WQ_LEGACY flag is described as "internal: create*_workqueue()",
> tegra194_cpufreq_probe()/scsi_add_host_with_dma()/iscsi_host_alloc()/
> iscsi_transport_init() are passing __WQ_LEGACY flag using alloc_workqueue()
> interface. Therefore, __WQ_LEGACY flag is no longer a meaningful indicator of
> "internal: create*_workqueue()". Description for __WQ_LEGACY flag needs an
> update.
...
> Given that the legacy create_workqueue() interface always implied WQ_MEM_RECLAIM flag,
>
> maybe it is better to make alloc_workqueue() interface WQ_MEM_RECLAIM by default.

That actually is pretty expensive when added up, which is why we went for
the shared worker pool model in the first place.

> That is, obsolete WQ_MEM_RECLAIM flag and __WQ_LEGACY flag, and introduce a new flag
> (e.g. WQ_MAY_SHARE_WORKER) which is passed to alloc_workqueue() interface only when
> it is absolutely confident that this WQ never participates in memory reclaim path and
> never participates in flush_workqueue()/flush_work() operation.

No, just fix the abusers. There are four abusers in the kernel and they
aren't difficult to fix.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-21 16:55         ` Tejun Heo
@ 2022-03-21 22:53           ` Tetsuo Handa
  2022-03-21 23:04             ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2022-03-21 22:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On 2022/03/22 1:55, Tejun Heo wrote:
> No, just fix the abusers. There are four abusers in the kernel and they
> aren't difficult to fix.

So, are you expecting that a change shown below happens, by adding WQ_MEM_RECLAIM
flag to all WQs which may hit "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing
!WQ_MEM_RECLAIM %s:%ps" warning? Otherwise, __WQ_LEGACY flag will continue
serving as a hack for suppressing this warning.

---
 drivers/cpufreq/tegra194-cpufreq.c  | 2 +-
 drivers/scsi/hosts.c                | 2 +-
 drivers/scsi/libiscsi.c             | 2 +-
 drivers/scsi/scsi_transport_iscsi.c | 2 +-
 include/linux/workqueue.h           | 7 +++----
 kernel/workqueue.c                  | 3 +--
 6 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c
index ac381db25dbe..5d72ef07f9ed 100644
--- a/drivers/cpufreq/tegra194-cpufreq.c
+++ b/drivers/cpufreq/tegra194-cpufreq.c
@@ -379,7 +379,7 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev)
 	if (IS_ERR(bpmp))
 		return PTR_ERR(bpmp);
 
-	read_counters_wq = alloc_workqueue("read_counters_wq", __WQ_LEGACY, 1);
+	read_counters_wq = alloc_workqueue("read_counters_wq", 0, 1);
 	if (!read_counters_wq) {
 		dev_err(&pdev->dev, "fail to create_workqueue\n");
 		err = -EINVAL;
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f69b77cbf538..4485f65d5e92 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -277,7 +277,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 		snprintf(shost->work_q_name, sizeof(shost->work_q_name),
 			 "scsi_wq_%d", shost->host_no);
 		shost->work_q = alloc_workqueue("%s",
-			WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND,
+			WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND,
 			1, shost->work_q_name);
 
 		if (!shost->work_q) {
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 059dae8909ee..c923d2a1086e 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2801,7 +2801,7 @@ struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
 		snprintf(ihost->workq_name, sizeof(ihost->workq_name),
 			"iscsi_q_%d", shost->host_no);
 		ihost->workq = alloc_workqueue("%s",
-			WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND,
+			WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND,
 			1, ihost->workq_name);
 		if (!ihost->workq)
 			goto free_host;
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 554b6f784223..b4f0b7584112 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -4876,7 +4876,7 @@ static __init int iscsi_transport_init(void)
 	}
 
 	iscsi_eh_timer_workq = alloc_workqueue("%s",
-			WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND,
+			WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND,
 			1, "iscsi_eh");
 	if (!iscsi_eh_timer_workq) {
 		err = -ENOMEM;
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..666c4cc73f6b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -337,7 +337,6 @@ enum {
 
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
-	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
 	__WQ_ORDERED_EXPLICIT	= 1 << 19, /* internal: alloc_ordered_workqueue() */
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
@@ -420,12 +419,12 @@ alloc_workqueue(const char *fmt, unsigned int flags, int max_active, ...);
 			__WQ_ORDERED_EXPLICIT | (flags), 1, ##args)
 
 #define create_workqueue(name)						\
-	alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
+	alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name))
 #define create_freezable_workqueue(name)				\
-	alloc_workqueue("%s", __WQ_LEGACY | WQ_FREEZABLE | WQ_UNBOUND |	\
+	alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND |	\
 			WQ_MEM_RECLAIM, 1, (name))
 #define create_singlethread_workqueue(name)				\
-	alloc_ordered_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, name)
+	alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM, name)
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..aba3b1505292 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2643,8 +2643,7 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
 	WARN_ONCE(current->flags & PF_MEMALLOC,
 		  "workqueue: PF_MEMALLOC task %d(%s) is flushing !WQ_MEM_RECLAIM %s:%ps",
 		  current->pid, current->comm, target_wq->name, target_func);
-	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
-			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
+	WARN_ONCE(worker && (worker->current_pwq->wq->flags & WQ_MEM_RECLAIM),
 		  "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps",
 		  worker->current_pwq->wq->name, worker->current_func,
 		  target_wq->name, target_func);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-21 22:53           ` Tetsuo Handa
@ 2022-03-21 23:04             ` Tejun Heo
  2022-03-21 23:17               ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2022-03-21 23:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

Hello,

On Tue, Mar 22, 2022 at 07:53:49AM +0900, Tetsuo Handa wrote:
> On 2022/03/22 1:55, Tejun Heo wrote:
> > No, just fix the abusers. There are four abusers in the kernel and they
> > aren't difficult to fix.
> 
> So, are you expecting that a change shown below happens, by adding WQ_MEM_RECLAIM
> flag to all WQs which may hit "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing
> !WQ_MEM_RECLAIM %s:%ps" warning? Otherwise, __WQ_LEGACY flag will continue
> serving as a hack for suppressing this warning.

If you convert them to all of them in the flush chain to use
alloc_workqueue() w/ MEM_RECLAIM, the warning will go away.

>  #define create_workqueue(name)						\
> -	alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
> +	alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name))
>  #define create_freezable_workqueue(name)				\
> -	alloc_workqueue("%s", __WQ_LEGACY | WQ_FREEZABLE | WQ_UNBOUND |	\
> +	alloc_workqueue("%s", WQ_FREEZABLE | WQ_UNBOUND |	\
>  			WQ_MEM_RECLAIM, 1, (name))
>  #define create_singlethread_workqueue(name)				\
> -	alloc_ordered_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, name)
> +	alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM, name)

But why are you dropping the flag from their intended users?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-21 23:04             ` Tejun Heo
@ 2022-03-21 23:17               ` Tetsuo Handa
  2022-03-21 23:27                 ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2022-03-21 23:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On 2022/03/22 8:04, Tejun Heo wrote:
> But why are you dropping the flag from their intended users?

As far as I can see, the only difference __WQ_LEGACY makes is whether
"workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps"
warning is printed or not. What are the intended users?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-21 23:17               ` Tetsuo Handa
@ 2022-03-21 23:27                 ` Tejun Heo
  2022-03-22  0:09                   ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2022-03-21 23:27 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On Tue, Mar 22, 2022 at 08:17:43AM +0900, Tetsuo Handa wrote:
> On 2022/03/22 8:04, Tejun Heo wrote:
> > But why are you dropping the flag from their intended users?
> 
> As far as I can see, the only difference __WQ_LEGACY makes is whether
> "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps"
> warning is printed or not. What are the intended users?

The old create_workqueue() and friends always imply WQ_MEM_RECLAIM because
they all used to be served dedicated kthreads. The growing number of
kthreads used this way became a headache. There were too many of these
kthreads sitting around doing nothing. In some niche configurations, they
ate up enough PIDs to cause boot failrues.

To address the issue, the new implementation made the workqueues share pools
of workers. However, this means that forward progress can't be guaranteed
under memory pressure, so workqueues which are depended upon during memory
reclaim now need to set WQ_MEM_RECLAIM explicitly to request a dedicated
rescuer thread.

The legacy flushing warning is telling us that those workqueues can be
converted to alloc_workqueue + WQ_MEM_RECLAIM as we know them to be
participating in memory reclaim (as they're flushing one of the explicitly
marked workqueues). So, if you spot them, the right thing to do is
converting all the involved workqueues to use alloc_workqueue() +
WQ_MEM_RECLAIM.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-21 23:27                 ` Tejun Heo
@ 2022-03-22  0:09                   ` Tetsuo Handa
  2022-03-22 16:52                     ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2022-03-22  0:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On 2022/03/22 8:27, Tejun Heo wrote:
> On Tue, Mar 22, 2022 at 08:17:43AM +0900, Tetsuo Handa wrote:
>> On 2022/03/22 8:04, Tejun Heo wrote:
>>> But why are you dropping the flag from their intended users?
>>
>> As far as I can see, the only difference __WQ_LEGACY makes is whether
>> "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps"
>> warning is printed or not. What are the intended users?
> 
> The old create_workqueue() and friends always imply WQ_MEM_RECLAIM because
> they all used to be served dedicated kthreads. The growing number of
> kthreads used this way became a headache. There were too many of these
> kthreads sitting around doing nothing. In some niche configurations, they
> ate up enough PIDs to cause boot failrues.

OK.

> 
> To address the issue, the new implementation made the workqueues share pools
> of workers. However, this means that forward progress can't be guaranteed
> under memory pressure, so workqueues which are depended upon during memory
> reclaim now need to set WQ_MEM_RECLAIM explicitly to request a dedicated
> rescuer thread.

OK.

> 
> The legacy flushing warning is telling us that those workqueues can be

s/can be/must be/ ?

> converted to alloc_workqueue + WQ_MEM_RECLAIM as we know them to be
> participating in memory reclaim (as they're flushing one of the explicitly
> marked workqueues). So, if you spot them, the right thing to do is
> converting all the involved workqueues to use alloc_workqueue() +
> WQ_MEM_RECLAIM.

Then, can the description of

	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */

be improved to something like

	/*
	 * This flag disables deadlock detection which can happen when flushing
	 * a work item in !WQ_MEM_RECLAIM workqueue from WQ_MEM_RECLAIM workqueue.
	 * But try to avoid using this flag, by adding WQ_MEM_RECLAIM to all WQs which
	 * can be involved where a guarantee of forward progress under memory pressure
	 * is required.
	 */

? Current /* internal: create*_workqueue() */ tells me nothing.



My question is: I want to add WQ_MEM_RECLAIM flag to the WQ used by loop module
because this WQ is involved upon writeback operation. But unless I add both
__WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module, we will hit

	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),

warning because e.g. xfs-sync WQ used by xfs module is not using WQ_MEM_RECLAIM flag.

	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
				XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);

You are suggesting that the correct approach is to add WQ_MEM_RECLAIM flag to WQs
used by filesystems when adding WQ_MEM_RECLAIM flag to the WQ used by loop module
introduces possibility of hitting

	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),

warning (instead of either adding __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used
by loop module or giving up WQ_MEM_RECLAIM flag for the WQ used by loop module),
correct?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22  0:09                   ` Tetsuo Handa
@ 2022-03-22 16:52                     ` Tejun Heo
  2022-03-22 22:00                       ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2022-03-22 16:52 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

Hello,

On Tue, Mar 22, 2022 at 09:09:53AM +0900, Tetsuo Handa wrote:
> > The legacy flushing warning is telling us that those workqueues can be
> 
> s/can be/must be/ ?

Well, one thing that we can but don't want to do is converting all
create_workqueue() users to alloc_workqueue() with MEM_RECLAIM, which is
wasteful but won't break anything. We know for sure that the workqueues
which trigger the legacy warning are participating in memory reclaim and
thus need MEM_RECLAIM. So, yeah, the warning tells us that they need
MEM_RECLAIM and should be converted.

> ? Current /* internal: create*_workqueue() */ tells me nothing.

It's trying to say that it shouldn't be used outside workqueue proper and
the warning message is supposed to trigger the conversion. But, yeah, a
stronger comment can help.

> My question is: I want to add WQ_MEM_RECLAIM flag to the WQ used by loop module
> because this WQ is involved upon writeback operation. But unless I add both
> __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module, we will hit
> 
> 	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
> 			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
> 
> warning because e.g. xfs-sync WQ used by xfs module is not using WQ_MEM_RECLAIM flag.
> 
> 	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
> 				XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);
> 
> You are suggesting that the correct approach is to add WQ_MEM_RECLAIM flag to WQs
> used by filesystems when adding WQ_MEM_RECLAIM flag to the WQ used by loop module
> introduces possibility of hitting
> 
> 	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
> 			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
> 
> warning (instead of either adding __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used
> by loop module or giving up WQ_MEM_RECLAIM flag for the WQ used by loop module),
> correct?

Yeah, you detected multiple issues at the same time. xfs sync is
participating in memory reclaim but doesn't have MEM_RECLAIM and loop is
marked with LEGACY but flushing other workqueues which are MEM_RECLIAM. So,
both xfs and loop workqueues need to be explicitly marked with MEM_RECLAIM.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 16:52                     ` Tejun Heo
@ 2022-03-22 22:00                       ` Dave Chinner
  2022-03-22 22:02                         ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2022-03-22 22:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tetsuo Handa, Dan Schatzberg, Jens Axboe, Ming Lei,
	Andrew Morton, Jan Kara, Christoph Hellwig, linux-block,
	linux-xfs

On Tue, Mar 22, 2022 at 06:52:14AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 22, 2022 at 09:09:53AM +0900, Tetsuo Handa wrote:
> > > The legacy flushing warning is telling us that those workqueues can be
> > 
> > s/can be/must be/ ?
> 
> Well, one thing that we can but don't want to do is converting all
> create_workqueue() users to alloc_workqueue() with MEM_RECLAIM, which is
> wasteful but won't break anything. We know for sure that the workqueues
> which trigger the legacy warning are participating in memory reclaim and
> thus need MEM_RECLAIM. So, yeah, the warning tells us that they need
> MEM_RECLAIM and should be converted.
> 
> > ? Current /* internal: create*_workqueue() */ tells me nothing.
> 
> It's trying to say that it shouldn't be used outside workqueue proper and
> the warning message is supposed to trigger the conversion. But, yeah, a
> stronger comment can help.
> 
> > My question is: I want to add WQ_MEM_RECLAIM flag to the WQ used by loop module
> > because this WQ is involved upon writeback operation. But unless I add both
> > __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used by loop module, we will hit
> > 
> > 	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
> > 			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
> > 
> > warning because e.g. xfs-sync WQ used by xfs module is not using WQ_MEM_RECLAIM flag.
> > 
> > 	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
> > 				XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);
> > 
> > You are suggesting that the correct approach is to add WQ_MEM_RECLAIM flag to WQs
> > used by filesystems when adding WQ_MEM_RECLAIM flag to the WQ used by loop module
> > introduces possibility of hitting
> > 
> > 	WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
> > 			      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
> > 
> > warning (instead of either adding __WQ_LEGACY | WQ_MEM_RECLAIM flags to the WQ used
> > by loop module or giving up WQ_MEM_RECLAIM flag for the WQ used by loop module),
> > correct?
> 
> Yeah, you detected multiple issues at the same time. xfs sync is
> participating in memory reclaim

No it isn't. What makes you think it is part of memory reclaim?

The xfs-sync workqueue exists solely to perform async flushes of
dirty data at ENOSPC via sync_inodes_sb() because we can't call
sync_inodes_sb directly in the context that hit ENOSPC due to locks
and transaction contexts held. The paths that need this are
buffered writes and file create (on disk inode allocation), neither
of which are in the the memory reclaim path, either.

So this work has nothing to do with memory reclaim, and as such it's
not tagged with WQ_MEM_RECLAIM.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 22:00                       ` Dave Chinner
@ 2022-03-22 22:02                         ` Tejun Heo
  2022-03-22 22:05                           ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2022-03-22 22:02 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Tetsuo Handa, Dan Schatzberg, Jens Axboe, Ming Lei,
	Andrew Morton, Jan Kara, Christoph Hellwig, linux-block,
	linux-xfs

Hello,

On Wed, Mar 23, 2022 at 09:00:07AM +1100, Dave Chinner wrote:
> > Yeah, you detected multiple issues at the same time. xfs sync is
> > participating in memory reclaim
> 
> No it isn't. What makes you think it is part of memory reclaim?
> 
> The xfs-sync workqueue exists solely to perform async flushes of
> dirty data at ENOSPC via sync_inodes_sb() because we can't call
> sync_inodes_sb directly in the context that hit ENOSPC due to locks
> and transaction contexts held. The paths that need this are
> buffered writes and file create (on disk inode allocation), neither
> of which are in the the memory reclaim path, either.
> 
> So this work has nothing to do with memory reclaim, and as such it's
> not tagged with WQ_MEM_RECLAIM.

Hmmm... yeah, I actually don't know the exact dependency here and the
dependency may not be real - e.g. the conclusion might be that loop is
conflating different uses and needs to split its use of workqueues into two
separate ones. Tetsuo, can you post more details on the warning that you're
seeing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 22:02                         ` Tejun Heo
@ 2022-03-22 22:05                           ` Tetsuo Handa
  2022-03-22 22:19                             ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2022-03-22 22:05 UTC (permalink / raw)
  To: Tejun Heo, Dave Chinner
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On 2022/03/23 7:02, Tejun Heo wrote:
> Hello,
> 
> On Wed, Mar 23, 2022 at 09:00:07AM +1100, Dave Chinner wrote:
>>> Yeah, you detected multiple issues at the same time. xfs sync is
>>> participating in memory reclaim
>>
>> No it isn't. What makes you think it is part of memory reclaim?
>>
>> The xfs-sync workqueue exists solely to perform async flushes of
>> dirty data at ENOSPC via sync_inodes_sb() because we can't call
>> sync_inodes_sb directly in the context that hit ENOSPC due to locks
>> and transaction contexts held. The paths that need this are
>> buffered writes and file create (on disk inode allocation), neither
>> of which are in the the memory reclaim path, either.
>>
>> So this work has nothing to do with memory reclaim, and as such it's
>> not tagged with WQ_MEM_RECLAIM.
> 
> Hmmm... yeah, I actually don't know the exact dependency here and the
> dependency may not be real - e.g. the conclusion might be that loop is
> conflating different uses and needs to split its use of workqueues into two
> separate ones. Tetsuo, can you post more details on the warning that you're
> seeing?
> 

It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ .

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 22:05                           ` Tetsuo Handa
@ 2022-03-22 22:19                             ` Tejun Heo
  2022-03-22 22:59                               ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2022-03-22 22:19 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dave Chinner, Dan Schatzberg, Jens Axboe, Ming Lei,
	Andrew Morton, Jan Kara, Christoph Hellwig, linux-block,
	linux-xfs

On Wed, Mar 23, 2022 at 07:05:56AM +0900, Tetsuo Handa wrote:
> > Hmmm... yeah, I actually don't know the exact dependency here and the
> > dependency may not be real - e.g. the conclusion might be that loop is
> > conflating different uses and needs to split its use of workqueues into two
> > separate ones. Tetsuo, can you post more details on the warning that you're
> > seeing?
> > 
> 
> It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ .

Looks like a correct dependency to me. The work item is being flushed from
good old write path. Dave?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 22:19                             ` Tejun Heo
@ 2022-03-22 22:59                               ` Dave Chinner
  2022-03-22 23:32                                 ` Tetsuo Handa
  2022-03-23  0:09                                 ` Tejun Heo
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2022-03-22 22:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tetsuo Handa, Dan Schatzberg, Jens Axboe, Ming Lei,
	Andrew Morton, Jan Kara, Christoph Hellwig, linux-block,
	linux-xfs

On Tue, Mar 22, 2022 at 12:19:40PM -1000, Tejun Heo wrote:
> On Wed, Mar 23, 2022 at 07:05:56AM +0900, Tetsuo Handa wrote:
> > > Hmmm... yeah, I actually don't know the exact dependency here and the
> > > dependency may not be real - e.g. the conclusion might be that loop is
> > > conflating different uses and needs to split its use of workqueues into two
> > > separate ones. Tetsuo, can you post more details on the warning that you're
> > > seeing?
> > > 
> > 
> > It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ .
> 
> Looks like a correct dependency to me. The work item is being flushed from
> good old write path. Dave?

The filesystem buffered write IO path isn't part of memory reclaim -
it's a user IO path and I think most filesystems will treat it that
way.

We've had similar layering problems with the loop IO path implyingi
GFP_NOFS must be used by filesystems allocating memory in the IO
path - we solved that by requiring the loop IO submission context
(loop_process_work()) to set PF_MEMALLOC_NOIO so that it didn't
deadlock anywhere in the underlying filesystems that have no idea
that the loop device has added memory reclaim constraints to the IO
path.

This seems like it's the same layering problem - syscall facing IO
paths are designed for incoming IO from user context, not outgoing
writeback IO from memory reclaim contexts. Memory reclaim contexts
are supposed to use back end filesystem operations like
->writepages() to flush dirty data when necessary.

If the loop device IO mechanism means that every ->write_iter path
needs to be considered as directly in the memory reclaim path, then
that means a huge amount of the kernel needs to be considered as "in
memory reclaim". i.e. it's not just this one XFS workqueue that is
going have this problem - it's any workqueue that can be waited on
by the incoming IO path.

For example, network filesystem might put the network stack directly
in the IO path. Which means if we then put loop on top of that
filesystems, various workqueues in the network stack may now need to
be considered as running under the memory reclaim path because of
the loop block device.

I don't know what the solution is, but if the fix is "xfs needs to
mark a workqueue that has nothing to do with memory reclaim as
WQ_MEM_RECLAIM because of the loop device" then we're talking about
playing workqueue whack-a-mole across the entire kernel forever
more....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 22:59                               ` Dave Chinner
@ 2022-03-22 23:32                                 ` Tetsuo Handa
  2022-03-22 23:50                                   ` Dave Chinner
  2022-03-23  0:09                                 ` Tejun Heo
  1 sibling, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2022-03-22 23:32 UTC (permalink / raw)
  To: Dave Chinner, Tejun Heo
  Cc: Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton, Jan Kara,
	Christoph Hellwig, linux-block, linux-xfs

On 2022/03/23 7:59, Dave Chinner wrote:
> I don't know what the solution is, but if the fix is "xfs needs to
> mark a workqueue that has nothing to do with memory reclaim as
> WQ_MEM_RECLAIM because of the loop device" then we're talking about
> playing workqueue whack-a-mole across the entire kernel forever
> more....

During an attempt to fix lockdep warning caused by calling destroy_workqueue()
when loop device's autoclear operation is triggered with disk->open_mutex held,
Christoph has proposed a patch which avoids calling destroy_workqueue() by using
a global WQ. But like demonstrated at
https://lkml.kernel.org/r/22b51922-30c6-e4ed-ace9-5620f877682c@i-love.sakura.ne.jp ,
I confirmed that the loop module needs to reserve one "struct task_struct" on each
loop device's WQ. And creating per loop device WQ with WQ_MEM_RECLAIM flag is the
only available method for reserving "struct task_struct" on each loop device's WQ.
That is, WQ_MEM_RECLAIM flag is required for not only surviving memory pressure
situation but also surviving max active limitation.

But like demonstrated at
https://lkml.kernel.org/r/61f41e56-3650-f0fc-9ef5-7e19fe84e6b7@I-love.SAKURA.ne.jp ,
creating per loop device WQ with WQ_MEM_RECLAIM flag without __WQ_LEGACY flag will hit
this "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps" warning.

And if WQs used by filesystem side do not want to use WQ_MEM_RECLAIM flag, the loop
module would have to abuse __WQ_LEGACY flag in order to suppress this warning.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 23:32                                 ` Tetsuo Handa
@ 2022-03-22 23:50                                   ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2022-03-22 23:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Tejun Heo, Dan Schatzberg, Jens Axboe, Ming Lei, Andrew Morton,
	Jan Kara, Christoph Hellwig, linux-block, linux-xfs

On Wed, Mar 23, 2022 at 08:32:49AM +0900, Tetsuo Handa wrote:
> On 2022/03/23 7:59, Dave Chinner wrote:
> > I don't know what the solution is, but if the fix is "xfs needs to
> > mark a workqueue that has nothing to do with memory reclaim as
> > WQ_MEM_RECLAIM because of the loop device" then we're talking about
> > playing workqueue whack-a-mole across the entire kernel forever
> > more....
....
> And if WQs used by filesystem side do not want to use WQ_MEM_RECLAIM flag, the loop
> module would have to abuse __WQ_LEGACY flag in order to suppress this warning.

I'm not talking about whether filesysetms want to use WQ_MEM_RECLAIM
or not, I'm commenting on the implicit depedency that the loop
device creates that forces the use of WQ_MEM_RECLAIM in all
downstream workqueues. That's what I'm asking about here - how far
does this implicit, undocumented dependency actually reach and how
do we communicate to all developers so that they know about this in
the future when creating new workqueues that might end up under the
loop device?

That's the problem here - unless the developer explicitly considers
and/or remembers this loopback dependency when adding a new
workqueue to a filesystem (or even as far down as network stacks)
then this problem is going to keep happening and we'll just have to
keep driving WQ_MEM_RECLAIM deeper into the stack.

Tejun stated that just marking all workqueues as WQ_MEM_RECLAIM as
being problematic in some situations, and I'm concerned that
resolving implicit dependencies are going to end up in that
situation anyway.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue
  2022-03-22 22:59                               ` Dave Chinner
  2022-03-22 23:32                                 ` Tetsuo Handa
@ 2022-03-23  0:09                                 ` Tejun Heo
  1 sibling, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2022-03-23  0:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Tetsuo Handa, Dan Schatzberg, Jens Axboe, Ming Lei,
	Andrew Morton, Jan Kara, Christoph Hellwig, linux-block,
	linux-xfs

Hello,

On Wed, Mar 23, 2022 at 09:59:14AM +1100, Dave Chinner wrote:
> The filesystem buffered write IO path isn't part of memory reclaim -
> it's a user IO path and I think most filesystems will treat it that
> way.

We can argue the semantics but anything in fs / io path which sit on write
path should be marked MEM_RECLAIM because they can be depended upon while
cleaning dirty pages. This isn't a layering problem or anything. It's just
what that flag is for.

> If the loop device IO mechanism means that every ->write_iter path
> needs to be considered as directly in the memory reclaim path, then
> that means a huge amount of the kernel needs to be considered as "in
> memory reclaim". i.e. it's not just this one XFS workqueue that is
> going have this problem - it's any workqueue that can be waited on
> by the incoming IO path.
> 
> For example, network filesystem might put the network stack directly
> in the IO path. Which means if we then put loop on top of that
> filesystems, various workqueues in the network stack may now need to
> be considered as running under the memory reclaim path because of
> the loop block device.
> 
> I don't know what the solution is, but if the fix is "xfs needs to
> mark a workqueue that has nothing to do with memory reclaim as
> WQ_MEM_RECLAIM because of the loop device" then we're talking about
> playing workqueue whack-a-mole across the entire kernel forever
> more....

Yeah, all those workqueues must be and most of them are already tagged with
MEM_RECLAIM. The network drivers are kinda painful and we *can* make them
conditional (on it sitting in the io path) if that ever becomes necessary
but the number hasn't been problematic till now.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-03-23  0:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e0a0bc94-e6de-b0e5-ee46-a76cd1570ea6@I-love.SAKURA.ne.jp>
     [not found] ` <YjNHzyTFHjh9v6k4@dschatzberg-fedora-PC0Y6AEN.dhcp.thefacebook.com>
     [not found]   ` <5542ef88-dcc9-0db5-7f01-ad5779d9bc07@I-love.SAKURA.ne.jp>
     [not found]     ` <YjS+Jr6QudSKMSGy@slm.duckdns.org>
2022-03-19  2:02       ` [PATCH] loop: add WQ_MEM_RECLAIM flag to per device workqueue Tetsuo Handa
2022-03-21 16:55         ` Tejun Heo
2022-03-21 22:53           ` Tetsuo Handa
2022-03-21 23:04             ` Tejun Heo
2022-03-21 23:17               ` Tetsuo Handa
2022-03-21 23:27                 ` Tejun Heo
2022-03-22  0:09                   ` Tetsuo Handa
2022-03-22 16:52                     ` Tejun Heo
2022-03-22 22:00                       ` Dave Chinner
2022-03-22 22:02                         ` Tejun Heo
2022-03-22 22:05                           ` Tetsuo Handa
2022-03-22 22:19                             ` Tejun Heo
2022-03-22 22:59                               ` Dave Chinner
2022-03-22 23:32                                 ` Tetsuo Handa
2022-03-22 23:50                                   ` Dave Chinner
2022-03-23  0:09                                 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).