linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
@ 2021-04-12 12:23 Daniel Wagner
  2021-04-12 12:31 ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Wagner @ 2021-04-12 12:23 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Steve Wise, Jason Gunthorpe, Leon Romanovsky,
	Potnuri Bharat Teja, Daniel Wagner

Drop the WQ_MEM_RECLAIM flag as it is not needed and introduces
warnings.

The documentation says "all wq which might be used in the memory
reclaim paths MUST have this flag set. The wq is guaranteed to have at
least one execution context regardless of memory pressure."

By setting WQ_MEM_RECLAIM the threads are ready be running during
early init. The claim it guarantees at least one execution context
regardless of memory pressure is not supported by the implementation.

As the nvme core does not depend on early init we can remove the
WQ_MEM_RECLAIM flag. This resolves a warning in the rdma path:

  WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work [nvme_rdma]
  is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]

There were several attempts to address these kind of warnings and but
it still persist:

  39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")
  cb93e597779e ("cm: Don't allocate ib_cm workqueue with WQ_MEM_RECLAIM")
  c669ccdc50c2 ("nvme: queue ns scanning and async request from nvme_wq")

Also a review of the nvme jobs shows nvme_wq and nvme_reset_wq gets
jobs posted which do memory allocation:

 - nvme_wq

   nvme_scan_work()
     nvme_scan_ns_list()
       ns_list = kzalloc(..., GFP_KERNEL);
   [...]

 - nvme_reset_wq

   nvme_reset_work()
     nvme_pci_configure_admin_queue()
       nvme_alloc_queue()
         dma_alloc_coherent(..., GFP_KERNEL)

   nvme_rdma_reset_ctrl_work()
     nvme_rdma_setup_ctrl()
       (see above)

   nvme_reset_ctrl_work()
     nvme_tcp_setup_ctrl()
       nvme_tcp_configure_admin_queue()
         nvme_tcp_alloc_queue()
           sock_create()
   [...]

nvme_delete_wq doesn't run any job which allocates memory, the system
still depends on nvme_wq/nvme_reset_wq making progress.

 - nvme_delete_wq

   nvme_fc_ctrl_connectivity_loss()
     nvme_delete_ctrl()
   nvme_fc_unregister_remoteport
     nvme_delete_ctrl()
   nvme_fc_reconnect_or_delete()
     nvme_delete_ctrl()
   nvme_rdma_reconnect_or_remove()
     nvme_delete_ctrl()
   nvme_tcp_reconnect_or_remove()
     nvme_delete_ctrl()

   nvme_delete_ctrl_work()
     flush_work(&ctrl->reset_work)
     flush_work(&ctrl->async_event_work)
     cancel_work_sync(&ctrl->fw_act_work)
     ...

That means we either have WQ_MEM_RECLAIM set on all wq or non.

Link: https://patchwork.kernel.org/project/linux-rdma/patch/5f5a1e4e90f3625cea57ffa79fc0e5bcb7efe09d.1548963371.git.swise@opengridcomputing.com/

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 11fca6459812..ab0d00ddf03f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4810,17 +4810,17 @@ static int __init nvme_core_init(void)
 	_nvme_check_size();
 
 	nvme_wq = alloc_workqueue("nvme-wq",
-			WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
+			WQ_UNBOUND | WQ_SYSFS, 0);
 	if (!nvme_wq)
 		goto out;
 
 	nvme_reset_wq = alloc_workqueue("nvme-reset-wq",
-			WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
+			WQ_UNBOUND | WQ_SYSFS, 0);
 	if (!nvme_reset_wq)
 		goto destroy_wq;
 
 	nvme_delete_wq = alloc_workqueue("nvme-delete-wq",
-			WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
+			WQ_UNBOUND | WQ_SYSFS, 0);
 	if (!nvme_delete_wq)
 		goto destroy_reset_wq;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
  2021-04-12 12:23 [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues Daniel Wagner
@ 2021-04-12 12:31 ` Jason Gunthorpe
  2021-04-12 12:49   ` Daniel Wagner
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-04-12 12:31 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Steve Wise, Leon Romanovsky,
	Potnuri Bharat Teja

On Mon, Apr 12, 2021 at 02:23:30PM +0200, Daniel Wagner wrote:
> Drop the WQ_MEM_RECLAIM flag as it is not needed and introduces
> warnings.
> 
> The documentation says "all wq which might be used in the memory
> reclaim paths MUST have this flag set. The wq is guaranteed to have at
> least one execution context regardless of memory pressure."
> 
> By setting WQ_MEM_RECLAIM the threads are ready be running during
> early init. The claim it guarantees at least one execution context
> regardless of memory pressure is not supported by the implementation.
> 
> As the nvme core does not depend on early init we can remove the
> WQ_MEM_RECLAIM flag. This resolves a warning in the rdma path:

What does early init have to do with WQ_MEM_RECLAIM?

WQ_MEM_RECLIAM is required when any thread in a reclaim context goes
to sleep waiting for a WQ to complete. For instance by calling
flush_workqueue() or many other things.

The sleeping reclaim context must be guarenteed that the work can be
completed without the work, work queue machinery, or anything the work
has become interconnected with, recursing back into a reclaim.

IIRC the issue here was some destroy or flush work in some error
condition that happened to be under a reclaim context?

I don't see the kind of analysis I'd expect in this commit message to
justify this change.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
  2021-04-12 12:31 ` Jason Gunthorpe
@ 2021-04-12 12:49   ` Daniel Wagner
  2021-04-12 13:04     ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Wagner @ 2021-04-12 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Steve Wise, Leon Romanovsky,
	Potnuri Bharat Teja

Hi Jason,

On Mon, Apr 12, 2021 at 09:31:49AM -0300, Jason Gunthorpe wrote:
> What does early init have to do with WQ_MEM_RECLAIM?

40c17f75dfa9 ("workqueue: allow WQ_MEM_RECLAIM on early init workqueues")

    Workqueues can be created early during boot before workqueue subsystem
    in fully online - work items are queued waiting for later full
    initialization.  However, early init wasn't supported for
    WQ_MEM_RECLAIM workqueues causing unnecessary annoyances for a subset
    of users.  Expand early init support to include WQ_MEM_RECLAIM
    workqueues.

That's the connection between WQ_MEM_RECLAIM and early init.

> WQ_MEM_RECLIAM is required when any thread in a reclaim context goes
> to sleep waiting for a WQ to complete. For instance by calling
> flush_workqueue() or many other things.
> 
> The sleeping reclaim context must be guarenteed that the work can be
> completed without the work, work queue machinery, or anything the work
> has become interconnected with, recursing back into a reclaim.
> 
> IIRC the issue here was some destroy or flush work in some error
> condition that happened to be under a reclaim context?

I understand what you are saying and I would totally agree with you but
where is the code for this?

I've grepped through the code and didn't find anything which supports
the guarantee claim. Neither mm nor schedule seems to care about this
flag nor workqueue.c (except the early init bits). Or I must miss
something.

Thanks,
Daniel



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
  2021-04-12 12:49   ` Daniel Wagner
@ 2021-04-12 13:04     ` Jason Gunthorpe
  2021-04-13  8:54       ` Daniel Wagner
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-04-12 13:04 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Steve Wise, Leon Romanovsky,
	Potnuri Bharat Teja

On Mon, Apr 12, 2021 at 02:49:09PM +0200, Daniel Wagner wrote:

> I've grepped through the code and didn't find anything which supports
> the guarantee claim. Neither mm nor schedule seems to care about this
> flag nor workqueue.c (except the early init bits). Or I must miss
> something.

It is pretty complicated, but the WQ_MEM_RECLAIM preallocates a thread:

static int init_rescuer(struct workqueue_struct *wq)
{
	if (!(wq->flags & WQ_MEM_RECLAIM))
		return 0;

	rescuer = alloc_worker(NUMA_NO_NODE);

This comment explains it:

 * Workqueue rescuer thread function.  There's one rescuer for each
 * workqueue which has WQ_MEM_RECLAIM set.
 *
 * Regular work processing on a pool may block trying to create a new
 * worker which uses GFP_KERNEL allocation which has slight chance of
 * developing into deadlock if some works currently on the same queue
 * need to be processed to satisfy the GFP_KERNEL allocation.  This is
 * the problem rescuer solves.
 *
 * When such condition is possible, the pool summons rescuers of all
 * workqueues which have works queued on the pool and let them process
 * those works so that forward progress can be guaranteed.
 *
 * This should happen rarely.

Basically the allocation of importance in the workqueue is assigning a
worker, so pre-allocating a worker ensures the work can continue to
progress without becoming dependent on allocations.

This is why work under the WQ_MEM_RECLAIM cannot recurse back into the
allocator as it would get a rescurer thread stuck at a point when all
other threads are already stuck.

To remove WQ_MEM_RECLAIM you have to make assertions about the calling
contexts and blocking contexts of the workqueue, not what the work
itself is doing.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
  2021-04-12 13:04     ` Jason Gunthorpe
@ 2021-04-13  8:54       ` Daniel Wagner
  2021-04-13 13:35         ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Wagner @ 2021-04-13  8:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Steve Wise, Leon Romanovsky,
	Potnuri Bharat Teja

On Mon, Apr 12, 2021 at 10:04:02AM -0300, Jason Gunthorpe wrote:
> Basically the allocation of importance in the workqueue is assigning a
> worker, so pre-allocating a worker ensures the work can continue to
> progress without becoming dependent on allocations.

Ah okay, got it. I didn't really understood this part. So the
WQ_MEM_RECLAIM is 'just' avoiding a new worker creation.

> This is why work under the WQ_MEM_RECLAIM cannot recurse back into the
> allocator as it would get a rescurer thread stuck at a point when all
> other threads are already stuck.
>
> To remove WQ_MEM_RECLAIM you have to make assertions about the calling
> contexts and blocking contexts of the workqueue, not what the work
> itself is doing.

Hmm, I am struggling with your last statement. If a worker does an
allocation it might block. I understand this is something which a worker
in a WQ_MEM_RECLAIM context is not allowed to do.

My aim is still to get rid of the warning triggered by the rdma
code.

Anyway, thanks for explaining.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
  2021-04-13  8:54       ` Daniel Wagner
@ 2021-04-13 13:35         ` Jason Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2021-04-13 13:35 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Steve Wise, Leon Romanovsky,
	Potnuri Bharat Teja

On Tue, Apr 13, 2021 at 10:54:04AM +0200, Daniel Wagner wrote:

> Hmm, I am struggling with your last statement. If a worker does an
> allocation it might block. I understand this is something which a worker
> in a WQ_MEM_RECLAIM context is not allowed to do.
> 
> My aim is still to get rid of the warning triggered by the rdma
> code.

The WQ_MEM_RECLAIM is placed on a workqueue not because of what is
*inside* the work, but because of what is *waiting* on the work.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-13 13:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-12 12:23 [PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues Daniel Wagner
2021-04-12 12:31 ` Jason Gunthorpe
2021-04-12 12:49   ` Daniel Wagner
2021-04-12 13:04     ` Jason Gunthorpe
2021-04-13  8:54       ` Daniel Wagner
2021-04-13 13:35         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).