linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM"
@ 2023-05-23 15:54 Maurizio Lombardi
  2023-05-23 18:28 ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Maurizio Lombardi @ 2023-05-23 15:54 UTC (permalink / raw)
  To: jgg; +Cc: leon, linux-rdma, linux-nvme, parav

when nvme_rdma_reconnect_ctrl_work() fails, it flushes
the ib_addr:process_one_req() work but the latter is enqueued
on addr_wq which has been marked as "!WQ_MEM_RECLAIM".

workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work
[nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]

Call Trace:
__flush_work.isra.0+0xbf/0x230
__cancel_work_timer+0x103/0x190
? rdma_resolve_ip+0x257/0x2f0 [ib_core]
? __dev_printk+0x2d/0x69
rdma_addr_cancel+0x8e/0xc0 [ib_core]
_destroy_id+0x1b/0x270 [rdma_cm]
nvme_rdma_alloc_queue.cold+0x4b/0x5c [nvme_rdma]
nvme_rdma_configure_admin_queue+0x1e/0x2f0 [nvme_rdma]
nvme_rdma_setup_ctrl+0x1e/0x220 [nvme_rdma]
nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]

This has been introduced by
commit 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")

Since nvme_rdma_reconnect_work() needs to flush process_one_req(), we
have to restore the WQ_MEM_RECLAIM flag on the addr_wq workqueue.

This reverts commit 39baf10310e6669564a485b55267fae70a4e44ae.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/infiniband/core/addr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f253295795f0..5c36d01ebf0b 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -872,7 +872,7 @@ static struct notifier_block nb = {
 
 int addr_init(void)
 {
-	addr_wq = alloc_ordered_workqueue("ib_addr", 0);
+	addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM);
 	if (!addr_wq)
 		return -ENOMEM;
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM"
  2023-05-23 15:54 [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM" Maurizio Lombardi
@ 2023-05-23 18:28 ` Leon Romanovsky
  2023-05-29 15:12   ` Maurizio Lombardi
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2023-05-23 18:28 UTC (permalink / raw)
  To: Maurizio Lombardi; +Cc: jgg, linux-rdma, linux-nvme, parav

On Tue, May 23, 2023 at 05:54:08PM +0200, Maurizio Lombardi wrote:
> when nvme_rdma_reconnect_ctrl_work() fails, it flushes
> the ib_addr:process_one_req() work but the latter is enqueued
> on addr_wq which has been marked as "!WQ_MEM_RECLAIM".
> 
> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work
> [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]

And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really
needed.

Thanks


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM"
  2023-05-23 18:28 ` Leon Romanovsky
@ 2023-05-29 15:12   ` Maurizio Lombardi
       [not found]     ` <ZHebeWlpn68Xa1Hd@ziepe.ca>
  0 siblings, 1 reply; 6+ messages in thread
From: Maurizio Lombardi @ 2023-05-29 15:12 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: jgg, linux-rdma, linux-nvme, parav, Sagi Grimberg

út 23. 5. 2023 v 20:28 odesílatel Leon Romanovsky <leon@kernel.org> napsal:
> > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work
> > [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
>
> And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really
> needed.

Adding Sagi Grimberg to cc, he probably knows and can explain it better than me.

Maurizio



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM"
       [not found]     ` <ZHebeWlpn68Xa1Hd@ziepe.ca>
@ 2023-06-05 23:01       ` Sagi Grimberg
  2023-06-06 17:45         ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2023-06-05 23:01 UTC (permalink / raw)
  To: Jason Gunthorpe, Maurizio Lombardi
  Cc: Leon Romanovsky, linux-rdma, linux-nvme, parav


>>>> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work
>>>> [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
>>>
>>> And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really
>>> needed.
>>
>> Adding Sagi Grimberg to cc, he probably knows and can explain it better than me.
> 
> We already allocate so much memory on these paths it is pretty
> nonsense to claim they are a reclaim context. One allocation on the WQ
> is not going to be the problem.
> 
> Probably this nvme stuff should not be re-using a reclaim marke dWQ
> for memory allocating work like this, it is kind of nonsensical.

A controller reset runs on this workqueue, which should succeed to allow
for pages to be flushed to the nvme disk. So I'd say its kind of
essential that this sequence has a rescuer thread.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM"
  2023-06-05 23:01       ` Sagi Grimberg
@ 2023-06-06 17:45         ` Jason Gunthorpe
  2023-06-07 14:59           ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2023-06-06 17:45 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Maurizio Lombardi, Leon Romanovsky, linux-rdma, linux-nvme, parav

On Tue, Jun 06, 2023 at 02:01:04AM +0300, Sagi Grimberg wrote:
> 
> > > > > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work
> > > > > [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
> > > > 
> > > > And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really
> > > > needed.
> > > 
> > > Adding Sagi Grimberg to cc, he probably knows and can explain it better than me.
> > 
> > We already allocate so much memory on these paths it is pretty
> > nonsense to claim they are a reclaim context. One allocation on the WQ
> > is not going to be the problem.
> > 
> > Probably this nvme stuff should not be re-using a reclaim marke dWQ
> > for memory allocating work like this, it is kind of nonsensical.
> 
> A controller reset runs on this workqueue, which should succeed to allow
> for pages to be flushed to the nvme disk. So I'd say its kind of
> essential that this sequence has a rescuer thread.

So don't run the CM stuff on the same WQ, go to another one without
the reclaim mark?

Jason


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM"
  2023-06-06 17:45         ` Jason Gunthorpe
@ 2023-06-07 14:59           ` Sagi Grimberg
  0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2023-06-07 14:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Maurizio Lombardi, Leon Romanovsky, linux-rdma, linux-nvme, parav


>>>>>> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work
>>>>>> [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
>>>>>
>>>>> And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really
>>>>> needed.
>>>>
>>>> Adding Sagi Grimberg to cc, he probably knows and can explain it better than me.
>>>
>>> We already allocate so much memory on these paths it is pretty
>>> nonsense to claim they are a reclaim context. One allocation on the WQ
>>> is not going to be the problem.
>>>
>>> Probably this nvme stuff should not be re-using a reclaim marke dWQ
>>> for memory allocating work like this, it is kind of nonsensical.
>>
>> A controller reset runs on this workqueue, which should succeed to allow
>> for pages to be flushed to the nvme disk. So I'd say its kind of
>> essential that this sequence has a rescuer thread.
> 
> So don't run the CM stuff on the same WQ, go to another one without
> the reclaim mark?

That is not trivial. teardown works won't need a rescuer, but
they will need to fence async work elements that are a part
of the bringup that do need a rescuer (for example a scan work).

So it requires more thought how it would be possible to untangle
any dependency between work elements that make use of a rescuer
and others that don't, and vice-versa.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-07 14:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 15:54 [PATCH] Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM" Maurizio Lombardi
2023-05-23 18:28 ` Leon Romanovsky
2023-05-29 15:12   ` Maurizio Lombardi
     [not found]     ` <ZHebeWlpn68Xa1Hd@ziepe.ca>
2023-06-05 23:01       ` Sagi Grimberg
2023-06-06 17:45         ` Jason Gunthorpe
2023-06-07 14:59           ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).