* [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive
@ 2020-02-06 19:13 James Smart
2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart
2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart
0 siblings, 2 replies; 11+ messages in thread
From: James Smart @ 2020-02-06 19:13 UTC (permalink / raw)
To: linux-nvme; +Cc: maxg, James Smart, mark.wunderlich, hch, sagi
Delayed keep alive work is queued on system workqueue and may be cancelled
via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq.
Check_flush_dependency detects mismatched attributes between the work-queue
context used to cancel the keep alive work and system-wq. Specifically
system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
to cancel keep alive work have WQ_MEM_RECLAIM flag.
Example warning:
workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]
To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq.
However this creates a secondary concern where work and a request to cancel
that work may be in the same work queue - namely err_work in the rdma and
tcp transports, which will want to flush/cancel the keep alive work which
will now be on nvme_wq.
After reviewing the transports, it looks like err_work can be moved to
nvme_reset_wq. In fact that aligns them better with transition into
RESETTING and performing related reset work in nvme_reset_wq.
James Smart (2):
nvme: move keep alive processing to nvme-wq
nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq
drivers/nvme/host/core.c | 10 +++++-----
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/tcp.c | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
--
2.13.7
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] nvme: move keep alive processing to nvme-wq
2020-02-06 19:13 [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive James Smart
@ 2020-02-06 19:13 ` James Smart
2020-02-07 0:05 ` Sagi Grimberg
` (2 more replies)
2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart
1 sibling, 3 replies; 11+ messages in thread
From: James Smart @ 2020-02-06 19:13 UTC (permalink / raw)
To: linux-nvme; +Cc: James Smart, sagi, Nigel Kirkland, maxg, mark.wunderlich, hch
Delayed keep alive work is queued on system workqueue and may be cancelled
via nvme_stop_keep_alive from nvme-reset-wq, nvme-fc-wq or nvme-wq.
Check_flush_dependency detects mismatched attributes between the work-queue
context used to cancel the keep alive work and system-wq. Specifically
system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
to cancel keep alive work have WQ_MEM_RECLAIM flag.
Example warning:
workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]
To avoid the flags mismatch, move keep alive work to nvme-wq.
Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
drivers/nvme/host/core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6ec03507da68..303b308fde52 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -66,8 +66,8 @@ MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
* nvme_reset_wq - hosts nvme reset works
* nvme_delete_wq - hosts nvme delete works
*
- * nvme_wq will host works such are scan, aen handling, fw activation,
- * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq
+ * nvme_wq will host works such as scan, aen handling, fw activation,
+ * keep-alive, periodic reconnects etc. nvme_reset_wq
* runs reset works which also flush works hosted on nvme_wq for
* serialization purposes. nvme_delete_wq host controller deletion
* works which flush reset works for serialization.
@@ -974,7 +974,7 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status)
startka = true;
spin_unlock_irqrestore(&ctrl->lock, flags);
if (startka)
- schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
+ queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ);
}
static int nvme_keep_alive(struct nvme_ctrl *ctrl)
@@ -1004,7 +1004,7 @@ static void nvme_keep_alive_work(struct work_struct *work)
dev_dbg(ctrl->device,
"reschedule traffic based keep-alive timer\n");
ctrl->comp_seen = false;
- schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
+ queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ);
return;
}
@@ -1021,7 +1021,7 @@ static void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
if (unlikely(ctrl->kato == 0))
return;
- schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
+ queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ);
}
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
--
2.13.7
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq
2020-02-06 19:13 [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive James Smart
2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart
@ 2020-02-06 19:13 ` James Smart
2020-02-07 0:06 ` Sagi Grimberg
2020-02-10 17:04 ` Christoph Hellwig
1 sibling, 2 replies; 11+ messages in thread
From: James Smart @ 2020-02-06 19:13 UTC (permalink / raw)
To: linux-nvme; +Cc: James Smart, sagi, Nigel Kirkland, maxg, mark.wunderlich, hch
With keep alive processing being moved to nvme_wq, it potentially creates
a conflicting position with err_work also processing on nvme_wq and needing
to flush/stop keep alives.
To avoid issues, schedule err_work on nvme_reset_wq. It looks like this is
not only a good thing for keep alives, but also brings the transports in
line with the RESETTING state and processing work relative to RESETTING
on nvme_reset_wq.
This change is made to both nvme-rdma and nvme-tcp which have like code.
Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/tcp.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 2a47c6c5007e..3e85c5cacefd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1088,7 +1088,7 @@ static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
return;
- queue_work(nvme_wq, &ctrl->err_work);
+ queue_work(nvme_reset_wq, &ctrl->err_work);
}
static void nvme_rdma_wr_error(struct ib_cq *cq, struct ib_wc *wc,
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 6d43b23a0fc8..a28f9144954c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -422,7 +422,7 @@ static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl)
if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
return;
- queue_work(nvme_wq, &to_tcp_ctrl(ctrl)->err_work);
+ queue_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work);
}
static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
--
2.13.7
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq
2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart
@ 2020-02-07 0:05 ` Sagi Grimberg
2020-02-10 17:04 ` Christoph Hellwig
2020-02-11 3:00 ` Keith Busch
2 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2020-02-07 0:05 UTC (permalink / raw)
To: James Smart, linux-nvme; +Cc: Nigel Kirkland, maxg, mark.wunderlich, hch
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq
2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart
@ 2020-02-07 0:06 ` Sagi Grimberg
2020-02-10 17:04 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2020-02-07 0:06 UTC (permalink / raw)
To: James Smart, linux-nvme; +Cc: Nigel Kirkland, maxg, mark.wunderlich, hch
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq
2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart
2020-02-07 0:05 ` Sagi Grimberg
@ 2020-02-10 17:04 ` Christoph Hellwig
2020-02-11 3:00 ` Keith Busch
2 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2020-02-10 17:04 UTC (permalink / raw)
To: James Smart; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq
2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart
2020-02-07 0:06 ` Sagi Grimberg
@ 2020-02-10 17:04 ` Christoph Hellwig
2020-02-11 10:12 ` Max Gurtovoy
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2020-02-10 17:04 UTC (permalink / raw)
To: James Smart; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch
On Thu, Feb 06, 2020 at 11:13:42AM -0800, James Smart wrote:
> With keep alive processing being moved to nvme_wq, it potentially creates
> a conflicting position with err_work also processing on nvme_wq and needing
> to flush/stop keep alives.
>
> To avoid issues, schedule err_work on nvme_reset_wq. It looks like this is
> not only a good thing for keep alives, but also brings the transports in
> line with the RESETTING state and processing work relative to RESETTING
> on nvme_reset_wq.
>
> This change is made to both nvme-rdma and nvme-tcp which have like code.
Shouldn't we move both in one patch to create a regression in one
cycle while fixing the other one?
Otherwise this looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq
2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart
2020-02-07 0:05 ` Sagi Grimberg
2020-02-10 17:04 ` Christoph Hellwig
@ 2020-02-11 3:00 ` Keith Busch
2020-02-11 17:50 ` James Smart
2 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2020-02-11 3:00 UTC (permalink / raw)
To: James Smart; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch
On Thu, Feb 06, 2020 at 11:13:41AM -0800, James Smart wrote:
> Delayed keep alive work is queued on system workqueue and may be cancelled
> via nvme_stop_keep_alive from nvme-reset-wq, nvme-fc-wq or nvme-wq.
>
> Check_flush_dependency detects mismatched attributes between the work-queue
> context used to cancel the keep alive work and system-wq. Specifically
> system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
> to cancel keep alive work have WQ_MEM_RECLAIM flag.
>
> Example warning:
>
> workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
> is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]
>
> To avoid the flags mismatch, move keep alive work to nvme-wq.
>
> Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com>
> Signed-off-by: James Smart <jsmart2021@gmail.com>
Hi James,
I'd like to include this in the next 5.6-rc pull, but just want to
clarify attribution: the author should be the first sign-off, so
should I amend the author to Nigel or move his sign-off below yours?
Thanks,
Keith
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq
2020-02-10 17:04 ` Christoph Hellwig
@ 2020-02-11 10:12 ` Max Gurtovoy
2020-02-11 17:56 ` James Smart
0 siblings, 1 reply; 11+ messages in thread
From: Max Gurtovoy @ 2020-02-11 10:12 UTC (permalink / raw)
To: Christoph Hellwig, James Smart
Cc: Nigel Kirkland, mark.wunderlich, sagi, linux-nvme
On 2/10/2020 7:04 PM, Christoph Hellwig wrote:
> On Thu, Feb 06, 2020 at 11:13:42AM -0800, James Smart wrote:
>> With keep alive processing being moved to nvme_wq, it potentially creates
>> a conflicting position with err_work also processing on nvme_wq and needing
>> to flush/stop keep alives.
>>
>> To avoid issues, schedule err_work on nvme_reset_wq. It looks like this is
>> not only a good thing for keep alives, but also brings the transports in
>> line with the RESETTING state and processing work relative to RESETTING
>> on nvme_reset_wq.
>>
>> This change is made to both nvme-rdma and nvme-tcp which have like code.
> Shouldn't we move both in one patch to create a regression in one
> cycle while fixing the other one?
Yup, seems like these should be squashed.
James,
did you run some testing for RDMA/tcp that will trigger this code ?
Fio + port toggling for example ?
or brutally "killing" target during traffic that will cause timeouts to
IOs ?
>
> Otherwise this looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq
2020-02-11 3:00 ` Keith Busch
@ 2020-02-11 17:50 ` James Smart
0 siblings, 0 replies; 11+ messages in thread
From: James Smart @ 2020-02-11 17:50 UTC (permalink / raw)
To: Keith Busch; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch
On 2/10/2020 7:00 PM, Keith Busch wrote:
> On Thu, Feb 06, 2020 at 11:13:41AM -0800, James Smart wrote:
>> Delayed keep alive work is queued on system workqueue and may be cancelled
>> via nvme_stop_keep_alive from nvme-reset-wq, nvme-fc-wq or nvme-wq.
>>
>> Check_flush_dependency detects mismatched attributes between the work-queue
>> context used to cancel the keep alive work and system-wq. Specifically
>> system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
>> to cancel keep alive work have WQ_MEM_RECLAIM flag.
>>
>> Example warning:
>>
>> workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
>> is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]
>>
>> To avoid the flags mismatch, move keep alive work to nvme-wq.
>>
>> Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com>
>> Signed-off-by: James Smart <jsmart2021@gmail.com>
>
> Hi James,
>
> I'd like to include this in the next 5.6-rc pull, but just want to
> clarify attribution: the author should be the first sign-off, so
> should I amend the author to Nigel or move his sign-off below yours?
>
> Thanks,
> Keith
>
either of us as author is fine. so just pull it and have Nigel be author.
-- james
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq
2020-02-11 10:12 ` Max Gurtovoy
@ 2020-02-11 17:56 ` James Smart
0 siblings, 0 replies; 11+ messages in thread
From: James Smart @ 2020-02-11 17:56 UTC (permalink / raw)
To: Max Gurtovoy, Christoph Hellwig
Cc: Nigel Kirkland, mark.wunderlich, sagi, linux-nvme
On 2/11/2020 2:12 AM, Max Gurtovoy wrote:
>
> On 2/10/2020 7:04 PM, Christoph Hellwig wrote:
>> On Thu, Feb 06, 2020 at 11:13:42AM -0800, James Smart wrote:
>>> With keep alive processing being moved to nvme_wq, it potentially
>>> creates
>>> a conflicting position with err_work also processing on nvme_wq and
>>> needing
>>> to flush/stop keep alives.
>>>
>>> To avoid issues, schedule err_work on nvme_reset_wq. It looks like
>>> this is
>>> not only a good thing for keep alives, but also brings the transports in
>>> line with the RESETTING state and processing work relative to RESETTING
>>> on nvme_reset_wq.
>>>
>>> This change is made to both nvme-rdma and nvme-tcp which have like code.
>> Shouldn't we move both in one patch to create a regression in one
>> cycle while fixing the other one?
>
> Yup, seems like these should be squashed.
note: see the v2 posted yesterday.
>
> James,
>
> did you run some testing for RDMA/tcp that will trigger this code ?
The testing of the keepalive movement was all in FC. FC already has it's
calls from either nvme_reset_wq or nvme_delete_wq.
For RDMA/TCP - it was not tested. It was by code inspection and looking
at who calls the keepalive cancellation calls.
-- james
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-02-11 18:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-06 19:13 [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive James Smart
2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart
2020-02-07 0:05 ` Sagi Grimberg
2020-02-10 17:04 ` Christoph Hellwig
2020-02-11 3:00 ` Keith Busch
2020-02-11 17:50 ` James Smart
2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart
2020-02-07 0:06 ` Sagi Grimberg
2020-02-10 17:04 ` Christoph Hellwig
2020-02-11 10:12 ` Max Gurtovoy
2020-02-11 17:56 ` James Smart
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.