* [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive @ 2020-02-06 19:13 James Smart 2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart 2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart 0 siblings, 2 replies; 11+ messages in thread From: James Smart @ 2020-02-06 19:13 UTC (permalink / raw) To: linux-nvme; +Cc: maxg, James Smart, mark.wunderlich, hch, sagi Delayed keep alive work is queued on system workqueue and may be cancelled via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq. Check_flush_dependency detects mismatched attributes between the work-queue context used to cancel the keep alive work and system-wq. Specifically system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used to cancel keep alive work have WQ_MEM_RECLAIM flag. Example warning: workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq. However this creates a secondary concern where work and a request to cancel that work may be in the same work queue - namely err_work in the rdma and tcp transports, which will want to flush/cancel the keep alive work which will now be on nvme_wq. After reviewing the transports, it looks like err_work can be moved to nvme_reset_wq. In fact that aligns them better with transition into RESETTING and performing related reset work in nvme_reset_wq. James Smart (2): nvme: move keep alive processing to nvme-wq nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq drivers/nvme/host/core.c | 10 +++++----- drivers/nvme/host/rdma.c | 2 +- drivers/nvme/host/tcp.c | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) -- 2.13.7 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] nvme: move keep alive processing to nvme-wq 2020-02-06 19:13 [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive James Smart @ 2020-02-06 19:13 ` James Smart 2020-02-07 0:05 ` Sagi Grimberg ` (2 more replies) 2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart 1 sibling, 3 replies; 11+ messages in thread From: James Smart @ 2020-02-06 19:13 UTC (permalink / raw) To: linux-nvme; +Cc: James Smart, sagi, Nigel Kirkland, maxg, mark.wunderlich, hch Delayed keep alive work is queued on system workqueue and may be cancelled via nvme_stop_keep_alive from nvme-reset-wq, nvme-fc-wq or nvme-wq. Check_flush_dependency detects mismatched attributes between the work-queue context used to cancel the keep alive work and system-wq. Specifically system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used to cancel keep alive work have WQ_MEM_RECLAIM flag. Example warning: workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] To avoid the flags mismatch, move keep alive work to nvme-wq. Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> --- drivers/nvme/host/core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 6ec03507da68..303b308fde52 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -66,8 +66,8 @@ MODULE_PARM_DESC(streams, "turn on support for Streams write directives"); * nvme_reset_wq - hosts nvme reset works * nvme_delete_wq - hosts nvme delete works * - * nvme_wq will host works such are scan, aen handling, fw activation, - * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq + * nvme_wq will host works such as scan, aen handling, fw activation, + * keep-alive, periodic reconnects etc. nvme_reset_wq * runs reset works which also flush works hosted on nvme_wq for * serialization purposes. nvme_delete_wq host controller deletion * works which flush reset works for serialization. @@ -974,7 +974,7 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status) startka = true; spin_unlock_irqrestore(&ctrl->lock, flags); if (startka) - schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ); + queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ); } static int nvme_keep_alive(struct nvme_ctrl *ctrl) @@ -1004,7 +1004,7 @@ static void nvme_keep_alive_work(struct work_struct *work) dev_dbg(ctrl->device, "reschedule traffic based keep-alive timer\n"); ctrl->comp_seen = false; - schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ); + queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ); return; } @@ -1021,7 +1021,7 @@ static void nvme_start_keep_alive(struct nvme_ctrl *ctrl) if (unlikely(ctrl->kato == 0)) return; - schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ); + queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ); } void nvme_stop_keep_alive(struct nvme_ctrl *ctrl) -- 2.13.7 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq 2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart @ 2020-02-07 0:05 ` Sagi Grimberg 2020-02-10 17:04 ` Christoph Hellwig 2020-02-11 3:00 ` Keith Busch 2 siblings, 0 replies; 11+ messages in thread From: Sagi Grimberg @ 2020-02-07 0:05 UTC (permalink / raw) To: James Smart, linux-nvme; +Cc: Nigel Kirkland, maxg, mark.wunderlich, hch Reviewed-by: Sagi Grimberg <sagi@grimberg.me> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq 2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart 2020-02-07 0:05 ` Sagi Grimberg @ 2020-02-10 17:04 ` Christoph Hellwig 2020-02-11 3:00 ` Keith Busch 2 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2020-02-10 17:04 UTC (permalink / raw) To: James Smart; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq 2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart 2020-02-07 0:05 ` Sagi Grimberg 2020-02-10 17:04 ` Christoph Hellwig @ 2020-02-11 3:00 ` Keith Busch 2020-02-11 17:50 ` James Smart 2 siblings, 1 reply; 11+ messages in thread From: Keith Busch @ 2020-02-11 3:00 UTC (permalink / raw) To: James Smart; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch On Thu, Feb 06, 2020 at 11:13:41AM -0800, James Smart wrote: > Delayed keep alive work is queued on system workqueue and may be cancelled > via nvme_stop_keep_alive from nvme-reset-wq, nvme-fc-wq or nvme-wq. > > Check_flush_dependency detects mismatched attributes between the work-queue > context used to cancel the keep alive work and system-wq. Specifically > system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used > to cancel keep alive work have WQ_MEM_RECLAIM flag. > > Example warning: > > workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] > is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] > > To avoid the flags mismatch, move keep alive work to nvme-wq. > > Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com> > Signed-off-by: James Smart <jsmart2021@gmail.com> Hi James, I'd like to include this in the next 5.6-rc pull, but just want to clarify attribution: the author should be the first sign-off, so should I amend the author to Nigel or move his sign-off below yours? Thanks, Keith _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] nvme: move keep alive processing to nvme-wq 2020-02-11 3:00 ` Keith Busch @ 2020-02-11 17:50 ` James Smart 0 siblings, 0 replies; 11+ messages in thread From: James Smart @ 2020-02-11 17:50 UTC (permalink / raw) To: Keith Busch; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch On 2/10/2020 7:00 PM, Keith Busch wrote: > On Thu, Feb 06, 2020 at 11:13:41AM -0800, James Smart wrote: >> Delayed keep alive work is queued on system workqueue and may be cancelled >> via nvme_stop_keep_alive from nvme-reset-wq, nvme-fc-wq or nvme-wq. >> >> Check_flush_dependency detects mismatched attributes between the work-queue >> context used to cancel the keep alive work and system-wq. Specifically >> system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used >> to cancel keep alive work have WQ_MEM_RECLAIM flag. >> >> Example warning: >> >> workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] >> is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] >> >> To avoid the flags mismatch, move keep alive work to nvme-wq. >> >> Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com> >> Signed-off-by: James Smart <jsmart2021@gmail.com> > > Hi James, > > I'd like to include this in the next 5.6-rc pull, but just want to > clarify attribution: the author should be the first sign-off, so > should I amend the author to Nigel or move his sign-off below yours? > > Thanks, > Keith > either of us as author is fine. so just pull it and have Nigel be author. -- james _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq 2020-02-06 19:13 [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive James Smart 2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart @ 2020-02-06 19:13 ` James Smart 2020-02-07 0:06 ` Sagi Grimberg 2020-02-10 17:04 ` Christoph Hellwig 1 sibling, 2 replies; 11+ messages in thread From: James Smart @ 2020-02-06 19:13 UTC (permalink / raw) To: linux-nvme; +Cc: James Smart, sagi, Nigel Kirkland, maxg, mark.wunderlich, hch With keep alive processing being moved to nvme_wq, it potentially creates a conflicting position with err_work also processing on nvme_wq and needing to flush/stop keep alives. To avoid issues, schedule err_work on nvme_reset_wq. It looks like this is not only a good thing for keep alives, but also brings the transports in line with the RESETTING state and processing work relative to RESETTING on nvme_reset_wq. This change is made to both nvme-rdma and nvme-tcp which have like code. Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> --- drivers/nvme/host/rdma.c | 2 +- drivers/nvme/host/tcp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 2a47c6c5007e..3e85c5cacefd 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1088,7 +1088,7 @@ static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl) if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) return; - queue_work(nvme_wq, &ctrl->err_work); + queue_work(nvme_reset_wq, &ctrl->err_work); } static void nvme_rdma_wr_error(struct ib_cq *cq, struct ib_wc *wc, diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 6d43b23a0fc8..a28f9144954c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -422,7 +422,7 @@ static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl) if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) return; - queue_work(nvme_wq, &to_tcp_ctrl(ctrl)->err_work); + queue_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work); } static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, -- 2.13.7 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq 2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart @ 2020-02-07 0:06 ` Sagi Grimberg 2020-02-10 17:04 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Sagi Grimberg @ 2020-02-07 0:06 UTC (permalink / raw) To: James Smart, linux-nvme; +Cc: Nigel Kirkland, maxg, mark.wunderlich, hch Reviewed-by: Sagi Grimberg <sagi@grimberg.me> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq 2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart 2020-02-07 0:06 ` Sagi Grimberg @ 2020-02-10 17:04 ` Christoph Hellwig 2020-02-11 10:12 ` Max Gurtovoy 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2020-02-10 17:04 UTC (permalink / raw) To: James Smart; +Cc: sagi, linux-nvme, Nigel Kirkland, maxg, mark.wunderlich, hch On Thu, Feb 06, 2020 at 11:13:42AM -0800, James Smart wrote: > With keep alive processing being moved to nvme_wq, it potentially creates > a conflicting position with err_work also processing on nvme_wq and needing > to flush/stop keep alives. > > To avoid issues, schedule err_work on nvme_reset_wq. It looks like this is > not only a good thing for keep alives, but also brings the transports in > line with the RESETTING state and processing work relative to RESETTING > on nvme_reset_wq. > > This change is made to both nvme-rdma and nvme-tcp which have like code. Shouldn't we move both in one patch to create a regression in one cycle while fixing the other one? Otherwise this looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq 2020-02-10 17:04 ` Christoph Hellwig @ 2020-02-11 10:12 ` Max Gurtovoy 2020-02-11 17:56 ` James Smart 0 siblings, 1 reply; 11+ messages in thread From: Max Gurtovoy @ 2020-02-11 10:12 UTC (permalink / raw) To: Christoph Hellwig, James Smart Cc: Nigel Kirkland, mark.wunderlich, sagi, linux-nvme On 2/10/2020 7:04 PM, Christoph Hellwig wrote: > On Thu, Feb 06, 2020 at 11:13:42AM -0800, James Smart wrote: >> With keep alive processing being moved to nvme_wq, it potentially creates >> a conflicting position with err_work also processing on nvme_wq and needing >> to flush/stop keep alives. >> >> To avoid issues, schedule err_work on nvme_reset_wq. It looks like this is >> not only a good thing for keep alives, but also brings the transports in >> line with the RESETTING state and processing work relative to RESETTING >> on nvme_reset_wq. >> >> This change is made to both nvme-rdma and nvme-tcp which have like code. > Shouldn't we move both in one patch to create a regression in one > cycle while fixing the other one? Yup, seems like these should be squashed. James, did you run some testing for RDMA/tcp that will trigger this code ? Fio + port toggling for example ? or brutally "killing" target during traffic that will cause timeouts to IOs ? > > Otherwise this looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq 2020-02-11 10:12 ` Max Gurtovoy @ 2020-02-11 17:56 ` James Smart 0 siblings, 0 replies; 11+ messages in thread From: James Smart @ 2020-02-11 17:56 UTC (permalink / raw) To: Max Gurtovoy, Christoph Hellwig Cc: Nigel Kirkland, mark.wunderlich, sagi, linux-nvme On 2/11/2020 2:12 AM, Max Gurtovoy wrote: > > On 2/10/2020 7:04 PM, Christoph Hellwig wrote: >> On Thu, Feb 06, 2020 at 11:13:42AM -0800, James Smart wrote: >>> With keep alive processing being moved to nvme_wq, it potentially >>> creates >>> a conflicting position with err_work also processing on nvme_wq and >>> needing >>> to flush/stop keep alives. >>> >>> To avoid issues, schedule err_work on nvme_reset_wq. It looks like >>> this is >>> not only a good thing for keep alives, but also brings the transports in >>> line with the RESETTING state and processing work relative to RESETTING >>> on nvme_reset_wq. >>> >>> This change is made to both nvme-rdma and nvme-tcp which have like code. >> Shouldn't we move both in one patch to create a regression in one >> cycle while fixing the other one? > > Yup, seems like these should be squashed. note: see the v2 posted yesterday. > > James, > > did you run some testing for RDMA/tcp that will trigger this code ? The testing of the keepalive movement was all in FC. FC already has it's calls from either nvme_reset_wq or nvme_delete_wq. For RDMA/TCP - it was not tested. It was by code inspection and looking at who calls the keepalive cancellation calls. -- james _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-02-11 18:20 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-06 19:13 [PATCH 0/2] nvme: prevent warning triggered by nvme_stop_keep_alive James Smart 2020-02-06 19:13 ` [PATCH 1/2] nvme: move keep alive processing to nvme-wq James Smart 2020-02-07 0:05 ` Sagi Grimberg 2020-02-10 17:04 ` Christoph Hellwig 2020-02-11 3:00 ` Keith Busch 2020-02-11 17:50 ` James Smart 2020-02-06 19:13 ` [PATCH 2/2] nvme-rdma/nvme-tcp: Move err_work to nvme_reset_wq James Smart 2020-02-07 0:06 ` Sagi Grimberg 2020-02-10 17:04 ` Christoph Hellwig 2020-02-11 10:12 ` Max Gurtovoy 2020-02-11 17:56 ` James Smart
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.