[PATCH 0/5] blk-mq: wait until completed req's complete fn is run

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] blk-mq: wait until completed req's complete fn is run
@ 2019-07-22  5:39 ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-nvme, Ming Lei, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

Hi,

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but never provide any way to synchronize the request's complete
fn.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Fixes this issue by waitting until completed req's complete fn is run.

Thanks,
Ming

Ming Lei (5):
  blk-mq: introduce blk_mq_request_completed()
  blk-mq: introduce blk_mq_tagset_wait_completed_request()
  nvme: don't abort completed request in nvme_cancel_request
  nvme: wait until all completed request's complete fn is called
  blk-mq: remove blk_mq_complete_request_sync

 block/blk-mq-tag.c         | 32 ++++++++++++++++++++++++++++++++
 block/blk-mq.c             | 13 ++++++-------
 drivers/nvme/host/core.c   |  6 +++++-
 drivers/nvme/host/pci.c    |  2 ++
 drivers/nvme/host/rdma.c   |  8 ++++++--
 drivers/nvme/host/tcp.c    |  8 ++++++--
 drivers/nvme/target/loop.c |  2 ++
 include/linux/blk-mq.h     |  3 ++-
 8 files changed, 61 insertions(+), 13 deletions(-)

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>

-- 
2.20.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/5] blk-mq: wait until completed req's complete fn is run
@ 2019-07-22  5:39 ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)


Hi,

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but never provide any way to synchronize the request's complete
fn.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Fixes this issue by waitting until completed req's complete fn is run.

Thanks,
Ming

Ming Lei (5):
  blk-mq: introduce blk_mq_request_completed()
  blk-mq: introduce blk_mq_tagset_wait_completed_request()
  nvme: don't abort completed request in nvme_cancel_request
  nvme: wait until all completed request's complete fn is called
  blk-mq: remove blk_mq_complete_request_sync

 block/blk-mq-tag.c         | 32 ++++++++++++++++++++++++++++++++
 block/blk-mq.c             | 13 ++++++-------
 drivers/nvme/host/core.c   |  6 +++++-
 drivers/nvme/host/pci.c    |  2 ++
 drivers/nvme/host/rdma.c   |  8 ++++++--
 drivers/nvme/host/tcp.c    |  8 ++++++--
 drivers/nvme/target/loop.c |  2 ++
 include/linux/blk-mq.h     |  3 ++-
 8 files changed, 61 insertions(+), 13 deletions(-)

Cc: Max Gurtovoy <maxg at mellanox.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>

-- 
2.20.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/5] blk-mq: introduce blk_mq_request_completed()
  2019-07-22  5:39 ` Ming Lei
@ 2019-07-22  5:39   ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-nvme, Ming Lei, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.

So introduce it.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 6 ++++++
 include/linux/blk-mq.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b038ec680e84..e1d0b4567388 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -665,6 +665,12 @@ int blk_mq_request_started(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_mq_request_started);
 
+int blk_mq_request_completed(struct request *rq)
+{
+	return blk_mq_rq_state(rq) == MQ_RQ_COMPLETE;
+}
+EXPORT_SYMBOL_GPL(blk_mq_request_completed);
+
 void blk_mq_start_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3fa1fa59f9b2..baac2926e54a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -296,6 +296,7 @@ static inline u16 blk_mq_unique_tag_to_tag(u32 unique_tag)
 
 
 int blk_mq_request_started(struct request *rq);
+int blk_mq_request_completed(struct request *rq);
 void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, blk_status_t error);
 void __blk_mq_end_request(struct request *rq, blk_status_t error);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/5] blk-mq: introduce blk_mq_request_completed()
@ 2019-07-22  5:39   ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)


NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.

So introduce it.

Cc: Max Gurtovoy <maxg at mellanox.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 block/blk-mq.c         | 6 ++++++
 include/linux/blk-mq.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b038ec680e84..e1d0b4567388 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -665,6 +665,12 @@ int blk_mq_request_started(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_mq_request_started);
 
+int blk_mq_request_completed(struct request *rq)
+{
+	return blk_mq_rq_state(rq) == MQ_RQ_COMPLETE;
+}
+EXPORT_SYMBOL_GPL(blk_mq_request_completed);
+
 void blk_mq_start_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3fa1fa59f9b2..baac2926e54a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -296,6 +296,7 @@ static inline u16 blk_mq_unique_tag_to_tag(u32 unique_tag)
 
 
 int blk_mq_request_started(struct request *rq);
+int blk_mq_request_completed(struct request *rq);
 void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, blk_status_t error);
 void __blk_mq_end_request(struct request *rq, blk_status_t error);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
  2019-07-22  5:39 ` Ming Lei
@ 2019-07-22  5:39   ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-nvme, Ming Lei, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c     | 32 ++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index da19f0bc8876..008388e82b5c 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 
 #include <linux/blk-mq.h>
+#include <linux/delay.h>
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
@@ -354,6 +355,37 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
+static bool blk_mq_tagset_count_completed_rqs(struct request *rq,
+		void *data, bool reserved)
+{
+	unsigned *count = data;
+
+	if (blk_mq_request_completed(rq))
+		(*count)++;
+	return true;
+}
+
+/**
+ * blk_mq_tagset_wait_completed_request - wait until all completed req's
+ * complete funtion is run
+ * @tagset:	Tag set to drain completed request
+ *
+ * Note: This function has to be run after all IO queues are shutdown
+ */
+void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset)
+{
+	while (true) {
+		unsigned count = 0;
+
+		blk_mq_tagset_busy_iter(tagset,
+				blk_mq_tagset_count_completed_rqs, &count);
+		if (!count)
+			break;
+		msleep(5);
+	}
+}
+EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
+
 /**
  * blk_mq_queue_tag_busy_iter - iterate over all requests with a driver tag
  * @q:		Request queue to examine.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index baac2926e54a..ee0719b649b6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -322,6 +322,7 @@ bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
+void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
@ 2019-07-22  5:39   ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)


blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy <maxg at mellanox.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 block/blk-mq-tag.c     | 32 ++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index da19f0bc8876..008388e82b5c 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 
 #include <linux/blk-mq.h>
+#include <linux/delay.h>
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
@@ -354,6 +355,37 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
+static bool blk_mq_tagset_count_completed_rqs(struct request *rq,
+		void *data, bool reserved)
+{
+	unsigned *count = data;
+
+	if (blk_mq_request_completed(rq))
+		(*count)++;
+	return true;
+}
+
+/**
+ * blk_mq_tagset_wait_completed_request - wait until all completed req's
+ * complete funtion is run
+ * @tagset:	Tag set to drain completed request
+ *
+ * Note: This function has to be run after all IO queues are shutdown
+ */
+void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset)
+{
+	while (true) {
+		unsigned count = 0;
+
+		blk_mq_tagset_busy_iter(tagset,
+				blk_mq_tagset_count_completed_rqs, &count);
+		if (!count)
+			break;
+		msleep(5);
+	}
+}
+EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
+
 /**
  * blk_mq_queue_tag_busy_iter - iterate over all requests with a driver tag
  * @q:		Request queue to examine.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index baac2926e54a..ee0719b649b6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -322,6 +322,7 @@ bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
+void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-22  5:39 ` Ming Lei
@ 2019-07-22  5:39   ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-nvme, Ming Lei, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

Before aborting in-flight requests, all IO queues have been shutdown.
However, request's completion fn may not be done yet because it may
be scheduled to run via IPI.

So don't abort one request if it is marked as completed, otherwise
we may abort one normal completed request.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cc09b81fc7f4..cb8007cce4d1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
 bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 {
+	/* don't abort one completed request */
+	if (blk_mq_request_completed(req))
+		return;
+
 	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
 				"Cancelling I/O %d", req->tag);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-22  5:39   ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)


Before aborting in-flight requests, all IO queues have been shutdown.
However, request's completion fn may not be done yet because it may
be scheduled to run via IPI.

So don't abort one request if it is marked as completed, otherwise
we may abort one normal completed request.

Cc: Max Gurtovoy <maxg at mellanox.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cc09b81fc7f4..cb8007cce4d1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
 bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 {
+	/* don't abort one completed request */
+	if (blk_mq_request_completed(req))
+		return;
+
 	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
 				"Cancelling I/O %d", req->tag);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/5] nvme: wait until all completed request's complete fn is called
  2019-07-22  5:39 ` Ming Lei
@ 2019-07-22  5:39   ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-nvme, Ming Lei, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

When aborting in-flight request for recoverying controller, we have
maken sure that queue's complete function is called on completed
request before moving one. For example, the warning of
WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered.

Fix this issue by using blk_mq_tagset_drain_completed_request.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c    | 2 ++
 drivers/nvme/host/rdma.c   | 8 ++++++--
 drivers/nvme/host/tcp.c    | 8 ++++++--
 drivers/nvme/target/loop.c | 2 ++
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb970ca82517..47f37d9d9aa7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2403,6 +2403,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+	blk_mq_tagset_wait_completed_request(&dev->tagset);
+	blk_mq_tagset_wait_completed_request(&dev->admin_tagset);
 
 	/*
 	 * The driver will not be starting up queues again if shutting down so
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a249db528d54..b313a60be1ca 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -901,9 +901,11 @@ static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 {
 	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
-	if (ctrl->ctrl.admin_tagset)
+	if (ctrl->ctrl.admin_tagset) {
 		blk_mq_tagset_busy_iter(ctrl->ctrl.admin_tagset,
 			nvme_cancel_request, &ctrl->ctrl);
+		blk_mq_tagset_wait_completed_request(ctrl->ctrl.admin_tagset);
+	}
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_destroy_admin_queue(ctrl, remove);
 }
@@ -914,9 +916,11 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
 	if (ctrl->ctrl.queue_count > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
 		nvme_rdma_stop_io_queues(ctrl);
-		if (ctrl->ctrl.tagset)
+		if (ctrl->ctrl.tagset) {
 			blk_mq_tagset_busy_iter(ctrl->ctrl.tagset,
 				nvme_cancel_request, &ctrl->ctrl);
+			blk_mq_tagset_wait_completed_request(ctrl->ctrl.tagset);
+		}
 		if (remove)
 			nvme_start_queues(&ctrl->ctrl);
 		nvme_rdma_destroy_io_queues(ctrl, remove);
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 606b13d35d16..cf2eaf834b36 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1748,9 +1748,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 {
 	blk_mq_quiesce_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
-	if (ctrl->admin_tagset)
+	if (ctrl->admin_tagset) {
 		blk_mq_tagset_busy_iter(ctrl->admin_tagset,
 			nvme_cancel_request, ctrl);
+		blk_mq_tagset_wait_completed_request(ctrl->admin_tagset);
+	}
 	blk_mq_unquiesce_queue(ctrl->admin_q);
 	nvme_tcp_destroy_admin_queue(ctrl, remove);
 }
@@ -1762,9 +1764,11 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 		return;
 	nvme_stop_queues(ctrl);
 	nvme_tcp_stop_io_queues(ctrl);
-	if (ctrl->tagset)
+	if (ctrl->tagset) {
 		blk_mq_tagset_busy_iter(ctrl->tagset,
 			nvme_cancel_request, ctrl);
+		blk_mq_tagset_wait_completed_request(ctrl->tagset);
+	}
 	if (remove)
 		nvme_start_queues(ctrl);
 	nvme_tcp_destroy_io_queues(ctrl, remove);
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b16dc3981c69..95c8f1732215 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -407,6 +407,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 		nvme_stop_queues(&ctrl->ctrl);
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
 					nvme_cancel_request, &ctrl->ctrl);
+		blk_mq_tagset_wait_completed_request(&ctrl->tag_set);
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
@@ -416,6 +417,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_cancel_request, &ctrl->ctrl);
+	blk_mq_tagset_wait_completed_request(&ctrl->admin_tag_set);
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 	nvme_loop_destroy_admin_queue(ctrl);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/5] nvme: wait until all completed request's complete fn is called
@ 2019-07-22  5:39   ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)


When aborting in-flight request for recoverying controller, we have
maken sure that queue's complete function is called on completed
request before moving one. For example, the warning of
WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered.

Fix this issue by using blk_mq_tagset_drain_completed_request.

Cc: Max Gurtovoy <maxg at mellanox.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/pci.c    | 2 ++
 drivers/nvme/host/rdma.c   | 8 ++++++--
 drivers/nvme/host/tcp.c    | 8 ++++++--
 drivers/nvme/target/loop.c | 2 ++
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb970ca82517..47f37d9d9aa7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2403,6 +2403,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+	blk_mq_tagset_wait_completed_request(&dev->tagset);
+	blk_mq_tagset_wait_completed_request(&dev->admin_tagset);
 
 	/*
 	 * The driver will not be starting up queues again if shutting down so
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a249db528d54..b313a60be1ca 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -901,9 +901,11 @@ static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 {
 	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
-	if (ctrl->ctrl.admin_tagset)
+	if (ctrl->ctrl.admin_tagset) {
 		blk_mq_tagset_busy_iter(ctrl->ctrl.admin_tagset,
 			nvme_cancel_request, &ctrl->ctrl);
+		blk_mq_tagset_wait_completed_request(ctrl->ctrl.admin_tagset);
+	}
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_destroy_admin_queue(ctrl, remove);
 }
@@ -914,9 +916,11 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
 	if (ctrl->ctrl.queue_count > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
 		nvme_rdma_stop_io_queues(ctrl);
-		if (ctrl->ctrl.tagset)
+		if (ctrl->ctrl.tagset) {
 			blk_mq_tagset_busy_iter(ctrl->ctrl.tagset,
 				nvme_cancel_request, &ctrl->ctrl);
+			blk_mq_tagset_wait_completed_request(ctrl->ctrl.tagset);
+		}
 		if (remove)
 			nvme_start_queues(&ctrl->ctrl);
 		nvme_rdma_destroy_io_queues(ctrl, remove);
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 606b13d35d16..cf2eaf834b36 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1748,9 +1748,11 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 {
 	blk_mq_quiesce_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
-	if (ctrl->admin_tagset)
+	if (ctrl->admin_tagset) {
 		blk_mq_tagset_busy_iter(ctrl->admin_tagset,
 			nvme_cancel_request, ctrl);
+		blk_mq_tagset_wait_completed_request(ctrl->admin_tagset);
+	}
 	blk_mq_unquiesce_queue(ctrl->admin_q);
 	nvme_tcp_destroy_admin_queue(ctrl, remove);
 }
@@ -1762,9 +1764,11 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 		return;
 	nvme_stop_queues(ctrl);
 	nvme_tcp_stop_io_queues(ctrl);
-	if (ctrl->tagset)
+	if (ctrl->tagset) {
 		blk_mq_tagset_busy_iter(ctrl->tagset,
 			nvme_cancel_request, ctrl);
+		blk_mq_tagset_wait_completed_request(ctrl->tagset);
+	}
 	if (remove)
 		nvme_start_queues(ctrl);
 	nvme_tcp_destroy_io_queues(ctrl, remove);
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b16dc3981c69..95c8f1732215 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -407,6 +407,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 		nvme_stop_queues(&ctrl->ctrl);
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
 					nvme_cancel_request, &ctrl->ctrl);
+		blk_mq_tagset_wait_completed_request(&ctrl->tag_set);
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
@@ -416,6 +417,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_cancel_request, &ctrl->ctrl);
+	blk_mq_tagset_wait_completed_request(&ctrl->admin_tag_set);
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 	nvme_loop_destroy_admin_queue(ctrl);
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/5] blk-mq: remove blk_mq_complete_request_sync
  2019-07-22  5:39 ` Ming Lei
@ 2019-07-22  5:39   ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-nvme, Ming Lei, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c           | 7 -------
 drivers/nvme/host/core.c | 2 +-
 include/linux/blk-mq.h   | 1 -
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e1d0b4567388..9836a00d8c07 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -652,13 +652,6 @@ bool blk_mq_complete_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
-void blk_mq_complete_request_sync(struct request *rq)
-{
-	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
-	rq->q->mq_ops->complete(rq);
-}
-EXPORT_SYMBOL_GPL(blk_mq_complete_request_sync);
-
 int blk_mq_request_started(struct request *rq)
 {
 	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cb8007cce4d1..4beaed3b5022 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -293,7 +293,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 				"Cancelling I/O %d", req->tag);
 
 	nvme_req(req)->status = NVME_SC_ABORT_REQ;
-	blk_mq_complete_request_sync(req);
+	blk_mq_complete_request(req);
 	return true;
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ee0719b649b6..1cdd2788cfa6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -305,7 +305,6 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 bool blk_mq_complete_request(struct request *rq);
-void blk_mq_complete_request_sync(struct request *rq);
 bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
 			   struct bio *bio, unsigned int nr_segs);
 bool blk_mq_queue_stopped(struct request_queue *q);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/5] blk-mq: remove blk_mq_complete_request_sync
@ 2019-07-22  5:39   ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-22  5:39 UTC (permalink / raw)


blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.

Cc: Max Gurtovoy <maxg at mellanox.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 block/blk-mq.c           | 7 -------
 drivers/nvme/host/core.c | 2 +-
 include/linux/blk-mq.h   | 1 -
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e1d0b4567388..9836a00d8c07 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -652,13 +652,6 @@ bool blk_mq_complete_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
-void blk_mq_complete_request_sync(struct request *rq)
-{
-	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
-	rq->q->mq_ops->complete(rq);
-}
-EXPORT_SYMBOL_GPL(blk_mq_complete_request_sync);
-
 int blk_mq_request_started(struct request *rq)
 {
 	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cb8007cce4d1..4beaed3b5022 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -293,7 +293,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 				"Cancelling I/O %d", req->tag);
 
 	nvme_req(req)->status = NVME_SC_ABORT_REQ;
-	blk_mq_complete_request_sync(req);
+	blk_mq_complete_request(req);
 	return true;
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ee0719b649b6..1cdd2788cfa6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -305,7 +305,6 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 bool blk_mq_complete_request(struct request *rq);
-void blk_mq_complete_request_sync(struct request *rq);
 bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
 			   struct bio *bio, unsigned int nr_segs);
 bool blk_mq_queue_stopped(struct request_queue *q);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-22 15:25     ` Bart Van Assche
  -1 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-22 15:25 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On 7/21/19 10:39 PM, Ming Lei wrote:
> blk-mq may schedule to call queue's complete function on remote CPU via
> IPI, but doesn't provide any way to synchronize the request's complete
> fn.
> 
> In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> re-allocated. If the completed request's complete fn is run finally after the
> hardware queue's resource is released, kernel crash will be triggered.
> 
> Prepare for fixing this kind of issue by introducing
> blk_mq_tagset_wait_completed_request().

An explanation is missing of why the block layer is modified to fix this 
instead of the NVMe driver.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
@ 2019-07-22 15:25     ` Bart Van Assche
  0 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-22 15:25 UTC (permalink / raw)


On 7/21/19 10:39 PM, Ming Lei wrote:
> blk-mq may schedule to call queue's complete function on remote CPU via
> IPI, but doesn't provide any way to synchronize the request's complete
> fn.
> 
> In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> re-allocated. If the completed request's complete fn is run finally after the
> hardware queue's resource is released, kernel crash will be triggered.
> 
> Prepare for fixing this kind of issue by introducing
> blk_mq_tagset_wait_completed_request().

An explanation is missing of why the block layer is modified to fix this 
instead of the NVMe driver.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-22 15:27     ` Bart Van Assche
  -1 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-22 15:27 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On 7/21/19 10:39 PM, Ming Lei wrote:
> Before aborting in-flight requests, all IO queues have been shutdown.
> However, request's completion fn may not be done yet because it may
> be scheduled to run via IPI.
> 
> So don't abort one request if it is marked as completed, otherwise
> we may abort one normal completed request.
> 
> Cc: Max Gurtovoy <maxg@mellanox.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/core.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index cc09b81fc7f4..cb8007cce4d1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
>   
>   bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>   {
> +	/* don't abort one completed request */
> +	if (blk_mq_request_completed(req))
> +		return;
> +
>   	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>   				"Cancelling I/O %d", req->tag);

Something I probably already asked before: what prevents that 
nvme_cancel_request() is executed concurrently with the completion 
handler of the same request?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-22 15:27     ` Bart Van Assche
  0 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-22 15:27 UTC (permalink / raw)


On 7/21/19 10:39 PM, Ming Lei wrote:
> Before aborting in-flight requests, all IO queues have been shutdown.
> However, request's completion fn may not be done yet because it may
> be scheduled to run via IPI.
> 
> So don't abort one request if it is marked as completed, otherwise
> we may abort one normal completed request.
> 
> Cc: Max Gurtovoy <maxg at mellanox.com>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Ming Lei <ming.lei at redhat.com>
> ---
>   drivers/nvme/host/core.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index cc09b81fc7f4..cb8007cce4d1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
>   
>   bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>   {
> +	/* don't abort one completed request */
> +	if (blk_mq_request_completed(req))
> +		return;
> +
>   	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>   				"Cancelling I/O %d", req->tag);

Something I probably already asked before: what prevents that 
nvme_cancel_request() is executed concurrently with the completion 
handler of the same request?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-22 15:27     ` Bart Van Assche
@ 2019-07-22 23:22       ` Keith Busch
  -1 siblings, 0 replies; 50+ messages in thread
From: Keith Busch @ 2019-07-22 23:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Jens Axboe, linux-block, Sagi Grimberg, linux-nvme,
	Keith Busch, Max Gurtovoy, Christoph Hellwig

On Mon, Jul 22, 2019 at 9:27 AM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/21/19 10:39 PM, Ming Lei wrote:
> > Before aborting in-flight requests, all IO queues have been shutdown.
> > However, request's completion fn may not be done yet because it may
> > be scheduled to run via IPI.
> >
> > So don't abort one request if it is marked as completed, otherwise
> > we may abort one normal completed request.
> >
> > Cc: Max Gurtovoy <maxg@mellanox.com>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index cc09b81fc7f4..cb8007cce4d1 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
> >
> >   bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> >   {
> > +     /* don't abort one completed request */
> > +     if (blk_mq_request_completed(req))
> > +             return;
> > +
> >       dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
> >                               "Cancelling I/O %d", req->tag);
>
> Something I probably already asked before: what prevents that
> nvme_cancel_request() is executed concurrently with the completion
> handler of the same request?

At least for pci, we've shutdown the queues and their interrupts prior
to tagset iteration, so we can't concurrently execute a natural
completion for in-flight requests while cancelling them.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-22 23:22       ` Keith Busch
  0 siblings, 0 replies; 50+ messages in thread
From: Keith Busch @ 2019-07-22 23:22 UTC (permalink / raw)


On Mon, Jul 22, 2019@9:27 AM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/21/19 10:39 PM, Ming Lei wrote:
> > Before aborting in-flight requests, all IO queues have been shutdown.
> > However, request's completion fn may not be done yet because it may
> > be scheduled to run via IPI.
> >
> > So don't abort one request if it is marked as completed, otherwise
> > we may abort one normal completed request.
> >
> > Cc: Max Gurtovoy <maxg at mellanox.com>
> > Cc: Sagi Grimberg <sagi at grimberg.me>
> > Cc: Keith Busch <keith.busch at intel.com>
> > Cc: Christoph Hellwig <hch at lst.de>
> > Signed-off-by: Ming Lei <ming.lei at redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index cc09b81fc7f4..cb8007cce4d1 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
> >
> >   bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> >   {
> > +     /* don't abort one completed request */
> > +     if (blk_mq_request_completed(req))
> > +             return;
> > +
> >       dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
> >                               "Cancelling I/O %d", req->tag);
>
> Something I probably already asked before: what prevents that
> nvme_cancel_request() is executed concurrently with the completion
> handler of the same request?

At least for pci, we've shutdown the queues and their interrupts prior
to tagset iteration, so we can't concurrently execute a natural
completion for in-flight requests while cancelling them.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/5] blk-mq: wait until completed req's complete fn is run
  2019-07-22  5:39 ` Ming Lei
@ 2019-07-22 23:27   ` Bob Liu
  -1 siblings, 0 replies; 50+ messages in thread
From: Bob Liu @ 2019-07-22 23:27 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, linux-nvme, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

On 7/22/19 1:39 PM, Ming Lei wrote:
> Hi,
> 
> blk-mq may schedule to call queue's complete function on remote CPU via
> IPI, but never provide any way to synchronize the request's complete
> fn.
> 
> In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> re-allocated. If the completed request's complete fn is run finally after the
> hardware queue's resource is released, kernel crash will be triggered.
> 

Have you seen the crash? Anyway to emulate/verify this bug..

> Fixes this issue by waitting until completed req's complete fn is run.
> 
> Thanks,
> Ming
> 
> Ming Lei (5):
>   blk-mq: introduce blk_mq_request_completed()
>   blk-mq: introduce blk_mq_tagset_wait_completed_request()
>   nvme: don't abort completed request in nvme_cancel_request
>   nvme: wait until all completed request's complete fn is called
>   blk-mq: remove blk_mq_complete_request_sync
> 
>  block/blk-mq-tag.c         | 32 ++++++++++++++++++++++++++++++++
>  block/blk-mq.c             | 13 ++++++-------
>  drivers/nvme/host/core.c   |  6 +++++-
>  drivers/nvme/host/pci.c    |  2 ++
>  drivers/nvme/host/rdma.c   |  8 ++++++--
>  drivers/nvme/host/tcp.c    |  8 ++++++--
>  drivers/nvme/target/loop.c |  2 ++
>  include/linux/blk-mq.h     |  3 ++-
>  8 files changed, 61 insertions(+), 13 deletions(-)
> 
> Cc: Max Gurtovoy <maxg@mellanox.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/5] blk-mq: wait until completed req's complete fn is run
@ 2019-07-22 23:27   ` Bob Liu
  0 siblings, 0 replies; 50+ messages in thread
From: Bob Liu @ 2019-07-22 23:27 UTC (permalink / raw)


On 7/22/19 1:39 PM, Ming Lei wrote:
> Hi,
> 
> blk-mq may schedule to call queue's complete function on remote CPU via
> IPI, but never provide any way to synchronize the request's complete
> fn.
> 
> In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> re-allocated. If the completed request's complete fn is run finally after the
> hardware queue's resource is released, kernel crash will be triggered.
> 

Have you seen the crash? Anyway to emulate/verify this bug..

> Fixes this issue by waitting until completed req's complete fn is run.
> 
> Thanks,
> Ming
> 
> Ming Lei (5):
>   blk-mq: introduce blk_mq_request_completed()
>   blk-mq: introduce blk_mq_tagset_wait_completed_request()
>   nvme: don't abort completed request in nvme_cancel_request
>   nvme: wait until all completed request's complete fn is called
>   blk-mq: remove blk_mq_complete_request_sync
> 
>  block/blk-mq-tag.c         | 32 ++++++++++++++++++++++++++++++++
>  block/blk-mq.c             | 13 ++++++-------
>  drivers/nvme/host/core.c   |  6 +++++-
>  drivers/nvme/host/pci.c    |  2 ++
>  drivers/nvme/host/rdma.c   |  8 ++++++--
>  drivers/nvme/host/tcp.c    |  8 ++++++--
>  drivers/nvme/target/loop.c |  2 ++
>  include/linux/blk-mq.h     |  3 ++-
>  8 files changed, 61 insertions(+), 13 deletions(-)
> 
> Cc: Max Gurtovoy <maxg at mellanox.com>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Christoph Hellwig <hch at lst.de>
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-22 23:22       ` Keith Busch
@ 2019-07-23  0:07         ` Sagi Grimberg
  -1 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23  0:07 UTC (permalink / raw)
  To: Keith Busch, Bart Van Assche
  Cc: Ming Lei, Jens Axboe, linux-block, linux-nvme, Keith Busch,
	Max Gurtovoy, Christoph Hellwig


>> On 7/21/19 10:39 PM, Ming Lei wrote:
>>> Before aborting in-flight requests, all IO queues have been shutdown.
>>> However, request's completion fn may not be done yet because it may
>>> be scheduled to run via IPI.
>>>
>>> So don't abort one request if it is marked as completed, otherwise
>>> we may abort one normal completed request.
>>>
>>> Cc: Max Gurtovoy <maxg@mellanox.com>
>>> Cc: Sagi Grimberg <sagi@grimberg.me>
>>> Cc: Keith Busch <keith.busch@intel.com>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    drivers/nvme/host/core.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index cc09b81fc7f4..cb8007cce4d1 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
>>>
>>>    bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>>>    {
>>> +     /* don't abort one completed request */
>>> +     if (blk_mq_request_completed(req))
>>> +             return;
>>> +
>>>        dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>>>                                "Cancelling I/O %d", req->tag);
>>
>> Something I probably already asked before: what prevents that
>> nvme_cancel_request() is executed concurrently with the completion
>> handler of the same request?
> 
> At least for pci, we've shutdown the queues and their interrupts prior
> to tagset iteration, so we can't concurrently execute a natural
> completion for in-flight requests while cancelling them.

Same for tcp and rdma.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-23  0:07         ` Sagi Grimberg
  0 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23  0:07 UTC (permalink / raw)



>> On 7/21/19 10:39 PM, Ming Lei wrote:
>>> Before aborting in-flight requests, all IO queues have been shutdown.
>>> However, request's completion fn may not be done yet because it may
>>> be scheduled to run via IPI.
>>>
>>> So don't abort one request if it is marked as completed, otherwise
>>> we may abort one normal completed request.
>>>
>>> Cc: Max Gurtovoy <maxg at mellanox.com>
>>> Cc: Sagi Grimberg <sagi at grimberg.me>
>>> Cc: Keith Busch <keith.busch at intel.com>
>>> Cc: Christoph Hellwig <hch at lst.de>
>>> Signed-off-by: Ming Lei <ming.lei at redhat.com>
>>> ---
>>>    drivers/nvme/host/core.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index cc09b81fc7f4..cb8007cce4d1 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
>>>
>>>    bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>>>    {
>>> +     /* don't abort one completed request */
>>> +     if (blk_mq_request_completed(req))
>>> +             return;
>>> +
>>>        dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>>>                                "Cancelling I/O %d", req->tag);
>>
>> Something I probably already asked before: what prevents that
>> nvme_cancel_request() is executed concurrently with the completion
>> handler of the same request?
> 
> At least for pci, we've shutdown the queues and their interrupts prior
> to tagset iteration, so we can't concurrently execute a natural
> completion for in-flight requests while cancelling them.

Same for tcp and rdma.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
  2019-07-22 15:25     ` Bart Van Assche
@ 2019-07-23  1:06       ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-23  1:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On Mon, Jul 22, 2019 at 08:25:07AM -0700, Bart Van Assche wrote:
> On 7/21/19 10:39 PM, Ming Lei wrote:
> > blk-mq may schedule to call queue's complete function on remote CPU via
> > IPI, but doesn't provide any way to synchronize the request's complete
> > fn.
> > 
> > In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> > re-allocated. If the completed request's complete fn is run finally after the
> > hardware queue's resource is released, kernel crash will be triggered.
> > 
> > Prepare for fixing this kind of issue by introducing
> > blk_mq_tagset_wait_completed_request().
> 
> An explanation is missing of why the block layer is modified to fix this
> instead of the NVMe driver.

The above commit log has explained that there isn't sync mechanism in
blk-mq wrt. request completion, and there might be similar issue in other
future drivers.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
@ 2019-07-23  1:06       ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-23  1:06 UTC (permalink / raw)


On Mon, Jul 22, 2019@08:25:07AM -0700, Bart Van Assche wrote:
> On 7/21/19 10:39 PM, Ming Lei wrote:
> > blk-mq may schedule to call queue's complete function on remote CPU via
> > IPI, but doesn't provide any way to synchronize the request's complete
> > fn.
> > 
> > In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> > re-allocated. If the completed request's complete fn is run finally after the
> > hardware queue's resource is released, kernel crash will be triggered.
> > 
> > Prepare for fixing this kind of issue by introducing
> > blk_mq_tagset_wait_completed_request().
> 
> An explanation is missing of why the block layer is modified to fix this
> instead of the NVMe driver.

The above commit log has explained that there isn't sync mechanism in
blk-mq wrt. request completion, and there might be similar issue in other
future drivers.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-22 15:27     ` Bart Van Assche
@ 2019-07-23  1:08       ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-23  1:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On Mon, Jul 22, 2019 at 08:27:32AM -0700, Bart Van Assche wrote:
> On 7/21/19 10:39 PM, Ming Lei wrote:
> > Before aborting in-flight requests, all IO queues have been shutdown.
> > However, request's completion fn may not be done yet because it may
> > be scheduled to run via IPI.
> > 
> > So don't abort one request if it is marked as completed, otherwise
> > we may abort one normal completed request.
> > 
> > Cc: Max Gurtovoy <maxg@mellanox.com>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index cc09b81fc7f4..cb8007cce4d1 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
> >   bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> >   {
> > +	/* don't abort one completed request */
> > +	if (blk_mq_request_completed(req))
> > +		return;
> > +
> >   	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
> >   				"Cancelling I/O %d", req->tag);
> 
> Something I probably already asked before: what prevents that
> nvme_cancel_request() is executed concurrently with the completion handler
> of the same request?

The commit log did mention the point:

	Before aborting in-flight requests, all IO queues have been shutdown.

which implies that no concurrent normal completion.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-23  1:08       ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-23  1:08 UTC (permalink / raw)


On Mon, Jul 22, 2019@08:27:32AM -0700, Bart Van Assche wrote:
> On 7/21/19 10:39 PM, Ming Lei wrote:
> > Before aborting in-flight requests, all IO queues have been shutdown.
> > However, request's completion fn may not be done yet because it may
> > be scheduled to run via IPI.
> > 
> > So don't abort one request if it is marked as completed, otherwise
> > we may abort one normal completed request.
> > 
> > Cc: Max Gurtovoy <maxg at mellanox.com>
> > Cc: Sagi Grimberg <sagi at grimberg.me>
> > Cc: Keith Busch <keith.busch at intel.com>
> > Cc: Christoph Hellwig <hch at lst.de>
> > Signed-off-by: Ming Lei <ming.lei at redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index cc09b81fc7f4..cb8007cce4d1 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
> >   bool nvme_cancel_request(struct request *req, void *data, bool reserved)
> >   {
> > +	/* don't abort one completed request */
> > +	if (blk_mq_request_completed(req))
> > +		return;
> > +
> >   	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
> >   				"Cancelling I/O %d", req->tag);
> 
> Something I probably already asked before: what prevents that
> nvme_cancel_request() is executed concurrently with the completion handler
> of the same request?

The commit log did mention the point:

	Before aborting in-flight requests, all IO queues have been shutdown.

which implies that no concurrent normal completion.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/5] blk-mq: wait until completed req's complete fn is run
  2019-07-22 23:27   ` Bob Liu
@ 2019-07-23  1:10     ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-23  1:10 UTC (permalink / raw)
  To: Bob Liu
  Cc: Jens Axboe, linux-block, linux-nvme, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

On Tue, Jul 23, 2019 at 07:27:00AM +0800, Bob Liu wrote:
> On 7/22/19 1:39 PM, Ming Lei wrote:
> > Hi,
> > 
> > blk-mq may schedule to call queue's complete function on remote CPU via
> > IPI, but never provide any way to synchronize the request's complete
> > fn.
> > 
> > In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> > re-allocated. If the completed request's complete fn is run finally after the
> > hardware queue's resource is released, kernel crash will be triggered.
> > 
> 
> Have you seen the crash? Anyway to emulate/verify this bug..

The crash is reported on nvme-rdma by one RH partner, and the approach
of this patchset has been verified already.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/5] blk-mq: wait until completed req's complete fn is run
@ 2019-07-23  1:10     ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-23  1:10 UTC (permalink / raw)


On Tue, Jul 23, 2019@07:27:00AM +0800, Bob Liu wrote:
> On 7/22/19 1:39 PM, Ming Lei wrote:
> > Hi,
> > 
> > blk-mq may schedule to call queue's complete function on remote CPU via
> > IPI, but never provide any way to synchronize the request's complete
> > fn.
> > 
> > In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> > re-allocated. If the completed request's complete fn is run finally after the
> > hardware queue's resource is released, kernel crash will be triggered.
> > 
> 
> Have you seen the crash? Anyway to emulate/verify this bug..

The crash is reported on nvme-rdma by one RH partner, and the approach
of this patchset has been verified already.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/5] nvme: wait until all completed request's complete fn is called
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-23 16:14     ` Dongli Zhang
  -1 siblings, 0 replies; 50+ messages in thread
From: Dongli Zhang @ 2019-07-23 16:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Max Gurtovoy, Sagi Grimberg,
	Keith Busch, Christoph Hellwig

Hi Ming,

On 07/22/2019 01:39 PM, Ming Lei wrote:
> When aborting in-flight request for recoverying controller, we have

recovering

> maken sure that queue's complete function is called on completed
> request before moving one. For example, the warning of
> WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered.
> 
> Fix this issue by using blk_mq_tagset_drain_completed_request.
> 

Should be blk_mq_tagset_wait_completed_request but not
blk_mq_tagset_drain_completed_request?

Dongli Zhang

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/5] nvme: wait until all completed request's complete fn is called
@ 2019-07-23 16:14     ` Dongli Zhang
  0 siblings, 0 replies; 50+ messages in thread
From: Dongli Zhang @ 2019-07-23 16:14 UTC (permalink / raw)


Hi Ming,

On 07/22/2019 01:39 PM, Ming Lei wrote:
> When aborting in-flight request for recoverying controller, we have

recovering

> maken sure that queue's complete function is called on completed
> request before moving one. For example, the warning of
> WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered.
> 
> Fix this issue by using blk_mq_tagset_drain_completed_request.
> 

Should be blk_mq_tagset_wait_completed_request but not
blk_mq_tagset_drain_completed_request?

Dongli Zhang

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-23  1:08       ` Ming Lei
@ 2019-07-23 19:22         ` Bart Van Assche
  -1 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-23 19:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On 7/22/19 6:08 PM, Ming Lei wrote:
> On Mon, Jul 22, 2019 at 08:27:32AM -0700, Bart Van Assche wrote:
>> On 7/21/19 10:39 PM, Ming Lei wrote:
>>> Before aborting in-flight requests, all IO queues have been shutdown.
>>> However, request's completion fn may not be done yet because it may
>>> be scheduled to run via IPI.
>>>
>>> So don't abort one request if it is marked as completed, otherwise
>>> we may abort one normal completed request.
>>>
>>> Cc: Max Gurtovoy <maxg@mellanox.com>
>>> Cc: Sagi Grimberg <sagi@grimberg.me>
>>> Cc: Keith Busch <keith.busch@intel.com>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    drivers/nvme/host/core.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index cc09b81fc7f4..cb8007cce4d1 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
>>>    bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>>>    {
>>> +	/* don't abort one completed request */
>>> +	if (blk_mq_request_completed(req))
>>> +		return;
>>> +
>>>    	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>>>    				"Cancelling I/O %d", req->tag);
>>
>> Something I probably already asked before: what prevents that
>> nvme_cancel_request() is executed concurrently with the completion handler
>> of the same request?
> 
> The commit log did mention the point:
> 
> 	Before aborting in-flight requests, all IO queues have been shutdown.
> 
> which implies that no concurrent normal completion.

How about adding that explanation as a comment above
nvme_cancel_request()? That would make that explanation much easier to 
find compared to having to search through commit logs.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-23 19:22         ` Bart Van Assche
  0 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-23 19:22 UTC (permalink / raw)


On 7/22/19 6:08 PM, Ming Lei wrote:
> On Mon, Jul 22, 2019@08:27:32AM -0700, Bart Van Assche wrote:
>> On 7/21/19 10:39 PM, Ming Lei wrote:
>>> Before aborting in-flight requests, all IO queues have been shutdown.
>>> However, request's completion fn may not be done yet because it may
>>> be scheduled to run via IPI.
>>>
>>> So don't abort one request if it is marked as completed, otherwise
>>> we may abort one normal completed request.
>>>
>>> Cc: Max Gurtovoy <maxg at mellanox.com>
>>> Cc: Sagi Grimberg <sagi at grimberg.me>
>>> Cc: Keith Busch <keith.busch at intel.com>
>>> Cc: Christoph Hellwig <hch at lst.de>
>>> Signed-off-by: Ming Lei <ming.lei at redhat.com>
>>> ---
>>>    drivers/nvme/host/core.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index cc09b81fc7f4..cb8007cce4d1 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -285,6 +285,10 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
>>>    bool nvme_cancel_request(struct request *req, void *data, bool reserved)
>>>    {
>>> +	/* don't abort one completed request */
>>> +	if (blk_mq_request_completed(req))
>>> +		return;
>>> +
>>>    	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
>>>    				"Cancelling I/O %d", req->tag);
>>
>> Something I probably already asked before: what prevents that
>> nvme_cancel_request() is executed concurrently with the completion handler
>> of the same request?
> 
> The commit log did mention the point:
> 
> 	Before aborting in-flight requests, all IO queues have been shutdown.
> 
> which implies that no concurrent normal completion.

How about adding that explanation as a comment above
nvme_cancel_request()? That would make that explanation much easier to 
find compared to having to search through commit logs.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/5] blk-mq: introduce blk_mq_request_completed()
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-23 20:26     ` Sagi Grimberg
  -1 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:26 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, linux-nvme, Max Gurtovoy, Keith Busch, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/5] blk-mq: introduce blk_mq_request_completed()
@ 2019-07-23 20:26     ` Sagi Grimberg
  0 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:26 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-23 20:27     ` Sagi Grimberg
  -1 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:27 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, linux-nvme, Max Gurtovoy, Keith Busch, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
@ 2019-07-23 20:27     ` Sagi Grimberg
  0 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:27 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-23 20:27     ` Sagi Grimberg
  -1 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:27 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, linux-nvme, Max Gurtovoy, Keith Busch, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request
@ 2019-07-23 20:27     ` Sagi Grimberg
  0 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:27 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/5] nvme: wait until all completed request's complete fn is called
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-23 20:29     ` Sagi Grimberg
  -1 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:29 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, linux-nvme, Max Gurtovoy, Keith Busch, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

What about fc? I know that they are not canceling a request, but
they still complete it, shouldn't it also wait before removing?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/5] nvme: wait until all completed request's complete fn is called
@ 2019-07-23 20:29     ` Sagi Grimberg
  0 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:29 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

What about fc? I know that they are not canceling a request, but
they still complete it, shouldn't it also wait before removing?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/5] blk-mq: remove blk_mq_complete_request_sync
  2019-07-22  5:39   ` Ming Lei
@ 2019-07-23 20:30     ` Sagi Grimberg
  -1 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:30 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, linux-nvme, Max Gurtovoy, Keith Busch, Christoph Hellwig

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 5/5] blk-mq: remove blk_mq_complete_request_sync
@ 2019-07-23 20:30     ` Sagi Grimberg
  0 siblings, 0 replies; 50+ messages in thread
From: Sagi Grimberg @ 2019-07-23 20:30 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
  2019-07-23  1:06       ` Ming Lei
@ 2019-07-23 20:54         ` Bart Van Assche
  -1 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-23 20:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On 7/22/19 6:06 PM, Ming Lei wrote:
> On Mon, Jul 22, 2019 at 08:25:07AM -0700, Bart Van Assche wrote:
>> On 7/21/19 10:39 PM, Ming Lei wrote:
>>> blk-mq may schedule to call queue's complete function on remote CPU via
>>> IPI, but doesn't provide any way to synchronize the request's complete
>>> fn.
>>>
>>> In some driver's EH(such as NVMe), hardware queue's resource may be freed &
>>> re-allocated. If the completed request's complete fn is run finally after the
>>> hardware queue's resource is released, kernel crash will be triggered.
>>>
>>> Prepare for fixing this kind of issue by introducing
>>> blk_mq_tagset_wait_completed_request().
>>
>> An explanation is missing of why the block layer is modified to fix this
>> instead of the NVMe driver.
> 
> The above commit log has explained that there isn't sync mechanism in
> blk-mq wrt. request completion, and there might be similar issue in other
> future drivers.

That is not sufficient as a motivation to modify the block layer because 
there is already a way to wait until request completions have finished, 
namely the request queue freeze mechanism. Have you considered to use 
that mechanism instead of introducing 
blk_mq_tagset_wait_completed_request()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
@ 2019-07-23 20:54         ` Bart Van Assche
  0 siblings, 0 replies; 50+ messages in thread
From: Bart Van Assche @ 2019-07-23 20:54 UTC (permalink / raw)


On 7/22/19 6:06 PM, Ming Lei wrote:
> On Mon, Jul 22, 2019@08:25:07AM -0700, Bart Van Assche wrote:
>> On 7/21/19 10:39 PM, Ming Lei wrote:
>>> blk-mq may schedule to call queue's complete function on remote CPU via
>>> IPI, but doesn't provide any way to synchronize the request's complete
>>> fn.
>>>
>>> In some driver's EH(such as NVMe), hardware queue's resource may be freed &
>>> re-allocated. If the completed request's complete fn is run finally after the
>>> hardware queue's resource is released, kernel crash will be triggered.
>>>
>>> Prepare for fixing this kind of issue by introducing
>>> blk_mq_tagset_wait_completed_request().
>>
>> An explanation is missing of why the block layer is modified to fix this
>> instead of the NVMe driver.
> 
> The above commit log has explained that there isn't sync mechanism in
> blk-mq wrt. request completion, and there might be similar issue in other
> future drivers.

That is not sufficient as a motivation to modify the block layer because 
there is already a way to wait until request completions have finished, 
namely the request queue freeze mechanism. Have you considered to use 
that mechanism instead of introducing 
blk_mq_tagset_wait_completed_request()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
  2019-07-23 20:54         ` Bart Van Assche
@ 2019-07-24  1:34           ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-24  1:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On Tue, Jul 23, 2019 at 01:54:52PM -0700, Bart Van Assche wrote:
> On 7/22/19 6:06 PM, Ming Lei wrote:
> > On Mon, Jul 22, 2019 at 08:25:07AM -0700, Bart Van Assche wrote:
> > > On 7/21/19 10:39 PM, Ming Lei wrote:
> > > > blk-mq may schedule to call queue's complete function on remote CPU via
> > > > IPI, but doesn't provide any way to synchronize the request's complete
> > > > fn.
> > > > 
> > > > In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> > > > re-allocated. If the completed request's complete fn is run finally after the
> > > > hardware queue's resource is released, kernel crash will be triggered.
> > > > 
> > > > Prepare for fixing this kind of issue by introducing
> > > > blk_mq_tagset_wait_completed_request().
> > > 
> > > An explanation is missing of why the block layer is modified to fix this
> > > instead of the NVMe driver.
> > 
> > The above commit log has explained that there isn't sync mechanism in
> > blk-mq wrt. request completion, and there might be similar issue in other
> > future drivers.
> 
> That is not sufficient as a motivation to modify the block layer because
> there is already a way to wait until request completions have finished,
> namely the request queue freeze mechanism. Have you considered to use that
> mechanism instead of introducing blk_mq_tagset_wait_completed_request()?

The introduced interface is used in EH, during which the aborted
requests will stay at blk-mq sw/scheduler queue, so queue freeze will
cause deadlock. We simply can't use it.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request()
@ 2019-07-24  1:34           ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-24  1:34 UTC (permalink / raw)


On Tue, Jul 23, 2019@01:54:52PM -0700, Bart Van Assche wrote:
> On 7/22/19 6:06 PM, Ming Lei wrote:
> > On Mon, Jul 22, 2019@08:25:07AM -0700, Bart Van Assche wrote:
> > > On 7/21/19 10:39 PM, Ming Lei wrote:
> > > > blk-mq may schedule to call queue's complete function on remote CPU via
> > > > IPI, but doesn't provide any way to synchronize the request's complete
> > > > fn.
> > > > 
> > > > In some driver's EH(such as NVMe), hardware queue's resource may be freed &
> > > > re-allocated. If the completed request's complete fn is run finally after the
> > > > hardware queue's resource is released, kernel crash will be triggered.
> > > > 
> > > > Prepare for fixing this kind of issue by introducing
> > > > blk_mq_tagset_wait_completed_request().
> > > 
> > > An explanation is missing of why the block layer is modified to fix this
> > > instead of the NVMe driver.
> > 
> > The above commit log has explained that there isn't sync mechanism in
> > blk-mq wrt. request completion, and there might be similar issue in other
> > future drivers.
> 
> That is not sufficient as a motivation to modify the block layer because
> there is already a way to wait until request completions have finished,
> namely the request queue freeze mechanism. Have you considered to use that
> mechanism instead of introducing blk_mq_tagset_wait_completed_request()?

The introduced interface is used in EH, during which the aborted
requests will stay at blk-mq sw/scheduler queue, so queue freeze will
cause deadlock. We simply can't use it.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/5] nvme: wait until all completed request's complete fn is called
  2019-07-23 20:29     ` Sagi Grimberg
@ 2019-07-24  1:43       ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-24  1:43 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, linux-block, linux-nvme, Max Gurtovoy, Keith Busch,
	Christoph Hellwig

On Tue, Jul 23, 2019 at 01:29:54PM -0700, Sagi Grimberg wrote:
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> 
> What about fc? I know that they are not canceling a request, but
> they still complete it, shouldn't it also wait before removing?

fc doesn't use blk-mq request abort mechanism, so its own approach
may sync with abort well.

However, for normal completion, I think the wait is still needed,
will do it in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/5] nvme: wait until all completed request's complete fn is called
@ 2019-07-24  1:43       ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-24  1:43 UTC (permalink / raw)


On Tue, Jul 23, 2019@01:29:54PM -0700, Sagi Grimberg wrote:
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> 
> What about fc? I know that they are not canceling a request, but
> they still complete it, shouldn't it also wait before removing?

fc doesn't use blk-mq request abort mechanism, so its own approach
may sync with abort well.

However, for normal completion, I think the wait is still needed,
will do it in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/5] nvme: wait until all completed request's complete fn is called
  2019-07-23 16:14     ` Dongli Zhang
@ 2019-07-24  2:05       ` Ming Lei
  -1 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-24  2:05 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block,
	Max Gurtovoy, Christoph Hellwig

On Wed, Jul 24, 2019 at 12:14:24AM +0800, Dongli Zhang wrote:
> Hi Ming,
> 
> On 07/22/2019 01:39 PM, Ming Lei wrote:
> > When aborting in-flight request for recoverying controller, we have
> 
> recovering
> 
> > maken sure that queue's complete function is called on completed
> > request before moving one. For example, the warning of
> > WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered.
> > 
> > Fix this issue by using blk_mq_tagset_drain_completed_request.
> > 
> 
> Should be blk_mq_tagset_wait_completed_request but not
> blk_mq_tagset_drain_completed_request?

Will fix the two in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/5] nvme: wait until all completed request's complete fn is called
@ 2019-07-24  2:05       ` Ming Lei
  0 siblings, 0 replies; 50+ messages in thread
From: Ming Lei @ 2019-07-24  2:05 UTC (permalink / raw)


On Wed, Jul 24, 2019@12:14:24AM +0800, Dongli Zhang wrote:
> Hi Ming,
> 
> On 07/22/2019 01:39 PM, Ming Lei wrote:
> > When aborting in-flight request for recoverying controller, we have
> 
> recovering
> 
> > maken sure that queue's complete function is called on completed
> > request before moving one. For example, the warning of
> > WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered.
> > 
> > Fix this issue by using blk_mq_tagset_drain_completed_request.
> > 
> 
> Should be blk_mq_tagset_wait_completed_request but not
> blk_mq_tagset_drain_completed_request?

Will fix the two in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2019-07-24  2:05 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22  5:39 [PATCH 0/5] blk-mq: wait until completed req's complete fn is run Ming Lei
2019-07-22  5:39 ` Ming Lei
2019-07-22  5:39 ` [PATCH 1/5] blk-mq: introduce blk_mq_request_completed() Ming Lei
2019-07-22  5:39   ` Ming Lei
2019-07-23 20:26   ` Sagi Grimberg
2019-07-23 20:26     ` Sagi Grimberg
2019-07-22  5:39 ` [PATCH 2/5] blk-mq: introduce blk_mq_tagset_wait_completed_request() Ming Lei
2019-07-22  5:39   ` Ming Lei
2019-07-22 15:25   ` Bart Van Assche
2019-07-22 15:25     ` Bart Van Assche
2019-07-23  1:06     ` Ming Lei
2019-07-23  1:06       ` Ming Lei
2019-07-23 20:54       ` Bart Van Assche
2019-07-23 20:54         ` Bart Van Assche
2019-07-24  1:34         ` Ming Lei
2019-07-24  1:34           ` Ming Lei
2019-07-23 20:27   ` Sagi Grimberg
2019-07-23 20:27     ` Sagi Grimberg
2019-07-22  5:39 ` [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request Ming Lei
2019-07-22  5:39   ` Ming Lei
2019-07-22 15:27   ` Bart Van Assche
2019-07-22 15:27     ` Bart Van Assche
2019-07-22 23:22     ` Keith Busch
2019-07-22 23:22       ` Keith Busch
2019-07-23  0:07       ` Sagi Grimberg
2019-07-23  0:07         ` Sagi Grimberg
2019-07-23  1:08     ` Ming Lei
2019-07-23  1:08       ` Ming Lei
2019-07-23 19:22       ` Bart Van Assche
2019-07-23 19:22         ` Bart Van Assche
2019-07-23 20:27   ` Sagi Grimberg
2019-07-23 20:27     ` Sagi Grimberg
2019-07-22  5:39 ` [PATCH 4/5] nvme: wait until all completed request's complete fn is called Ming Lei
2019-07-22  5:39   ` Ming Lei
2019-07-23 16:14   ` Dongli Zhang
2019-07-23 16:14     ` Dongli Zhang
2019-07-24  2:05     ` Ming Lei
2019-07-24  2:05       ` Ming Lei
2019-07-23 20:29   ` Sagi Grimberg
2019-07-23 20:29     ` Sagi Grimberg
2019-07-24  1:43     ` Ming Lei
2019-07-24  1:43       ` Ming Lei
2019-07-22  5:39 ` [PATCH 5/5] blk-mq: remove blk_mq_complete_request_sync Ming Lei
2019-07-22  5:39   ` Ming Lei
2019-07-23 20:30   ` Sagi Grimberg
2019-07-23 20:30     ` Sagi Grimberg
2019-07-22 23:27 ` [PATCH 0/5] blk-mq: wait until completed req's complete fn is run Bob Liu
2019-07-22 23:27   ` Bob Liu
2019-07-23  1:10   ` Ming Lei
2019-07-23  1:10     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.