linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] restore nvme-rdma polling
@ 2018-12-13  6:38 Sagi Grimberg
  2018-12-13  6:38 ` [PATCH v2 1/6] block: introduce blk_execute_rq_polled Sagi Grimberg
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

Add an additional queue mapping for polling queues that will
host polling for latency critical I/O.

Allocate the poll queues with IB_POLL_DIRECT context. For nvmf connect
we introduce a new blk_execute_rq_polled to poll for the completion and
have nvmf_connect_io_queue use it for conneting polling queues.

Finally, we turn off polling support for nvme-multipath as it won't invoke
polling and our completion queues no longer generates any interrupts for
it. I didn't come up with a good way to get around it so far...

Changes from v1:
- get rid of ib_change_cq_ctx
- poll for nvmf connect over poll queues

Sagi Grimberg (6):
  block: introduce blk_execute_rq_polled
  nvme-core: allow __nvme_submit_sync_cmd to poll
  nvme-fabrics: allow nvmf_connect_io_queue to poll
  nvme-fabrics: allow user to pass in nr_poll_queues
  nvme-rdma: implement polling queue map
  nvme-multipath: disable polling for underlying namespace request queue

 block/blk-exec.c            | 29 +++++++++++++++++++
 block/blk-mq.c              |  8 -----
 drivers/nvme/host/core.c    | 15 ++++++----
 drivers/nvme/host/fabrics.c | 25 ++++++++++++----
 drivers/nvme/host/fabrics.h |  5 +++-
 drivers/nvme/host/fc.c      |  2 +-
 drivers/nvme/host/nvme.h    |  2 +-
 drivers/nvme/host/rdma.c    | 58 +++++++++++++++++++++++++++++++++----
 drivers/nvme/host/tcp.c     |  2 +-
 drivers/nvme/target/loop.c  |  2 +-
 include/linux/blk-mq.h      |  8 +++++
 include/linux/blkdev.h      |  2 ++
 12 files changed, 128 insertions(+), 30 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/6] block: introduce blk_execute_rq_polled
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13  8:24   ` Christoph Hellwig
  2018-12-13 15:23   ` Steve Wise
  2018-12-13  6:38 ` [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll Sagi Grimberg
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

Used forsynchronous requests that needs polling. If we are knowingly
sending a request down to a poll queue, we need a synchronous interface
to poll for its completion.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-exec.c       | 29 +++++++++++++++++++++++++++++
 block/blk-mq.c         |  8 --------
 include/linux/blk-mq.h |  8 ++++++++
 include/linux/blkdev.h |  2 ++
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/block/blk-exec.c b/block/blk-exec.c
index a34b7d918742..572032d60001 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -90,3 +90,32 @@ void blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
 		wait_for_completion_io(&wait);
 }
 EXPORT_SYMBOL(blk_execute_rq);
+
+/**
+ * blk_execute_rq_polled - execute a request and poll for its completion
+ * @q:		queue to insert the request in
+ * @bd_disk:	matching gendisk
+ * @rq:		request to insert
+ * @at_head:    insert request at head or tail of queue
+ *
+ * Description:
+ *    Insert a fully prepared request at the back of the I/O scheduler queue
+ *    for execution and wait for completion.
+ */
+void blk_execute_rq_polled(struct request_queue *q, struct gendisk *bd_disk,
+		   struct request *rq, int at_head)
+{
+	DECLARE_COMPLETION_ONSTACK(wait);
+
+	WARN_ON_ONCE(!test_bit(QUEUE_FLAG_POLL, &q->queue_flags));
+
+	rq->cmd_flags |= REQ_HIPRI;
+	rq->end_io_data = &wait;
+	blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);
+
+	while (!completion_done(&wait)) {
+		blk_poll(q, request_to_qc_t(rq->mq_hctx, rq), true);
+		cond_resched();
+	}
+}
+EXPORT_SYMBOL(blk_execute_rq_polled);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 65770da99159..65d3f3a69c0d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1747,14 +1747,6 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio)
 	blk_account_io_start(rq, true);
 }
 
-static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
-{
-	if (rq->tag != -1)
-		return blk_tag_to_qc_t(rq->tag, hctx->queue_num, false);
-
-	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
-}
-
 static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 					    struct request *rq,
 					    blk_qc_t *cookie, bool last)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 57eda7b20243..c77cba1ec0f5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -357,4 +357,12 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
 	for ((i) = 0; (i) < (hctx)->nr_ctx &&				\
 	     ({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++)
 
+static inline blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
+{
+	if (rq->tag != -1)
+		return blk_tag_to_qc_t(rq->tag, hctx->queue_num, false);
+
+	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
+}
+
 #endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 81f1b105946b..9f48d8855916 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -860,6 +860,8 @@ extern void blk_execute_rq(struct request_queue *, struct gendisk *,
 			  struct request *, int);
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 				  struct request *, int, rq_end_io_fn *);
+void blk_execute_rq_polled(struct request_queue *q, struct gendisk *bd_disk,
+		   struct request *rq, int at_head);
 
 int blk_status_to_errno(blk_status_t status);
 blk_status_t errno_to_blk_status(int errno);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
  2018-12-13  6:38 ` [PATCH v2 1/6] block: introduce blk_execute_rq_polled Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13  8:27   ` Christoph Hellwig
  2018-12-13 15:19   ` Steve Wise
  2018-12-13  6:38 ` [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue " Sagi Grimberg
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

Pass poll bool to indicate that we need it to poll. This prepares us for
polling support in nvmf since connect is an I/O that will be queued
and has to be polled in order to complete.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c    | 13 ++++++++-----
 drivers/nvme/host/fabrics.c | 10 +++++-----
 drivers/nvme/host/nvme.h    |  2 +-
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f90576862736..eb1c10b0eaf0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -713,7 +713,7 @@ EXPORT_SYMBOL_GPL(nvme_setup_cmd);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
 		unsigned timeout, int qid, int at_head,
-		blk_mq_req_flags_t flags)
+		blk_mq_req_flags_t flags, bool poll)
 {
 	struct request *req;
 	int ret;
@@ -730,7 +730,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 			goto out;
 	}
 
-	blk_execute_rq(req->q, NULL, req, at_head);
+	if (poll)
+		blk_execute_rq_polled(req->q, NULL, req, at_head);
+	else
+		blk_execute_rq(req->q, NULL, req, at_head);
 	if (result)
 		*result = nvme_req(req)->result;
 	if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
@@ -747,7 +750,7 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buffer, unsigned bufflen)
 {
 	return __nvme_submit_sync_cmd(q, cmd, NULL, buffer, bufflen, 0,
-			NVME_QID_ANY, 0, 0);
+			NVME_QID_ANY, 0, 0, false);
 }
 EXPORT_SYMBOL_GPL(nvme_submit_sync_cmd);
 
@@ -1058,7 +1061,7 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
 	c.features.dword11 = cpu_to_le32(dword11);
 
 	ret = __nvme_submit_sync_cmd(dev->admin_q, &c, &res,
-			buffer, buflen, 0, NVME_QID_ANY, 0, 0);
+			buffer, buflen, 0, NVME_QID_ANY, 0, 0, false);
 	if (ret >= 0 && result)
 		*result = le32_to_cpu(res.u32);
 	return ret;
@@ -1703,7 +1706,7 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 	cmd.common.cdw10[1] = cpu_to_le32(len);
 
 	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
-				      ADMIN_TIMEOUT, NVME_QID_ANY, 1, 0);
+				      ADMIN_TIMEOUT, NVME_QID_ANY, 1, 0, false);
 }
 EXPORT_SYMBOL_GPL(nvme_sec_submit);
 #endif /* CONFIG_BLK_SED_OPAL */
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 19ff0eae4582..d93a169f6403 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -159,7 +159,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
 	cmd.prop_get.offset = cpu_to_le32(off);
 
 	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res, NULL, 0, 0,
-			NVME_QID_ANY, 0, 0);
+			NVME_QID_ANY, 0, 0, false);
 
 	if (ret >= 0)
 		*val = le64_to_cpu(res.u64);
@@ -206,7 +206,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
 	cmd.prop_get.offset = cpu_to_le32(off);
 
 	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res, NULL, 0, 0,
-			NVME_QID_ANY, 0, 0);
+			NVME_QID_ANY, 0, 0, false);
 
 	if (ret >= 0)
 		*val = le64_to_cpu(res.u64);
@@ -252,7 +252,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
 	cmd.prop_set.value = cpu_to_le64(val);
 
 	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, NULL, 0, 0,
-			NVME_QID_ANY, 0, 0);
+			NVME_QID_ANY, 0, 0, false);
 	if (unlikely(ret))
 		dev_err(ctrl->device,
 			"Property Set error: %d, offset %#x\n",
@@ -406,7 +406,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 
 	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res,
 			data, sizeof(*data), 0, NVME_QID_ANY, 1,
-			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
+			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, false);
 	if (ret) {
 		nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
 				       &cmd, data);
@@ -468,7 +468,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
 
 	ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &res,
 			data, sizeof(*data), 0, qid, 1,
-			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
+			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, false);
 	if (ret) {
 		nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
 				       &cmd, data);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 8e0ec365ce8d..75ed640a5ef0 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -444,7 +444,7 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
 		unsigned timeout, int qid, int at_head,
-		blk_mq_req_flags_t flags);
+		blk_mq_req_flags_t flags, bool flag);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue to poll
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
  2018-12-13  6:38 ` [PATCH v2 1/6] block: introduce blk_execute_rq_polled Sagi Grimberg
  2018-12-13  6:38 ` [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13  8:27   ` Christoph Hellwig
  2018-12-13 15:25   ` Steve Wise
  2018-12-13  6:38 ` [PATCH v2 4/6] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

Preparation for polling support for fabrics. Polling support
means that our completion queues are not generating any interrupts
which means we need to poll for the nvmf io queue connect as well.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c | 4 ++--
 drivers/nvme/host/fabrics.h | 2 +-
 drivers/nvme/host/fc.c      | 2 +-
 drivers/nvme/host/rdma.c    | 2 +-
 drivers/nvme/host/tcp.c     | 2 +-
 drivers/nvme/target/loop.c  | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index d93a169f6403..5ff14f46a8d9 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -441,7 +441,7 @@ EXPORT_SYMBOL_GPL(nvmf_connect_admin_queue);
  *	> 0: NVMe error status code
  *	< 0: Linux errno error code
  */
-int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
+int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid, bool poll)
 {
 	struct nvme_command cmd;
 	struct nvmf_connect_data *data;
@@ -468,7 +468,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
 
 	ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &res,
 			data, sizeof(*data), 0, qid, 1,
-			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, false);
+			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, poll);
 	if (ret) {
 		nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
 				       &cmd, data);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 81b8fd1c0c5d..cad70a9f976a 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -168,7 +168,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val);
 int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val);
 int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl);
-int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid);
+int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid, bool poll);
 int nvmf_register_transport(struct nvmf_transport_ops *ops);
 void nvmf_unregister_transport(struct nvmf_transport_ops *ops);
 void nvmf_free_options(struct nvmf_ctrl_options *opts);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index b79e41938513..89accc76d71c 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1975,7 +1975,7 @@ nvme_fc_connect_io_queues(struct nvme_fc_ctrl *ctrl, u16 qsize)
 					(qsize / 5));
 		if (ret)
 			break;
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, i);
+		ret = nvmf_connect_io_queue(&ctrl->ctrl, i, false);
 		if (ret)
 			break;
 
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ed726da77b5b..b907ed43814f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -598,7 +598,7 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
 	int ret;
 
 	if (idx)
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, idx);
+		ret = nvmf_connect_io_queue(&ctrl->ctrl, idx, false);
 	else
 		ret = nvmf_connect_admin_queue(&ctrl->ctrl);
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 51826479a41e..762a676403ef 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1393,7 +1393,7 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
 	int ret;
 
 	if (idx)
-		ret = nvmf_connect_io_queue(nctrl, idx);
+		ret = nvmf_connect_io_queue(nctrl, idx, false);
 	else
 		ret = nvmf_connect_admin_queue(nctrl);
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9908082b32c4..4aac1b4a8112 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -345,7 +345,7 @@ static int nvme_loop_connect_io_queues(struct nvme_loop_ctrl *ctrl)
 	int i, ret;
 
 	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, i);
+		ret = nvmf_connect_io_queue(&ctrl->ctrl, i, false);
 		if (ret)
 			return ret;
 		set_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[i].flags);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/6] nvme-fabrics: allow user to pass in nr_poll_queues
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
                   ` (2 preceding siblings ...)
  2018-12-13  6:38 ` [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue " Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13 15:26   ` Steve Wise
  2018-12-13  6:38 ` [PATCH v2 5/6] nvme-rdma: implement polling queue map Sagi Grimberg
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

This argument will specify how many polling I/O queues
to connect when creating the controller. These I/O queues
will host I/O that is set with REQ_HIPRI.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c | 13 +++++++++++++
 drivers/nvme/host/fabrics.h |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 5ff14f46a8d9..b2ab213f43de 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -617,6 +617,7 @@ static const match_table_t opt_tokens = {
 	{ NVMF_OPT_HDR_DIGEST,		"hdr_digest"		},
 	{ NVMF_OPT_DATA_DIGEST,		"data_digest"		},
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
+	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -850,6 +851,18 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->nr_write_queues = token;
 			break;
+		case NVMF_OPT_NR_POLL_QUEUES:
+			if (match_int(args, &token)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			if (token <= 0) {
+				pr_err("Invalid nr_poll_queues %d\n", token);
+				ret = -EINVAL;
+				goto out;
+			}
+			opts->nr_poll_queues = token;
+			break;
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index cad70a9f976a..478343b73e38 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -62,6 +62,7 @@ enum {
 	NVMF_OPT_HDR_DIGEST	= 1 << 15,
 	NVMF_OPT_DATA_DIGEST	= 1 << 16,
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
+	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 };
 
 /**
@@ -93,6 +94,7 @@ enum {
  * @hdr_digest: generate/verify header digest (TCP)
  * @data_digest: generate/verify data digest (TCP)
  * @nr_write_queues: number of queues for write I/O
+ * @nr_poll_queues: number of queues for polling I/O
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -113,6 +115,7 @@ struct nvmf_ctrl_options {
 	bool			hdr_digest;
 	bool			data_digest;
 	unsigned int		nr_write_queues;
+	unsigned int		nr_poll_queues;
 };
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/6] nvme-rdma: implement polling queue map
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
                   ` (3 preceding siblings ...)
  2018-12-13  6:38 ` [PATCH v2 4/6] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13 15:28   ` Steve Wise
  2018-12-13  6:38 ` [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
  2018-12-13  6:38 ` [PATCH v2 nvme-cli 7/6] fabrics: pass in number of polling queues Sagi Grimberg
  6 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

When passed with nr_poll_queues setup additional queues with cq polling
context IB_POLL_DIRECT (no interrupts) and make sure to set
QUEUE_FLAG_POLL on the connect_q. In addition add the third queue
mapping for polling queues.

nvmf connect on this queue is polled for like all other requests so make
nvmf_connect_io_queue poll for polling queues.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 58 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index b907ed43814f..80b3113b45fb 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -162,6 +162,13 @@ static inline int nvme_rdma_queue_idx(struct nvme_rdma_queue *queue)
 	return queue - queue->ctrl->queues;
 }
 
+static bool nvme_rdma_poller_queue(struct nvme_rdma_queue *queue)
+{
+	return nvme_rdma_queue_idx(queue) >
+		queue->ctrl->ctrl.opts->nr_io_queues +
+		queue->ctrl->ctrl.opts->nr_write_queues;
+}
+
 static inline size_t nvme_rdma_inline_data_size(struct nvme_rdma_queue *queue)
 {
 	return queue->cmnd_capsule_len - sizeof(struct nvme_command);
@@ -440,6 +447,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	const int send_wr_factor = 3;			/* MR, SEND, INV */
 	const int cq_factor = send_wr_factor + 1;	/* + RECV */
 	int comp_vector, idx = nvme_rdma_queue_idx(queue);
+	enum ib_poll_context poll_ctx;
 	int ret;
 
 	queue->device = nvme_rdma_find_get_device(queue->cm_id);
@@ -456,10 +464,16 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	 */
 	comp_vector = idx == 0 ? idx : idx - 1;
 
+	/* Polling queues need direct cq polling context */
+	if (nvme_rdma_poller_queue(queue))
+		poll_ctx = IB_POLL_DIRECT;
+	else
+		poll_ctx = IB_POLL_SOFTIRQ;
+
 	/* +1 for ib_stop_cq */
 	queue->ib_cq = ib_alloc_cq(ibdev, queue,
 				cq_factor * queue->queue_size + 1,
-				comp_vector, IB_POLL_SOFTIRQ);
+				comp_vector, poll_ctx);
 	if (IS_ERR(queue->ib_cq)) {
 		ret = PTR_ERR(queue->ib_cq);
 		goto out_put_dev;
@@ -595,15 +609,17 @@ static void nvme_rdma_stop_io_queues(struct nvme_rdma_ctrl *ctrl)
 
 static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
 {
+	struct nvme_rdma_queue *queue = &ctrl->queues[idx];
+	bool poll = nvme_rdma_poller_queue(queue);
 	int ret;
 
 	if (idx)
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, idx, false);
+		ret = nvmf_connect_io_queue(&ctrl->ctrl, idx, poll);
 	else
 		ret = nvmf_connect_admin_queue(&ctrl->ctrl);
 
 	if (!ret)
-		set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[idx].flags);
+		set_bit(NVME_RDMA_Q_LIVE, &queue->flags);
 	else
 		dev_info(ctrl->ctrl.device,
 			"failed to connect queue: %d ret=%d\n", idx, ret);
@@ -646,6 +662,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 				ibdev->num_comp_vectors);
 
 	nr_io_queues += min(opts->nr_write_queues, num_online_cpus());
+	nr_io_queues += min(opts->nr_poll_queues, num_online_cpus());
 
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
@@ -716,7 +733,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
-		set->nr_maps = 2 /* default + read */;
+		set->nr_maps = HCTX_MAX_TYPES;
 	}
 
 	ret = blk_mq_alloc_tag_set(set);
@@ -864,6 +881,10 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 			ret = PTR_ERR(ctrl->ctrl.connect_q);
 			goto out_free_tag_set;
 		}
+
+		if (ctrl->ctrl.opts->nr_poll_queues)
+			blk_queue_flag_set(QUEUE_FLAG_POLL,
+				ctrl->ctrl.connect_q);
 	} else {
 		blk_mq_update_nr_hw_queues(&ctrl->tag_set,
 			ctrl->ctrl.queue_count - 1);
@@ -1742,6 +1763,14 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_IOERR;
 }
 
+static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
+{
+	struct nvme_rdma_queue *queue = hctx->driver_data;
+	struct ib_cq *cq = queue->ib_cq;
+
+	return ib_process_cq_direct(cq, -1);
+}
+
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
@@ -1772,6 +1801,21 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 			ctrl->device->dev, 0);
 	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
 			ctrl->device->dev, 0);
+
+	if (ctrl->ctrl.opts->nr_poll_queues) {
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->ctrl.opts->nr_poll_queues;
+		set->map[HCTX_TYPE_POLL].queue_offset =
+				ctrl->ctrl.opts->nr_io_queues;
+		if (ctrl->ctrl.opts->nr_write_queues)
+			set->map[HCTX_TYPE_POLL].queue_offset +=
+				ctrl->ctrl.opts->nr_write_queues;
+	} else {
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->ctrl.opts->nr_io_queues;
+		set->map[HCTX_TYPE_POLL].queue_offset = 0;
+	}
+	blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
 	return 0;
 }
 
@@ -1783,6 +1827,7 @@ static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.init_hctx	= nvme_rdma_init_hctx,
 	.timeout	= nvme_rdma_timeout,
 	.map_queues	= nvme_rdma_map_queues,
+	.poll		= nvme_rdma_poll,
 };
 
 static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
@@ -1927,7 +1972,8 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work);
 	INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
 
-	ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues + 1;
+	ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues +
+				opts->nr_poll_queues + 1;
 	ctrl->ctrl.sqsize = opts->queue_size - 1;
 	ctrl->ctrl.kato = opts->kato;
 
@@ -1979,7 +2025,7 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
 	.required_opts	= NVMF_OPT_TRADDR,
 	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
 			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
-			  NVMF_OPT_NR_WRITE_QUEUES,
+			  NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES,
 	.create_ctrl	= nvme_rdma_create_ctrl,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
                   ` (4 preceding siblings ...)
  2018-12-13  6:38 ` [PATCH v2 5/6] nvme-rdma: implement polling queue map Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13  8:31   ` Christoph Hellwig
  2018-12-13 15:28   ` Steve Wise
  2018-12-13  6:38 ` [PATCH v2 nvme-cli 7/6] fabrics: pass in number of polling queues Sagi Grimberg
  6 siblings, 2 replies; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

Since the multipath device does not support polling (yet) we cannot
pass requests to the polling queue map as those will not generate
interrupt so we cannot reap the completion.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index eb1c10b0eaf0..5a6c29ee669c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1550,6 +1550,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->head->disk) {
 		nvme_update_disk_info(ns->head->disk, ns, id);
 		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
+		/* multipath device does not support polling */
+		blk_queue_flag_clear(QUEUE_FLAG_POLL, ns->queue);
 	}
 #endif
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 nvme-cli 7/6] fabrics: pass in number of polling queues
  2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
                   ` (5 preceding siblings ...)
  2018-12-13  6:38 ` [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
@ 2018-12-13  6:38 ` Sagi Grimberg
  2018-12-13 15:29   ` Steve Wise
  6 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13  6:38 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe

nr_poll_queues specifies the number of additional queues that will
be connected for hosting polling latency critical I/O.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 Documentation/nvme-connect.txt |  5 +++++
 fabrics.c                      | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/nvme-connect.txt b/Documentation/nvme-connect.txt
index d4a0e6678475..5412472dbd35 100644
--- a/Documentation/nvme-connect.txt
+++ b/Documentation/nvme-connect.txt
@@ -17,6 +17,7 @@ SYNOPSIS
 		[--hostnqn=<hostnqn>      | -q <hostnqn>]
 		[--nr-io-queues=<#>       | -i <#>]
 		[--nr-write-queues=<#>    | -W <#>]
+		[--nr-poll-queues=<#>     | -P <#>]
 		[--queue-size=<#>         | -Q <#>]
 		[--keep-alive-tmo=<#>     | -k <#>]
 		[--reconnect-delay=<#>    | -c <#>]
@@ -80,6 +81,10 @@ OPTIONS
 --nr-write-queues=<#>::
 	Adds additional queues that will be used for write I/O.
 
+-P <#>::
+--nr-poll-queues=<#>::
+	Adds additional queues that will be used for polling latency sensitive I/O.
+
 -Q <#>::
 --queue-size=<#>::
 	Overrides the default number of elements in the I/O queues created
diff --git a/fabrics.c b/fabrics.c
index 5cc1a795345a..efe2ce89cdca 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -54,6 +54,7 @@ static struct config {
 	char *hostid;
 	int  nr_io_queues;
 	int  nr_write_queues;
+	int  nr_poll_queues;
 	int  queue_size;
 	int  keep_alive_tmo;
 	int  reconnect_delay;
@@ -624,6 +625,8 @@ static int build_options(char *argstr, int max_len)
 				cfg.nr_io_queues) ||
 	    add_int_argument(&argstr, &max_len, "nr_write_queues",
 				cfg.nr_write_queues) ||
+	    add_int_argument(&argstr, &max_len, "nr_poll_queues",
+				cfg.nr_poll_queues) ||
 	    add_int_argument(&argstr, &max_len, "queue_size", cfg.queue_size) ||
 	    add_int_argument(&argstr, &max_len, "keep_alive_tmo",
 				cfg.keep_alive_tmo) ||
@@ -704,6 +707,13 @@ retry:
 		p += len;
 	}
 
+	if (cfg.nr_poll_queues) {
+		len = sprintf(p, ",nr_poll_queues=%d", cfg.nr_poll_queues);
+		if (len < 0)
+			return -EINVAL;
+		p += len;
+	}
+
 	if (cfg.host_traddr) {
 		len = sprintf(p, ",host_traddr=%s", cfg.host_traddr);
 		if (len < 0)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] block: introduce blk_execute_rq_polled
  2018-12-13  6:38 ` [PATCH v2 1/6] block: introduce blk_execute_rq_polled Sagi Grimberg
@ 2018-12-13  8:24   ` Christoph Hellwig
  2018-12-13 15:23   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-12-13  8:24 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-rdma, Christoph Hellwig,
	Keith Busch, Jens Axboe

On Wed, Dec 12, 2018 at 10:38:13PM -0800, Sagi Grimberg wrote:
> Used forsynchronous requests that needs polling. If we are knowingly
> sending a request down to a poll queue, we need a synchronous interface
> to poll for its completion.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  block/blk-exec.c       | 29 +++++++++++++++++++++++++++++
>  block/blk-mq.c         |  8 --------
>  include/linux/blk-mq.h |  8 ++++++++
>  include/linux/blkdev.h |  2 ++
>  4 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index a34b7d918742..572032d60001 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -90,3 +90,32 @@ void blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
>  		wait_for_completion_io(&wait);
>  }
>  EXPORT_SYMBOL(blk_execute_rq);
> +
> +/**
> + * blk_execute_rq_polled - execute a request and poll for its completion
> + * @q:		queue to insert the request in
> + * @bd_disk:	matching gendisk
> + * @rq:		request to insert
> + * @at_head:    insert request at head or tail of queue
> + *
> + * Description:
> + *    Insert a fully prepared request at the back of the I/O scheduler queue
> + *    for execution and wait for completion.
> + */
> +void blk_execute_rq_polled(struct request_queue *q, struct gendisk *bd_disk,
> +		   struct request *rq, int at_head)
> +{
> +	DECLARE_COMPLETION_ONSTACK(wait);
> +
> +	WARN_ON_ONCE(!test_bit(QUEUE_FLAG_POLL, &q->queue_flags));
> +
> +	rq->cmd_flags |= REQ_HIPRI;
> +	rq->end_io_data = &wait;
> +	blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);
> +
> +	while (!completion_done(&wait)) {
> +		blk_poll(q, request_to_qc_t(rq->mq_hctx, rq), true);
> +		cond_resched();
> +	}

Can we just open code this in nvme for now?

> +static inline blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)

Too long line.

> +{
> +	if (rq->tag != -1)
> +		return blk_tag_to_qc_t(rq->tag, hctx->queue_num, false);
> +
> +	return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true);
> +}

Also these are only two users of blk_tag_to_qc_t, it might be worth
to fold it into request_to_qc_t:

static inline blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx,
		struct request *rq)
{
	if (rq->tag != -1)
		return rq->tag | (hctx->queue_num << BLK_QC_T_SHIFT);
	return rq->internal_tag | (hctx->queue_num << BLK_QC_T_SHIFT) |
			BLK_QC_T_INTERNAL;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll
  2018-12-13  6:38 ` [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll Sagi Grimberg
@ 2018-12-13  8:27   ` Christoph Hellwig
  2018-12-13 15:19   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-12-13  8:27 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-rdma, Christoph Hellwig,
	Keith Busch, Jens Axboe

On Wed, Dec 12, 2018 at 10:38:14PM -0800, Sagi Grimberg wrote:
> Pass poll bool to indicate that we need it to poll. This prepares us for
> polling support in nvmf since connect is an I/O that will be queued
> and has to be polled in order to complete.

I hate these bool parameters.  Maybe we just need a blk_execute_rq_flags
that instead of the at_head paramter takes a flags paramter.  I guess
for now this is ok, but I don't really like it:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue to poll
  2018-12-13  6:38 ` [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue " Sagi Grimberg
@ 2018-12-13  8:27   ` Christoph Hellwig
  2018-12-13 15:25   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-12-13  8:27 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-rdma, Christoph Hellwig,
	Keith Busch, Jens Axboe

On Wed, Dec 12, 2018 at 10:38:15PM -0800, Sagi Grimberg wrote:
> Preparation for polling support for fabrics. Polling support
> means that our completion queues are not generating any interrupts
> which means we need to poll for the nvmf io queue connect as well.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13  6:38 ` [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
@ 2018-12-13  8:31   ` Christoph Hellwig
  2018-12-13 15:49     ` Sagi Grimberg
  2018-12-13 15:28   ` Steve Wise
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2018-12-13  8:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-rdma, Christoph Hellwig,
	Keith Busch, Jens Axboe

On Wed, Dec 12, 2018 at 10:38:18PM -0800, Sagi Grimberg wrote:
> Since the multipath device does not support polling (yet) we cannot
> pass requests to the polling queue map as those will not generate
> interrupt so we cannot reap the completion.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

As said before I don't think this is the right place.  We need
to handle the stacking case in the block layer.  Something like this
untested patch below should do the trick:

--
From b810e53e8a9ec6f62202c2ad65e6e56277bcace7 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Thu, 13 Dec 2018 09:30:13 +0100
Subject: block: clear REQ_HIPRI if polling is not supported

This prevents a HIPRI bio from being submitted through a stacking
driver that does not support polling and thus won't poll for I/O
completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 268d2b8e9843..efa10789ddc0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -921,6 +921,9 @@ generic_make_request_checks(struct bio *bio)
 		}
 	}
 
+	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+		bio->bi_opf &= ~REQ_HIPRI;
+
 	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
 		if (!blk_queue_discard(q))
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll
  2018-12-13  6:38 ` [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll Sagi Grimberg
  2018-12-13  8:27   ` Christoph Hellwig
@ 2018-12-13 15:19   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:19 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> Pass poll bool to indicate that we need it to poll. This prepares us for
> polling support in nvmf since connect is an I/O that will be queued
> and has to be polled in order to complete.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/core.c    | 13 ++++++++-----
>  drivers/nvme/host/fabrics.c | 10 +++++-----
>  drivers/nvme/host/nvme.h    |  2 +-
>  3 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f90576862736..eb1c10b0eaf0 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -713,7 +713,7 @@ EXPORT_SYMBOL_GPL(nvme_setup_cmd);
>  int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  		union nvme_result *result, void *buffer, unsigned bufflen,
>  		unsigned timeout, int qid, int at_head,
> -		blk_mq_req_flags_t flags)
> +		blk_mq_req_flags_t flags, bool poll)
>  {
>  	struct request *req;
>  	int ret;
> @@ -730,7 +730,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  			goto out;
>  	}
>  
> -	blk_execute_rq(req->q, NULL, req, at_head);
> +	if (poll)
> +		blk_execute_rq_polled(req->q, NULL, req, at_head);
> +	else
> +		blk_execute_rq(req->q, NULL, req, at_head);
>  	if (result)
>  		*result = nvme_req(req)->result;
>  	if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
> @@ -747,7 +750,7 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  		void *buffer, unsigned bufflen)
>  {
>  	return __nvme_submit_sync_cmd(q, cmd, NULL, buffer, bufflen, 0,
> -			NVME_QID_ANY, 0, 0);
> +			NVME_QID_ANY, 0, 0, false);
>  }
>  EXPORT_SYMBOL_GPL(nvme_submit_sync_cmd);
>  
> @@ -1058,7 +1061,7 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
>  	c.features.dword11 = cpu_to_le32(dword11);
>  
>  	ret = __nvme_submit_sync_cmd(dev->admin_q, &c, &res,
> -			buffer, buflen, 0, NVME_QID_ANY, 0, 0);
> +			buffer, buflen, 0, NVME_QID_ANY, 0, 0, false);
>  	if (ret >= 0 && result)
>  		*result = le32_to_cpu(res.u32);
>  	return ret;
> @@ -1703,7 +1706,7 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>  	cmd.common.cdw10[1] = cpu_to_le32(len);
>  
>  	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
> -				      ADMIN_TIMEOUT, NVME_QID_ANY, 1, 0);
> +				      ADMIN_TIMEOUT, NVME_QID_ANY, 1, 0, false);
>  }
>  EXPORT_SYMBOL_GPL(nvme_sec_submit);
>  #endif /* CONFIG_BLK_SED_OPAL */
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 19ff0eae4582..d93a169f6403 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -159,7 +159,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
>  	cmd.prop_get.offset = cpu_to_le32(off);
>  
>  	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res, NULL, 0, 0,
> -			NVME_QID_ANY, 0, 0);
> +			NVME_QID_ANY, 0, 0, false);
>  
>  	if (ret >= 0)
>  		*val = le64_to_cpu(res.u64);
> @@ -206,7 +206,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
>  	cmd.prop_get.offset = cpu_to_le32(off);
>  
>  	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res, NULL, 0, 0,
> -			NVME_QID_ANY, 0, 0);
> +			NVME_QID_ANY, 0, 0, false);
>  
>  	if (ret >= 0)
>  		*val = le64_to_cpu(res.u64);
> @@ -252,7 +252,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
>  	cmd.prop_set.value = cpu_to_le64(val);
>  
>  	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, NULL, 0, 0,
> -			NVME_QID_ANY, 0, 0);
> +			NVME_QID_ANY, 0, 0, false);
>  	if (unlikely(ret))
>  		dev_err(ctrl->device,
>  			"Property Set error: %d, offset %#x\n",
> @@ -406,7 +406,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>  
>  	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res,
>  			data, sizeof(*data), 0, NVME_QID_ANY, 1,
> -			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
> +			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, false);
>  	if (ret) {
>  		nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
>  				       &cmd, data);
> @@ -468,7 +468,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
>  
>  	ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &res,
>  			data, sizeof(*data), 0, qid, 1,
> -			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
> +			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, false);
>  	if (ret) {
>  		nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
>  				       &cmd, data);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 8e0ec365ce8d..75ed640a5ef0 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -444,7 +444,7 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>  		union nvme_result *result, void *buffer, unsigned bufflen,
>  		unsigned timeout, int qid, int at_head,
> -		blk_mq_req_flags_t flags);
> +		blk_mq_req_flags_t flags, bool flag);


Shouldn't the above be 'bool poll'?


Reviewed-by: Steve Wise <swise@opengridcomputing.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] block: introduce blk_execute_rq_polled
  2018-12-13  6:38 ` [PATCH v2 1/6] block: introduce blk_execute_rq_polled Sagi Grimberg
  2018-12-13  8:24   ` Christoph Hellwig
@ 2018-12-13 15:23   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:23 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> Used forsynchronous requests that needs polling. If we are knowingly

nit:  "Used for synchronous requests that need polling."


> sending a request down to a poll queue, we need a synchronous interface
> to poll for its completion.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


Reviewed-by: Steve Wise <swise@opengricomputing.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue to poll
  2018-12-13  6:38 ` [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue " Sagi Grimberg
  2018-12-13  8:27   ` Christoph Hellwig
@ 2018-12-13 15:25   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:25 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> Preparation for polling support for fabrics. Polling support
> means that our completion queues are not generating any interrupts
> which means we need to poll for the nvmf io queue connect as well.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Looks good.

Reviewed by Steve Wise <swise@opengridcomputing.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/6] nvme-fabrics: allow user to pass in nr_poll_queues
  2018-12-13  6:38 ` [PATCH v2 4/6] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
@ 2018-12-13 15:26   ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:26 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> This argument will specify how many polling I/O queues
> to connect when creating the controller. These I/O queues
> will host I/O that is set with REQ_HIPRI.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Is there a default if none is specified on the 'nvme connect' command?

Reviewed-by: Steve Wise <swise@opengridcomputing.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] nvme-rdma: implement polling queue map
  2018-12-13  6:38 ` [PATCH v2 5/6] nvme-rdma: implement polling queue map Sagi Grimberg
@ 2018-12-13 15:28   ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:28 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> When passed with nr_poll_queues setup additional queues with cq polling
> context IB_POLL_DIRECT (no interrupts) and make sure to set
> QUEUE_FLAG_POLL on the connect_q. In addition add the third queue
> mapping for polling queues.
>
> nvmf connect on this queue is polled for like all other requests so make
> nvmf_connect_io_queue poll for polling queues.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


Looks good.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13  6:38 ` [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
  2018-12-13  8:31   ` Christoph Hellwig
@ 2018-12-13 15:28   ` Steve Wise
  1 sibling, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:28 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> Since the multipath device does not support polling (yet) we cannot
> pass requests to the polling queue map as those will not generate
> interrupt so we cannot reap the completion.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Looks good.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 nvme-cli 7/6] fabrics: pass in number of polling queues
  2018-12-13  6:38 ` [PATCH v2 nvme-cli 7/6] fabrics: pass in number of polling queues Sagi Grimberg
@ 2018-12-13 15:29   ` Steve Wise
  0 siblings, 0 replies; 23+ messages in thread
From: Steve Wise @ 2018-12-13 15:29 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch, Jens Axboe


On 12/13/2018 12:38 AM, Sagi Grimberg wrote:
> nr_poll_queues specifies the number of additional queues that will
> be connected for hosting polling latency critical I/O.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


Looks good.

Signed-off-by: Steve Wise <swise@opengridcomputing.com



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13  8:31   ` Christoph Hellwig
@ 2018-12-13 15:49     ` Sagi Grimberg
  2018-12-13 15:52       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13 15:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, linux-rdma, linux-nvme, linux-block


> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
> +		bio->bi_opf &= ~REQ_HIPRI;
> +

Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
avoid the extra atomic operation in the host path?

Would it make sense?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13 15:49     ` Sagi Grimberg
@ 2018-12-13 15:52       ` Christoph Hellwig
  2018-12-13 16:14         ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2018-12-13 15:52 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, linux-rdma,
	linux-nvme, linux-block

On Thu, Dec 13, 2018 at 07:49:42AM -0800, Sagi Grimberg wrote:
>
>> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
>> +		bio->bi_opf &= ~REQ_HIPRI;
>> +
>
> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
> avoid the extra atomic operation in the host path?
>
> Would it make sense?

test_bit is not usually implemented as an atomic operation.

Take a look at e.g.

arch/x86/include/asm/bitops.h:constant_test_bit()

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13 15:52       ` Christoph Hellwig
@ 2018-12-13 16:14         ` Sagi Grimberg
  2018-12-13 20:13           ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2018-12-13 16:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, linux-rdma, linux-nvme, linux-block

>>> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
>>> +		bio->bi_opf &= ~REQ_HIPRI;
>>> +
>>
>> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
>> avoid the extra atomic operation in the host path?
>>
>> Would it make sense?
> 
> test_bit is not usually implemented as an atomic operation.
> 
> Take a look at e.g.
> 
> arch/x86/include/asm/bitops.h:constant_test_bit()

Ah.. But its still read from volatile argument so still more expensive?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-13 16:14         ` Sagi Grimberg
@ 2018-12-13 20:13           ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2018-12-13 20:13 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, linux-rdma,
	linux-nvme, linux-block

On Thu, Dec 13, 2018 at 08:14:57AM -0800, Sagi Grimberg wrote:
>>>> +	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
>>>> +		bio->bi_opf &= ~REQ_HIPRI;
>>>> +
>>>
>>> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and
>>> avoid the extra atomic operation in the host path?
>>>
>>> Would it make sense?
>>
>> test_bit is not usually implemented as an atomic operation.
>>
>> Take a look at e.g.
>>
>> arch/x86/include/asm/bitops.h:constant_test_bit()
>
> Ah.. But its still read from volatile argument so still more expensive?

I don't think the volatile should make a difference.  I actually
compiled both versions and the test_bit version generates a movq + testl
insted of testb:

-	movq	120(%rbx), %rdx	# MEM[(const long unsigned int *)q_38 + 120B], _135
-	testl	$524288, %edx	#, _135
+	testb	$8, 122(%rbx)	#, q_40->queue_flags

But actually generates a larger object:

 36966	   9470	     88	  46524	   b5bc	blk-core.o-opencode
 36956	   9470	     88	  46514	   b5b2	blk-core.o-test-bit

No idea what is going there.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-12-13 20:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-13  6:38 [PATCH v2 0/6] restore nvme-rdma polling Sagi Grimberg
2018-12-13  6:38 ` [PATCH v2 1/6] block: introduce blk_execute_rq_polled Sagi Grimberg
2018-12-13  8:24   ` Christoph Hellwig
2018-12-13 15:23   ` Steve Wise
2018-12-13  6:38 ` [PATCH v2 2/6] nvme-core: allow __nvme_submit_sync_cmd to poll Sagi Grimberg
2018-12-13  8:27   ` Christoph Hellwig
2018-12-13 15:19   ` Steve Wise
2018-12-13  6:38 ` [PATCH v2 3/6] nvme-fabrics: allow nvmf_connect_io_queue " Sagi Grimberg
2018-12-13  8:27   ` Christoph Hellwig
2018-12-13 15:25   ` Steve Wise
2018-12-13  6:38 ` [PATCH v2 4/6] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
2018-12-13 15:26   ` Steve Wise
2018-12-13  6:38 ` [PATCH v2 5/6] nvme-rdma: implement polling queue map Sagi Grimberg
2018-12-13 15:28   ` Steve Wise
2018-12-13  6:38 ` [PATCH v2 6/6] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
2018-12-13  8:31   ` Christoph Hellwig
2018-12-13 15:49     ` Sagi Grimberg
2018-12-13 15:52       ` Christoph Hellwig
2018-12-13 16:14         ` Sagi Grimberg
2018-12-13 20:13           ` Christoph Hellwig
2018-12-13 15:28   ` Steve Wise
2018-12-13  6:38 ` [PATCH v2 nvme-cli 7/6] fabrics: pass in number of polling queues Sagi Grimberg
2018-12-13 15:29   ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).