linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] restore polling to nvme-rdma
@ 2018-12-11 23:36 Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 1/4] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:36 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch

Add an additional queue mapping for polling queues that will
host polling for latency critical I/O.

One caveat is that we don't want these queues to be pure polling
as we don't want to bother with polling for the initial nvmf connect
I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling
context from SOFTIRQ to DIRECT. Note that this function is not safe
with inflight I/O so the caller must make sure not to call it without
having all I/O quiesced (we also relax the ib_cq_completion_direct warning
as we have a scenario that this can happen).

With that, we simply defer blk_poll callout to ib_process_cq_direct and
we're done. One thing that might worth adding is some kind of ignore
regexp of some sort because we don't want to give up polling because we
consumed memory registration completions. As for now, we might break the
polling early do to that.

Finally, we turn off polling support for nvme-multipath as it won't invoke
polling and our completion queues no longer generates any interrupts for
it. I didn't come up with a good way to get around it so far...

Sagi Grimberg (4):
  nvme-fabrics: allow user to pass in nr_poll_queues
  rdma: introduce ib_change_cq_ctx
  nvme-rdma: implement polling queue map
  nvme-multipath: disable polling for underlying namespace request queue

 drivers/infiniband/core/cq.c | 102 ++++++++++++++++++++++++-----------
 drivers/nvme/host/core.c     |   2 +
 drivers/nvme/host/fabrics.c  |  16 +++++-
 drivers/nvme/host/fabrics.h  |   3 ++
 drivers/nvme/host/rdma.c     |  35 +++++++++++-
 include/rdma/ib_verbs.h      |   1 +
 6 files changed, 124 insertions(+), 35 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 1/4] nvme-fabrics: allow user to pass in nr_poll_queues
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
@ 2018-12-11 23:36 ` Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 2/4] rdma: introduce ib_change_cq_ctx Sagi Grimberg
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:36 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch

This argument will specify how many polling I/O queues
to connect when creating the controller. These I/O queues
will host I/O that is set with REQ_HIPRI.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c | 16 +++++++++++++++-
 drivers/nvme/host/fabrics.h |  3 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 066c3a02e08b..f43bd1eada1c 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -617,6 +617,7 @@ static const match_table_t opt_tokens = {
 	{ NVMF_OPT_HDR_DIGEST,		"hdr_digest"		},
 	{ NVMF_OPT_DATA_DIGEST,		"data_digest"		},
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
+	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -850,6 +851,18 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->nr_write_queues = token;
 			break;
+		case NVMF_OPT_NR_POLL_QUEUES:
+			if (match_int(args, &token)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			if (token <= 0) {
+				pr_err("Invalid nr_poll_queues %d\n", token);
+				ret = -EINVAL;
+				goto out;
+			}
+			opts->nr_poll_queues = token;
+			break;
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
@@ -967,7 +980,8 @@ EXPORT_SYMBOL_GPL(nvmf_free_options);
 #define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
 				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
 				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
-				 NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_NR_WRITE_QUEUES)
+				 NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_NR_WRITE_QUEUES | \
+				 NVMF_OPT_NR_POLL_QUEUES)
 
 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 81b8fd1c0c5d..36f5daf05f08 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -62,6 +62,7 @@ enum {
 	NVMF_OPT_HDR_DIGEST	= 1 << 15,
 	NVMF_OPT_DATA_DIGEST	= 1 << 16,
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
+	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 };
 
 /**
@@ -93,6 +94,7 @@ enum {
  * @hdr_digest: generate/verify header digest (TCP)
  * @data_digest: generate/verify data digest (TCP)
  * @nr_write_queues: number of queues for write I/O
+ * @nr_poll_queues: number of queues for polling I/O
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -113,6 +115,7 @@ struct nvmf_ctrl_options {
 	bool			hdr_digest;
 	bool			data_digest;
 	unsigned int		nr_write_queues;
+	unsigned int		nr_poll_queues;
 };
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 2/4] rdma: introduce ib_change_cq_ctx
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 1/4] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
@ 2018-12-11 23:36 ` Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 3/4] nvme-rdma: implement polling queue map Sagi Grimberg
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:36 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch

Allow cq consumers to modify the cq polling context online. The
consumer might want to allocate the cq with softirq/workqueue polling
context for async (setup time) I/O, and when completed, switch the
polling context to direct polling and get all the interrupts out
of the way.

One example is nvme-rdma driver that hooks into the block layer
infrastructure for a polling queue map for latency sensitive I/O.
Every nvmf queue starts with a connect message that is the slow path
at setup time, and there is no need for polling (it is actually
hurtful). Instead, allocate the polling queue cq with IB_POLL_SOFTIRQ
and switch it to IB_POLL_DIRECT where it makes sense.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/infiniband/core/cq.c | 102 ++++++++++++++++++++++++-----------
 include/rdma/ib_verbs.h      |   1 +
 2 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index b1e5365ddafa..c820eb954edc 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -80,7 +80,7 @@ EXPORT_SYMBOL(ib_process_cq_direct);
 
 static void ib_cq_completion_direct(struct ib_cq *cq, void *private)
 {
-	WARN_ONCE(1, "got unsolicited completion for CQ 0x%p\n", cq);
+	pr_debug("got unsolicited completion for CQ 0x%p\n", cq);
 }
 
 static int ib_poll_handler(struct irq_poll *iop, int budget)
@@ -120,6 +120,33 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
 	queue_work(cq->comp_wq, &cq->work);
 }
 
+static int __ib_cq_set_ctx(struct ib_cq *cq)
+{
+	switch (cq->poll_ctx) {
+	case IB_POLL_DIRECT:
+		cq->comp_handler = ib_cq_completion_direct;
+		break;
+	case IB_POLL_SOFTIRQ:
+		cq->comp_handler = ib_cq_completion_softirq;
+
+		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
+		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+		break;
+	case IB_POLL_WORKQUEUE:
+	case IB_POLL_UNBOUND_WORKQUEUE:
+		cq->comp_handler = ib_cq_completion_workqueue;
+		INIT_WORK(&cq->work, ib_cq_poll_work);
+		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+		cq->comp_wq = (cq->poll_ctx == IB_POLL_WORKQUEUE) ?
+				ib_comp_wq : ib_comp_unbound_wq;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * __ib_alloc_cq - allocate a completion queue
  * @dev:		device to allocate the CQ for
@@ -164,28 +191,9 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 	rdma_restrack_set_task(&cq->res, caller);
 	rdma_restrack_add(&cq->res);
 
-	switch (cq->poll_ctx) {
-	case IB_POLL_DIRECT:
-		cq->comp_handler = ib_cq_completion_direct;
-		break;
-	case IB_POLL_SOFTIRQ:
-		cq->comp_handler = ib_cq_completion_softirq;
-
-		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
-		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-		break;
-	case IB_POLL_WORKQUEUE:
-	case IB_POLL_UNBOUND_WORKQUEUE:
-		cq->comp_handler = ib_cq_completion_workqueue;
-		INIT_WORK(&cq->work, ib_cq_poll_work);
-		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-		cq->comp_wq = (cq->poll_ctx == IB_POLL_WORKQUEUE) ?
-				ib_comp_wq : ib_comp_unbound_wq;
-		break;
-	default:
-		ret = -EINVAL;
+	ret = __ib_cq_set_ctx(cq);
+	if (ret)
 		goto out_free_wc;
-	}
 
 	return cq;
 
@@ -198,17 +206,8 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 }
 EXPORT_SYMBOL(__ib_alloc_cq);
 
-/**
- * ib_free_cq - free a completion queue
- * @cq:		completion queue to free.
- */
-void ib_free_cq(struct ib_cq *cq)
+static void __ib_cq_clear_ctx(struct ib_cq *cq)
 {
-	int ret;
-
-	if (WARN_ON_ONCE(atomic_read(&cq->usecnt)))
-		return;
-
 	switch (cq->poll_ctx) {
 	case IB_POLL_DIRECT:
 		break;
@@ -222,6 +221,20 @@ void ib_free_cq(struct ib_cq *cq)
 	default:
 		WARN_ON_ONCE(1);
 	}
+}
+
+/**
+ * ib_free_cq - free a completion queue
+ * @cq:		completion queue to free.
+ */
+void ib_free_cq(struct ib_cq *cq)
+{
+	int ret;
+
+	if (WARN_ON_ONCE(atomic_read(&cq->usecnt)))
+		return;
+
+	__ib_cq_clear_ctx(cq);
 
 	kfree(cq->wc);
 	rdma_restrack_del(&cq->res);
@@ -229,3 +242,28 @@ void ib_free_cq(struct ib_cq *cq)
 	WARN_ON_ONCE(ret);
 }
 EXPORT_SYMBOL(ib_free_cq);
+
+/**
+ * ib_change_cq_ctx - change completion queue polling context dynamically
+ * @cq:			the completion queue
+ * @poll_ctx:		new context to poll the CQ from
+ *
+ * The caller must make sure that there is no inflight I/O when calling
+ * this (otherwise its just asking for trouble). If the cq polling context
+ * change fails, the old polling context is restored.
+ */
+int ib_change_cq_ctx(struct ib_cq *cq, enum ib_poll_context poll_ctx)
+{
+	enum ib_poll_context old_ctx = cq->poll_ctx;
+	int ret;
+
+	__ib_cq_clear_ctx(cq);
+	cq->poll_ctx = poll_ctx;
+	ret = __ib_cq_set_ctx(cq);
+	if (ret) {
+		cq->poll_ctx = old_ctx;
+		__ib_cq_set_ctx(cq);
+	}
+	return ret;
+}
+EXPORT_SYMBOL(ib_change_cq_ctx);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9c0c2132a2d6..c9d03d3a3cd4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3464,6 +3464,7 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 
 void ib_free_cq(struct ib_cq *cq);
 int ib_process_cq_direct(struct ib_cq *cq, int budget);
+int ib_change_cq_ctx(struct ib_cq *cq, enum ib_poll_context poll_ctx);
 
 /**
  * ib_create_cq - Creates a CQ on the specified device.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 3/4] nvme-rdma: implement polling queue map
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 1/4] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 2/4] rdma: introduce ib_change_cq_ctx Sagi Grimberg
@ 2018-12-11 23:36 ` Sagi Grimberg
  2018-12-11 23:36 ` [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:36 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch

Every nvmf queue starts with a connect message that is the slow path
at setup time, and there is no need for polling (it is actually
hurtful). Instead, allocate the polling queue cq with IB_POLL_SOFTIRQ
and switch it to IB_POLL_DIRECT where it makes sense.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6a7c546b4e74..590d006d0187 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -607,6 +607,11 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
 	else
 		dev_info(ctrl->ctrl.device,
 			"failed to connect queue: %d ret=%d\n", idx, ret);
+
+	if (idx > ctrl->ctrl.opts->nr_io_queues +
+	    ctrl->ctrl.opts->nr_write_queues)
+		ib_change_cq_ctx(ctrl->queues[idx].ib_cq, IB_POLL_DIRECT);
+
 	return ret;
 }
 
@@ -646,6 +651,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 				ibdev->num_comp_vectors);
 
 	nr_io_queues += min(opts->nr_write_queues, num_online_cpus());
+	nr_io_queues += min(opts->nr_poll_queues, num_online_cpus());
 
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
@@ -716,7 +722,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
-		set->nr_maps = 2 /* default + read */;
+		set->nr_maps = HCTX_MAX_TYPES;
 	}
 
 	ret = blk_mq_alloc_tag_set(set);
@@ -1742,6 +1748,14 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_IOERR;
 }
 
+static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
+{
+	struct nvme_rdma_queue *queue = hctx->driver_data;
+	struct ib_cq *cq = queue->ib_cq;
+
+	return ib_process_cq_direct(cq, 16);
+}
+
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
@@ -1772,6 +1786,21 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 			ctrl->device->dev, 0);
 	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
 			ctrl->device->dev, 0);
+
+	if (ctrl->ctrl.opts->nr_poll_queues) {
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->ctrl.opts->nr_poll_queues;
+		set->map[HCTX_TYPE_POLL].queue_offset =
+				ctrl->ctrl.opts->nr_io_queues;
+		if (ctrl->ctrl.opts->nr_write_queues)
+			set->map[HCTX_TYPE_POLL].queue_offset +=
+				ctrl->ctrl.opts->nr_write_queues;
+	} else {
+		set->map[HCTX_TYPE_POLL].nr_queues =
+				ctrl->ctrl.opts->nr_io_queues;
+		set->map[HCTX_TYPE_POLL].queue_offset = 0;
+	}
+	blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
 	return 0;
 }
 
@@ -1783,6 +1812,7 @@ static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.init_hctx	= nvme_rdma_init_hctx,
 	.timeout	= nvme_rdma_timeout,
 	.map_queues	= nvme_rdma_map_queues,
+	.poll		= nvme_rdma_poll,
 };
 
 static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
@@ -1927,7 +1957,8 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work);
 	INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
 
-	ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues + 1;
+	ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues +
+				opts->nr_poll_queues + 1;
 	ctrl->ctrl.sqsize = opts->queue_size - 1;
 	ctrl->ctrl.kato = opts->kato;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
                   ` (2 preceding siblings ...)
  2018-12-11 23:36 ` [PATCH RFC 3/4] nvme-rdma: implement polling queue map Sagi Grimberg
@ 2018-12-11 23:36 ` Sagi Grimberg
  2018-12-12  7:11   ` Christoph Hellwig
  2018-12-11 23:36 ` [PATCH RFC nvme-cli 5/4] fabrics: pass in number of polling queues Sagi Grimberg
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:36 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch

Since the multipath device does not support polling (yet) we cannot
pass requests to the polling queue map as those will not generate
interrupt so we cannot reap the completion.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f90576862736..511d399a6002 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1547,6 +1547,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->head->disk) {
 		nvme_update_disk_info(ns->head->disk, ns, id);
 		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
+		/* XXX: multipath device does not support polling for now... */
+		blk_queue_flag_clear(QUEUE_FLAG_POLL, ns->queue);
 	}
 #endif
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC nvme-cli 5/4] fabrics: pass in number of polling queues
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
                   ` (3 preceding siblings ...)
  2018-12-11 23:36 ` [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
@ 2018-12-11 23:36 ` Sagi Grimberg
  2018-12-12  0:27   ` Sagi Grimberg
  2018-12-12  7:07 ` [PATCH RFC 0/4] restore polling to nvme-rdma Christoph Hellwig
  2018-12-12 16:37 ` Steve Wise
  6 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:36 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch

nr_poll_queues specifies the number of additional queues that will
be connected for hosting polling latency critical I/O.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 Documentation/nvme-connect.txt |  5 +++++
 fabrics.c                      | 11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/nvme-connect.txt b/Documentation/nvme-connect.txt
index d4a0e6678475..5412472dbd35 100644
--- a/Documentation/nvme-connect.txt
+++ b/Documentation/nvme-connect.txt
@@ -17,6 +17,7 @@ SYNOPSIS
 		[--hostnqn=<hostnqn>      | -q <hostnqn>]
 		[--nr-io-queues=<#>       | -i <#>]
 		[--nr-write-queues=<#>    | -W <#>]
+		[--nr-poll-queues=<#>     | -P <#>]
 		[--queue-size=<#>         | -Q <#>]
 		[--keep-alive-tmo=<#>     | -k <#>]
 		[--reconnect-delay=<#>    | -c <#>]
@@ -80,6 +81,10 @@ OPTIONS
 --nr-write-queues=<#>::
 	Adds additional queues that will be used for write I/O.
 
+-P <#>::
+--nr-poll-queues=<#>::
+	Adds additional queues that will be used for polling latency sensitive I/O.
+
 -Q <#>::
 --queue-size=<#>::
 	Overrides the default number of elements in the I/O queues created
diff --git a/fabrics.c b/fabrics.c
index bc6a0b7e4e21..214fce4c1b51 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -54,6 +54,7 @@ static struct config {
 	char *hostid;
 	int  nr_io_queues;
 	int  nr_write_queues;
+	int  nr_poll_queues;
 	int  queue_size;
 	int  keep_alive_tmo;
 	int  reconnect_delay;
@@ -624,6 +625,8 @@ static int build_options(char *argstr, int max_len)
 				cfg.nr_io_queues) ||
 	    add_int_argument(&argstr, &max_len, "nr_write_queues",
 				cfg.nr_write_queues) ||
+	    add_int_argument(&argstr, &max_len, "nr_poll_queues",
+				cfg.nr_poll_queues) ||
 	    add_int_argument(&argstr, &max_len, "queue_size", cfg.queue_size) ||
 	    add_int_argument(&argstr, &max_len, "keep_alive_tmo",
 				cfg.keep_alive_tmo) ||
@@ -704,6 +707,13 @@ retry:
 		p += len;
 	}
 
+	if (cfg.nr_poll_queues) {
+		len = sprintf(p, ",nr_poll_queues=%d", cfg.nr_poll_queues);
+		if (len < 0)
+			return -EINVAL;
+		p += len;
+	}
+
 	if (cfg.host_traddr) {
 		len = sprintf(p, ",host_traddr=%s", cfg.host_traddr);
 		if (len < 0)
@@ -1020,6 +1030,7 @@ int connect(const char *desc, int argc, char **argv)
 		{"hostid",          'I', "LIST", CFG_STRING, &cfg.hostid,      required_argument, "user-defined hostid (if default not used)"},
 		{"nr-io-queues",    'i', "LIST", CFG_INT, &cfg.nr_io_queues,    required_argument, "number of io queues to use (default is core count)" },
 		{"nr-write-queues",  'W', "LIST", CFG_INT, &cfg.nr_write_queues,    required_argument, "number of write queues to use (default 0)" },
+		{"nr-poll-queues",  'W', "LIST", CFG_INT, &cfg.nr_poll_queues,    required_argument, "number of poll queues to use (default 0)" },
 		{"queue-size",      'Q', "LIST", CFG_INT, &cfg.queue_size,      required_argument, "number of io queue elements to use (default 128)" },
 		{"keep-alive-tmo",  'k', "LIST", CFG_INT, &cfg.keep_alive_tmo,  required_argument, "keep alive timeout period in seconds" },
 		{"reconnect-delay", 'c', "LIST", CFG_INT, &cfg.reconnect_delay, required_argument, "reconnect timeout period in seconds" },
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC nvme-cli 5/4] fabrics: pass in number of polling queues
  2018-12-11 23:36 ` [PATCH RFC nvme-cli 5/4] fabrics: pass in number of polling queues Sagi Grimberg
@ 2018-12-12  0:27   ` Sagi Grimberg
  0 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12  0:27 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, linux-rdma, Christoph Hellwig, Keith Busch


> +		{"nr-poll-queues",  'W', "LIST", CFG_INT, &cfg.nr_poll_queues,    required_argument, "number of poll queues to use (default 0)" },

Oops, this should be 'P'

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
                   ` (4 preceding siblings ...)
  2018-12-11 23:36 ` [PATCH RFC nvme-cli 5/4] fabrics: pass in number of polling queues Sagi Grimberg
@ 2018-12-12  7:07 ` Christoph Hellwig
  2018-12-12  7:16   ` Sagi Grimberg
  2018-12-12 16:37 ` Steve Wise
  6 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-12-12  7:07 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-rdma, Christoph Hellwig, Keith Busch

On Tue, Dec 11, 2018 at 03:36:47PM -0800, Sagi Grimberg wrote:
> Add an additional queue mapping for polling queues that will
> host polling for latency critical I/O.
> 
> One caveat is that we don't want these queues to be pure polling
> as we don't want to bother with polling for the initial nvmf connect
> I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling
> context from SOFTIRQ to DIRECT.

So do we really care?  Yes, polling for the initial connect is not
exactly efficient, but then again it doesn't happen all that often.

Except for efficiency is there any problem with just starting out
in polling mode?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-11 23:36 ` [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
@ 2018-12-12  7:11   ` Christoph Hellwig
  2018-12-12  7:19     ` Sagi Grimberg
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-12-12  7:11 UTC (permalink / raw)
  To: Sagi Grimberg, Jens Axboe
  Cc: linux-nvme, linux-block, linux-rdma, Christoph Hellwig, Keith Busch

[adding Jens]

> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1547,6 +1547,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>  	if (ns->head->disk) {
>  		nvme_update_disk_info(ns->head->disk, ns, id);
>  		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
> +		/* XXX: multipath device does not support polling for now... */
> +		blk_queue_flag_clear(QUEUE_FLAG_POLL, ns->queue);

I'd drop the XXX.  But I think we actually have a block layer problem
here.  Currently stacking devices will just pass through REQ_HIPRI,
despite none of them supporting any polling for it.

So we need to make sure in the block layer or I/O submitter that
REQ_HIPRI is only set if QUEUE_FLAG_POLL is supported.  I think it would
also help if we rename it to REQ_POLL to make this more obvious.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12  7:07 ` [PATCH RFC 0/4] restore polling to nvme-rdma Christoph Hellwig
@ 2018-12-12  7:16   ` Sagi Grimberg
  2018-12-12  8:09     ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12  7:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, linux-block, linux-rdma, Keith Busch


>> Add an additional queue mapping for polling queues that will
>> host polling for latency critical I/O.
>>
>> One caveat is that we don't want these queues to be pure polling
>> as we don't want to bother with polling for the initial nvmf connect
>> I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling
>> context from SOFTIRQ to DIRECT.
> 
> So do we really care?  Yes, polling for the initial connect is not
> exactly efficient, but then again it doesn't happen all that often.
> 
> Except for efficiency is there any problem with just starting out
> in polling mode?

I found it cumbersome so I didn't really consider it...
Isn't it a bit awkward? we will need to implement polled connect
locally in nvme-rdma (because fabrics doesn't know anything about
queues, hctx or polling).

I'm open to looking at it if you think that this is better. Note that if
we had the CQ in our hands, we would do exactly what we did here
effectively: use interrupt for the connect and then simply not
re-arm it again and poll... Should we poll the connect just because
we are behind the CQ API?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-12  7:11   ` Christoph Hellwig
@ 2018-12-12  7:19     ` Sagi Grimberg
  2018-12-12  7:21       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12  7:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: linux-nvme, linux-block, linux-rdma, Keith Busch


> [adding Jens]
> 
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1547,6 +1547,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>>   	if (ns->head->disk) {
>>   		nvme_update_disk_info(ns->head->disk, ns, id);
>>   		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
>> +		/* XXX: multipath device does not support polling for now... */
>> +		blk_queue_flag_clear(QUEUE_FLAG_POLL, ns->queue);
> 
> I'd drop the XXX.  But I think we actually have a block layer problem
> here.  Currently stacking devices will just pass through REQ_HIPRI,
> despite none of them supporting any polling for it.

Yea... forgot there are other stack devices...

> So we need to make sure in the block layer or I/O submitter that
> REQ_HIPRI is only set if QUEUE_FLAG_POLL is supported.  I think it would
> also help if we rename it to REQ_POLL to make this more obvious.

It used to check for it, but was changed to look at nr_maps instead...
So I think this is a regression...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-12  7:19     ` Sagi Grimberg
@ 2018-12-12  7:21       ` Christoph Hellwig
  2018-12-12  7:29         ` Sagi Grimberg
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-12-12  7:21 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jens Axboe, linux-nvme, linux-block,
	linux-rdma, Keith Busch

On Tue, Dec 11, 2018 at 11:19:22PM -0800, Sagi Grimberg wrote:
>> So we need to make sure in the block layer or I/O submitter that
>> REQ_HIPRI is only set if QUEUE_FLAG_POLL is supported.  I think it would
>> also help if we rename it to REQ_POLL to make this more obvious.
>
> It used to check for it, but was changed to look at nr_maps instead...
> So I think this is a regression...

blk_mq_map_queue looks at both QUEUE_FLAG_POLL and nr_maps for the
queue selection, but that doesn't help with the problem that we should
never pass poll requests through stacking devices.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-12  7:21       ` Christoph Hellwig
@ 2018-12-12  7:29         ` Sagi Grimberg
  2018-12-12  7:37           ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12  7:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-nvme, linux-block, linux-rdma, Keith Busch


>>> So we need to make sure in the block layer or I/O submitter that
>>> REQ_HIPRI is only set if QUEUE_FLAG_POLL is supported.  I think it would
>>> also help if we rename it to REQ_POLL to make this more obvious.
>>
>> It used to check for it, but was changed to look at nr_maps instead...
>> So I think this is a regression...
> 
> blk_mq_map_queue looks at both QUEUE_FLAG_POLL and nr_maps for the
> queue selection, but that doesn't help with the problem that we should
> never pass poll requests through stacking devices.

Yea I agree.. Would probably make better sense to forbid it in the core,
but I'm not entirely sure how this is possible make_request drivers...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue
  2018-12-12  7:29         ` Sagi Grimberg
@ 2018-12-12  7:37           ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2018-12-12  7:37 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jens Axboe, linux-nvme, linux-block,
	linux-rdma, Keith Busch

On Tue, Dec 11, 2018 at 11:29:33PM -0800, Sagi Grimberg wrote:
> Yea I agree.. Would probably make better sense to forbid it in the core,
> but I'm not entirely sure how this is possible make_request drivers...

Either we can clear the flag in generic_make_request when QUEUE_FLAG_POLL
is not set, or just change the few users to never even set it in that
case.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12  7:16   ` Sagi Grimberg
@ 2018-12-12  8:09     ` Christoph Hellwig
  2018-12-12  8:53       ` Sagi Grimberg
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-12-12  8:09 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme, linux-block, linux-rdma, Keith Busch

On Tue, Dec 11, 2018 at 11:16:31PM -0800, Sagi Grimberg wrote:
>
>>> Add an additional queue mapping for polling queues that will
>>> host polling for latency critical I/O.
>>>
>>> One caveat is that we don't want these queues to be pure polling
>>> as we don't want to bother with polling for the initial nvmf connect
>>> I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling
>>> context from SOFTIRQ to DIRECT.
>>
>> So do we really care?  Yes, polling for the initial connect is not
>> exactly efficient, but then again it doesn't happen all that often.
>>
>> Except for efficiency is there any problem with just starting out
>> in polling mode?
>
> I found it cumbersome so I didn't really consider it...
> Isn't it a bit awkward? we will need to implement polled connect
> locally in nvme-rdma (because fabrics doesn't know anything about
> queues, hctx or polling).

Well, it should just be a little blk_poll loop, right?

> I'm open to looking at it if you think that this is better. Note that if
> we had the CQ in our hands, we would do exactly what we did here
> effectively: use interrupt for the connect and then simply not
> re-arm it again and poll... Should we poll the connect just because
> we are behind the CQ API?

I'm just worried that the switch between the different context looks
like a way to easy way to shoot yourself in the foot, so if we can
avoid exposing that it would make for a harder to abuse API.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12  8:09     ` Christoph Hellwig
@ 2018-12-12  8:53       ` Sagi Grimberg
  2018-12-12 14:05         ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12  8:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, linux-block, linux-rdma, Keith Busch


>> I found it cumbersome so I didn't really consider it...
>> Isn't it a bit awkward? we will need to implement polled connect
>> locally in nvme-rdma (because fabrics doesn't know anything about
>> queues, hctx or polling).
> 
> Well, it should just be a little blk_poll loop, right?

Not so much about the poll loop, but the fact that we will need
to check if we need to poll for this special case every time in
.queue_rq and its somewhat annoying...

>> I'm open to looking at it if you think that this is better. Note that if
>> we had the CQ in our hands, we would do exactly what we did here
>> effectively: use interrupt for the connect and then simply not
>> re-arm it again and poll... Should we poll the connect just because
>> we are behind the CQ API?
> 
> I'm just worried that the switch between the different context looks
> like a way to easy way to shoot yourself in the foot, so if we can
> avoid exposing that it would make for a harder to abuse API.

Well, it would have been 100% safe if we could undo a cq re-arm that we
did in the past...

The code is documented that the caller must make sure that there is no
inflight I/O during the invocation of the routine..

We could be creative if we really want to make it 100% safe for inflight
I/O (although no one should ever need to use that). We can flush the
current cq context (work/irq), switch to polling context, then create a
single entry QP attached to this CQ and drain it :)
That would make it safe but its brain-dead...

Anyway, if people think its really a bad idea we'll go ahead and
poll the nvmf connect...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12  8:53       ` Sagi Grimberg
@ 2018-12-12 14:05         ` Christoph Hellwig
  2018-12-12 18:23           ` Sagi Grimberg
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-12-12 14:05 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme, linux-block, linux-rdma, Keith Busch

On Wed, Dec 12, 2018 at 12:53:47AM -0800, Sagi Grimberg wrote:
>> Well, it should just be a little blk_poll loop, right?
>
> Not so much about the poll loop, but the fact that we will need
> to check if we need to poll for this special case every time in
> .queue_rq and its somewhat annoying...

I don't think we need to add any special check in queue_rq.  Any
polled command should be marked as HIPRI from the submitting code,
and that includes the queue creation.

> Anyway, if people think its really a bad idea we'll go ahead and
> poll the nvmf connect...

I'm really just worried about the completion context switching.  If the
RDMA folks are happy with that API and you strongly favour this version
I'll let you decide.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
                   ` (5 preceding siblings ...)
  2018-12-12  7:07 ` [PATCH RFC 0/4] restore polling to nvme-rdma Christoph Hellwig
@ 2018-12-12 16:37 ` Steve Wise
  2018-12-12 18:05   ` Sagi Grimberg
  6 siblings, 1 reply; 21+ messages in thread
From: Steve Wise @ 2018-12-12 16:37 UTC (permalink / raw)
  To: 'Sagi Grimberg', linux-nvme
  Cc: linux-block, linux-rdma, 'Christoph Hellwig',
	'Keith Busch'

Hey Sagi,

> Subject: [PATCH RFC 0/4] restore polling to nvme-rdma
> 
> Add an additional queue mapping for polling queues that will
> host polling for latency critical I/O.
> 
> One caveat is that we don't want these queues to be pure polling
> as we don't want to bother with polling for the initial nvmf connect
> I/O. Hence, introduce ib_change_cq_ctx that will modify the cq polling
> context from SOFTIRQ to DIRECT. Note that this function is not safe
> with inflight I/O so the caller must make sure not to call it without
> having all I/O quiesced (we also relax the ib_cq_completion_direct warning
> as we have a scenario that this can happen).

Is there no way to handle this in the core?  Maybe have the polling context
transition to DIRECT when the queue becomes empty and before re-arming the
CQ?  So ib_change_cq_ctx() would be called to indicate the change should
happen when it is safe to do so.  

Just a thought..



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12 16:37 ` Steve Wise
@ 2018-12-12 18:05   ` Sagi Grimberg
  2018-12-12 18:10     ` Steve Wise
  0 siblings, 1 reply; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12 18:05 UTC (permalink / raw)
  To: Steve Wise, linux-nvme
  Cc: linux-block, linux-rdma, 'Christoph Hellwig',
	'Keith Busch'


> Hey Sagi,

Hi Steve,

> Is there no way to handle this in the core?  Maybe have the polling context
> transition to DIRECT when the queue becomes empty and before re-arming the
> CQ?

That is what I suggested, but that would mean that we we need to drain
the cq before making the switch, which means we need to allocate a
dedicated qp for that cq, and even that doesn't guarantee that the
ULP is not posting other wrs on its own qp(s)...

So making this safe for infight I/O would be a challenge... If we end
up agreeing that we are ok with this functionality, I'd much rather not
deal with it and simply document "use with care".

> So ib_change_cq_ctx() would be called to indicate the change should
> happen when it is safe to do so.

You lost me here... ib_change_cq_ctx would get called by who and when?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12 18:05   ` Sagi Grimberg
@ 2018-12-12 18:10     ` Steve Wise
  0 siblings, 0 replies; 21+ messages in thread
From: Steve Wise @ 2018-12-12 18:10 UTC (permalink / raw)
  To: 'Sagi Grimberg', linux-nvme
  Cc: linux-block, linux-rdma, 'Christoph Hellwig',
	'Keith Busch'

> > Hey Sagi,
> 
> Hi Steve,
> 
> > Is there no way to handle this in the core?  Maybe have the polling context
> > transition to DIRECT when the queue becomes empty and before re-arming
> the
> > CQ?
> 
> That is what I suggested, but that would mean that we we need to drain
> the cq before making the switch, which means we need to allocate a
> dedicated qp for that cq, and even that doesn't guarantee that the
> ULP is not posting other wrs on its own qp(s)...
> 
> So making this safe for infight I/O would be a challenge... If we end
> up agreeing that we are ok with this functionality, I'd much rather not
> deal with it and simply document "use with care".
> 
> > So ib_change_cq_ctx() would be called to indicate the change should
> > happen when it is safe to do so.
> 
> You lost me here... ib_change_cq_ctx would get called by who and when?

I didn't look in detail at your changes, but ib_change_cq_ctx() is called by the application, right?  I was just asking what if the semantics of the call were "change the context when it is safe to do so" vs "do it immediately and hope there are no outstanding WRs".   But I don't think this semantic change simplifies the problem.  




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/4] restore polling to nvme-rdma
  2018-12-12 14:05         ` Christoph Hellwig
@ 2018-12-12 18:23           ` Sagi Grimberg
  0 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2018-12-12 18:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-rdma, Keith Busch, linux-nvme


>>> Well, it should just be a little blk_poll loop, right?
>>
>> Not so much about the poll loop, but the fact that we will need
>> to check if we need to poll for this special case every time in
>> .queue_rq and its somewhat annoying...
> 
> I don't think we need to add any special check in queue_rq.  Any
> polled command should be marked as HIPRI from the submitting code,
> and that includes the queue creation.

Hmm, good idea, perhaps adding something like to following:
--
/**
  * blk_execute_rq_polled - execute a request and poll for its completion
  * @q:          queue to insert the request in
  * @bd_disk:    matching gendisk
  * @rq:         request to insert
  * @at_head:    insert request at head or tail of queue
  *
  * Description:
  *    Insert a fully prepared request at the back of the I/O scheduler
  *    queue for execution and poll for completion.
  */
void blk_execute_rq_polled(struct request_queue *q, struct gendisk *bd_disk,
                    struct request *rq, int at_head)
{
         DECLARE_COMPLETION_ONSTACK(wait);
         unsigned long hang_check;

	rq->cmd_flags |= REQ_HIPRI;
         rq->end_io_data = &wait;
         blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);

         while (!completion_done(&wait))
		blk_poll(q, 0, true);
}
EXPORT_SYMBOL(blk_execute_rq_polled);
--

and __nvme_submit_sync_cmd() can receive a poll argument as well to call
blk_execute_req_polled()?

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-12-12 18:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11 23:36 [PATCH RFC 0/4] restore polling to nvme-rdma Sagi Grimberg
2018-12-11 23:36 ` [PATCH RFC 1/4] nvme-fabrics: allow user to pass in nr_poll_queues Sagi Grimberg
2018-12-11 23:36 ` [PATCH RFC 2/4] rdma: introduce ib_change_cq_ctx Sagi Grimberg
2018-12-11 23:36 ` [PATCH RFC 3/4] nvme-rdma: implement polling queue map Sagi Grimberg
2018-12-11 23:36 ` [PATCH RFC 4/4] nvme-multipath: disable polling for underlying namespace request queue Sagi Grimberg
2018-12-12  7:11   ` Christoph Hellwig
2018-12-12  7:19     ` Sagi Grimberg
2018-12-12  7:21       ` Christoph Hellwig
2018-12-12  7:29         ` Sagi Grimberg
2018-12-12  7:37           ` Christoph Hellwig
2018-12-11 23:36 ` [PATCH RFC nvme-cli 5/4] fabrics: pass in number of polling queues Sagi Grimberg
2018-12-12  0:27   ` Sagi Grimberg
2018-12-12  7:07 ` [PATCH RFC 0/4] restore polling to nvme-rdma Christoph Hellwig
2018-12-12  7:16   ` Sagi Grimberg
2018-12-12  8:09     ` Christoph Hellwig
2018-12-12  8:53       ` Sagi Grimberg
2018-12-12 14:05         ` Christoph Hellwig
2018-12-12 18:23           ` Sagi Grimberg
2018-12-12 16:37 ` Steve Wise
2018-12-12 18:05   ` Sagi Grimberg
2018-12-12 18:10     ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).