All of lore.kernel.org
 help / color / mirror / Atom feed
* block and nvme polling improvements V2
@ 2018-11-29 19:12 ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:12 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

Hi all,

this series optimizes a few bits in the block layer and nvme code
related to polling.

It starts by moving the queue types recently introduce entirely into
the block layer instead of requiring an indirect call for them.

It then switches nvme and the block layer to only allow polling
with separate poll queues, which allows us to realize the following
benefits:

 - poll queues can safely avoid disabling irqs on any locks
   (we already do that in NVMe, but it isn't 100% kosher as-is)
 - regular interrupt driven queues can drop the CQ lock entirely,
   as we won't race for completing CQs

Then we drop the NVMe RDMA code, as it doesn't follow the new mode,
and remove the nvme multipath polling code including the block hooks
for it, which didn't make much sense to start with given that we
started bypassing the multipath code for single controller subsystems
early on.  Last but not least we enable polling in the block layer
by default if the underlying driver has poll queues, as that already
requires explicit user action.

Note that it would be really nice to have polling back for RDMA with
dedicated poll queues, but that might take a while.  Also based on
Jens' polling aio patches we could now implement a model in nvmet
where we have a thread polling both the backend nvme device and
the RDMA CQs, which might give us some pretty nice performace
(I know Sagi looked into something similar a while ago).

A git tree is also available at:

    git://git.infradead.org/users/hch/block.git nvme-polling

Gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-polling

Changes since v2:
 - rebased to the latest block for-4.21 tree

^ permalink raw reply	[flat|nested] 62+ messages in thread

* block and nvme polling improvements V2
@ 2018-11-29 19:12 ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:12 UTC (permalink / raw)


Hi all,

this series optimizes a few bits in the block layer and nvme code
related to polling.

It starts by moving the queue types recently introduce entirely into
the block layer instead of requiring an indirect call for them.

It then switches nvme and the block layer to only allow polling
with separate poll queues, which allows us to realize the following
benefits:

 - poll queues can safely avoid disabling irqs on any locks
   (we already do that in NVMe, but it isn't 100% kosher as-is)
 - regular interrupt driven queues can drop the CQ lock entirely,
   as we won't race for completing CQs

Then we drop the NVMe RDMA code, as it doesn't follow the new mode,
and remove the nvme multipath polling code including the block hooks
for it, which didn't make much sense to start with given that we
started bypassing the multipath code for single controller subsystems
early on.  Last but not least we enable polling in the block layer
by default if the underlying driver has poll queues, as that already
requires explicit user action.

Note that it would be really nice to have polling back for RDMA with
dedicated poll queues, but that might take a while.  Also based on
Jens' polling aio patches we could now implement a model in nvmet
where we have a thread polling both the backend nvme device and
the RDMA CQs, which might give us some pretty nice performace
(I know Sagi looked into something similar a while ago).

A git tree is also available at:

    git://git.infradead.org/users/hch/block.git nvme-polling

Gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-polling

Changes since v2:
 - rebased to the latest block for-4.21 tree

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:12   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:12 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

Having another indirect all in the fast path doesn't really help
in our post-spectre world.  Also having too many queue type is just
going to create confusion, so I'd rather manage them centrally.

Note that the queue type naming and ordering changes a bit - the
first index now is the defauly queue for everything not explicitly
marked, the optional ones are read and poll queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.h          | 21 +++++++------
 drivers/nvme/host/pci.c | 68 +++++++++++++++--------------------------
 include/linux/blk-mq.h  | 15 ++++-----
 3 files changed, 43 insertions(+), 61 deletions(-)

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 7291e5379358..a664ea44ffd4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -81,16 +81,14 @@ extern int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int);
 /*
  * blk_mq_map_queue_type() - map (hctx_type,cpu) to hardware queue
  * @q: request queue
- * @hctx_type: the hctx type index
+ * @type: the hctx type index
  * @cpu: CPU
  */
 static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue *q,
-							  unsigned int hctx_type,
+							  enum hctx_type type,
 							  unsigned int cpu)
 {
-	struct blk_mq_tag_set *set = q->tag_set;
-
-	return q->queue_hw_ctx[set->map[hctx_type].mq_map[cpu]];
+	return q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]];
 }
 
 /*
@@ -103,12 +101,17 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
 						     unsigned int flags,
 						     unsigned int cpu)
 {
-	int hctx_type = 0;
+	enum hctx_type type = HCTX_TYPE_DEFAULT;
+
+	if (q->tag_set->nr_maps > HCTX_TYPE_POLL &&
+	    ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags)))
+		type = HCTX_TYPE_POLL;
 
-	if (q->mq_ops->rq_flags_to_type)
-		hctx_type = q->mq_ops->rq_flags_to_type(q, flags);
+	else if (q->tag_set->nr_maps > HCTX_TYPE_READ &&
+		 ((flags & REQ_OP_MASK) == REQ_OP_READ))
+		type = HCTX_TYPE_READ;
 
-	return blk_mq_map_queue_type(q, hctx_type, cpu);
+	return blk_mq_map_queue_type(q, type, cpu);
 }
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 527907aa6903..a1bb4bb92e7f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -95,13 +95,6 @@ struct nvme_queue;
 
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
-enum {
-	NVMEQ_TYPE_READ,
-	NVMEQ_TYPE_WRITE,
-	NVMEQ_TYPE_POLL,
-	NVMEQ_TYPE_NR,
-};
-
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -115,7 +108,7 @@ struct nvme_dev {
 	struct dma_pool *prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
-	unsigned io_queues[NVMEQ_TYPE_NR];
+	unsigned io_queues[HCTX_MAX_TYPES];
 	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
@@ -499,10 +492,10 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 
 		map->nr_queues = dev->io_queues[i];
 		if (!map->nr_queues) {
-			BUG_ON(i == NVMEQ_TYPE_READ);
+			BUG_ON(i == HCTX_TYPE_DEFAULT);
 
 			/* shared set, resuse read set parameters */
-			map->nr_queues = dev->io_queues[NVMEQ_TYPE_READ];
+			map->nr_queues = dev->io_queues[HCTX_TYPE_DEFAULT];
 			qoff = 0;
 			offset = queue_irq_offset(dev);
 		}
@@ -512,7 +505,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 		 * affinity), so use the regular blk-mq cpu mapping
 		 */
 		map->queue_offset = qoff;
-		if (i != NVMEQ_TYPE_POLL)
+		if (i != HCTX_TYPE_POLL)
 			blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
 		else
 			blk_mq_map_queues(map);
@@ -961,16 +954,6 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
-static int nvme_rq_flags_to_type(struct request_queue *q, unsigned int flags)
-{
-	if ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
-		return NVMEQ_TYPE_POLL;
-	if ((flags & REQ_OP_MASK) == REQ_OP_READ)
-		return NVMEQ_TYPE_READ;
-
-	return NVMEQ_TYPE_WRITE;
-}
-
 static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1634,7 +1617,6 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 #define NVME_SHARED_MQ_OPS					\
 	.queue_rq		= nvme_queue_rq,		\
 	.commit_rqs		= nvme_commit_rqs,		\
-	.rq_flags_to_type	= nvme_rq_flags_to_type,	\
 	.complete		= nvme_pci_complete_rq,		\
 	.init_hctx		= nvme_init_hctx,		\
 	.init_request		= nvme_init_request,		\
@@ -1785,9 +1767,9 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
 	}
 
 	max = min(dev->max_qid, dev->ctrl.queue_count - 1);
-	if (max != 1 && dev->io_queues[NVMEQ_TYPE_POLL]) {
-		rw_queues = dev->io_queues[NVMEQ_TYPE_READ] +
-				dev->io_queues[NVMEQ_TYPE_WRITE];
+	if (max != 1 && dev->io_queues[HCTX_TYPE_POLL]) {
+		rw_queues = dev->io_queues[HCTX_TYPE_DEFAULT] +
+				dev->io_queues[HCTX_TYPE_READ];
 	} else {
 		rw_queues = max;
 	}
@@ -2076,9 +2058,9 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 	 * Setup read/write queue split
 	 */
 	if (nr_io_queues == 1) {
-		dev->io_queues[NVMEQ_TYPE_READ] = 1;
-		dev->io_queues[NVMEQ_TYPE_WRITE] = 0;
-		dev->io_queues[NVMEQ_TYPE_POLL] = 0;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = 1;
+		dev->io_queues[HCTX_TYPE_READ] = 0;
+		dev->io_queues[HCTX_TYPE_POLL] = 0;
 		return;
 	}
 
@@ -2095,10 +2077,10 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 			this_p_queues = nr_io_queues - 1;
 		}
 
-		dev->io_queues[NVMEQ_TYPE_POLL] = this_p_queues;
+		dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
 		nr_io_queues -= this_p_queues;
 	} else
-		dev->io_queues[NVMEQ_TYPE_POLL] = 0;
+		dev->io_queues[HCTX_TYPE_POLL] = 0;
 
 	/*
 	 * If 'write_queues' is set, ensure it leaves room for at least
@@ -2112,11 +2094,11 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 	 * a queue set.
 	 */
 	if (!this_w_queues) {
-		dev->io_queues[NVMEQ_TYPE_WRITE] = 0;
-		dev->io_queues[NVMEQ_TYPE_READ] = nr_io_queues;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
+		dev->io_queues[HCTX_TYPE_READ] = 0;
 	} else {
-		dev->io_queues[NVMEQ_TYPE_WRITE] = this_w_queues;
-		dev->io_queues[NVMEQ_TYPE_READ] = nr_io_queues - this_w_queues;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = this_w_queues;
+		dev->io_queues[HCTX_TYPE_READ] = nr_io_queues - this_w_queues;
 	}
 }
 
@@ -2138,8 +2120,8 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 	 */
 	do {
 		nvme_calc_io_queues(dev, nr_io_queues);
-		irq_sets[0] = dev->io_queues[NVMEQ_TYPE_READ];
-		irq_sets[1] = dev->io_queues[NVMEQ_TYPE_WRITE];
+		irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT];
+		irq_sets[1] = dev->io_queues[HCTX_TYPE_READ];
 		if (!irq_sets[1])
 			affd.nr_sets = 1;
 
@@ -2226,12 +2208,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 
 	dev->num_vecs = result;
 	result = max(result - 1, 1);
-	dev->max_qid = result + dev->io_queues[NVMEQ_TYPE_POLL];
+	dev->max_qid = result + dev->io_queues[HCTX_TYPE_POLL];
 
-	dev_info(dev->ctrl.device, "%d/%d/%d read/write/poll queues\n",
-					dev->io_queues[NVMEQ_TYPE_READ],
-					dev->io_queues[NVMEQ_TYPE_WRITE],
-					dev->io_queues[NVMEQ_TYPE_POLL]);
+	dev_info(dev->ctrl.device, "%d/%d/%d default/read/poll queues\n",
+					dev->io_queues[HCTX_TYPE_DEFAULT],
+					dev->io_queues[HCTX_TYPE_READ],
+					dev->io_queues[HCTX_TYPE_POLL]);
 
 	/*
 	 * Should investigate if there's a performance win from allocating
@@ -2332,13 +2314,13 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	int ret;
 
 	if (!dev->ctrl.tagset) {
-		if (!dev->io_queues[NVMEQ_TYPE_POLL])
+		if (!dev->io_queues[HCTX_TYPE_POLL])
 			dev->tagset.ops = &nvme_mq_ops;
 		else
 			dev->tagset.ops = &nvme_mq_poll_noirq_ops;
 
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
-		dev->tagset.nr_maps = NVMEQ_TYPE_NR;
+		dev->tagset.nr_maps = HCTX_MAX_TYPES;
 		dev->tagset.timeout = NVME_IO_TIMEOUT;
 		dev->tagset.numa_node = dev_to_node(dev->dev);
 		dev->tagset.queue_depth =
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 467f1dd21ccf..57eda7b20243 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -81,8 +81,12 @@ struct blk_mq_queue_map {
 	unsigned int queue_offset;
 };
 
-enum {
-	HCTX_MAX_TYPES = 3,
+enum hctx_type {
+	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
+	HCTX_TYPE_READ,		/* just for READ I/O */
+	HCTX_TYPE_POLL,		/* polled I/O of any kind */
+
+	HCTX_MAX_TYPES,
 };
 
 struct blk_mq_tag_set {
@@ -118,8 +122,6 @@ struct blk_mq_queue_data {
 typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
 		const struct blk_mq_queue_data *);
 typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
-/* takes rq->cmd_flags as input, returns a hardware type index */
-typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
 typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
 typedef void (put_budget_fn)(struct blk_mq_hw_ctx *);
 typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
@@ -154,11 +156,6 @@ struct blk_mq_ops {
 	 */
 	commit_rqs_fn		*commit_rqs;
 
-	/*
-	 * Return a queue map type for the given request/bio flags
-	 */
-	rq_flags_to_type_fn	*rq_flags_to_type;
-
 	/*
 	 * Reserve budget before queue request, once .queue_rq is
 	 * run, it is driver's responsibility to release the
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-29 19:12   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:12 UTC (permalink / raw)


Having another indirect all in the fast path doesn't really help
in our post-spectre world.  Also having too many queue type is just
going to create confusion, so I'd rather manage them centrally.

Note that the queue type naming and ordering changes a bit - the
first index now is the defauly queue for everything not explicitly
marked, the optional ones are read and poll queues.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.h          | 21 +++++++------
 drivers/nvme/host/pci.c | 68 +++++++++++++++--------------------------
 include/linux/blk-mq.h  | 15 ++++-----
 3 files changed, 43 insertions(+), 61 deletions(-)

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 7291e5379358..a664ea44ffd4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -81,16 +81,14 @@ extern int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int);
 /*
  * blk_mq_map_queue_type() - map (hctx_type,cpu) to hardware queue
  * @q: request queue
- * @hctx_type: the hctx type index
+ * @type: the hctx type index
  * @cpu: CPU
  */
 static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue *q,
-							  unsigned int hctx_type,
+							  enum hctx_type type,
 							  unsigned int cpu)
 {
-	struct blk_mq_tag_set *set = q->tag_set;
-
-	return q->queue_hw_ctx[set->map[hctx_type].mq_map[cpu]];
+	return q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]];
 }
 
 /*
@@ -103,12 +101,17 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
 						     unsigned int flags,
 						     unsigned int cpu)
 {
-	int hctx_type = 0;
+	enum hctx_type type = HCTX_TYPE_DEFAULT;
+
+	if (q->tag_set->nr_maps > HCTX_TYPE_POLL &&
+	    ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags)))
+		type = HCTX_TYPE_POLL;
 
-	if (q->mq_ops->rq_flags_to_type)
-		hctx_type = q->mq_ops->rq_flags_to_type(q, flags);
+	else if (q->tag_set->nr_maps > HCTX_TYPE_READ &&
+		 ((flags & REQ_OP_MASK) == REQ_OP_READ))
+		type = HCTX_TYPE_READ;
 
-	return blk_mq_map_queue_type(q, hctx_type, cpu);
+	return blk_mq_map_queue_type(q, type, cpu);
 }
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 527907aa6903..a1bb4bb92e7f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -95,13 +95,6 @@ struct nvme_queue;
 
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
-enum {
-	NVMEQ_TYPE_READ,
-	NVMEQ_TYPE_WRITE,
-	NVMEQ_TYPE_POLL,
-	NVMEQ_TYPE_NR,
-};
-
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -115,7 +108,7 @@ struct nvme_dev {
 	struct dma_pool *prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
-	unsigned io_queues[NVMEQ_TYPE_NR];
+	unsigned io_queues[HCTX_MAX_TYPES];
 	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
@@ -499,10 +492,10 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 
 		map->nr_queues = dev->io_queues[i];
 		if (!map->nr_queues) {
-			BUG_ON(i == NVMEQ_TYPE_READ);
+			BUG_ON(i == HCTX_TYPE_DEFAULT);
 
 			/* shared set, resuse read set parameters */
-			map->nr_queues = dev->io_queues[NVMEQ_TYPE_READ];
+			map->nr_queues = dev->io_queues[HCTX_TYPE_DEFAULT];
 			qoff = 0;
 			offset = queue_irq_offset(dev);
 		}
@@ -512,7 +505,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 		 * affinity), so use the regular blk-mq cpu mapping
 		 */
 		map->queue_offset = qoff;
-		if (i != NVMEQ_TYPE_POLL)
+		if (i != HCTX_TYPE_POLL)
 			blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
 		else
 			blk_mq_map_queues(map);
@@ -961,16 +954,6 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
-static int nvme_rq_flags_to_type(struct request_queue *q, unsigned int flags)
-{
-	if ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
-		return NVMEQ_TYPE_POLL;
-	if ((flags & REQ_OP_MASK) == REQ_OP_READ)
-		return NVMEQ_TYPE_READ;
-
-	return NVMEQ_TYPE_WRITE;
-}
-
 static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1634,7 +1617,6 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 #define NVME_SHARED_MQ_OPS					\
 	.queue_rq		= nvme_queue_rq,		\
 	.commit_rqs		= nvme_commit_rqs,		\
-	.rq_flags_to_type	= nvme_rq_flags_to_type,	\
 	.complete		= nvme_pci_complete_rq,		\
 	.init_hctx		= nvme_init_hctx,		\
 	.init_request		= nvme_init_request,		\
@@ -1785,9 +1767,9 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
 	}
 
 	max = min(dev->max_qid, dev->ctrl.queue_count - 1);
-	if (max != 1 && dev->io_queues[NVMEQ_TYPE_POLL]) {
-		rw_queues = dev->io_queues[NVMEQ_TYPE_READ] +
-				dev->io_queues[NVMEQ_TYPE_WRITE];
+	if (max != 1 && dev->io_queues[HCTX_TYPE_POLL]) {
+		rw_queues = dev->io_queues[HCTX_TYPE_DEFAULT] +
+				dev->io_queues[HCTX_TYPE_READ];
 	} else {
 		rw_queues = max;
 	}
@@ -2076,9 +2058,9 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 	 * Setup read/write queue split
 	 */
 	if (nr_io_queues == 1) {
-		dev->io_queues[NVMEQ_TYPE_READ] = 1;
-		dev->io_queues[NVMEQ_TYPE_WRITE] = 0;
-		dev->io_queues[NVMEQ_TYPE_POLL] = 0;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = 1;
+		dev->io_queues[HCTX_TYPE_READ] = 0;
+		dev->io_queues[HCTX_TYPE_POLL] = 0;
 		return;
 	}
 
@@ -2095,10 +2077,10 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 			this_p_queues = nr_io_queues - 1;
 		}
 
-		dev->io_queues[NVMEQ_TYPE_POLL] = this_p_queues;
+		dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
 		nr_io_queues -= this_p_queues;
 	} else
-		dev->io_queues[NVMEQ_TYPE_POLL] = 0;
+		dev->io_queues[HCTX_TYPE_POLL] = 0;
 
 	/*
 	 * If 'write_queues' is set, ensure it leaves room for at least
@@ -2112,11 +2094,11 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 	 * a queue set.
 	 */
 	if (!this_w_queues) {
-		dev->io_queues[NVMEQ_TYPE_WRITE] = 0;
-		dev->io_queues[NVMEQ_TYPE_READ] = nr_io_queues;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
+		dev->io_queues[HCTX_TYPE_READ] = 0;
 	} else {
-		dev->io_queues[NVMEQ_TYPE_WRITE] = this_w_queues;
-		dev->io_queues[NVMEQ_TYPE_READ] = nr_io_queues - this_w_queues;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = this_w_queues;
+		dev->io_queues[HCTX_TYPE_READ] = nr_io_queues - this_w_queues;
 	}
 }
 
@@ -2138,8 +2120,8 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 	 */
 	do {
 		nvme_calc_io_queues(dev, nr_io_queues);
-		irq_sets[0] = dev->io_queues[NVMEQ_TYPE_READ];
-		irq_sets[1] = dev->io_queues[NVMEQ_TYPE_WRITE];
+		irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT];
+		irq_sets[1] = dev->io_queues[HCTX_TYPE_READ];
 		if (!irq_sets[1])
 			affd.nr_sets = 1;
 
@@ -2226,12 +2208,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 
 	dev->num_vecs = result;
 	result = max(result - 1, 1);
-	dev->max_qid = result + dev->io_queues[NVMEQ_TYPE_POLL];
+	dev->max_qid = result + dev->io_queues[HCTX_TYPE_POLL];
 
-	dev_info(dev->ctrl.device, "%d/%d/%d read/write/poll queues\n",
-					dev->io_queues[NVMEQ_TYPE_READ],
-					dev->io_queues[NVMEQ_TYPE_WRITE],
-					dev->io_queues[NVMEQ_TYPE_POLL]);
+	dev_info(dev->ctrl.device, "%d/%d/%d default/read/poll queues\n",
+					dev->io_queues[HCTX_TYPE_DEFAULT],
+					dev->io_queues[HCTX_TYPE_READ],
+					dev->io_queues[HCTX_TYPE_POLL]);
 
 	/*
 	 * Should investigate if there's a performance win from allocating
@@ -2332,13 +2314,13 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	int ret;
 
 	if (!dev->ctrl.tagset) {
-		if (!dev->io_queues[NVMEQ_TYPE_POLL])
+		if (!dev->io_queues[HCTX_TYPE_POLL])
 			dev->tagset.ops = &nvme_mq_ops;
 		else
 			dev->tagset.ops = &nvme_mq_poll_noirq_ops;
 
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
-		dev->tagset.nr_maps = NVMEQ_TYPE_NR;
+		dev->tagset.nr_maps = HCTX_MAX_TYPES;
 		dev->tagset.timeout = NVME_IO_TIMEOUT;
 		dev->tagset.numa_node = dev_to_node(dev->dev);
 		dev->tagset.queue_depth =
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 467f1dd21ccf..57eda7b20243 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -81,8 +81,12 @@ struct blk_mq_queue_map {
 	unsigned int queue_offset;
 };
 
-enum {
-	HCTX_MAX_TYPES = 3,
+enum hctx_type {
+	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
+	HCTX_TYPE_READ,		/* just for READ I/O */
+	HCTX_TYPE_POLL,		/* polled I/O of any kind */
+
+	HCTX_MAX_TYPES,
 };
 
 struct blk_mq_tag_set {
@@ -118,8 +122,6 @@ struct blk_mq_queue_data {
 typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
 		const struct blk_mq_queue_data *);
 typedef void (commit_rqs_fn)(struct blk_mq_hw_ctx *);
-/* takes rq->cmd_flags as input, returns a hardware type index */
-typedef int (rq_flags_to_type_fn)(struct request_queue *, unsigned int);
 typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
 typedef void (put_budget_fn)(struct blk_mq_hw_ctx *);
 typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
@@ -154,11 +156,6 @@ struct blk_mq_ops {
 	 */
 	commit_rqs_fn		*commit_rqs;
 
-	/*
-	 * Return a queue map type for the given request/bio flags
-	 */
-	rq_flags_to_type_fn	*rq_flags_to_type;
-
 	/*
 	 * Reserve budget before queue request, once .queue_rq is
 	 * run, it is driver's responsibility to release the
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:12   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:12 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

This gets rid of all the messing with cq_vector and the ->polled field
by using an atomic bitop to mark the queue enabled or not.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 43 ++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 28 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a1bb4bb92e7f..022395a319f4 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -201,7 +201,8 @@ struct nvme_queue {
 	u16 last_cq_head;
 	u16 qid;
 	u8 cq_phase;
-	u8 polled;
+	unsigned long flags;
+#define NVMEQ_ENABLED		0
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -927,7 +928,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	 * We should not need to do this, but we're still using this to
 	 * ensure we can drain requests on a dying queue.
 	 */
-	if (unlikely(nvmeq->cq_vector < 0 && !nvmeq->polled))
+	if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
 		return BLK_STS_IOERR;
 
 	ret = nvme_setup_cmd(ns, req, &cmnd);
@@ -1395,31 +1396,19 @@ static void nvme_free_queues(struct nvme_dev *dev, int lowest)
  */
 static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 {
-	int vector;
-
-	spin_lock_irq(&nvmeq->cq_lock);
-	if (nvmeq->cq_vector == -1 && !nvmeq->polled) {
-		spin_unlock_irq(&nvmeq->cq_lock);
+	if (!test_and_clear_bit(NVMEQ_ENABLED, &nvmeq->flags))
 		return 1;
-	}
-	vector = nvmeq->cq_vector;
-	nvmeq->dev->online_queues--;
-	nvmeq->cq_vector = -1;
-	nvmeq->polled = false;
-	spin_unlock_irq(&nvmeq->cq_lock);
 
-	/*
-	 * Ensure that nvme_queue_rq() sees it ->cq_vector == -1 without
-	 * having to grab the lock.
-	 */
+	/* ensure that nvme_queue_rq() sees NVMEQ_ENABLED cleared */
 	mb();
 
+	nvmeq->dev->online_queues--;
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
 		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
-	if (vector != -1)
-		pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
-
+	if (nvmeq->cq_vector == -1)
+		return 0;
+	pci_free_irq(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector, nvmeq);
+	nvmeq->cq_vector = -1;
 	return 0;
 }
 
@@ -1578,13 +1567,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
 	else if (result)
 		goto release_cq;
 
-	/*
-	 * Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will
-	 * invoke free_irq for it and cause a 'Trying to free already-free IRQ
-	 * xxx' warning if the create CQ/SQ command times out.
-	 */
 	nvmeq->cq_vector = vector;
-	nvmeq->polled = polled;
 	nvme_init_queue(nvmeq, qid);
 
 	if (vector != -1) {
@@ -1593,11 +1576,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
 			goto release_sq;
 	}
 
+	set_bit(NVMEQ_ENABLED, &nvmeq->flags);
 	return result;
 
 release_sq:
 	nvmeq->cq_vector = -1;
-	nvmeq->polled = false;
 	dev->online_queues--;
 	adapter_delete_sq(dev, qid);
 release_cq:
@@ -1751,6 +1734,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
 		return result;
 	}
 
+	set_bit(NVMEQ_ENABLED, &nvmeq->flags);
 	return result;
 }
 
@@ -2173,6 +2157,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 
 	if (nr_io_queues == 0)
 		return 0;
+	
+	clear_bit(NVMEQ_ENABLED, &adminq->flags);
 
 	if (dev->cmb_use_sqes) {
 		result = nvme_cmb_qdepth(dev, nr_io_queues,
@@ -2227,6 +2213,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 		adminq->cq_vector = -1;
 		return result;
 	}
+	set_bit(NVMEQ_ENABLED, &adminq->flags);
 	return nvme_create_io_queues(dev);
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled
@ 2018-11-29 19:12   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:12 UTC (permalink / raw)


This gets rid of all the messing with cq_vector and the ->polled field
by using an atomic bitop to mark the queue enabled or not.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 43 ++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 28 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a1bb4bb92e7f..022395a319f4 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -201,7 +201,8 @@ struct nvme_queue {
 	u16 last_cq_head;
 	u16 qid;
 	u8 cq_phase;
-	u8 polled;
+	unsigned long flags;
+#define NVMEQ_ENABLED		0
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -927,7 +928,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	 * We should not need to do this, but we're still using this to
 	 * ensure we can drain requests on a dying queue.
 	 */
-	if (unlikely(nvmeq->cq_vector < 0 && !nvmeq->polled))
+	if (unlikely(!test_bit(NVMEQ_ENABLED, &nvmeq->flags)))
 		return BLK_STS_IOERR;
 
 	ret = nvme_setup_cmd(ns, req, &cmnd);
@@ -1395,31 +1396,19 @@ static void nvme_free_queues(struct nvme_dev *dev, int lowest)
  */
 static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 {
-	int vector;
-
-	spin_lock_irq(&nvmeq->cq_lock);
-	if (nvmeq->cq_vector == -1 && !nvmeq->polled) {
-		spin_unlock_irq(&nvmeq->cq_lock);
+	if (!test_and_clear_bit(NVMEQ_ENABLED, &nvmeq->flags))
 		return 1;
-	}
-	vector = nvmeq->cq_vector;
-	nvmeq->dev->online_queues--;
-	nvmeq->cq_vector = -1;
-	nvmeq->polled = false;
-	spin_unlock_irq(&nvmeq->cq_lock);
 
-	/*
-	 * Ensure that nvme_queue_rq() sees it ->cq_vector == -1 without
-	 * having to grab the lock.
-	 */
+	/* ensure that nvme_queue_rq() sees NVMEQ_ENABLED cleared */
 	mb();
 
+	nvmeq->dev->online_queues--;
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
 		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
-	if (vector != -1)
-		pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
-
+	if (nvmeq->cq_vector == -1)
+		return 0;
+	pci_free_irq(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector, nvmeq);
+	nvmeq->cq_vector = -1;
 	return 0;
 }
 
@@ -1578,13 +1567,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
 	else if (result)
 		goto release_cq;
 
-	/*
-	 * Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will
-	 * invoke free_irq for it and cause a 'Trying to free already-free IRQ
-	 * xxx' warning if the create CQ/SQ command times out.
-	 */
 	nvmeq->cq_vector = vector;
-	nvmeq->polled = polled;
 	nvme_init_queue(nvmeq, qid);
 
 	if (vector != -1) {
@@ -1593,11 +1576,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
 			goto release_sq;
 	}
 
+	set_bit(NVMEQ_ENABLED, &nvmeq->flags);
 	return result;
 
 release_sq:
 	nvmeq->cq_vector = -1;
-	nvmeq->polled = false;
 	dev->online_queues--;
 	adapter_delete_sq(dev, qid);
 release_cq:
@@ -1751,6 +1734,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
 		return result;
 	}
 
+	set_bit(NVMEQ_ENABLED, &nvmeq->flags);
 	return result;
 }
 
@@ -2173,6 +2157,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 
 	if (nr_io_queues == 0)
 		return 0;
+	
+	clear_bit(NVMEQ_ENABLED, &adminq->flags);
 
 	if (dev->cmb_use_sqes) {
 		result = nvme_cmb_qdepth(dev, nr_io_queues,
@@ -2227,6 +2213,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 		adminq->cq_vector = -1;
 		return result;
 	}
+	set_bit(NVMEQ_ENABLED, &adminq->flags);
 	return nvme_create_io_queues(dev);
 }
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

Use a bit flag to mark if the SQ was allocated from the CMB, and clean
up the surrounding code a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 022395a319f4..b820dd0351cb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -186,7 +186,6 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	bool sq_cmds_is_io;
 	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
@@ -203,6 +202,7 @@ struct nvme_queue {
 	u8 cq_phase;
 	unsigned long flags;
 #define NVMEQ_ENABLED		0
+#define NVMEQ_SQ_CMB		1
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -1366,17 +1366,15 @@ static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
 	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
 				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
+	if (!nvmeq->sq_cmds)
+		return;
 
-	if (nvmeq->sq_cmds) {
-		if (nvmeq->sq_cmds_is_io)
-			pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
-					nvmeq->sq_cmds,
-					SQ_SIZE(nvmeq->q_depth));
-		else
-			dma_free_coherent(nvmeq->q_dmadev,
-					  SQ_SIZE(nvmeq->q_depth),
-					  nvmeq->sq_cmds,
-					  nvmeq->sq_dma_addr);
+	if (test_and_clear_bit(NVMEQ_SQ_CMB, &nvmeq->flags)) {
+		pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
+				nvmeq->sq_cmds, SQ_SIZE(nvmeq->q_depth));
+	} else {
+		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
+				nvmeq->sq_cmds, nvmeq->sq_dma_addr);
 	}
 }
 
@@ -1462,15 +1460,14 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 		nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
 		nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
 						nvmeq->sq_cmds);
-		nvmeq->sq_cmds_is_io = true;
-	}
-
-	if (!nvmeq->sq_cmds) {
-		nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
-					&nvmeq->sq_dma_addr, GFP_KERNEL);
-		nvmeq->sq_cmds_is_io = false;
+		if (nvmeq->sq_dma_addr) {
+			set_bit(NVMEQ_SQ_CMB, &nvmeq->flags);
+			return 0; 
+		}
 	}
 
+	nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+				&nvmeq->sq_dma_addr, GFP_KERNEL);
 	if (!nvmeq->sq_cmds)
 		return -ENOMEM;
 	return 0;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


Use a bit flag to mark if the SQ was allocated from the CMB, and clean
up the surrounding code a bit.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 022395a319f4..b820dd0351cb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -186,7 +186,6 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	bool sq_cmds_is_io;
 	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
@@ -203,6 +202,7 @@ struct nvme_queue {
 	u8 cq_phase;
 	unsigned long flags;
 #define NVMEQ_ENABLED		0
+#define NVMEQ_SQ_CMB		1
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -1366,17 +1366,15 @@ static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
 	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
 				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
+	if (!nvmeq->sq_cmds)
+		return;
 
-	if (nvmeq->sq_cmds) {
-		if (nvmeq->sq_cmds_is_io)
-			pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
-					nvmeq->sq_cmds,
-					SQ_SIZE(nvmeq->q_depth));
-		else
-			dma_free_coherent(nvmeq->q_dmadev,
-					  SQ_SIZE(nvmeq->q_depth),
-					  nvmeq->sq_cmds,
-					  nvmeq->sq_dma_addr);
+	if (test_and_clear_bit(NVMEQ_SQ_CMB, &nvmeq->flags)) {
+		pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev),
+				nvmeq->sq_cmds, SQ_SIZE(nvmeq->q_depth));
+	} else {
+		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
+				nvmeq->sq_cmds, nvmeq->sq_dma_addr);
 	}
 }
 
@@ -1462,15 +1460,14 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 		nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
 		nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
 						nvmeq->sq_cmds);
-		nvmeq->sq_cmds_is_io = true;
-	}
-
-	if (!nvmeq->sq_cmds) {
-		nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
-					&nvmeq->sq_dma_addr, GFP_KERNEL);
-		nvmeq->sq_cmds_is_io = false;
+		if (nvmeq->sq_dma_addr) {
+			set_bit(NVMEQ_SQ_CMB, &nvmeq->flags);
+			return 0; 
+		}
 	}
 
+	nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+				&nvmeq->sq_dma_addr, GFP_KERNEL);
 	if (!nvmeq->sq_cmds)
 		return -ENOMEM;
 	return 0;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/13] nvme-pci: only allow polling with separate poll queues
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

This will allow us to simplify both the regular NVMe interrupt handler
and the upcoming aio poll code.  In addition to that the separate
queues are generally a good idea for performance reasons.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b820dd0351cb..d42bb76e5e78 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1089,13 +1089,6 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 }
 
 static int nvme_poll(struct blk_mq_hw_ctx *hctx)
-{
-	struct nvme_queue *nvmeq = hctx->driver_data;
-
-	return __nvme_poll(nvmeq, -1);
-}
-
-static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
 	u16 start, end;
@@ -1605,12 +1598,11 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 
 static const struct blk_mq_ops nvme_mq_ops = {
 	NVME_SHARED_MQ_OPS,
-	.poll			= nvme_poll,
 };
 
-static const struct blk_mq_ops nvme_mq_poll_noirq_ops = {
+static const struct blk_mq_ops nvme_mq_poll_ops = {
 	NVME_SHARED_MQ_OPS,
-	.poll			= nvme_poll_noirq,
+	.poll			= nvme_poll,
 };
 
 static void nvme_dev_remove_admin(struct nvme_dev *dev)
@@ -2298,10 +2290,10 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	int ret;
 
 	if (!dev->ctrl.tagset) {
-		if (!dev->io_queues[HCTX_TYPE_POLL])
-			dev->tagset.ops = &nvme_mq_ops;
+		if (dev->io_queues[HCTX_TYPE_POLL])
+			dev->tagset.ops = &nvme_mq_poll_ops;
 		else
-			dev->tagset.ops = &nvme_mq_poll_noirq_ops;
+			dev->tagset.ops = &nvme_mq_ops;
 
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
 		dev->tagset.nr_maps = HCTX_MAX_TYPES;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/13] nvme-pci: only allow polling with separate poll queues
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


This will allow us to simplify both the regular NVMe interrupt handler
and the upcoming aio poll code.  In addition to that the separate
queues are generally a good idea for performance reasons.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b820dd0351cb..d42bb76e5e78 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1089,13 +1089,6 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 }
 
 static int nvme_poll(struct blk_mq_hw_ctx *hctx)
-{
-	struct nvme_queue *nvmeq = hctx->driver_data;
-
-	return __nvme_poll(nvmeq, -1);
-}
-
-static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
 	u16 start, end;
@@ -1605,12 +1598,11 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 
 static const struct blk_mq_ops nvme_mq_ops = {
 	NVME_SHARED_MQ_OPS,
-	.poll			= nvme_poll,
 };
 
-static const struct blk_mq_ops nvme_mq_poll_noirq_ops = {
+static const struct blk_mq_ops nvme_mq_poll_ops = {
 	NVME_SHARED_MQ_OPS,
-	.poll			= nvme_poll_noirq,
+	.poll			= nvme_poll,
 };
 
 static void nvme_dev_remove_admin(struct nvme_dev *dev)
@@ -2298,10 +2290,10 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	int ret;
 
 	if (!dev->ctrl.tagset) {
-		if (!dev->io_queues[HCTX_TYPE_POLL])
-			dev->tagset.ops = &nvme_mq_ops;
+		if (dev->io_queues[HCTX_TYPE_POLL])
+			dev->tagset.ops = &nvme_mq_poll_ops;
 		else
-			dev->tagset.ops = &nvme_mq_poll_noirq_ops;
+			dev->tagset.ops = &nvme_mq_ops;
 
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
 		dev->tagset.nr_maps = HCTX_MAX_TYPES;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/13] nvme-pci: consolidate code for polling non-dedicated queues
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

We have three places that can poll for I/O completions on a normal
interrupt-enabled queue.  All of them are in slow path code, so
consolidate them to a single helper that uses spin_lock_irqsave and
removes the fast path cqe_pending check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d42bb76e5e78..10c26a2e355a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1072,17 +1072,19 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 	return IRQ_NONE;
 }
 
-static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
+/*
+ * Poll for completions any queue, including those not dedicated to polling.
+ * Can be called from any context.
+ */
+static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
 {
+	unsigned long flags;
 	u16 start, end;
 	int found;
 
-	if (!nvme_cqe_pending(nvmeq))
-		return 0;
-
-	spin_lock_irq(&nvmeq->cq_lock);
+	spin_lock_irqsave(&nvmeq->cq_lock, flags);
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irq(&nvmeq->cq_lock);
+	spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1279,7 +1281,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	/*
 	 * Did we miss an interrupt?
 	 */
-	if (__nvme_poll(nvmeq, req->tag)) {
+	if (nvme_poll_irqdisable(nvmeq, req->tag)) {
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, completion polled\n",
 			 req->tag, nvmeq->qid);
@@ -1406,18 +1408,13 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 {
 	struct nvme_queue *nvmeq = &dev->queues[0];
-	u16 start, end;
 
 	if (shutdown)
 		nvme_shutdown_ctrl(&dev->ctrl);
 	else
 		nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
 
-	spin_lock_irq(&nvmeq->cq_lock);
-	nvme_process_cq(nvmeq, &start, &end, -1);
-	spin_unlock_irq(&nvmeq->cq_lock);
-
-	nvme_complete_cqes(nvmeq, start, end);
+	nvme_poll_irqdisable(nvmeq, -1);
 }
 
 static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
@@ -2217,17 +2214,9 @@ static void nvme_del_queue_end(struct request *req, blk_status_t error)
 static void nvme_del_cq_end(struct request *req, blk_status_t error)
 {
 	struct nvme_queue *nvmeq = req->end_io_data;
-	u16 start, end;
 
-	if (!error) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&nvmeq->cq_lock, flags);
-		nvme_process_cq(nvmeq, &start, &end, -1);
-		spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
-
-		nvme_complete_cqes(nvmeq, start, end);
-	}
+	if (!error)
+		nvme_poll_irqdisable(nvmeq, -1);
 
 	nvme_del_queue_end(req, error);
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/13] nvme-pci: consolidate code for polling non-dedicated queues
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


We have three places that can poll for I/O completions on a normal
interrupt-enabled queue.  All of them are in slow path code, so
consolidate them to a single helper that uses spin_lock_irqsave and
removes the fast path cqe_pending check.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d42bb76e5e78..10c26a2e355a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1072,17 +1072,19 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 	return IRQ_NONE;
 }
 
-static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
+/*
+ * Poll for completions any queue, including those not dedicated to polling.
+ * Can be called from any context.
+ */
+static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
 {
+	unsigned long flags;
 	u16 start, end;
 	int found;
 
-	if (!nvme_cqe_pending(nvmeq))
-		return 0;
-
-	spin_lock_irq(&nvmeq->cq_lock);
+	spin_lock_irqsave(&nvmeq->cq_lock, flags);
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irq(&nvmeq->cq_lock);
+	spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1279,7 +1281,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	/*
 	 * Did we miss an interrupt?
 	 */
-	if (__nvme_poll(nvmeq, req->tag)) {
+	if (nvme_poll_irqdisable(nvmeq, req->tag)) {
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, completion polled\n",
 			 req->tag, nvmeq->qid);
@@ -1406,18 +1408,13 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 {
 	struct nvme_queue *nvmeq = &dev->queues[0];
-	u16 start, end;
 
 	if (shutdown)
 		nvme_shutdown_ctrl(&dev->ctrl);
 	else
 		nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
 
-	spin_lock_irq(&nvmeq->cq_lock);
-	nvme_process_cq(nvmeq, &start, &end, -1);
-	spin_unlock_irq(&nvmeq->cq_lock);
-
-	nvme_complete_cqes(nvmeq, start, end);
+	nvme_poll_irqdisable(nvmeq, -1);
 }
 
 static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
@@ -2217,17 +2214,9 @@ static void nvme_del_queue_end(struct request *req, blk_status_t error)
 static void nvme_del_cq_end(struct request *req, blk_status_t error)
 {
 	struct nvme_queue *nvmeq = req->end_io_data;
-	u16 start, end;
 
-	if (!error) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&nvmeq->cq_lock, flags);
-		nvme_process_cq(nvmeq, &start, &end, -1);
-		spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
-
-		nvme_complete_cqes(nvmeq, start, end);
-	}
+	if (!error)
+		nvme_poll_irqdisable(nvmeq, -1);
 
 	nvme_del_queue_end(req, error);
 }
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

Pass the opcode for the delete SQ/CQ command as an argument instead of
the somewhat confusing pass loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 10c26a2e355a..9ceba9900ca3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2244,31 +2244,29 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
 	return 0;
 }
 
-static void nvme_disable_io_queues(struct nvme_dev *dev)
+static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 {
-	int pass, queues = dev->online_queues - 1;
+	int nr_queues = dev->online_queues - 1, sent = 0;
 	unsigned long timeout;
-	u8 opcode = nvme_admin_delete_sq;
 
-	for (pass = 0; pass < 2; pass++) {
-		int sent = 0, i = queues;
-
-		reinit_completion(&dev->ioq_wait);
+	reinit_completion(&dev->ioq_wait);
  retry:
-		timeout = ADMIN_TIMEOUT;
-		for (; i > 0; i--, sent++)
-			if (nvme_delete_queue(&dev->queues[i], opcode))
-				break;
-
-		while (sent--) {
-			timeout = wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
-			if (timeout == 0)
-				return;
-			if (i)
-				goto retry;
-		}
-		opcode = nvme_admin_delete_cq;
+	timeout = ADMIN_TIMEOUT;
+	while (nr_queues > 0) {
+		if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
+			break;
+		nr_queues--;
+		sent++;
 	}
+	while (sent--) {
+		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
+				timeout);
+		if (timeout == 0)
+			return false;
+		if (nr_queues)
+			goto retry;
+	}
+	return true;
 }
 
 /*
@@ -2428,7 +2426,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	nvme_stop_queues(&dev->ctrl);
 
 	if (!dead && dev->ctrl.queue_count > 0) {
-		nvme_disable_io_queues(dev);
+		if (nvme_disable_io_queues(dev, nvme_admin_delete_sq))
+			nvme_disable_io_queues(dev, nvme_admin_delete_cq);
 		nvme_disable_admin_queue(dev, shutdown);
 	}
 	for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


Pass the opcode for the delete SQ/CQ command as an argument instead of
the somewhat confusing pass loop.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 10c26a2e355a..9ceba9900ca3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2244,31 +2244,29 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
 	return 0;
 }
 
-static void nvme_disable_io_queues(struct nvme_dev *dev)
+static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 {
-	int pass, queues = dev->online_queues - 1;
+	int nr_queues = dev->online_queues - 1, sent = 0;
 	unsigned long timeout;
-	u8 opcode = nvme_admin_delete_sq;
 
-	for (pass = 0; pass < 2; pass++) {
-		int sent = 0, i = queues;
-
-		reinit_completion(&dev->ioq_wait);
+	reinit_completion(&dev->ioq_wait);
  retry:
-		timeout = ADMIN_TIMEOUT;
-		for (; i > 0; i--, sent++)
-			if (nvme_delete_queue(&dev->queues[i], opcode))
-				break;
-
-		while (sent--) {
-			timeout = wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
-			if (timeout == 0)
-				return;
-			if (i)
-				goto retry;
-		}
-		opcode = nvme_admin_delete_cq;
+	timeout = ADMIN_TIMEOUT;
+	while (nr_queues > 0) {
+		if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
+			break;
+		nr_queues--;
+		sent++;
 	}
+	while (sent--) {
+		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
+				timeout);
+		if (timeout == 0)
+			return false;
+		if (nr_queues)
+			goto retry;
+	}
+	return true;
 }
 
 /*
@@ -2428,7 +2426,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	nvme_stop_queues(&dev->ctrl);
 
 	if (!dead && dev->ctrl.queue_count > 0) {
-		nvme_disable_io_queues(dev);
+		if (nvme_disable_io_queues(dev, nvme_admin_delete_sq))
+			nvme_disable_io_queues(dev, nvme_admin_delete_cq);
 		nvme_disable_admin_queue(dev, shutdown);
 	}
 	for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

This is the last place outside of nvme_irq that handles CQEs from
interrupt context, and thus is in the way of removing the cq_lock for
normal queues, and avoiding lockdep warnings on the poll queues, for
which we already take it without IRQ disabling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9ceba9900ca3..fb8db7d8170a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -203,6 +203,7 @@ struct nvme_queue {
 	unsigned long flags;
 #define NVMEQ_ENABLED		0
 #define NVMEQ_SQ_CMB		1
+#define NVMEQ_DELETE_ERROR	2
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -2216,7 +2217,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
 	struct nvme_queue *nvmeq = req->end_io_data;
 
 	if (!error)
-		nvme_poll_irqdisable(nvmeq, -1);
+		set_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags);
 
 	nvme_del_queue_end(req, error);
 }
@@ -2258,11 +2259,20 @@ static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 		nr_queues--;
 		sent++;
 	}
-	while (sent--) {
+	while (sent) {
+		struct nvme_queue *nvmeq = &dev->queues[nr_queues + sent];
+
 		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
 				timeout);
 		if (timeout == 0)
 			return false;
+
+		/* handle any remaining CQEs */
+		if (opcode == nvme_admin_delete_cq &&
+		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
+			nvme_poll_irqdisable(nvmeq, -1);
+
+		sent--;
 		if (nr_queues)
 			goto retry;
 	}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


This is the last place outside of nvme_irq that handles CQEs from
interrupt context, and thus is in the way of removing the cq_lock for
normal queues, and avoiding lockdep warnings on the poll queues, for
which we already take it without IRQ disabling.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9ceba9900ca3..fb8db7d8170a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -203,6 +203,7 @@ struct nvme_queue {
 	unsigned long flags;
 #define NVMEQ_ENABLED		0
 #define NVMEQ_SQ_CMB		1
+#define NVMEQ_DELETE_ERROR	2
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -2216,7 +2217,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
 	struct nvme_queue *nvmeq = req->end_io_data;
 
 	if (!error)
-		nvme_poll_irqdisable(nvmeq, -1);
+		set_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags);
 
 	nvme_del_queue_end(req, error);
 }
@@ -2258,11 +2259,20 @@ static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 		nr_queues--;
 		sent++;
 	}
-	while (sent--) {
+	while (sent) {
+		struct nvme_queue *nvmeq = &dev->queues[nr_queues + sent];
+
 		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
 				timeout);
 		if (timeout == 0)
 			return false;
+
+		/* handle any remaining CQEs */
+		if (opcode == nvme_admin_delete_cq &&
+		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
+			nvme_poll_irqdisable(nvmeq, -1);
+
+		sent--;
 		if (nr_queues)
 			goto retry;
 	}
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

Now that we can't poll regular, interrupt driven I/O queues there
is almost nothing that can race with an interrupt.  The only
possible other contexts polling a CQ are the error handler and
queue shutdown, and both are so far off in the slow path that
we can simply use the big hammer of disabling interrupts.

With that we can stop taking the cq_lock for normal queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fb8db7d8170a..d43925fba560 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -186,7 +186,8 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	spinlock_t cq_lock ____cacheline_aligned_in_smp;
+	 /* only used for poll queues: */
+	spinlock_t cq_poll_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
 	dma_addr_t sq_dma_addr;
@@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
 	irqreturn_t ret = IRQ_NONE;
 	u16 start, end;
 
-	spin_lock(&nvmeq->cq_lock);
+	/*
+	 * The rmb/wmb pair ensures we see all updates from a previous run of
+	 * the irq handler, even if that was on another CPU.
+	 */
+	rmb();
 	if (nvmeq->cq_head != nvmeq->last_cq_head)
 		ret = IRQ_HANDLED;
 	nvme_process_cq(nvmeq, &start, &end, -1);
 	nvmeq->last_cq_head = nvmeq->cq_head;
-	spin_unlock(&nvmeq->cq_lock);
+	wmb();
 
 	if (start != end) {
 		nvme_complete_cqes(nvmeq, start, end);
@@ -1079,13 +1084,24 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
  */
 static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
 {
-	unsigned long flags;
+	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
 	u16 start, end;
 	int found;
 
-	spin_lock_irqsave(&nvmeq->cq_lock, flags);
+	/*
+	 * For a poll queue we need to protect against the polling thread
+	 * using the CQ lock.  For normal interrupt driven threads we have
+	 * to disable the interrupt to avoid racing with it.
+	 */
+	if (nvmeq->cq_vector == -1)
+		spin_lock(&nvmeq->cq_poll_lock);
+	else
+		disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
+	if (nvmeq->cq_vector == -1)
+		spin_unlock(&nvmeq->cq_poll_lock);
+	else
+		enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1100,9 +1116,9 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx)
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
-	spin_lock(&nvmeq->cq_lock);
+	spin_lock(&nvmeq->cq_poll_lock);
 	found = nvme_process_cq(nvmeq, &start, &end, -1);
-	spin_unlock(&nvmeq->cq_lock);
+	spin_unlock(&nvmeq->cq_poll_lock);
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1482,7 +1498,7 @@ static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
 	nvmeq->q_dmadev = dev->dev;
 	nvmeq->dev = dev;
 	spin_lock_init(&nvmeq->sq_lock);
-	spin_lock_init(&nvmeq->cq_lock);
+	spin_lock_init(&nvmeq->cq_poll_lock);
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
 	nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];
@@ -1518,7 +1534,6 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 {
 	struct nvme_dev *dev = nvmeq->dev;
 
-	spin_lock_irq(&nvmeq->cq_lock);
 	nvmeq->sq_tail = 0;
 	nvmeq->last_sq_tail = 0;
 	nvmeq->cq_head = 0;
@@ -1527,7 +1542,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 	memset((void *)nvmeq->cqes, 0, CQ_SIZE(nvmeq->q_depth));
 	nvme_dbbuf_init(dev, nvmeq, qid);
 	dev->online_queues++;
-	spin_unlock_irq(&nvmeq->cq_lock);
+	wmb(); /* ensure the first interrupt sees the initialization */
 }
 
 static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


Now that we can't poll regular, interrupt driven I/O queues there
is almost nothing that can race with an interrupt.  The only
possible other contexts polling a CQ are the error handler and
queue shutdown, and both are so far off in the slow path that
we can simply use the big hammer of disabling interrupts.

With that we can stop taking the cq_lock for normal queues.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fb8db7d8170a..d43925fba560 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -186,7 +186,8 @@ struct nvme_queue {
 	struct nvme_dev *dev;
 	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
-	spinlock_t cq_lock ____cacheline_aligned_in_smp;
+	 /* only used for poll queues: */
+	spinlock_t cq_poll_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
 	dma_addr_t sq_dma_addr;
@@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
 	irqreturn_t ret = IRQ_NONE;
 	u16 start, end;
 
-	spin_lock(&nvmeq->cq_lock);
+	/*
+	 * The rmb/wmb pair ensures we see all updates from a previous run of
+	 * the irq handler, even if that was on another CPU.
+	 */
+	rmb();
 	if (nvmeq->cq_head != nvmeq->last_cq_head)
 		ret = IRQ_HANDLED;
 	nvme_process_cq(nvmeq, &start, &end, -1);
 	nvmeq->last_cq_head = nvmeq->cq_head;
-	spin_unlock(&nvmeq->cq_lock);
+	wmb();
 
 	if (start != end) {
 		nvme_complete_cqes(nvmeq, start, end);
@@ -1079,13 +1084,24 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
  */
 static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
 {
-	unsigned long flags;
+	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
 	u16 start, end;
 	int found;
 
-	spin_lock_irqsave(&nvmeq->cq_lock, flags);
+	/*
+	 * For a poll queue we need to protect against the polling thread
+	 * using the CQ lock.  For normal interrupt driven threads we have
+	 * to disable the interrupt to avoid racing with it.
+	 */
+	if (nvmeq->cq_vector == -1)
+		spin_lock(&nvmeq->cq_poll_lock);
+	else
+		disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
+	if (nvmeq->cq_vector == -1)
+		spin_unlock(&nvmeq->cq_poll_lock);
+	else
+		enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1100,9 +1116,9 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx)
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
-	spin_lock(&nvmeq->cq_lock);
+	spin_lock(&nvmeq->cq_poll_lock);
 	found = nvme_process_cq(nvmeq, &start, &end, -1);
-	spin_unlock(&nvmeq->cq_lock);
+	spin_unlock(&nvmeq->cq_poll_lock);
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1482,7 +1498,7 @@ static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
 	nvmeq->q_dmadev = dev->dev;
 	nvmeq->dev = dev;
 	spin_lock_init(&nvmeq->sq_lock);
-	spin_lock_init(&nvmeq->cq_lock);
+	spin_lock_init(&nvmeq->cq_poll_lock);
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
 	nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];
@@ -1518,7 +1534,6 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 {
 	struct nvme_dev *dev = nvmeq->dev;
 
-	spin_lock_irq(&nvmeq->cq_lock);
 	nvmeq->sq_tail = 0;
 	nvmeq->last_sq_tail = 0;
 	nvmeq->cq_head = 0;
@@ -1527,7 +1542,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 	memset((void *)nvmeq->cqes, 0, CQ_SIZE(nvmeq->q_depth));
 	nvme_dbbuf_init(dev, nvmeq, qid);
 	dev->online_queues++;
-	spin_unlock_irq(&nvmeq->cq_lock);
+	wmb(); /* ensure the first interrupt sees the initialization */
 }
 
 static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/13] nvme-rdma: remove I/O polling support
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

The code was always a bit of a hack that digs far too much into
RDMA core internals.  Lets kick it out and reimplement proper
dedicated poll queues as needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/rdma.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ccfde6c7c0a5..a62e9f177c06 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1736,29 +1736,6 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_IOERR;
 }
 
-static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
-{
-	struct nvme_rdma_queue *queue = hctx->driver_data;
-	struct ib_cq *cq = queue->ib_cq;
-	struct ib_wc wc;
-	int found = 0;
-
-	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		struct ib_cqe *cqe = wc.wr_cqe;
-
-		if (cqe) {
-			if (cqe->done == nvme_rdma_recv_done) {
-				nvme_rdma_recv_done(cq, &wc);
-				found++;
-			} else {
-				cqe->done(cq, &wc);
-			}
-		}
-	}
-
-	return found;
-}
-
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
@@ -1780,7 +1757,6 @@ static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.init_request	= nvme_rdma_init_request,
 	.exit_request	= nvme_rdma_exit_request,
 	.init_hctx	= nvme_rdma_init_hctx,
-	.poll		= nvme_rdma_poll,
 	.timeout	= nvme_rdma_timeout,
 	.map_queues	= nvme_rdma_map_queues,
 };
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/13] nvme-rdma: remove I/O polling support
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


The code was always a bit of a hack that digs far too much into
RDMA core internals.  Lets kick it out and reimplement proper
dedicated poll queues as needed.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/rdma.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ccfde6c7c0a5..a62e9f177c06 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1736,29 +1736,6 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return BLK_STS_IOERR;
 }
 
-static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
-{
-	struct nvme_rdma_queue *queue = hctx->driver_data;
-	struct ib_cq *cq = queue->ib_cq;
-	struct ib_wc wc;
-	int found = 0;
-
-	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		struct ib_cqe *cqe = wc.wr_cqe;
-
-		if (cqe) {
-			if (cqe->done == nvme_rdma_recv_done) {
-				nvme_rdma_recv_done(cq, &wc);
-				found++;
-			} else {
-				cqe->done(cq, &wc);
-			}
-		}
-	}
-
-	return found;
-}
-
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
@@ -1780,7 +1757,6 @@ static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.init_request	= nvme_rdma_init_request,
 	.exit_request	= nvme_rdma_exit_request,
 	.init_hctx	= nvme_rdma_init_hctx,
-	.poll		= nvme_rdma_poll,
 	.timeout	= nvme_rdma_timeout,
 	.map_queues	= nvme_rdma_map_queues,
 };
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/13] nvme-mpath: remove I/O polling support
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

The ->poll_fn has been stale for a while, as a lot of places check for mq
ops.  But there is no real point in it anyway, as we don't even use
the multipath code for subsystems without multiple ports, which is usually
what we do high performance I/O to.  If it really becomes an issue we
should rework the nvme code to also skip the multipath code for any
private namespace, even if that could mean some trouble when rescanning.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/multipath.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index ffebdd0ae34b..ec310b1b9267 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,21 +220,6 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 	return ret;
 }
 
-static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc, bool spin)
-{
-	struct nvme_ns_head *head = q->queuedata;
-	struct nvme_ns *ns;
-	int found = 0;
-	int srcu_idx;
-
-	srcu_idx = srcu_read_lock(&head->srcu);
-	ns = srcu_dereference(head->current_path[numa_node_id()], &head->srcu);
-	if (likely(ns && nvme_path_is_optimized(ns)))
-		found = ns->queue->poll_fn(q, qc, spin);
-	srcu_read_unlock(&head->srcu, srcu_idx);
-	return found;
-}
-
 static void nvme_requeue_work(struct work_struct *work)
 {
 	struct nvme_ns_head *head =
@@ -281,7 +266,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 		goto out;
 	q->queuedata = head;
 	blk_queue_make_request(q, nvme_ns_head_make_request);
-	q->poll_fn = nvme_ns_head_poll;
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 	/* set to a default value for 512 until disk is validated */
 	blk_queue_logical_block_size(q, 512);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/13] nvme-mpath: remove I/O polling support
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


The ->poll_fn has been stale for a while, as a lot of places check for mq
ops.  But there is no real point in it anyway, as we don't even use
the multipath code for subsystems without multiple ports, which is usually
what we do high performance I/O to.  If it really becomes an issue we
should rework the nvme code to also skip the multipath code for any
private namespace, even if that could mean some trouble when rescanning.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/multipath.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index ffebdd0ae34b..ec310b1b9267 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,21 +220,6 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 	return ret;
 }
 
-static int nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc, bool spin)
-{
-	struct nvme_ns_head *head = q->queuedata;
-	struct nvme_ns *ns;
-	int found = 0;
-	int srcu_idx;
-
-	srcu_idx = srcu_read_lock(&head->srcu);
-	ns = srcu_dereference(head->current_path[numa_node_id()], &head->srcu);
-	if (likely(ns && nvme_path_is_optimized(ns)))
-		found = ns->queue->poll_fn(q, qc, spin);
-	srcu_read_unlock(&head->srcu, srcu_idx);
-	return found;
-}
-
 static void nvme_requeue_work(struct work_struct *work)
 {
 	struct nvme_ns_head *head =
@@ -281,7 +266,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 		goto out;
 	q->queuedata = head;
 	blk_queue_make_request(q, nvme_ns_head_make_request);
-	q->poll_fn = nvme_ns_head_poll;
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 	/* set to a default value for 512 until disk is validated */
 	blk_queue_logical_block_size(q, 512);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/13] block: remove ->poll_fn
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

This was intended to support users like nvme multipath, but is just
getting in the way and adding another indirect call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c       | 23 -----------------------
 block/blk-mq.c         | 24 +++++++++++++++++++-----
 include/linux/blkdev.h |  2 --
 3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3f6f5e6c2fe4..942276399085 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1249,29 +1249,6 @@ blk_qc_t submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
-/**
- * blk_poll - poll for IO completions
- * @q:  the queue
- * @cookie: cookie passed back at IO submission time
- * @spin: whether to spin for completions
- *
- * Description:
- *    Poll for completions on the passed in queue. Returns number of
- *    completed entries found. If @spin is true, then blk_poll will continue
- *    looping until at least one completion is found, unless the task is
- *    otherwise marked running (or we need to reschedule).
- */
-int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
-{
-	if (!q->poll_fn || !blk_qc_t_valid(cookie))
-		return 0;
-
-	if (current->plug)
-		blk_flush_plug_list(current->plug, false);
-	return q->poll_fn(q, cookie, spin);
-}
-EXPORT_SYMBOL_GPL(blk_poll);
-
 /**
  * blk_cloned_rq_check_limits - Helper function to check a cloned request
  *                              for new the queue limits
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7dcef565dc0f..9c90c5038d07 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -38,7 +38,6 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -2823,8 +2822,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	spin_lock_init(&q->requeue_lock);
 
 	blk_queue_make_request(q, blk_mq_make_request);
-	if (q->mq_ops->poll)
-		q->poll_fn = blk_mq_poll;
 
 	/*
 	 * Do this after blk_queue_make_request() overrides it...
@@ -3385,14 +3382,30 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+/**
+ * blk_poll - poll for IO completions
+ * @q:  the queue
+ * @cookie: cookie passed back at IO submission time
+ * @spin: whether to spin for completions
+ *
+ * Description:
+ *    Poll for completions on the passed in queue. Returns number of
+ *    completed entries found. If @spin is true, then blk_poll will continue
+ *    looping until at least one completion is found, unless the task is
+ *    otherwise marked running (or we need to reschedule).
+ */
+int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 	long state;
 
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_qc_t_valid(cookie) ||
+	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		return 0;
 
+	if (current->plug)
+		blk_flush_plug_list(current->plug, false);
+
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
 
 	/*
@@ -3433,6 +3446,7 @@ static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	__set_current_state(TASK_RUNNING);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blk_poll);
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
 {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 08d940f85fa0..0b3874bdbc6a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -283,7 +283,6 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
-typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t, bool spin);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
@@ -401,7 +400,6 @@ struct request_queue {
 	struct rq_qos		*rq_qos;
 
 	make_request_fn		*make_request_fn;
-	poll_q_fn		*poll_fn;
 	dma_drain_needed_fn	*dma_drain_needed;
 
 	const struct blk_mq_ops	*mq_ops;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/13] block: remove ->poll_fn
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


This was intended to support users like nvme multipath, but is just
getting in the way and adding another indirect call.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-core.c       | 23 -----------------------
 block/blk-mq.c         | 24 +++++++++++++++++++-----
 include/linux/blkdev.h |  2 --
 3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3f6f5e6c2fe4..942276399085 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1249,29 +1249,6 @@ blk_qc_t submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
-/**
- * blk_poll - poll for IO completions
- * @q:  the queue
- * @cookie: cookie passed back at IO submission time
- * @spin: whether to spin for completions
- *
- * Description:
- *    Poll for completions on the passed in queue. Returns number of
- *    completed entries found. If @spin is true, then blk_poll will continue
- *    looping until at least one completion is found, unless the task is
- *    otherwise marked running (or we need to reschedule).
- */
-int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
-{
-	if (!q->poll_fn || !blk_qc_t_valid(cookie))
-		return 0;
-
-	if (current->plug)
-		blk_flush_plug_list(current->plug, false);
-	return q->poll_fn(q, cookie, spin);
-}
-EXPORT_SYMBOL_GPL(blk_poll);
-
 /**
  * blk_cloned_rq_check_limits - Helper function to check a cloned request
  *                              for new the queue limits
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7dcef565dc0f..9c90c5038d07 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -38,7 +38,6 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -2823,8 +2822,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	spin_lock_init(&q->requeue_lock);
 
 	blk_queue_make_request(q, blk_mq_make_request);
-	if (q->mq_ops->poll)
-		q->poll_fn = blk_mq_poll;
 
 	/*
 	 * Do this after blk_queue_make_request() overrides it...
@@ -3385,14 +3382,30 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+/**
+ * blk_poll - poll for IO completions
+ * @q:  the queue
+ * @cookie: cookie passed back at IO submission time
+ * @spin: whether to spin for completions
+ *
+ * Description:
+ *    Poll for completions on the passed in queue. Returns number of
+ *    completed entries found. If @spin is true, then blk_poll will continue
+ *    looping until at least one completion is found, unless the task is
+ *    otherwise marked running (or we need to reschedule).
+ */
+int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 	long state;
 
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_qc_t_valid(cookie) ||
+	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		return 0;
 
+	if (current->plug)
+		blk_flush_plug_list(current->plug, false);
+
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
 
 	/*
@@ -3433,6 +3446,7 @@ static int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	__set_current_state(TASK_RUNNING);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blk_poll);
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
 {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 08d940f85fa0..0b3874bdbc6a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -283,7 +283,6 @@ static inline unsigned short req_get_ioprio(struct request *req)
 struct blk_queue_ctx;
 
 typedef blk_qc_t (make_request_fn) (struct request_queue *q, struct bio *bio);
-typedef int (poll_q_fn) (struct request_queue *q, blk_qc_t, bool spin);
 
 struct bio_vec;
 typedef int (dma_drain_needed_fn)(struct request *);
@@ -401,7 +400,6 @@ struct request_queue {
 	struct rq_qos		*rq_qos;
 
 	make_request_fn		*make_request_fn;
-	poll_q_fn		*poll_fn;
 	dma_drain_needed_fn	*dma_drain_needed;
 
 	const struct blk_mq_ops	*mq_ops;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/13] block: only allow polling if a poll queue_map exists
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

This avoids having to have differnet mq_ops for different setups
with or without poll queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c       |  2 +-
 drivers/nvme/host/pci.c | 29 +++++++++--------------------
 2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9f0cb370b39b..bb7c642ffefa 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -402,7 +402,7 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 	unsigned long poll_on;
 	ssize_t ret;
 
-	if (!q->mq_ops || !q->mq_ops->poll)
+	if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL)
 		return -EINVAL;
 
 	ret = queue_var_store(&poll_on, page, count);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d43925fba560..3ecc0bf75a62 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1600,22 +1600,15 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 	.timeout	= nvme_timeout,
 };
 
-#define NVME_SHARED_MQ_OPS					\
-	.queue_rq		= nvme_queue_rq,		\
-	.commit_rqs		= nvme_commit_rqs,		\
-	.complete		= nvme_pci_complete_rq,		\
-	.init_hctx		= nvme_init_hctx,		\
-	.init_request		= nvme_init_request,		\
-	.map_queues		= nvme_pci_map_queues,		\
-	.timeout		= nvme_timeout			\
-
 static const struct blk_mq_ops nvme_mq_ops = {
-	NVME_SHARED_MQ_OPS,
-};
-
-static const struct blk_mq_ops nvme_mq_poll_ops = {
-	NVME_SHARED_MQ_OPS,
-	.poll			= nvme_poll,
+	.queue_rq	= nvme_queue_rq,
+	.complete	= nvme_pci_complete_rq,
+	.commit_rqs	= nvme_commit_rqs,
+	.init_hctx	= nvme_init_hctx,
+	.init_request	= nvme_init_request,
+	.map_queues	= nvme_pci_map_queues,
+	.timeout	= nvme_timeout,
+	.poll		= nvme_poll,
 };
 
 static void nvme_dev_remove_admin(struct nvme_dev *dev)
@@ -2302,11 +2295,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	int ret;
 
 	if (!dev->ctrl.tagset) {
-		if (dev->io_queues[HCTX_TYPE_POLL])
-			dev->tagset.ops = &nvme_mq_poll_ops;
-		else
-			dev->tagset.ops = &nvme_mq_ops;
-
+		dev->tagset.ops = &nvme_mq_ops;
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
 		dev->tagset.nr_maps = HCTX_MAX_TYPES;
 		dev->tagset.timeout = NVME_IO_TIMEOUT;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/13] block: only allow polling if a poll queue_map exists
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


This avoids having to have differnet mq_ops for different setups
with or without poll queues.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-sysfs.c       |  2 +-
 drivers/nvme/host/pci.c | 29 +++++++++--------------------
 2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9f0cb370b39b..bb7c642ffefa 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -402,7 +402,7 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 	unsigned long poll_on;
 	ssize_t ret;
 
-	if (!q->mq_ops || !q->mq_ops->poll)
+	if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL)
 		return -EINVAL;
 
 	ret = queue_var_store(&poll_on, page, count);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d43925fba560..3ecc0bf75a62 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1600,22 +1600,15 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 	.timeout	= nvme_timeout,
 };
 
-#define NVME_SHARED_MQ_OPS					\
-	.queue_rq		= nvme_queue_rq,		\
-	.commit_rqs		= nvme_commit_rqs,		\
-	.complete		= nvme_pci_complete_rq,		\
-	.init_hctx		= nvme_init_hctx,		\
-	.init_request		= nvme_init_request,		\
-	.map_queues		= nvme_pci_map_queues,		\
-	.timeout		= nvme_timeout			\
-
 static const struct blk_mq_ops nvme_mq_ops = {
-	NVME_SHARED_MQ_OPS,
-};
-
-static const struct blk_mq_ops nvme_mq_poll_ops = {
-	NVME_SHARED_MQ_OPS,
-	.poll			= nvme_poll,
+	.queue_rq	= nvme_queue_rq,
+	.complete	= nvme_pci_complete_rq,
+	.commit_rqs	= nvme_commit_rqs,
+	.init_hctx	= nvme_init_hctx,
+	.init_request	= nvme_init_request,
+	.map_queues	= nvme_pci_map_queues,
+	.timeout	= nvme_timeout,
+	.poll		= nvme_poll,
 };
 
 static void nvme_dev_remove_admin(struct nvme_dev *dev)
@@ -2302,11 +2295,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	int ret;
 
 	if (!dev->ctrl.tagset) {
-		if (dev->io_queues[HCTX_TYPE_POLL])
-			dev->tagset.ops = &nvme_mq_poll_ops;
-		else
-			dev->tagset.ops = &nvme_mq_ops;
-
+		dev->tagset.ops = &nvme_mq_ops;
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
 		dev->tagset.nr_maps = HCTX_MAX_TYPES;
 		dev->tagset.timeout = NVME_IO_TIMEOUT;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/13] block: enable polling by default if a poll map is initalized
  2018-11-29 19:12 ` Christoph Hellwig
@ 2018-11-29 19:13   ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

If the user did setup polling in the driver we should not require
another know in the block layer to enable it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c90c5038d07..a550a00ac00c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2811,6 +2811,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->tag_set = set;
 
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
+	if (set->nr_maps > HCTX_TYPE_POLL)
+		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
 
 	if (!(set->flags & BLK_MQ_F_SG_MERGE))
 		blk_queue_flag_set(QUEUE_FLAG_NO_SG_MERGE, q);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/13] block: enable polling by default if a poll map is initalized
@ 2018-11-29 19:13   ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-29 19:13 UTC (permalink / raw)


If the user did setup polling in the driver we should not require
another know in the block layer to enable it.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c90c5038d07..a550a00ac00c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2811,6 +2811,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->tag_set = set;
 
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
+	if (set->nr_maps > HCTX_TYPE_POLL)
+		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
 
 	if (!(set->flags & BLK_MQ_F_SG_MERGE))
 		blk_queue_flag_set(QUEUE_FLAG_NO_SG_MERGE, q);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-29 19:12   ` Christoph Hellwig
@ 2018-11-29 19:50     ` Jens Axboe
  -1 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-29 19:50 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: Max Gurtovoy, linux-nvme, linux-block

On 11/29/18 12:12 PM, Christoph Hellwig wrote:
> Having another indirect all in the fast path doesn't really help
> in our post-spectre world.  Also having too many queue type is just
> going to create confusion, so I'd rather manage them centrally.
> 
> Note that the queue type naming and ordering changes a bit - the
> first index now is the defauly queue for everything not explicitly
                         ^^^^^^^

default

> marked, the optional ones are read and poll queues.

Looks fine to me, was hoping NOT to bring this into the core, but
I guess it might be more manageable there in the long run. And it's
hard to argue with getting rid of the flags_to_type indirect.

Side not, should probably make the sysfs 'type' attribute a string
at this point.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-29 19:50     ` Jens Axboe
  0 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-29 19:50 UTC (permalink / raw)


On 11/29/18 12:12 PM, Christoph Hellwig wrote:
> Having another indirect all in the fast path doesn't really help
> in our post-spectre world.  Also having too many queue type is just
> going to create confusion, so I'd rather manage them centrally.
> 
> Note that the queue type naming and ordering changes a bit - the
> first index now is the defauly queue for everything not explicitly
                         ^^^^^^^

default

> marked, the optional ones are read and poll queues.

Looks fine to me, was hoping NOT to bring this into the core, but
I guess it might be more manageable there in the long run. And it's
hard to argue with getting rid of the flags_to_type indirect.

Side not, should probably make the sysfs 'type' attribute a string
at this point.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-29 19:12   ` Christoph Hellwig
@ 2018-11-29 20:19     ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
> +enum hctx_type {
> +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> +	HCTX_TYPE_READ,		/* just for READ I/O */
> +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> +
> +	HCTX_MAX_TYPES,
>  };

Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
queues!

I'm not that sad about it, though.

Reviewed-by: Keith Busch <keith.busch@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-29 20:19     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:19 UTC (permalink / raw)


On Thu, Nov 29, 2018@08:12:58PM +0100, Christoph Hellwig wrote:
> +enum hctx_type {
> +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> +	HCTX_TYPE_READ,		/* just for READ I/O */
> +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> +
> +	HCTX_MAX_TYPES,
>  };

Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
queues!

I'm not that sad about it, though.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled
  2018-11-29 19:12   ` Christoph Hellwig
@ 2018-11-29 20:19     ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Thu, Nov 29, 2018 at 08:12:59PM +0100, Christoph Hellwig wrote:
> This gets rid of all the messing with cq_vector and the ->polled field
> by using an atomic bitop to mark the queue enabled or not.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Keith Busch <keith.busch@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled
@ 2018-11-29 20:19     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:19 UTC (permalink / raw)


On Thu, Nov 29, 2018@08:12:59PM +0100, Christoph Hellwig wrote:
> This gets rid of all the messing with cq_vector and the ->polled field
> by using an atomic bitop to mark the queue enabled or not.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>

Looks good.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit
  2018-11-29 19:13   ` Christoph Hellwig
@ 2018-11-29 20:22     ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Thu, Nov 29, 2018 at 08:13:00PM +0100, Christoph Hellwig wrote:
> Use a bit flag to mark if the SQ was allocated from the CMB, and clean
> up the surrounding code a bit.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Keith Busch <keith.busch@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit
@ 2018-11-29 20:22     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:22 UTC (permalink / raw)


On Thu, Nov 29, 2018@08:13:00PM +0100, Christoph Hellwig wrote:
> Use a bit flag to mark if the SQ was allocated from the CMB, and clean
> up the surrounding code a bit.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>

Looks good.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-29 20:19     ` Keith Busch
@ 2018-11-29 20:25       ` Jens Axboe
  -1 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-29 20:25 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On 11/29/18 1:19 PM, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
>> +enum hctx_type {
>> +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
>> +	HCTX_TYPE_READ,		/* just for READ I/O */
>> +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
>> +
>> +	HCTX_MAX_TYPES,
>>  };
> 
> Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
> queues!
> 
> I'm not that sad about it, though.

That's why I wanted these to be driver private, so you could just expand
at will. But it's not like we can't do that now, we'll just add some
extra types in here. The downside is that if we have a bunch of drivers
having dis-separate queues, then we end up with a bigger MAX number than
we would have with driver private queues.

IOW, I don't think this is ruining anything for you.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-29 20:25       ` Jens Axboe
  0 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-29 20:25 UTC (permalink / raw)


On 11/29/18 1:19 PM, Keith Busch wrote:
> On Thu, Nov 29, 2018@08:12:58PM +0100, Christoph Hellwig wrote:
>> +enum hctx_type {
>> +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
>> +	HCTX_TYPE_READ,		/* just for READ I/O */
>> +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
>> +
>> +	HCTX_MAX_TYPES,
>>  };
> 
> Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
> queues!
> 
> I'm not that sad about it, though.

That's why I wanted these to be driver private, so you could just expand
at will. But it's not like we can't do that now, we'll just add some
extra types in here. The downside is that if we have a bunch of drivers
having dis-separate queues, then we end up with a bigger MAX number than
we would have with driver private queues.

IOW, I don't think this is ruining anything for you.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
  2018-11-29 19:13   ` Christoph Hellwig
@ 2018-11-29 20:36     ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> This is the last place outside of nvme_irq that handles CQEs from
> interrupt context, and thus is in the way of removing the cq_lock for
> normal queues, and avoiding lockdep warnings on the poll queues, for
> which we already take it without IRQ disabling.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9ceba9900ca3..fb8db7d8170a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -203,6 +203,7 @@ struct nvme_queue {
>  	unsigned long flags;
>  #define NVMEQ_ENABLED		0
>  #define NVMEQ_SQ_CMB		1
> +#define NVMEQ_DELETE_ERROR	2
>  	u32 *dbbuf_sq_db;
>  	u32 *dbbuf_cq_db;
>  	u32 *dbbuf_sq_ei;
> @@ -2216,7 +2217,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
>  	struct nvme_queue *nvmeq = req->end_io_data;
>  
>  	if (!error)
> -		nvme_poll_irqdisable(nvmeq, -1);
> +		set_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags);
>  
>  	nvme_del_queue_end(req, error);
>  }
> @@ -2258,11 +2259,20 @@ static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
>  		nr_queues--;
>  		sent++;
>  	}
> -	while (sent--) {
> +	while (sent) {
> +		struct nvme_queue *nvmeq = &dev->queues[nr_queues + sent];
> +
>  		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
>  				timeout);
>  		if (timeout == 0)
>  			return false;
> +
> +		/* handle any remaining CQEs */
> +		if (opcode == nvme_admin_delete_cq &&
> +		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
> +			nvme_poll_irqdisable(nvmeq, -1);

We're dispatchig lots of queue deletions in parallel, and they may
complete in any order. I don't see how you can guarantee that the
wait_for_completion() will return for the nvmeq that you're polling.

You also need to clear NVMEQ_DELETE_ERROR somewhere later, maybe in
nvme_init_queue().

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
@ 2018-11-29 20:36     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:36 UTC (permalink / raw)


On Thu, Nov 29, 2018@08:13:04PM +0100, Christoph Hellwig wrote:
> This is the last place outside of nvme_irq that handles CQEs from
> interrupt context, and thus is in the way of removing the cq_lock for
> normal queues, and avoiding lockdep warnings on the poll queues, for
> which we already take it without IRQ disabling.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/pci.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9ceba9900ca3..fb8db7d8170a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -203,6 +203,7 @@ struct nvme_queue {
>  	unsigned long flags;
>  #define NVMEQ_ENABLED		0
>  #define NVMEQ_SQ_CMB		1
> +#define NVMEQ_DELETE_ERROR	2
>  	u32 *dbbuf_sq_db;
>  	u32 *dbbuf_cq_db;
>  	u32 *dbbuf_sq_ei;
> @@ -2216,7 +2217,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
>  	struct nvme_queue *nvmeq = req->end_io_data;
>  
>  	if (!error)
> -		nvme_poll_irqdisable(nvmeq, -1);
> +		set_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags);
>  
>  	nvme_del_queue_end(req, error);
>  }
> @@ -2258,11 +2259,20 @@ static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
>  		nr_queues--;
>  		sent++;
>  	}
> -	while (sent--) {
> +	while (sent) {
> +		struct nvme_queue *nvmeq = &dev->queues[nr_queues + sent];
> +
>  		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
>  				timeout);
>  		if (timeout == 0)
>  			return false;
> +
> +		/* handle any remaining CQEs */
> +		if (opcode == nvme_admin_delete_cq &&
> +		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
> +			nvme_poll_irqdisable(nvmeq, -1);

We're dispatchig lots of queue deletions in parallel, and they may
complete in any order. I don't see how you can guarantee that the
wait_for_completion() will return for the nvmeq that you're polling.

You also need to clear NVMEQ_DELETE_ERROR somewhere later, maybe in
nvme_init_queue().

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues
  2018-11-29 19:13   ` Christoph Hellwig
@ 2018-11-29 20:37     ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Thu, Nov 29, 2018 at 08:13:03PM +0100, Christoph Hellwig wrote:
> Pass the opcode for the delete SQ/CQ command as an argument instead of
> the somewhat confusing pass loop.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Keith Busch <keith.busch@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues
@ 2018-11-29 20:37     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 20:37 UTC (permalink / raw)


On Thu, Nov 29, 2018@08:13:03PM +0100, Christoph Hellwig wrote:
> Pass the opcode for the delete SQ/CQ command as an argument instead of
> the somewhat confusing pass loop.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>

Looks good.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues
  2018-11-29 19:13   ` Christoph Hellwig
@ 2018-11-29 21:08     ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 21:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Thu, Nov 29, 2018 at 08:13:05PM +0100, Christoph Hellwig wrote:
> @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
>  	irqreturn_t ret = IRQ_NONE;
>  	u16 start, end;
>  
> -	spin_lock(&nvmeq->cq_lock);
> +	/*
> +	 * The rmb/wmb pair ensures we see all updates from a previous run of
> +	 * the irq handler, even if that was on another CPU.
> +	 */
> +	rmb();
>  	if (nvmeq->cq_head != nvmeq->last_cq_head)
>  		ret = IRQ_HANDLED;
>  	nvme_process_cq(nvmeq, &start, &end, -1);
>  	nvmeq->last_cq_head = nvmeq->cq_head;
> -	spin_unlock(&nvmeq->cq_lock);
> +	wmb();
>  
>  	if (start != end) {
>  		nvme_complete_cqes(nvmeq, start, end);

We saved the "start, end" only so we could do the real completion
without holding a queue lock. Since you're not using a lock anymore,
a further optimization can complete the CQE inline with moving the cq
head so that we don't go through queue twice.

That can be a follow on, though, this patch looks fine.

Reviewed-by: Keith Busch <keith.busch@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues
@ 2018-11-29 21:08     ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-29 21:08 UTC (permalink / raw)


On Thu, Nov 29, 2018@08:13:05PM +0100, Christoph Hellwig wrote:
> @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
>  	irqreturn_t ret = IRQ_NONE;
>  	u16 start, end;
>  
> -	spin_lock(&nvmeq->cq_lock);
> +	/*
> +	 * The rmb/wmb pair ensures we see all updates from a previous run of
> +	 * the irq handler, even if that was on another CPU.
> +	 */
> +	rmb();
>  	if (nvmeq->cq_head != nvmeq->last_cq_head)
>  		ret = IRQ_HANDLED;
>  	nvme_process_cq(nvmeq, &start, &end, -1);
>  	nvmeq->last_cq_head = nvmeq->cq_head;
> -	spin_unlock(&nvmeq->cq_lock);
> +	wmb();
>  
>  	if (start != end) {
>  		nvme_complete_cqes(nvmeq, start, end);

We saved the "start, end" only so we could do the real completion
without holding a queue lock. Since you're not using a lock anymore,
a further optimization can complete the CQE inline with moving the cq
head so that we don't go through queue twice.

That can be a follow on, though, this patch looks fine.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-29 19:50     ` Jens Axboe
@ 2018-11-30  7:56       ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  7:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, Max Gurtovoy,
	linux-nvme, linux-block

On Thu, Nov 29, 2018 at 07:50:09PM +0000, Jens Axboe wrote:
> > in our post-spectre world.  Also having too many queue type is just
> > going to create confusion, so I'd rather manage them centrally.
> > 
> > Note that the queue type naming and ordering changes a bit - the
> > first index now is the defauly queue for everything not explicitly
>                          ^^^^^^^
> 
> default

Fixed.

> Side not, should probably make the sysfs 'type' attribute a string
> at this point.

Fixed as well.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-30  7:56       ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  7:56 UTC (permalink / raw)


On Thu, Nov 29, 2018@07:50:09PM +0000, Jens Axboe wrote:
> > in our post-spectre world.  Also having too many queue type is just
> > going to create confusion, so I'd rather manage them centrally.
> > 
> > Note that the queue type naming and ordering changes a bit - the
> > first index now is the defauly queue for everything not explicitly
>                          ^^^^^^^
> 
> default

Fixed.

> Side not, should probably make the sysfs 'type' attribute a string
> at this point.

Fixed as well.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-29 20:19     ` Keith Busch
@ 2018-11-30  8:00       ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  8:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, Max Gurtovoy,
	linux-nvme, linux-block

On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
> > +enum hctx_type {
> > +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> > +	HCTX_TYPE_READ,		/* just for READ I/O */
> > +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> > +
> > +	HCTX_MAX_TYPES,
> >  };
> 
> Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
> queues!

Wo between what do you even want to round robin?  If it is between
reads and writes that's easy.  If we want priority reads or writes
(separate from polling) that's also still fairly easily.

Btw, one thing I wanted to try once I get hold of the right hardware
is to mark the poll queues as priority queues and see if that makes
any differents in poll IOPS/latency.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-30  8:00       ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  8:00 UTC (permalink / raw)


On Thu, Nov 29, 2018@01:19:14PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018@08:12:58PM +0100, Christoph Hellwig wrote:
> > +enum hctx_type {
> > +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> > +	HCTX_TYPE_READ,		/* just for READ I/O */
> > +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> > +
> > +	HCTX_MAX_TYPES,
> >  };
> 
> Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
> queues!

Wo between what do you even want to round robin?  If it is between
reads and writes that's easy.  If we want priority reads or writes
(separate from polling) that's also still fairly easily.

Btw, one thing I wanted to try once I get hold of the right hardware
is to mark the poll queues as priority queues and see if that makes
any differents in poll IOPS/latency.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
  2018-11-29 20:36     ` Keith Busch
@ 2018-11-30  8:08       ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  8:08 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, Max Gurtovoy,
	linux-nvme, linux-block

On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> > This is the last place outside of nvme_irq that handles CQEs from
> > interrupt context, and thus is in the way of removing the cq_lock for
> > normal queues, and avoiding lockdep warnings on the poll queues, for
> > which we already take it without IRQ disabling.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/nvme/host/pci.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 9ceba9900ca3..fb8db7d8170a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -203,6 +203,7 @@ struct nvme_queue {
> >  	unsigned long flags;
> >  #define NVMEQ_ENABLED		0
> >  #define NVMEQ_SQ_CMB		1
> > +#define NVMEQ_DELETE_ERROR	2
> >  	u32 *dbbuf_sq_db;
> >  	u32 *dbbuf_cq_db;
> >  	u32 *dbbuf_sq_ei;
> > @@ -2216,7 +2217,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
> >  	struct nvme_queue *nvmeq = req->end_io_data;
> >  
> >  	if (!error)
> > -		nvme_poll_irqdisable(nvmeq, -1);
> > +		set_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags);
> >  
> >  	nvme_del_queue_end(req, error);
> >  }
> > @@ -2258,11 +2259,20 @@ static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
> >  		nr_queues--;
> >  		sent++;
> >  	}
> > -	while (sent--) {
> > +	while (sent) {
> > +		struct nvme_queue *nvmeq = &dev->queues[nr_queues + sent];
> > +
> >  		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
> >  				timeout);
> >  		if (timeout == 0)
> >  			return false;
> > +
> > +		/* handle any remaining CQEs */
> > +		if (opcode == nvme_admin_delete_cq &&
> > +		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
> > +			nvme_poll_irqdisable(nvmeq, -1);
> 
> We're dispatchig lots of queue deletions in parallel, and they may
> complete in any order. I don't see how you can guarantee that the
> wait_for_completion() will return for the nvmeq that you're polling.

True.  I thought about moving the completion to the queue so that
we have one completion per queue, and I should have done that after
all.  Note sure how I got the idea that not doing it is fine.

> You also need to clear NVMEQ_DELETE_ERROR somewhere later, maybe in
> nvme_init_queue().

Indeed.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
@ 2018-11-30  8:08       ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  8:08 UTC (permalink / raw)


On Thu, Nov 29, 2018@01:36:32PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018@08:13:04PM +0100, Christoph Hellwig wrote:
> > This is the last place outside of nvme_irq that handles CQEs from
> > interrupt context, and thus is in the way of removing the cq_lock for
> > normal queues, and avoiding lockdep warnings on the poll queues, for
> > which we already take it without IRQ disabling.
> > 
> > Signed-off-by: Christoph Hellwig <hch at lst.de>
> > ---
> >  drivers/nvme/host/pci.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 9ceba9900ca3..fb8db7d8170a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -203,6 +203,7 @@ struct nvme_queue {
> >  	unsigned long flags;
> >  #define NVMEQ_ENABLED		0
> >  #define NVMEQ_SQ_CMB		1
> > +#define NVMEQ_DELETE_ERROR	2
> >  	u32 *dbbuf_sq_db;
> >  	u32 *dbbuf_cq_db;
> >  	u32 *dbbuf_sq_ei;
> > @@ -2216,7 +2217,7 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
> >  	struct nvme_queue *nvmeq = req->end_io_data;
> >  
> >  	if (!error)
> > -		nvme_poll_irqdisable(nvmeq, -1);
> > +		set_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags);
> >  
> >  	nvme_del_queue_end(req, error);
> >  }
> > @@ -2258,11 +2259,20 @@ static bool nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
> >  		nr_queues--;
> >  		sent++;
> >  	}
> > -	while (sent--) {
> > +	while (sent) {
> > +		struct nvme_queue *nvmeq = &dev->queues[nr_queues + sent];
> > +
> >  		timeout = wait_for_completion_io_timeout(&dev->ioq_wait,
> >  				timeout);
> >  		if (timeout == 0)
> >  			return false;
> > +
> > +		/* handle any remaining CQEs */
> > +		if (opcode == nvme_admin_delete_cq &&
> > +		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
> > +			nvme_poll_irqdisable(nvmeq, -1);
> 
> We're dispatchig lots of queue deletions in parallel, and they may
> complete in any order. I don't see how you can guarantee that the
> wait_for_completion() will return for the nvmeq that you're polling.

True.  I thought about moving the completion to the queue so that
we have one completion per queue, and I should have done that after
all.  Note sure how I got the idea that not doing it is fine.

> You also need to clear NVMEQ_DELETE_ERROR somewhere later, maybe in
> nvme_init_queue().

Indeed.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues
  2018-11-29 21:08     ` Keith Busch
@ 2018-11-30  8:16       ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  8:16 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, Max Gurtovoy,
	linux-nvme, linux-block

On Thu, Nov 29, 2018 at 02:08:40PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:13:05PM +0100, Christoph Hellwig wrote:
> > @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
> >  	irqreturn_t ret = IRQ_NONE;
> >  	u16 start, end;
> >  
> > -	spin_lock(&nvmeq->cq_lock);
> > +	/*
> > +	 * The rmb/wmb pair ensures we see all updates from a previous run of
> > +	 * the irq handler, even if that was on another CPU.
> > +	 */
> > +	rmb();
> >  	if (nvmeq->cq_head != nvmeq->last_cq_head)
> >  		ret = IRQ_HANDLED;
> >  	nvme_process_cq(nvmeq, &start, &end, -1);
> >  	nvmeq->last_cq_head = nvmeq->cq_head;
> > -	spin_unlock(&nvmeq->cq_lock);
> > +	wmb();
> >  
> >  	if (start != end) {
> >  		nvme_complete_cqes(nvmeq, start, end);
> 
> We saved the "start, end" only so we could do the real completion
> without holding a queue lock. Since you're not using a lock anymore,
> a further optimization can complete the CQE inline with moving the cq
> head so that we don't go through queue twice.
> 
> That can be a follow on, though, this patch looks fine.

We still hold the lock for the polling case.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues
@ 2018-11-30  8:16       ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30  8:16 UTC (permalink / raw)


On Thu, Nov 29, 2018@02:08:40PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018@08:13:05PM +0100, Christoph Hellwig wrote:
> > @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
> >  	irqreturn_t ret = IRQ_NONE;
> >  	u16 start, end;
> >  
> > -	spin_lock(&nvmeq->cq_lock);
> > +	/*
> > +	 * The rmb/wmb pair ensures we see all updates from a previous run of
> > +	 * the irq handler, even if that was on another CPU.
> > +	 */
> > +	rmb();
> >  	if (nvmeq->cq_head != nvmeq->last_cq_head)
> >  		ret = IRQ_HANDLED;
> >  	nvme_process_cq(nvmeq, &start, &end, -1);
> >  	nvmeq->last_cq_head = nvmeq->cq_head;
> > -	spin_unlock(&nvmeq->cq_lock);
> > +	wmb();
> >  
> >  	if (start != end) {
> >  		nvme_complete_cqes(nvmeq, start, end);
> 
> We saved the "start, end" only so we could do the real completion
> without holding a queue lock. Since you're not using a lock anymore,
> a further optimization can complete the CQE inline with moving the cq
> head so that we don't go through queue twice.
> 
> That can be a follow on, though, this patch looks fine.

We still hold the lock for the polling case.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-30  8:00       ` Christoph Hellwig
@ 2018-11-30 14:40         ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-30 14:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Fri, Nov 30, 2018 at 12:00:13AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
> > > +enum hctx_type {
> > > +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> > > +	HCTX_TYPE_READ,		/* just for READ I/O */
> > > +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> > > +
> > > +	HCTX_MAX_TYPES,
> > >  };
> > 
> > Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
> > queues!
> 
> Wo between what do you even want to round robin?  If it is between
> reads and writes that's easy.  If we want priority reads or writes
> (separate from polling) that's also still fairly easily.

I was considering the IOPRIO_PRIO_CLASS. There are four of them, which
may roughly correspond to the four NVMe IO queues weights. Maybe even
through HIPRI flagged IOs with the RT class.

 
> Btw, one thing I wanted to try once I get hold of the right hardware
> is to mark the poll queues as priority queues and see if that makes
> any differents in poll IOPS/latency.

I doubt it will make much difference in IOPS, but should improve latency
on hipri IOs at the expense of normal IO since hipri will be fetched
ahead during command arbitrarion.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-30 14:40         ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-30 14:40 UTC (permalink / raw)


On Fri, Nov 30, 2018@12:00:13AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018@01:19:14PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018@08:12:58PM +0100, Christoph Hellwig wrote:
> > > +enum hctx_type {
> > > +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
> > > +	HCTX_TYPE_READ,		/* just for READ I/O */
> > > +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
> > > +
> > > +	HCTX_MAX_TYPES,
> > >  };
> > 
> > Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
> > queues!
> 
> Wo between what do you even want to round robin?  If it is between
> reads and writes that's easy.  If we want priority reads or writes
> (separate from polling) that's also still fairly easily.

I was considering the IOPRIO_PRIO_CLASS. There are four of them, which
may roughly correspond to the four NVMe IO queues weights. Maybe even
through HIPRI flagged IOs with the RT class.

 
> Btw, one thing I wanted to try once I get hold of the right hardware
> is to mark the poll queues as priority queues and see if that makes
> any differents in poll IOPS/latency.

I doubt it will make much difference in IOPS, but should improve latency
on hipri IOs at the expense of normal IO since hipri will be fetched
ahead during command arbitrarion.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
  2018-11-30  8:08       ` Christoph Hellwig
@ 2018-11-30 14:45         ` Keith Busch
  -1 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-30 14:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On Fri, Nov 30, 2018 at 12:08:09AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> > > +
> > > +		/* handle any remaining CQEs */
> > > +		if (opcode == nvme_admin_delete_cq &&
> > > +		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
> > > +			nvme_poll_irqdisable(nvmeq, -1);
> > 
> > We're dispatchig lots of queue deletions in parallel, and they may
> > complete in any order. I don't see how you can guarantee that the
> > wait_for_completion() will return for the nvmeq that you're polling.
> 
> True.  I thought about moving the completion to the queue so that
> we have one completion per queue, and I should have done that after
> all.  Note sure how I got the idea that not doing it is fine.

You may also move the completion polling in its own loop outside the
deletion loop.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues
@ 2018-11-30 14:45         ` Keith Busch
  0 siblings, 0 replies; 62+ messages in thread
From: Keith Busch @ 2018-11-30 14:45 UTC (permalink / raw)


On Fri, Nov 30, 2018@12:08:09AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018@01:36:32PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018@08:13:04PM +0100, Christoph Hellwig wrote:
> > > +
> > > +		/* handle any remaining CQEs */
> > > +		if (opcode == nvme_admin_delete_cq &&
> > > +		    !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags))
> > > +			nvme_poll_irqdisable(nvmeq, -1);
> > 
> > We're dispatchig lots of queue deletions in parallel, and they may
> > complete in any order. I don't see how you can guarantee that the
> > wait_for_completion() will return for the nvmeq that you're polling.
> 
> True.  I thought about moving the completion to the queue so that
> we have one completion per queue, and I should have done that after
> all.  Note sure how I got the idea that not doing it is fine.

You may also move the completion polling in its own loop outside the
deletion loop.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-30  8:00       ` Christoph Hellwig
@ 2018-11-30 15:20         ` Jens Axboe
  -1 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-30 15:20 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On 11/30/18 1:00 AM, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
>> On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
>>> +enum hctx_type {
>>> +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
>>> +	HCTX_TYPE_READ,		/* just for READ I/O */
>>> +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
>>> +
>>> +	HCTX_MAX_TYPES,
>>>  };
>>
>> Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
>> queues!
> 
> Wo between what do you even want to round robin?  If it is between
> reads and writes that's easy.  If we want priority reads or writes
> (separate from polling) that's also still fairly easily.
> 
> Btw, one thing I wanted to try once I get hold of the right hardware
> is to mark the poll queues as priority queues and see if that makes
> any differents in poll IOPS/latency.

Probably not a lot, if anything. Only for heavily mixed cases I'd
suspect it to make a difference. I can run some tests with it.

And beware that I've seen weird queue priority issues, ala the one
fixed by:

commit 9abd68ef454c824bfd18629033367b4382b5f390 (tag: for-linus-20180511)
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue May 8 10:25:15 2018 -0600

    nvme: add quirk to force medium priority for SQ creation

So we need to be careful with enabling priorities, I suspect. Hopefully
that's a standalone case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-30 15:20         ` Jens Axboe
  0 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-30 15:20 UTC (permalink / raw)


On 11/30/18 1:00 AM, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018@01:19:14PM -0700, Keith Busch wrote:
>> On Thu, Nov 29, 2018@08:12:58PM +0100, Christoph Hellwig wrote:
>>> +enum hctx_type {
>>> +	HCTX_TYPE_DEFAULT,	/* all I/O not otherwise accounted for */
>>> +	HCTX_TYPE_READ,		/* just for READ I/O */
>>> +	HCTX_TYPE_POLL,		/* polled I/O of any kind */
>>> +
>>> +	HCTX_MAX_TYPES,
>>>  };
>>
>> Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
>> queues!
> 
> Wo between what do you even want to round robin?  If it is between
> reads and writes that's easy.  If we want priority reads or writes
> (separate from polling) that's also still fairly easily.
> 
> Btw, one thing I wanted to try once I get hold of the right hardware
> is to mark the poll queues as priority queues and see if that makes
> any differents in poll IOPS/latency.

Probably not a lot, if anything. Only for heavily mixed cases I'd
suspect it to make a difference. I can run some tests with it.

And beware that I've seen weird queue priority issues, ala the one
fixed by:

commit 9abd68ef454c824bfd18629033367b4382b5f390 (tag: for-linus-20180511)
Author: Jens Axboe <axboe at kernel.dk>
Date:   Tue May 8 10:25:15 2018 -0600

    nvme: add quirk to force medium priority for SQ creation

So we need to be careful with enabling priorities, I suspect. Hopefully
that's a standalone case.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-30  7:56       ` Christoph Hellwig
@ 2018-11-30 15:20         ` Jens Axboe
  -1 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-30 15:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Max Gurtovoy, linux-nvme, linux-block

On 11/30/18 12:56 AM, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 07:50:09PM +0000, Jens Axboe wrote:
>>> in our post-spectre world.  Also having too many queue type is just
>>> going to create confusion, so I'd rather manage them centrally.
>>>
>>> Note that the queue type naming and ordering changes a bit - the
>>> first index now is the defauly queue for everything not explicitly
>>                          ^^^^^^^
>>
>> default
> 
> Fixed.
> 
>> Side not, should probably make the sysfs 'type' attribute a string
>> at this point.
> 
> Fixed as well.

Thanks - are you going to post a v3? Would like to get this staged.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-30 15:20         ` Jens Axboe
  0 siblings, 0 replies; 62+ messages in thread
From: Jens Axboe @ 2018-11-30 15:20 UTC (permalink / raw)


On 11/30/18 12:56 AM, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018@07:50:09PM +0000, Jens Axboe wrote:
>>> in our post-spectre world.  Also having too many queue type is just
>>> going to create confusion, so I'd rather manage them centrally.
>>>
>>> Note that the queue type naming and ordering changes a bit - the
>>> first index now is the defauly queue for everything not explicitly
>>                          ^^^^^^^
>>
>> default
> 
> Fixed.
> 
>> Side not, should probably make the sysfs 'type' attribute a string
>> at this point.
> 
> Fixed as well.

Thanks - are you going to post a v3? Would like to get this staged.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/13] block: move queues types to the block layer
  2018-11-30 15:20         ` Jens Axboe
@ 2018-11-30 15:21           ` Christoph Hellwig
  -1 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30 15:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, Max Gurtovoy,
	linux-nvme, linux-block

On Fri, Nov 30, 2018 at 03:20:51PM +0000, Jens Axboe wrote:
> Thanks - are you going to post a v3? Would like to get this staged.

Yes, will do.  Either late tonight or over the weekend.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/13] block: move queues types to the block layer
@ 2018-11-30 15:21           ` Christoph Hellwig
  0 siblings, 0 replies; 62+ messages in thread
From: Christoph Hellwig @ 2018-11-30 15:21 UTC (permalink / raw)


On Fri, Nov 30, 2018@03:20:51PM +0000, Jens Axboe wrote:
> Thanks - are you going to post a v3? Would like to get this staged.

Yes, will do.  Either late tonight or over the weekend.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-11-30 15:55 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-29 19:12 block and nvme polling improvements V2 Christoph Hellwig
2018-11-29 19:12 ` Christoph Hellwig
2018-11-29 19:12 ` [PATCH 01/13] block: move queues types to the block layer Christoph Hellwig
2018-11-29 19:12   ` Christoph Hellwig
2018-11-29 19:50   ` Jens Axboe
2018-11-29 19:50     ` Jens Axboe
2018-11-30  7:56     ` Christoph Hellwig
2018-11-30  7:56       ` Christoph Hellwig
2018-11-30 15:20       ` Jens Axboe
2018-11-30 15:20         ` Jens Axboe
2018-11-30 15:21         ` Christoph Hellwig
2018-11-30 15:21           ` Christoph Hellwig
2018-11-29 20:19   ` Keith Busch
2018-11-29 20:19     ` Keith Busch
2018-11-29 20:25     ` Jens Axboe
2018-11-29 20:25       ` Jens Axboe
2018-11-30  8:00     ` Christoph Hellwig
2018-11-30  8:00       ` Christoph Hellwig
2018-11-30 14:40       ` Keith Busch
2018-11-30 14:40         ` Keith Busch
2018-11-30 15:20       ` Jens Axboe
2018-11-30 15:20         ` Jens Axboe
2018-11-29 19:12 ` [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled Christoph Hellwig
2018-11-29 19:12   ` Christoph Hellwig
2018-11-29 20:19   ` Keith Busch
2018-11-29 20:19     ` Keith Busch
2018-11-29 19:13 ` [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 20:22   ` Keith Busch
2018-11-29 20:22     ` Keith Busch
2018-11-29 19:13 ` [PATCH 04/13] nvme-pci: only allow polling with separate poll queues Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 05/13] nvme-pci: consolidate code for polling non-dedicated queues Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 20:37   ` Keith Busch
2018-11-29 20:37     ` Keith Busch
2018-11-29 19:13 ` [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 20:36   ` Keith Busch
2018-11-29 20:36     ` Keith Busch
2018-11-30  8:08     ` Christoph Hellwig
2018-11-30  8:08       ` Christoph Hellwig
2018-11-30 14:45       ` Keith Busch
2018-11-30 14:45         ` Keith Busch
2018-11-29 19:13 ` [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 21:08   ` Keith Busch
2018-11-29 21:08     ` Keith Busch
2018-11-30  8:16     ` Christoph Hellwig
2018-11-30  8:16       ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 09/13] nvme-rdma: remove I/O polling support Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 10/13] nvme-mpath: " Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 11/13] block: remove ->poll_fn Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 12/13] block: only allow polling if a poll queue_map exists Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig
2018-11-29 19:13 ` [PATCH 13/13] block: enable polling by default if a poll map is initalized Christoph Hellwig
2018-11-29 19:13   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.