All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/7] Improve nvme completion handling
@ 2018-05-18 14:52 Jens Axboe
  2018-05-18 14:52 ` [PATCH 1/7] nvme: mark the result argument to nvme_complete_async_event volatile Jens Axboe
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


This has been simmering here for a day, and works fine for me.
I think this is good to go into the nvme 4.18 branch.

Thanks, Jens

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] nvme: mark the result argument to nvme_complete_async_event volatile
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  2018-05-18 14:52 ` [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid Jens Axboe
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

We'll need that in the PCIe driver soon as we'll read it straight off the
CQ.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 2 +-
 drivers/nvme/host/nvme.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 99b857e5a7a9..c7c1298c6dc5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3323,7 +3323,7 @@ static void nvme_fw_act_work(struct work_struct *work)
 }
 
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
-		union nvme_result *res)
+		volatile union nvme_result *res)
 {
 	u32 result = le32_to_cpu(res->u32);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 17d2f7cf3fed..de78eae52673 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -405,7 +405,7 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 		bool send);
 
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
-		union nvme_result *res);
+		volatile union nvme_result *res);
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
  2018-05-18 14:52 ` [PATCH 1/7] nvme: mark the result argument to nvme_complete_async_event volatile Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  2018-05-18 20:49   ` Keith Busch
  2018-05-18 14:52 ` [PATCH 3/7] nvme-pci: remove cq check after submission Jens Axboe
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

We always look at the current CQ head and phase, so don't pass these
as separate arguments, and rename the function to nvme_cqe_pending.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 drivers/nvme/host/pci.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 17a0190bd88f..9e2510249fa2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -914,10 +914,10 @@ static void nvme_pci_complete_rq(struct request *req)
 }
 
 /* We read the CQE phase first to check if the rest of the entry is valid */
-static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head,
-		u16 phase)
+static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
 {
-	return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
+	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
+			nvmeq->cq_phase;
 }
 
 static inline void nvme_ring_cq_doorbell(struct nvme_queue *nvmeq)
@@ -964,7 +964,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq,
 static inline bool nvme_read_cqe(struct nvme_queue *nvmeq,
 		struct nvme_completion *cqe)
 {
-	if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
+	if (nvme_cqe_pending(nvmeq)) {
 		*cqe = nvmeq->cqes[nvmeq->cq_head];
 
 		if (++nvmeq->cq_head == nvmeq->q_depth) {
@@ -1005,7 +1005,7 @@ static irqreturn_t nvme_irq(int irq, void *data)
 static irqreturn_t nvme_irq_check(int irq, void *data)
 {
 	struct nvme_queue *nvmeq = data;
-	if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase))
+	if (nvme_cqe_pending(nvmeq))
 		return IRQ_WAKE_THREAD;
 	return IRQ_NONE;
 }
@@ -1015,7 +1015,7 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 	struct nvme_completion cqe;
 	int found = 0, consumed = 0;
 
-	if (!nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase))
+	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
 	spin_lock_irq(&nvmeq->q_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] nvme-pci: remove cq check after submission
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
  2018-05-18 14:52 ` [PATCH 1/7] nvme: mark the result argument to nvme_complete_async_event volatile Jens Axboe
  2018-05-18 14:52 ` [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  2018-05-18 14:52 ` [PATCH 4/7] nvme-pci: move ->cq_vector == -1 check outside of ->q_lock Jens Axboe
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


We always check the completion queue after submitting, but in my testing
this isn't a win even on DRAM/xpoint devices. In some cases it's
actually worse. Kill it.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
Signed-off-by: Keith Busch <keith.busch at intel.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9e2510249fa2..9b32a67982e8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -68,7 +68,6 @@ MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
 struct nvme_dev;
 struct nvme_queue;
 
-static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
 /*
@@ -895,7 +894,6 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 		goto out_cleanup_iod;
 	}
 	__nvme_submit_cmd(nvmeq, &cmnd);
-	nvme_process_cq(nvmeq);
 	spin_unlock_irq(&nvmeq->q_lock);
 	return BLK_STS_OK;
 out_cleanup_iod:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] nvme-pci: move ->cq_vector == -1 check outside of ->q_lock
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
                   ` (2 preceding siblings ...)
  2018-05-18 14:52 ` [PATCH 3/7] nvme-pci: remove cq check after submission Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  2018-05-18 14:52 ` [PATCH 5/7] nvme-pci: handle completions outside of the queue lock Jens Axboe
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


We only clear it dynamically in nvme_suspend_queue(). When we do, ensure
to do a full flush so that any nvme_queue_rq() invocation will see it.

Ideally we'd kill this check completely, but we're using it to flush
requests on a dying queue.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9b32a67982e8..bfd7023877a7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -871,6 +871,13 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_command cmnd;
 	blk_status_t ret;
 
+	/*
+	 * We should not need to do this, but we're still using this to
+	 * ensure we can drain requests on a dying queue.
+	 */
+	if (unlikely(nvmeq->cq_vector < 0))
+		return BLK_STS_IOERR;
+
 	ret = nvme_setup_cmd(ns, req, &cmnd);
 	if (ret)
 		return ret;
@@ -888,11 +895,6 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	blk_mq_start_request(req);
 
 	spin_lock_irq(&nvmeq->q_lock);
-	if (unlikely(nvmeq->cq_vector < 0)) {
-		ret = BLK_STS_IOERR;
-		spin_unlock_irq(&nvmeq->q_lock);
-		goto out_cleanup_iod;
-	}
 	__nvme_submit_cmd(nvmeq, &cmnd);
 	spin_unlock_irq(&nvmeq->q_lock);
 	return BLK_STS_OK;
@@ -1329,6 +1331,12 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 	nvmeq->cq_vector = -1;
 	spin_unlock_irq(&nvmeq->q_lock);
 
+	/*
+	 * Ensure that nvme_queue_rq() sees it ->cq_vector == -1 without
+	 * having to grab the lock.
+	 */
+	mb();
+
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
 		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
                   ` (3 preceding siblings ...)
  2018-05-18 14:52 ` [PATCH 4/7] nvme-pci: move ->cq_vector == -1 check outside of ->q_lock Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  2018-05-18 21:06   ` Keith Busch
  2018-05-18 14:52 ` [PATCH 6/7] nvme-pci: split the nvme queue lock into submission and completion locks Jens Axboe
  2018-05-18 14:52 ` [PATCH 7/7] nvme-pci: drop IRQ disabling on submission queue lock Jens Axboe
  6 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


Split the completion of events into a two part process:

1) Reap the events inside the queue lock
2) Complete the events outside the queue lock

Since we never wrap the queue, we can access it locklessly after we've
updated the completion queue head. This patch started off with batching
events on the stack, but with this trick we don't have to. Keith Busch
<keith.busch at intel.com> came up with that idea.

Note that this kills the ->cqe_seen as well. I haven't been able to
trigger any ill effects of this. If we do race with polling every so
often, it should be rare enough NOT to trigger any issues.

Signed-off-by: Jens Axboe <axboe at kernel.dk>
Signed-off-by: Keith Busch <keith.busch at intel.com>
[hch: refactored, restored poll early exit optimization]
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 87 +++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bfd7023877a7..fd4f348c8f54 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -160,7 +160,6 @@ struct nvme_queue {
 	u16 cq_head;
 	u16 qid;
 	u8 cq_phase;
-	u8 cqe_seen;
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -931,9 +930,9 @@ static inline void nvme_ring_cq_doorbell(struct nvme_queue *nvmeq)
 	}
 }
 
-static inline void nvme_handle_cqe(struct nvme_queue *nvmeq,
-		struct nvme_completion *cqe)
+static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 {
+	volatile struct nvme_completion *cqe = &nvmeq->cqes[idx];
 	struct request *req;
 
 	if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
@@ -956,50 +955,58 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq,
 		return;
 	}
 
-	nvmeq->cqe_seen = 1;
 	req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
 	nvme_end_request(req, cqe->status, cqe->result);
 }
 
-static inline bool nvme_read_cqe(struct nvme_queue *nvmeq,
-		struct nvme_completion *cqe)
+static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
 {
-	if (nvme_cqe_pending(nvmeq)) {
-		*cqe = nvmeq->cqes[nvmeq->cq_head];
+	while (start != end) {
+		nvme_handle_cqe(nvmeq, start);
+		if (++start == nvmeq->q_depth)
+			start = 0;
+	}
+}
 
-		if (++nvmeq->cq_head == nvmeq->q_depth) {
-			nvmeq->cq_head = 0;
-			nvmeq->cq_phase = !nvmeq->cq_phase;
-		}
-		return true;
+static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
+{
+	if (++nvmeq->cq_head == nvmeq->q_depth) {
+		nvmeq->cq_head = 0;
+		nvmeq->cq_phase = !nvmeq->cq_phase;
 	}
-	return false;
 }
 
-static void nvme_process_cq(struct nvme_queue *nvmeq)
+static inline bool nvme_process_cq(struct nvme_queue *nvmeq, u16 *start,
+		u16 *end, int tag)
 {
-	struct nvme_completion cqe;
-	int consumed = 0;
+	bool found = false;
 
-	while (nvme_read_cqe(nvmeq, &cqe)) {
-		nvme_handle_cqe(nvmeq, &cqe);
-		consumed++;
+	*start = nvmeq->cq_head;
+	while (!found && nvme_cqe_pending(nvmeq)) {
+		if (nvmeq->cqes[nvmeq->cq_head].command_id == tag)
+			found = true;
+		nvme_update_cq_head(nvmeq);
 	}
+	*end = nvmeq->cq_head;
 
-	if (consumed)
+	if (*start != *end)
 		nvme_ring_cq_doorbell(nvmeq);
+	return found;
 }
 
 static irqreturn_t nvme_irq(int irq, void *data)
 {
-	irqreturn_t result;
 	struct nvme_queue *nvmeq = data;
+	u16 start, end;
+
 	spin_lock(&nvmeq->q_lock);
-	nvme_process_cq(nvmeq);
-	result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
-	nvmeq->cqe_seen = 0;
+	nvme_process_cq(nvmeq, &start, &end, -1);
 	spin_unlock(&nvmeq->q_lock);
-	return result;
+
+	if (start == end)
+		return IRQ_NONE;
+	nvme_complete_cqes(nvmeq, start, end);
+	return IRQ_HANDLED;
 }
 
 static irqreturn_t nvme_irq_check(int irq, void *data)
@@ -1012,27 +1019,17 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 
 static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 {
-	struct nvme_completion cqe;
-	int found = 0, consumed = 0;
+	u16 start, end;
+	bool found;
 
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
 	spin_lock_irq(&nvmeq->q_lock);
-	while (nvme_read_cqe(nvmeq, &cqe)) {
-		nvme_handle_cqe(nvmeq, &cqe);
-		consumed++;
-
-		if (tag == cqe.command_id) {
-			found = 1;
-			break;
-		}
-       }
-
-	if (consumed)
-		nvme_ring_cq_doorbell(nvmeq);
+	found = nvme_process_cq(nvmeq, &start, &end, tag);
 	spin_unlock_irq(&nvmeq->q_lock);
 
+	nvme_complete_cqes(nvmeq, start, end);
 	return found;
 }
 
@@ -1348,6 +1345,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 {
 	struct nvme_queue *nvmeq = &dev->queues[0];
+	u16 start, end;
 
 	if (shutdown)
 		nvme_shutdown_ctrl(&dev->ctrl);
@@ -1355,8 +1353,10 @@ static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 		nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
 
 	spin_lock_irq(&nvmeq->q_lock);
-	nvme_process_cq(nvmeq);
+	nvme_process_cq(nvmeq, &start, &end, -1);
 	spin_unlock_irq(&nvmeq->q_lock);
+
+	nvme_complete_cqes(nvmeq, start, end);
 }
 
 static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
@@ -2003,6 +2003,7 @@ static void nvme_del_queue_end(struct request *req, blk_status_t error)
 static void nvme_del_cq_end(struct request *req, blk_status_t error)
 {
 	struct nvme_queue *nvmeq = req->end_io_data;
+	u16 start, end;
 
 	if (!error) {
 		unsigned long flags;
@@ -2014,8 +2015,10 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
 		 */
 		spin_lock_irqsave_nested(&nvmeq->q_lock, flags,
 					SINGLE_DEPTH_NESTING);
-		nvme_process_cq(nvmeq);
+		nvme_process_cq(nvmeq, &start, &end, -1);
 		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
+
+		nvme_complete_cqes(nvmeq, start, end);
 	}
 
 	nvme_del_queue_end(req, error);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] nvme-pci: split the nvme queue lock into submission and completion locks
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
                   ` (4 preceding siblings ...)
  2018-05-18 14:52 ` [PATCH 5/7] nvme-pci: handle completions outside of the queue lock Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  2018-05-18 14:52 ` [PATCH 7/7] nvme-pci: drop IRQ disabling on submission queue lock Jens Axboe
  6 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


This is now feasible. We protect the submission queue ring with
->sq_lock, and the completion side with ->cq_lock.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Jens Axboe <axboe at kernel.dk>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fd4f348c8f54..1d64761e66f8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -146,9 +146,10 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
 struct nvme_queue {
 	struct device *q_dmadev;
 	struct nvme_dev *dev;
-	spinlock_t q_lock;
+	spinlock_t sq_lock;
 	struct nvme_command *sq_cmds;
 	struct nvme_command __iomem *sq_cmds_io;
+	spinlock_t cq_lock ____cacheline_aligned_in_smp;
 	volatile struct nvme_completion *cqes;
 	struct blk_mq_tags **tags;
 	dma_addr_t sq_dma_addr;
@@ -893,9 +894,9 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(req);
 
-	spin_lock_irq(&nvmeq->q_lock);
+	spin_lock_irq(&nvmeq->sq_lock);
 	__nvme_submit_cmd(nvmeq, &cmnd);
-	spin_unlock_irq(&nvmeq->q_lock);
+	spin_unlock_irq(&nvmeq->sq_lock);
 	return BLK_STS_OK;
 out_cleanup_iod:
 	nvme_free_iod(dev, req);
@@ -999,9 +1000,9 @@ static irqreturn_t nvme_irq(int irq, void *data)
 	struct nvme_queue *nvmeq = data;
 	u16 start, end;
 
-	spin_lock(&nvmeq->q_lock);
+	spin_lock(&nvmeq->cq_lock);
 	nvme_process_cq(nvmeq, &start, &end, -1);
-	spin_unlock(&nvmeq->q_lock);
+	spin_unlock(&nvmeq->cq_lock);
 
 	if (start == end)
 		return IRQ_NONE;
@@ -1025,9 +1026,9 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
-	spin_lock_irq(&nvmeq->q_lock);
+	spin_lock_irq(&nvmeq->cq_lock);
 	found = nvme_process_cq(nvmeq, &start, &end, tag);
-	spin_unlock_irq(&nvmeq->q_lock);
+	spin_unlock_irq(&nvmeq->cq_lock);
 
 	nvme_complete_cqes(nvmeq, start, end);
 	return found;
@@ -1050,9 +1051,9 @@ static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl)
 	c.common.opcode = nvme_admin_async_event;
 	c.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
 
-	spin_lock_irq(&nvmeq->q_lock);
+	spin_lock_irq(&nvmeq->sq_lock);
 	__nvme_submit_cmd(nvmeq, &c);
-	spin_unlock_irq(&nvmeq->q_lock);
+	spin_unlock_irq(&nvmeq->sq_lock);
 }
 
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
@@ -1318,15 +1319,15 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 {
 	int vector;
 
-	spin_lock_irq(&nvmeq->q_lock);
+	spin_lock_irq(&nvmeq->cq_lock);
 	if (nvmeq->cq_vector == -1) {
-		spin_unlock_irq(&nvmeq->q_lock);
+		spin_unlock_irq(&nvmeq->cq_lock);
 		return 1;
 	}
 	vector = nvmeq->cq_vector;
 	nvmeq->dev->online_queues--;
 	nvmeq->cq_vector = -1;
-	spin_unlock_irq(&nvmeq->q_lock);
+	spin_unlock_irq(&nvmeq->cq_lock);
 
 	/*
 	 * Ensure that nvme_queue_rq() sees it ->cq_vector == -1 without
@@ -1352,9 +1353,9 @@ static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 	else
 		nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
 
-	spin_lock_irq(&nvmeq->q_lock);
+	spin_lock_irq(&nvmeq->cq_lock);
 	nvme_process_cq(nvmeq, &start, &end, -1);
-	spin_unlock_irq(&nvmeq->q_lock);
+	spin_unlock_irq(&nvmeq->cq_lock);
 
 	nvme_complete_cqes(nvmeq, start, end);
 }
@@ -1414,7 +1415,8 @@ static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
 
 	nvmeq->q_dmadev = dev->dev;
 	nvmeq->dev = dev;
-	spin_lock_init(&nvmeq->q_lock);
+	spin_lock_init(&nvmeq->sq_lock);
+	spin_lock_init(&nvmeq->cq_lock);
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
 	nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride];
@@ -1450,7 +1452,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 {
 	struct nvme_dev *dev = nvmeq->dev;
 
-	spin_lock_irq(&nvmeq->q_lock);
+	spin_lock_irq(&nvmeq->cq_lock);
 	nvmeq->sq_tail = 0;
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
@@ -1458,7 +1460,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
 	memset((void *)nvmeq->cqes, 0, CQ_SIZE(nvmeq->q_depth));
 	nvme_dbbuf_init(dev, nvmeq, qid);
 	dev->online_queues++;
-	spin_unlock_irq(&nvmeq->q_lock);
+	spin_unlock_irq(&nvmeq->cq_lock);
 }
 
 static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
@@ -2009,14 +2011,14 @@ static void nvme_del_cq_end(struct request *req, blk_status_t error)
 		unsigned long flags;
 
 		/*
-		 * We might be called with the AQ q_lock held
-		 * and the I/O queue q_lock should always
+		 * We might be called with the AQ cq_lock held
+		 * and the I/O queue cq_lock should always
 		 * nest inside the AQ one.
 		 */
-		spin_lock_irqsave_nested(&nvmeq->q_lock, flags,
+		spin_lock_irqsave_nested(&nvmeq->cq_lock, flags,
 					SINGLE_DEPTH_NESTING);
 		nvme_process_cq(nvmeq, &start, &end, -1);
-		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
+		spin_unlock_irqrestore(&nvmeq->cq_lock, flags);
 
 		nvme_complete_cqes(nvmeq, start, end);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] nvme-pci: drop IRQ disabling on submission queue lock
  2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
                   ` (5 preceding siblings ...)
  2018-05-18 14:52 ` [PATCH 6/7] nvme-pci: split the nvme queue lock into submission and completion locks Jens Axboe
@ 2018-05-18 14:52 ` Jens Axboe
  6 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 14:52 UTC (permalink / raw)


Since we aren't sharing the lock for completions now, we don't
have to make it IRQ safe.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Jens Axboe <axboe at kernel.dk>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1d64761e66f8..06d1a5cd619e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -894,9 +894,9 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(req);
 
-	spin_lock_irq(&nvmeq->sq_lock);
+	spin_lock(&nvmeq->sq_lock);
 	__nvme_submit_cmd(nvmeq, &cmnd);
-	spin_unlock_irq(&nvmeq->sq_lock);
+	spin_unlock(&nvmeq->sq_lock);
 	return BLK_STS_OK;
 out_cleanup_iod:
 	nvme_free_iod(dev, req);
@@ -1051,9 +1051,9 @@ static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl)
 	c.common.opcode = nvme_admin_async_event;
 	c.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
 
-	spin_lock_irq(&nvmeq->sq_lock);
+	spin_lock(&nvmeq->sq_lock);
 	__nvme_submit_cmd(nvmeq, &c);
-	spin_unlock_irq(&nvmeq->sq_lock);
+	spin_unlock(&nvmeq->sq_lock);
 }
 
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid
  2018-05-18 20:49   ` Keith Busch
@ 2018-05-18 20:48     ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 20:48 UTC (permalink / raw)


On 5/18/18 2:49 PM, Keith Busch wrote:
> On Fri, May 18, 2018@08:52:30AM -0600, Jens Axboe wrote:
>> +static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
>>  {
>> -	return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
>> +	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
>> +			nvmeq->cq_phase;
>>  }
> 
> What happened to the byte-swap micro-optimization?

sparse complained (see older thread), so I folded that optimization
away with an incremental from Christoph. Honestly, we should probably
have called that a nano optimization anyway, and one that does
nothing on little endian anyway (nothing else matters, so... :))

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid
  2018-05-18 14:52 ` [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid Jens Axboe
@ 2018-05-18 20:49   ` Keith Busch
  2018-05-18 20:48     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2018-05-18 20:49 UTC (permalink / raw)


On Fri, May 18, 2018@08:52:30AM -0600, Jens Axboe wrote:
> +static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
>  {
> -	return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
> +	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
> +			nvmeq->cq_phase;
>  }

What happened to the byte-swap micro-optimization?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 14:52 ` [PATCH 5/7] nvme-pci: handle completions outside of the queue lock Jens Axboe
@ 2018-05-18 21:06   ` Keith Busch
  2018-05-18 21:11     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2018-05-18 21:06 UTC (permalink / raw)


On Fri, May 18, 2018@08:52:33AM -0600, Jens Axboe wrote:
> Note that this kills the ->cqe_seen as well. I haven't been able to
> trigger any ill effects of this. If we do race with polling every so
> often, it should be rare enough NOT to trigger any issues.

Ah hell, running HIPRI tests on low latency devices is hitting spruious
interrupt detection:

[  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
[  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
[  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
[  560.175991] Call Trace:
[  560.175994]  <IRQ>
[  560.176005]  dump_stack+0x5c/0x7b
[  560.176010]  __report_bad_irq+0x30/0xc0
[  560.176013]  note_interrupt+0x235/0x280
[  560.176020]  handle_irq_event_percpu+0x51/0x70
[  560.176023]  handle_irq_event+0x27/0x50
[  560.176026]  handle_edge_irq+0x6d/0x180
[  560.176031]  handle_irq+0xa5/0x110
[  560.176036]  do_IRQ+0x41/0xc0
[  560.176042]  common_interrupt+0xf/0xf
[  560.176043]  </IRQ>
[  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
[  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
[  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
[  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
[  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
[  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
[  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
[  560.176071]  do_idle+0x1f4/0x260
[  560.176075]  cpu_startup_entry+0x6f/0x80
[  560.176080]  start_secondary+0x184/0x1d0
[  560.176085]  secondary_startup_64+0xa5/0xb0
[  560.176088] handlers:
[  560.178387] [<00000000efb612be>] nvme_irq [nvme]
[  560.183019] Disabling IRQ #630

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:06   ` Keith Busch
@ 2018-05-18 21:11     ` Jens Axboe
  2018-05-18 21:22       ` Jens Axboe
  2018-05-18 21:25       ` Keith Busch
  0 siblings, 2 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 21:11 UTC (permalink / raw)


On 5/18/18 3:06 PM, Keith Busch wrote:
> On Fri, May 18, 2018@08:52:33AM -0600, Jens Axboe wrote:
>> Note that this kills the ->cqe_seen as well. I haven't been able to
>> trigger any ill effects of this. If we do race with polling every so
>> often, it should be rare enough NOT to trigger any issues.
> 
> Ah hell, running HIPRI tests on low latency devices is hitting spruious
> interrupt detection:
> 
> [  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
> [  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
> [  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
> [  560.175991] Call Trace:
> [  560.175994]  <IRQ>
> [  560.176005]  dump_stack+0x5c/0x7b
> [  560.176010]  __report_bad_irq+0x30/0xc0
> [  560.176013]  note_interrupt+0x235/0x280
> [  560.176020]  handle_irq_event_percpu+0x51/0x70
> [  560.176023]  handle_irq_event+0x27/0x50
> [  560.176026]  handle_edge_irq+0x6d/0x180
> [  560.176031]  handle_irq+0xa5/0x110
> [  560.176036]  do_IRQ+0x41/0xc0
> [  560.176042]  common_interrupt+0xf/0xf
> [  560.176043]  </IRQ>
> [  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
> [  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
> [  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
> [  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
> [  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
> [  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
> [  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
> [  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
> [  560.176071]  do_idle+0x1f4/0x260
> [  560.176075]  cpu_startup_entry+0x6f/0x80
> [  560.176080]  start_secondary+0x184/0x1d0
> [  560.176085]  secondary_startup_64+0xa5/0xb0
> [  560.176088] handlers:
> [  560.178387] [<00000000efb612be>] nvme_irq [nvme]
> [  560.183019] Disabling IRQ #630

Gah, I didn't manage to trigger any of that. What was your test case?
I'll see if I can come up with a nice cqe_seen replacement.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:11     ` Jens Axboe
@ 2018-05-18 21:22       ` Jens Axboe
  2018-05-18 21:28         ` Keith Busch
  2018-05-18 21:25       ` Keith Busch
  1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 21:22 UTC (permalink / raw)


On 5/18/18 3:11 PM, Jens Axboe wrote:
> On 5/18/18 3:06 PM, Keith Busch wrote:
>> On Fri, May 18, 2018@08:52:33AM -0600, Jens Axboe wrote:
>>> Note that this kills the ->cqe_seen as well. I haven't been able to
>>> trigger any ill effects of this. If we do race with polling every so
>>> often, it should be rare enough NOT to trigger any issues.
>>
>> Ah hell, running HIPRI tests on low latency devices is hitting spruious
>> interrupt detection:
>>
>> [  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
>> [  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
>> [  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
>> [  560.175991] Call Trace:
>> [  560.175994]  <IRQ>
>> [  560.176005]  dump_stack+0x5c/0x7b
>> [  560.176010]  __report_bad_irq+0x30/0xc0
>> [  560.176013]  note_interrupt+0x235/0x280
>> [  560.176020]  handle_irq_event_percpu+0x51/0x70
>> [  560.176023]  handle_irq_event+0x27/0x50
>> [  560.176026]  handle_edge_irq+0x6d/0x180
>> [  560.176031]  handle_irq+0xa5/0x110
>> [  560.176036]  do_IRQ+0x41/0xc0
>> [  560.176042]  common_interrupt+0xf/0xf
>> [  560.176043]  </IRQ>
>> [  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
>> [  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
>> [  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
>> [  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
>> [  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
>> [  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
>> [  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
>> [  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
>> [  560.176071]  do_idle+0x1f4/0x260
>> [  560.176075]  cpu_startup_entry+0x6f/0x80
>> [  560.176080]  start_secondary+0x184/0x1d0
>> [  560.176085]  secondary_startup_64+0xa5/0xb0
>> [  560.176088] handlers:
>> [  560.178387] [<00000000efb612be>] nvme_irq [nvme]
>> [  560.183019] Disabling IRQ #630
> 
> Gah, I didn't manage to trigger any of that. What was your test case?
> I'll see if I can come up with a nice cqe_seen replacement.

Totally untested, does this work?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 06d1a5cd619e..d1efe6b0f107 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -159,6 +159,7 @@ struct nvme_queue {
 	s16 cq_vector;
 	u16 sq_tail;
 	u16 cq_head;
+	u16 last_cq_head;
 	u16 qid;
 	u8 cq_phase;
 	u32 *dbbuf_sq_db;
@@ -998,16 +999,22 @@ static inline bool nvme_process_cq(struct nvme_queue *nvmeq, u16 *start,
 static irqreturn_t nvme_irq(int irq, void *data)
 {
 	struct nvme_queue *nvmeq = data;
+	irqreturn_t ret = IRQ_NONE;
 	u16 start, end;
 
 	spin_lock(&nvmeq->cq_lock);
+	if (nvmeq->cq_head != nvmeq->last_cq_head)
+		ret = IRQ_HANDLED;
 	nvme_process_cq(nvmeq, &start, &end, -1);
+	nvmeq->last_cq_head = nvmeq->cq_head;
 	spin_unlock(&nvmeq->cq_lock);
 
-	if (start == end)
-		return IRQ_NONE;
-	nvme_complete_cqes(nvmeq, start, end);
-	return IRQ_HANDLED;
+	if (start != end) {
+		nvme_complete_cqes(nvmeq, start, end);
+		return IRQ_HANDLED;
+	}
+
+	return ret;
 }
 
 static irqreturn_t nvme_irq_check(int irq, void *data)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:11     ` Jens Axboe
  2018-05-18 21:22       ` Jens Axboe
@ 2018-05-18 21:25       ` Keith Busch
  1 sibling, 0 replies; 24+ messages in thread
From: Keith Busch @ 2018-05-18 21:25 UTC (permalink / raw)


On Fri, May 18, 2018@03:11:21PM -0600, Jens Axboe wrote:
> Gah, I didn't manage to trigger any of that. What was your test case?
> I'll see if I can come up with a nice cqe_seen replacement.

Runing the following profile on the is lowest latency drive I have in my
"big" server (112 CPUs, type Xeon Platinum 8180), in pure polling
(io_poll_delay is -1), sustaining 190k IOPs:

[global]
ioengine=pvsync2
cpus_allowed=29
direct=1
rw=randread
norandommap
bs=4k
hipri
ramp_time=1
runtime=10
gtod_reduce=1
clocksource=cpu

[test]
filename=/dev/nvme1n1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:22       ` Jens Axboe
@ 2018-05-18 21:28         ` Keith Busch
  2018-05-18 21:31           ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2018-05-18 21:28 UTC (permalink / raw)


On Fri, May 18, 2018@03:22:20PM -0600, Jens Axboe wrote:
> Totally untested, does this work?

Looks like that did the trick. Previously it triggered within 1 or 2
iterations. Haven't seen in about a dozen so far.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:28         ` Keith Busch
@ 2018-05-18 21:31           ` Jens Axboe
  2018-05-18 21:48             ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 21:31 UTC (permalink / raw)


On 5/18/18 3:28 PM, Keith Busch wrote:
> On Fri, May 18, 2018@03:22:20PM -0600, Jens Axboe wrote:
>> Totally untested, does this work?
> 
> Looks like that did the trick. Previously it triggered within 1 or 2
> iterations. Haven't seen in about a dozen so far.

Alright, folded into that patch. This impacts the two patches after
this one too, updated versions all attached. Or find them here:

http://git.kernel.dk/cgit/linux-block/log/?h=nvme-4.18

-- 
Jens Axboe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-nvme-pci-drop-IRQ-disabling-on-submission-queue-lock.patch
Type: text/x-patch
Size: 1480 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20180518/0404be47/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-nvme-pci-split-the-nvme-queue-lock-into-submission-a.patch
Type: text/x-patch
Size: 5380 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20180518/0404be47/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-nvme-pci-handle-completions-outside-of-the-queue-loc.patch
Type: text/x-patch
Size: 5976 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20180518/0404be47/attachment-0002.bin>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:31           ` Jens Axboe
@ 2018-05-18 21:48             ` Keith Busch
  2018-05-18 22:46               ` Jens Axboe
  2018-05-21 14:18               ` Jens Axboe
  0 siblings, 2 replies; 24+ messages in thread
From: Keith Busch @ 2018-05-18 21:48 UTC (permalink / raw)


On Fri, May 18, 2018@03:31:47PM -0600, Jens Axboe wrote:
> Alright, folded into that patch. This impacts the two patches after
> this one too, updated versions all attached. Or find them here:
> 
> http://git.kernel.dk/cgit/linux-block/log/?h=nvme-4.18

Cool, everything looks in order here. I've staged in my local repo and
ready to push, but playing it safe this time for the weekend to see if
Christoph or anyone else has concerns.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:48             ` Keith Busch
@ 2018-05-18 22:46               ` Jens Axboe
  2018-05-21 14:18               ` Jens Axboe
  1 sibling, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2018-05-18 22:46 UTC (permalink / raw)


On 5/18/18 3:48 PM, Keith Busch wrote:
> On Fri, May 18, 2018@03:31:47PM -0600, Jens Axboe wrote:
>> Alright, folded into that patch. This impacts the two patches after
>> this one too, updated versions all attached. Or find them here:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=nvme-4.18
> 
> Cool, everything looks in order here. I've staged in my local repo and
> ready to push, but playing it safe this time for the weekend to see if
> Christoph or anyone else has concerns.

Sounds good, thanks Keith.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-18 21:48             ` Keith Busch
  2018-05-18 22:46               ` Jens Axboe
@ 2018-05-21 14:18               ` Jens Axboe
  2018-05-21 14:23                 ` Keith Busch
  1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-05-21 14:18 UTC (permalink / raw)


On 5/18/18 3:48 PM, Keith Busch wrote:
> On Fri, May 18, 2018@03:31:47PM -0600, Jens Axboe wrote:
>> Alright, folded into that patch. This impacts the two patches after
>> this one too, updated versions all attached. Or find them here:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=nvme-4.18
> 
> Cool, everything looks in order here. I've staged in my local repo and
> ready to push, but playing it safe this time for the weekend to see if
> Christoph or anyone else has concerns.

Shall we get it pushed? Might also not be a bad idea to do an early
pull for linux-block as well, with two weeks until the merge window
opens, for greater exposure.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-21 14:18               ` Jens Axboe
@ 2018-05-21 14:23                 ` Keith Busch
  2018-05-21 14:33                   ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2018-05-21 14:23 UTC (permalink / raw)


On Mon, May 21, 2018@08:18:05AM -0600, Jens Axboe wrote:
> Shall we get it pushed? Might also not be a bad idea to do an early
> pull for linux-block as well, with two weeks until the merge window
> opens, for greater exposure.

Sounds good, series pushed to nvme-4.18.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-21 14:23                 ` Keith Busch
@ 2018-05-21 14:33                   ` Jens Axboe
  2018-05-21 14:40                     ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2018-05-21 14:33 UTC (permalink / raw)


On 5/21/18 8:23 AM, Keith Busch wrote:
> On Mon, May 21, 2018@08:18:05AM -0600, Jens Axboe wrote:
>> Shall we get it pushed? Might also not be a bad idea to do an early
>> pull for linux-block as well, with two weeks until the merge window
>> opens, for greater exposure.
> 
> Sounds good, series pushed to nvme-4.18.

Just saw the pull, was writing the below. If you can ack/review it,
then I'll queue it on top.

---

You forgot to fold the poll fix... Here it is as a separate patch, or
fold it with "nvme-pci: handle completions outside of the queue lock"
and kill the last section in the commit message on cqe_seen.

From: Jens Axboe <axboe@kernel.dk>
Subject: [PATCH] nvme-pci: fix race between poll and IRQ completions

If polling completions are racing with the IRQ triggered by a
completion, the IRQ handler will find no work and return IRQ_NONE.
This can trigger complaints about spurious interrupts:

[  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
[  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
[  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
[  560.175991] Call Trace:
[  560.175994]  <IRQ>
[  560.176005]  dump_stack+0x5c/0x7b
[  560.176010]  __report_bad_irq+0x30/0xc0
[  560.176013]  note_interrupt+0x235/0x280
[  560.176020]  handle_irq_event_percpu+0x51/0x70
[  560.176023]  handle_irq_event+0x27/0x50
[  560.176026]  handle_edge_irq+0x6d/0x180
[  560.176031]  handle_irq+0xa5/0x110
[  560.176036]  do_IRQ+0x41/0xc0
[  560.176042]  common_interrupt+0xf/0xf
[  560.176043]  </IRQ>
[  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
[  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
[  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
[  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
[  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
[  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
[  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
[  560.176071]  do_idle+0x1f4/0x260
[  560.176075]  cpu_startup_entry+0x6f/0x80
[  560.176080]  start_secondary+0x184/0x1d0
[  560.176085]  secondary_startup_64+0xa5/0xb0
[  560.176088] handlers:
[  560.178387] [<00000000efb612be>] nvme_irq [nvme]
[  560.183019] Disabling IRQ #630

A previous commit removed ->cqe_seen that was handling this case,
but we need to handle this a bit differently due to completions
now running outside the queue lock. Return IRQ_HANDLED from the
IRQ handler, if the completion ring head was moved since we last
saw it.

Fixes: 5cb525c8315f ("nvme-pci: handle completions outside of the queue lock")
Reported-by: Keith Busch <keith.busch at intel.com>
Signed-off-by: Jens Axboe <axboe at kernel.dk>

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-21 14:33                   ` Jens Axboe
@ 2018-05-21 14:40                     ` Keith Busch
  2018-05-21 14:43                       ` Keith Busch
  0 siblings, 1 reply; 24+ messages in thread
From: Keith Busch @ 2018-05-21 14:40 UTC (permalink / raw)


On Mon, May 21, 2018@08:33:21AM -0600, Jens Axboe wrote:
> Just saw the pull, was writing the below. If you can ack/review it,
> then I'll queue it on top.

Oops, sorry about that.
 
> You forgot to fold the poll fix... Here it is as a separate patch, or
> fold it with "nvme-pci: handle completions outside of the queue lock"
> and kill the last section in the commit message on cqe_seen.
> 
> From: Jens Axboe <axboe at kernel.dk>
> Subject: [PATCH] nvme-pci: fix race between poll and IRQ completions
> 
> If polling completions are racing with the IRQ triggered by a
> completion, the IRQ handler will find no work and return IRQ_NONE.
> This can trigger complaints about spurious interrupts:
> 
> [  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
> [  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
> [  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
> [  560.175991] Call Trace:
> [  560.175994]  <IRQ>
> [  560.176005]  dump_stack+0x5c/0x7b
> [  560.176010]  __report_bad_irq+0x30/0xc0
> [  560.176013]  note_interrupt+0x235/0x280
> [  560.176020]  handle_irq_event_percpu+0x51/0x70
> [  560.176023]  handle_irq_event+0x27/0x50
> [  560.176026]  handle_edge_irq+0x6d/0x180
> [  560.176031]  handle_irq+0xa5/0x110
> [  560.176036]  do_IRQ+0x41/0xc0
> [  560.176042]  common_interrupt+0xf/0xf
> [  560.176043]  </IRQ>
> [  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
> [  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
> [  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
> [  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
> [  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
> [  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
> [  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
> [  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
> [  560.176071]  do_idle+0x1f4/0x260
> [  560.176075]  cpu_startup_entry+0x6f/0x80
> [  560.176080]  start_secondary+0x184/0x1d0
> [  560.176085]  secondary_startup_64+0xa5/0xb0
> [  560.176088] handlers:
> [  560.178387] [<00000000efb612be>] nvme_irq [nvme]
> [  560.183019] Disabling IRQ #630
> 
> A previous commit removed ->cqe_seen that was handling this case,
> but we need to handle this a bit differently due to completions
> now running outside the queue lock. Return IRQ_HANDLED from the
> IRQ handler, if the completion ring head was moved since we last
> saw it.
> 
> Fixes: 5cb525c8315f ("nvme-pci: handle completions outside of the queue lock")
> Reported-by: Keith Busch <keith.busch at intel.com>
> Signed-off-by: Jens Axboe <axboe at kernel.dk>

Reviewed-by: Keith Busch <keith.busch at intel.com>
Tested-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/7] nvme-pci: handle completions outside of the queue lock
  2018-05-21 14:40                     ` Keith Busch
@ 2018-05-21 14:43                       ` Keith Busch
  0 siblings, 0 replies; 24+ messages in thread
From: Keith Busch @ 2018-05-21 14:43 UTC (permalink / raw)


(resending; experiencing some growing pains after switching to a new
email address)

On Mon, May 21, 2018@08:40:00AM -0600, Keith Busch wrote:
> > Fixes: 5cb525c8315f ("nvme-pci: handle completions outside of the queue lock")
> > Reported-by: Keith Busch <keith.busch at intel.com>
> > Signed-off-by: Jens Axboe <axboe at kernel.dk>
 
Reviewed-by: Keith Busch <keith.busch at intel.com>
Tested-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid
  2018-05-17 16:31 RFC: handle completions outside the queue lock and split the " Christoph Hellwig
@ 2018-05-17 16:31 ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-17 16:31 UTC (permalink / raw)


We always look at the current CQ head and phase, so don't pass these
as separate arguments, and rename the function to nvme_cqe_pending.

Also micro-optimize by byte swapping the constant 1 instead of the status.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b58aebb347a0..4b8c3420ace6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -915,10 +915,10 @@ static void nvme_pci_complete_rq(struct request *req)
 }
 
 /* We read the CQE phase first to check if the rest of the entry is valid */
-static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head,
-		u16 phase)
+static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
 {
-	return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
+	return ((nvmeq->cqes[nvmeq->cq_head].status) & cpu_to_le16(1)) ==
+			nvmeq->cq_phase;
 }
 
 static inline void nvme_ring_cq_doorbell(struct nvme_queue *nvmeq)
@@ -965,7 +965,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq,
 static inline bool nvme_read_cqe(struct nvme_queue *nvmeq,
 		struct nvme_completion *cqe)
 {
-	if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
+	if (nvme_cqe_pending(nvmeq)) {
 		*cqe = nvmeq->cqes[nvmeq->cq_head];
 
 		if (++nvmeq->cq_head == nvmeq->q_depth) {
@@ -1006,7 +1006,7 @@ static irqreturn_t nvme_irq(int irq, void *data)
 static irqreturn_t nvme_irq_check(int irq, void *data)
 {
 	struct nvme_queue *nvmeq = data;
-	if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase))
+	if (nvme_cqe_pending(nvmeq))
 		return IRQ_WAKE_THREAD;
 	return IRQ_NONE;
 }
@@ -1016,7 +1016,7 @@ static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 	struct nvme_completion cqe;
 	int found = 0, consumed = 0;
 
-	if (!nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase))
+	if (!nvme_cqe_pending(nvmeq))
 		return 0;
 
 	spin_lock_irq(&nvmeq->q_lock);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-21 14:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18 14:52 [PATCHSET v2 0/7] Improve nvme completion handling Jens Axboe
2018-05-18 14:52 ` [PATCH 1/7] nvme: mark the result argument to nvme_complete_async_event volatile Jens Axboe
2018-05-18 14:52 ` [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid Jens Axboe
2018-05-18 20:49   ` Keith Busch
2018-05-18 20:48     ` Jens Axboe
2018-05-18 14:52 ` [PATCH 3/7] nvme-pci: remove cq check after submission Jens Axboe
2018-05-18 14:52 ` [PATCH 4/7] nvme-pci: move ->cq_vector == -1 check outside of ->q_lock Jens Axboe
2018-05-18 14:52 ` [PATCH 5/7] nvme-pci: handle completions outside of the queue lock Jens Axboe
2018-05-18 21:06   ` Keith Busch
2018-05-18 21:11     ` Jens Axboe
2018-05-18 21:22       ` Jens Axboe
2018-05-18 21:28         ` Keith Busch
2018-05-18 21:31           ` Jens Axboe
2018-05-18 21:48             ` Keith Busch
2018-05-18 22:46               ` Jens Axboe
2018-05-21 14:18               ` Jens Axboe
2018-05-21 14:23                 ` Keith Busch
2018-05-21 14:33                   ` Jens Axboe
2018-05-21 14:40                     ` Keith Busch
2018-05-21 14:43                       ` Keith Busch
2018-05-18 21:25       ` Keith Busch
2018-05-18 14:52 ` [PATCH 6/7] nvme-pci: split the nvme queue lock into submission and completion locks Jens Axboe
2018-05-18 14:52 ` [PATCH 7/7] nvme-pci: drop IRQ disabling on submission queue lock Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2018-05-17 16:31 RFC: handle completions outside the queue lock and split the " Christoph Hellwig
2018-05-17 16:31 ` [PATCH 2/7] nvme-pci: simplify nvme_cqe_valid Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.