All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme completion path optimizations and fixes V3
@ 2015-11-07  8:44 Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 01/12] block: fix blk_abort_request for blk-mq drivers Christoph Hellwig
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:44 UTC (permalink / raw)


This series optimized the NVMe completion path by taking full advantage
to the block layer multiple completion protection.  For that we'll need
to switch the remaining internal NVMe admin commands over to use the
block layer queuing.

But to do that we first have to special case the magic AEN requests.  This
only ends up being a single unlikely branch in the completion path, so
I think we're way better off than before.

The other important change is that block layer timeouts now call the driver
from workqueue context, which will make life much easier not just for NVMe,
but also for SCSI.  The way we use the workqueue API for this isn't quite
optimal, but I've Cced Tejun in case he can help out there.


Changes since V2:
 - not too much left from the original version except for the high level
   concept

Changes since V1:
 - various hotplug CPU fixes, including a patch from Akinobu Mita

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/12] block: fix blk_abort_request for blk-mq drivers
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
@ 2015-11-07  8:44 ` Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 02/12] block: defer timeouts to a workqueue Christoph Hellwig
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:44 UTC (permalink / raw)


We only added the request to the request list for the !blk-mq case,
so we should only delete it in that case as well.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-timeout.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 246dfb1..aa40aa9 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -158,11 +158,13 @@ void blk_abort_request(struct request *req)
 {
 	if (blk_mark_rq_complete(req))
 		return;
-	blk_delete_timer(req);
-	if (req->q->mq_ops)
+
+	if (req->q->mq_ops) {
 		blk_mq_rq_timed_out(req, false);
-	else
+	} else {
+		blk_delete_timer(req);
 		blk_rq_timed_out(req);
+	}
 }
 EXPORT_SYMBOL_GPL(blk_abort_request);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/12] block: defer timeouts to a workqueue
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 01/12] block: fix blk_abort_request for blk-mq drivers Christoph Hellwig
@ 2015-11-07  8:44 ` Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 03/12] nvme: factor out a nvme_unmap_data helper Christoph Hellwig
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:44 UTC (permalink / raw)


Timer context is not very useful for drivers to perform any meaningful abort
action from.  So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.

Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals.  But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-core.c       | 8 ++++++++
 block/blk-mq.c         | 8 +++++---
 block/blk-timeout.c    | 5 +++--
 block/blk.h            | 2 +-
 include/linux/blkdev.h | 1 +
 5 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index aecb544..2df634a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -664,6 +664,13 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 	wake_up_all(&q->mq_freeze_wq);
 }
 
+static void blk_rq_timed_out_timer(unsigned long data)
+{
+	struct request_queue *q = (struct request_queue *)data;
+
+	kblockd_schedule_work(&q->timeout_work);
+}
+
 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 {
 	struct request_queue *q;
@@ -825,6 +832,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 	if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
 		goto fail;
 
+	INIT_WORK(&q->timeout_work, blk_timeout_work);
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
 	q->unprep_rq_fn		= NULL;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c85c448..3a919ff 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -85,6 +85,7 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
+		cancel_work_sync(&q->timeout_work);
 		blk_mq_run_hw_queues(q, false);
 	}
 }
@@ -617,9 +618,10 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	}
 }
 
-static void blk_mq_rq_timer(unsigned long priv)
+static void blk_mq_timeout_work(struct work_struct *work)
 {
-	struct request_queue *q = (struct request_queue *)priv;
+	struct request_queue *q =
+		container_of(work, struct request_queue, timeout_work);
 	struct blk_mq_timeout_data data = {
 		.next		= 0,
 		.next_set	= 0,
@@ -1978,7 +1980,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		hctxs[i]->queue_num = i;
 	}
 
-	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
+	INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
 	blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
 
 	q->nr_queues = nr_cpu_ids;
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index aa40aa9..aedd128 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -127,9 +127,10 @@ static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout
 	}
 }
 
-void blk_rq_timed_out_timer(unsigned long data)
+void blk_timeout_work(struct work_struct *work)
 {
-	struct request_queue *q = (struct request_queue *) data;
+	struct request_queue *q =
+		container_of(work, struct request_queue, timeout_work);
 	unsigned long flags, next = 0;
 	struct request *rq, *tmp;
 	int next_set = 0;
diff --git a/block/blk.h b/block/blk.h
index 157c93d..f8d1784 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -95,7 +95,7 @@ static inline void blk_flush_integrity(void)
 }
 #endif
 
-void blk_rq_timed_out_timer(unsigned long data);
+void blk_timeout_work(struct work_struct *work);
 unsigned long blk_rq_timeout(unsigned long timeout);
 void blk_add_timer(struct request *req);
 void blk_delete_timer(struct request *);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d045ca8..ed11b7a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -407,6 +407,7 @@ struct request_queue {
 
 	unsigned int		rq_timeout;
 	struct timer_list	timeout;
+	struct work_struct	timeout_work;
 	struct list_head	timeout_list;
 
 	struct list_head	icq_list;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] nvme: factor out a nvme_unmap_data helper
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 01/12] block: fix blk_abort_request for blk-mq drivers Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 02/12] block: defer timeouts to a workqueue Christoph Hellwig
@ 2015-11-07  8:44 ` Christoph Hellwig
  2015-11-09 18:47   ` Keith Busch
  2015-11-07  8:44 ` [PATCH 04/12] nvme: factor out a few helpers from req_completion Christoph Hellwig
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:44 UTC (permalink / raw)


This is the counter part to nvme_map_data.  The forward declaration will
go away later in the series.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 43 +++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7911c43..a68ff1c6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -75,10 +75,12 @@ static wait_queue_head_t nvme_kthread_wait;
 
 struct nvme_dev;
 struct nvme_queue;
+struct nvme_iod;
 
 static int nvme_reset(struct nvme_dev *dev);
 static int nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
+static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod);
 
 struct async_cmd_info {
 	struct kthread_work work;
@@ -601,7 +603,6 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 	struct request *req = iod_get_private(iod);
 	struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
 	u16 status = le16_to_cpup(&cqe->status) >> 1;
-	bool requeue = false;
 	int error = 0;
 
 	if (unlikely(status)) {
@@ -609,13 +610,14 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 		    && (jiffies - req->start_time) < req->timeout) {
 			unsigned long flags;
 
-			requeue = true;
+			nvme_unmap_data(nvmeq->dev, iod);
+
 			blk_mq_requeue_request(req);
 			spin_lock_irqsave(req->q->queue_lock, flags);
 			if (!blk_queue_stopped(req->q))
 				blk_mq_kick_requeue_list(req->q);
 			spin_unlock_irqrestore(req->q->queue_lock, flags);
-			goto release_iod;
+			return;
 		}
 
 		if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
@@ -638,21 +640,8 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 			"completing aborted command with status:%04x\n",
 			error);
 
-release_iod:
-	if (iod->nents) {
-		dma_unmap_sg(nvmeq->dev->dev, iod->sg, iod->nents,
-			rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
-		if (blk_integrity_rq(req)) {
-			if (!rq_data_dir(req))
-				nvme_dif_remap(req, nvme_dif_complete);
-			dma_unmap_sg(nvmeq->dev->dev, iod->meta_sg, 1,
-				rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
-		}
-	}
-	nvme_free_iod(nvmeq->dev, iod);
-
-	if (likely(!requeue))
-		blk_mq_complete_request(req, error);
+	nvme_unmap_data(nvmeq->dev, iod);
+	blk_mq_complete_request(req, error);
 }
 
 static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
@@ -783,6 +772,24 @@ out:
 	return ret;
 }
 
+static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod)
+{
+	struct request *req = iod_get_private(iod);
+	enum dma_data_direction dma_dir = rq_data_dir(req) ?
+			DMA_TO_DEVICE : DMA_FROM_DEVICE;
+
+	if (iod->nents) {
+		dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+		if (blk_integrity_rq(req)) {
+			if (!rq_data_dir(req))
+				nvme_dif_remap(req, nvme_dif_complete);
+			dma_unmap_sg(dev->dev, iod->meta_sg, 1, dma_dir);
+		}
+	}
+
+	nvme_free_iod(dev, iod);
+}
+
 /*
  * We reuse the small pool to allocate the 16-byte range here as it is not
  * worth having a special pool for these or additional cases to handle freeing
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/12] nvme: factor out a few helpers from req_completion
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2015-11-07  8:44 ` [PATCH 03/12] nvme: factor out a nvme_unmap_data helper Christoph Hellwig
@ 2015-11-07  8:44 ` Christoph Hellwig
  2015-11-07  8:44 ` [PATCH 05/12] nvme: switch delete SQ/CQ to blk_execute_rq_nowait Christoph Hellwig
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:44 UTC (permalink / raw)


We'll need them in other places later.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 11 +++++++++++
 drivers/nvme/host/nvme.h |  7 +++++++
 drivers/nvme/host/pci.c  | 12 ++----------
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 66c7f52..a246c3d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -74,6 +74,17 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 	return ns;
 }
 
+void nvme_requeue_req(struct request *req)
+{
+	unsigned long flags;
+
+	blk_mq_requeue_request(req);
+	spin_lock_irqsave(req->q->queue_lock, flags);
+	if (!blk_queue_stopped(req->q))
+		blk_mq_kick_requeue_list(req->q);
+	spin_unlock_irqrestore(req->q->queue_lock, flags);
+}
+
 static struct request *nvme_alloc_request(struct request_queue *q,
 		struct nvme_command *cmd)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a879f9e..239f8c3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -189,6 +189,12 @@ static inline int nvme_error_status(u16 status)
 	}
 }
 
+static inline bool nvme_req_needs_retry(struct request *req, u16 status)
+{
+	return !(status & NVME_SC_DNR || blk_noretry_request(req)) &&
+		(jiffies - req->start_time) < req->timeout;
+}
+
 void nvme_uninit_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 		const struct nvme_ctrl_ops *ops, u16 vendor,
@@ -199,6 +205,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl);
 void nvme_scan_namespaces(struct nvme_ctrl *ctrl);
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
+void nvme_requeue_req(struct request *req);
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a68ff1c6..e0c8af4 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -606,17 +606,9 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 	int error = 0;
 
 	if (unlikely(status)) {
-		if (!(status & NVME_SC_DNR || blk_noretry_request(req))
-		    && (jiffies - req->start_time) < req->timeout) {
-			unsigned long flags;
-
+		if (nvme_req_needs_retry(req, status)) {
 			nvme_unmap_data(nvmeq->dev, iod);
-
-			blk_mq_requeue_request(req);
-			spin_lock_irqsave(req->q->queue_lock, flags);
-			if (!blk_queue_stopped(req->q))
-				blk_mq_kick_requeue_list(req->q);
-			spin_unlock_irqrestore(req->q->queue_lock, flags);
+			nvme_requeue_req(req);
 			return;
 		}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/12] nvme: switch delete SQ/CQ to blk_execute_rq_nowait
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2015-11-07  8:44 ` [PATCH 04/12] nvme: factor out a few helpers from req_completion Christoph Hellwig
@ 2015-11-07  8:44 ` Christoph Hellwig
  2015-11-07  8:45 ` [PATCH 06/12] nvme: switch abort " Christoph Hellwig
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:44 UTC (permalink / raw)


Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c |  2 +-
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 49 +++++++++++++++---------------------------------
 3 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a246c3d..37f7d69 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -85,7 +85,7 @@ void nvme_requeue_req(struct request *req)
 	spin_unlock_irqrestore(req->q->queue_lock, flags);
 }
 
-static struct request *nvme_alloc_request(struct request_queue *q,
+struct request *nvme_alloc_request(struct request_queue *q,
 		struct nvme_command *cmd)
 {
 	bool write = cmd->common.opcode & 1;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 239f8c3..58d36e7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -206,6 +206,8 @@ void nvme_scan_namespaces(struct nvme_ctrl *ctrl);
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
 void nvme_requeue_req(struct request *req);
+struct request *nvme_alloc_request(struct request_queue *q,
+		struct nvme_command *cmd);
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e0c8af4..06fe22d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -85,8 +85,6 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod);
 struct async_cmd_info {
 	struct kthread_work work;
 	struct kthread_worker *worker;
-	struct request *req;
-	u32 result;
 	int status;
 	void *ctx;
 };
@@ -390,16 +388,6 @@ static void abort_completion(struct nvme_queue *nvmeq, void *ctx,
 	atomic_inc(&nvmeq->dev->ctrl.abort_limit);
 }
 
-static void async_completion(struct nvme_queue *nvmeq, void *ctx,
-						struct nvme_completion *cqe)
-{
-	struct async_cmd_info *cmdinfo = ctx;
-	cmdinfo->result = le32_to_cpup(&cqe->result);
-	cmdinfo->status = le16_to_cpup(&cqe->status) >> 1;
-	queue_kthread_work(cmdinfo->worker, &cmdinfo->work);
-	blk_mq_free_request(cmdinfo->req);
-}
-
 static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq,
 				  unsigned int tag)
 {
@@ -959,28 +947,13 @@ static int nvme_submit_async_admin_req(struct nvme_dev *dev)
 	return 0;
 }
 
-static int nvme_submit_admin_async_cmd(struct nvme_dev *dev,
-			struct nvme_command *cmd,
-			struct async_cmd_info *cmdinfo, unsigned timeout)
+static void async_cmd_info_endio(struct request *req, int error)
 {
-	struct nvme_queue *nvmeq = dev->queues[0];
-	struct request *req;
-	struct nvme_cmd_info *cmd_rq;
-
-	req = blk_mq_alloc_request(dev->ctrl.admin_q, WRITE, GFP_KERNEL, false);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
+	struct async_cmd_info *cmdinfo = req->end_io_data;
 
-	req->timeout = timeout;
-	cmd_rq = blk_mq_rq_to_pdu(req);
-	cmdinfo->req = req;
-	nvme_set_info(cmd_rq, cmdinfo, async_completion);
-	cmdinfo->status = -EINTR;
-
-	cmd->common.command_id = req->tag;
-
-	nvme_submit_cmd(nvmeq, cmd);
-	return 0;
+	cmdinfo->status = req->errors;
+	queue_kthread_work(cmdinfo->worker, &cmdinfo->work);
+	blk_mq_free_request(req);
 }
 
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
@@ -1966,6 +1939,7 @@ static void nvme_del_queue_end(struct nvme_queue *nvmeq)
 static int adapter_async_del_queue(struct nvme_queue *nvmeq, u8 opcode,
 						kthread_work_func_t fn)
 {
+	struct request *req;
 	struct nvme_command c;
 
 	memset(&c, 0, sizeof(c));
@@ -1973,8 +1947,15 @@ static int adapter_async_del_queue(struct nvme_queue *nvmeq, u8 opcode,
 	c.delete_queue.qid = cpu_to_le16(nvmeq->qid);
 
 	init_kthread_work(&nvmeq->cmdinfo.work, fn);
-	return nvme_submit_admin_async_cmd(nvmeq->dev, &c, &nvmeq->cmdinfo,
-								ADMIN_TIMEOUT);
+
+	req = nvme_alloc_request(nvmeq->dev->ctrl.admin_q, &c);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->timeout = ADMIN_TIMEOUT;
+	req->end_io_data = &nvmeq->cmdinfo;
+	blk_execute_rq_nowait(req->q, NULL, req, 0, async_cmd_info_endio);
+	return 0;
 }
 
 static void nvme_del_cq_work_handler(struct kthread_work *work)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/12] nvme: switch abort to blk_execute_rq_nowait
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2015-11-07  8:44 ` [PATCH 05/12] nvme: switch delete SQ/CQ to blk_execute_rq_nowait Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  2015-11-09 21:33   ` Keith Busch
  2015-11-07  8:45 ` [PATCH 07/12] nvme: special case AEN requests Christoph Hellwig
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


And remove the new unused nvme_submit_cmd helper.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c |  8 +++---
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 66 ++++++++++++++++++++----------------------------
 3 files changed, 32 insertions(+), 44 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37f7d69..3600a0c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -86,12 +86,12 @@ void nvme_requeue_req(struct request *req)
 }
 
 struct request *nvme_alloc_request(struct request_queue *q,
-		struct nvme_command *cmd)
+		struct nvme_command *cmd, bool nowait)
 {
 	bool write = cmd->common.opcode & 1;
 	struct request *req;
 
-	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
+	req = blk_mq_alloc_request(q, write, GFP_KERNEL, nowait);
 	if (IS_ERR(req))
 		return req;
 
@@ -118,7 +118,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	struct request *req;
 	int ret;
 
-	req = nvme_alloc_request(q, cmd);
+	req = nvme_alloc_request(q, cmd, false);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -158,7 +158,7 @@ int __nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
 	void *meta = NULL;
 	int ret;
 
-	req = nvme_alloc_request(q, cmd);
+	req = nvme_alloc_request(q, cmd, false);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 58d36e7..d72bbe0 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -207,7 +207,7 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
 void nvme_requeue_req(struct request *req);
 struct request *nvme_alloc_request(struct request_queue *q,
-		struct nvme_command *cmd);
+		struct nvme_command *cmd, bool nowait);
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 06fe22d..cced2e6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -374,20 +374,6 @@ static void async_req_completion(struct nvme_queue *nvmeq, void *ctx,
 	}
 }
 
-static void abort_completion(struct nvme_queue *nvmeq, void *ctx,
-						struct nvme_completion *cqe)
-{
-	struct request *req = ctx;
-
-	u16 status = le16_to_cpup(&cqe->status) >> 1;
-	u32 result = le32_to_cpup(&cqe->result);
-
-	blk_mq_free_request(req);
-
-	dev_warn(nvmeq->q_dmadev, "Abort status:%x result:%x", status, result);
-	atomic_inc(&nvmeq->dev->ctrl.abort_limit);
-}
-
 static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq,
 				  unsigned int tag)
 {
@@ -417,7 +403,7 @@ static void *nvme_finish_cmd(struct nvme_queue *nvmeq, int tag,
 }
 
 /**
- * nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
+ * __nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
  * @nvmeq: The queue to use
  * @cmd: The command to send
  *
@@ -439,14 +425,6 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
 	nvmeq->sq_tail = tail;
 }
 
-static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd)
-{
-	unsigned long flags;
-	spin_lock_irqsave(&nvmeq->q_lock, flags);
-	__nvme_submit_cmd(nvmeq, cmd);
-	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
-}
-
 static __le64 **iod_list(struct nvme_iod *iod)
 {
 	return ((void *)iod) + iod->offset;
@@ -1019,13 +997,25 @@ static int adapter_delete_sq(struct nvme_dev *dev, u16 sqid)
 	return adapter_delete_queue(dev, nvme_admin_delete_sq, sqid);
 }
 
+static void abort_endio(struct request *req, int error)
+{
+	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
+	struct nvme_queue *nvmeq = cmd->nvmeq;
+	u32 result = (u32)(uintptr_t)req->special;
+	u16 status = req->errors;
+
+	dev_warn(nvmeq->q_dmadev, "Abort status:%x result:%x", status, result);
+	atomic_inc(&nvmeq->dev->ctrl.abort_limit);
+
+	blk_mq_free_request(req);
+}
+
 static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 {
 	struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = cmd_rq->nvmeq;
 	struct nvme_dev *dev = nvmeq->dev;
 	struct request *abort_req;
-	struct nvme_cmd_info *abort_cmd;
 	struct nvme_command cmd;
 
 	/*
@@ -1047,27 +1037,25 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	if (atomic_dec_and_test(&dev->ctrl.abort_limit))
 		return BLK_EH_RESET_TIMER;
 
-	abort_req = blk_mq_alloc_request(dev->ctrl.admin_q, WRITE, GFP_ATOMIC,
-									false);
-	if (IS_ERR(abort_req)) {
-		atomic_inc(&dev->ctrl.abort_limit);
-		return BLK_EH_RESET_TIMER;
-	}
-
-	abort_cmd = blk_mq_rq_to_pdu(abort_req);
-	nvme_set_info(abort_cmd, abort_req, abort_completion);
-
 	memset(&cmd, 0, sizeof(cmd));
 	cmd.abort.opcode = nvme_admin_abort_cmd;
 	cmd.abort.cid = req->tag;
 	cmd.abort.sqid = cpu_to_le16(nvmeq->qid);
-	cmd.abort.command_id = abort_req->tag;
-
-	cmd_rq->aborted = 1;
 
 	dev_warn(nvmeq->q_dmadev, "I/O %d QID %d timeout, aborting\n",
 				 req->tag, nvmeq->qid);
-	nvme_submit_cmd(dev->queues[0], &cmd);
+
+	abort_req = nvme_alloc_request(dev->ctrl.admin_q, &cmd, true);
+	if (IS_ERR(abort_req)) {
+		atomic_inc(&dev->ctrl.abort_limit);
+		return BLK_EH_RESET_TIMER;
+	}
+
+	cmd_rq->aborted = 1;
+
+	abort_req->timeout = ADMIN_TIMEOUT;
+	abort_req->end_io_data = NULL;
+	blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
 
 	/*
 	 * The aborted req will be completed on receiving the abort req.
@@ -1948,7 +1936,7 @@ static int adapter_async_del_queue(struct nvme_queue *nvmeq, u8 opcode,
 
 	init_kthread_work(&nvmeq->cmdinfo.work, fn);
 
-	req = nvme_alloc_request(nvmeq->dev->ctrl.admin_q, &c);
+	req = nvme_alloc_request(nvmeq->dev->ctrl.admin_q, &c, false);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/12] nvme: special case AEN requests
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2015-11-07  8:45 ` [PATCH 06/12] nvme: switch abort " Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  2015-11-09 21:43   ` Keith Busch
  2015-11-07  8:45 ` [PATCH 08/12] nvme: simplify completion handling Christoph Hellwig
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


AEN requests are different from other requests in that they don't time out
or can easily be cancelled.  Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 58 ++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cced2e6..344aa6e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -45,6 +45,7 @@
 
 #define NVME_Q_DEPTH		1024
 #define NVME_AQ_DEPTH		256
+#define NVME_NR_AEN_COMMANDS	1
 #define SQ_SIZE(depth)		(depth * sizeof(struct nvme_command))
 #define CQ_SIZE(depth)		(depth * sizeof(struct nvme_completion))
 #define SHUTDOWN_TIMEOUT	(shutdown_timeout * HZ)
@@ -354,23 +355,22 @@ static void *cancel_cmd_info(struct nvme_cmd_info *cmd, nvme_completion_fn *fn)
 	return ctx;
 }
 
-static void async_req_completion(struct nvme_queue *nvmeq, void *ctx,
-						struct nvme_completion *cqe)
+static void nvme_finish_aen_cmd(struct nvme_dev *dev, struct nvme_completion *cqe)
 {
-	u32 result = le32_to_cpup(&cqe->result);
-	u16 status = le16_to_cpup(&cqe->status) >> 1;
+	u16 status = le16_to_cpu(cqe->status) >> 1;
+	u32 result = le32_to_cpu(cqe->result);
 
 	if (status == NVME_SC_SUCCESS || status == NVME_SC_ABORT_REQ)
-		++nvmeq->dev->ctrl.event_limit;
+		++dev->ctrl.event_limit;
 	if (status != NVME_SC_SUCCESS)
 		return;
 
 	switch (result & 0xff07) {
 	case NVME_AER_NOTICE_NS_CHANGED:
-		dev_info(nvmeq->q_dmadev, "rescanning\n");
-		queue_work(nvme_workq, &nvmeq->dev->scan_work);
+		dev_info(dev->dev, "rescanning\n");
+		queue_work(nvme_workq, &dev->scan_work);
 	default:
-		dev_warn(nvmeq->q_dmadev, "async event result %08x\n", result);
+		dev_warn(dev->dev, "async event result %08x\n", result);
 	}
 }
 
@@ -852,13 +852,28 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
 		void *ctx;
 		nvme_completion_fn fn;
 		struct nvme_completion cqe = nvmeq->cqes[head];
-		if ((le16_to_cpu(cqe.status) & 1) != phase)
+		u16 status = le16_to_cpu(cqe.status);
+
+		if ((status & 1) != phase)
 			break;
 		nvmeq->sq_head = le16_to_cpu(cqe.sq_head);
 		if (++head == nvmeq->q_depth) {
 			head = 0;
 			phase = !phase;
 		}
+
+		/*
+		 * AEN requests are special as they don't time out and can
+		 * survive any kind of queue freeze and often don't respond to
+		 * aborts.  We don't even bother to allocate a struct request
+		 * for them but rather special case them here.
+		 */
+		if (unlikely(nvmeq->qid == 0 &&
+				cqe.command_id < NVME_NR_AEN_COMMANDS)) {
+			nvme_finish_aen_cmd(nvmeq->dev, &cqe);
+			continue;
+		}
+
 		ctx = nvme_finish_cmd(nvmeq, cqe.command_id, &fn);
 		fn(nvmeq, ctx, &cqe);
 	}
@@ -901,28 +916,14 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 	return IRQ_WAKE_THREAD;
 }
 
-static int nvme_submit_async_admin_req(struct nvme_dev *dev)
+static void nvme_submit_async_admin_req(struct nvme_dev *dev)
 {
-	struct nvme_queue *nvmeq = dev->queues[0];
 	struct nvme_command c;
-	struct nvme_cmd_info *cmd_info;
-	struct request *req;
-
-	req = blk_mq_alloc_request(dev->ctrl.admin_q, WRITE, GFP_ATOMIC, true);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	req->cmd_flags |= REQ_NO_TIMEOUT;
-	cmd_info = blk_mq_rq_to_pdu(req);
-	nvme_set_info(cmd_info, NULL, async_req_completion);
 
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = nvme_admin_async_event;
-	c.common.command_id = req->tag;
 
-	blk_mq_free_request(req);
-	__nvme_submit_cmd(nvmeq, &c);
-	return 0;
+	__nvme_submit_cmd(dev->queues[0], &c);
 }
 
 static void async_cmd_info_endio(struct request *req, int error)
@@ -1409,7 +1410,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		dev->admin_tagset.ops = &nvme_mq_admin_ops;
 		dev->admin_tagset.nr_hw_queues = 1;
 		dev->admin_tagset.queue_depth = NVME_AQ_DEPTH - 1;
-		dev->admin_tagset.reserved_tags = 1;
+		dev->admin_tagset.reserved_tags = NVME_NR_AEN_COMMANDS;
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
 		dev->admin_tagset.cmd_size = nvme_cmd_size(dev);
@@ -1543,8 +1544,7 @@ static int nvme_kthread(void *data)
 				nvme_process_cq(nvmeq);
 
 				while (i == 0 && dev->ctrl.event_limit > 0) {
-					if (nvme_submit_async_admin_req(dev))
-						break;
+					nvme_submit_async_admin_req(dev);
 					dev->ctrl.event_limit--;
 				}
 				spin_unlock_irq(&nvmeq->q_lock);
@@ -2198,7 +2198,7 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto free_tags;
 
-	dev->ctrl.event_limit = 1;
+	dev->ctrl.event_limit = NVME_NR_AEN_COMMANDS;
 
 	result = nvme_dev_list_add(dev);
 	if (result)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/12] nvme: simplify completion handling
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2015-11-07  8:45 ` [PATCH 07/12] nvme: special case AEN requests Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  2015-11-07  8:45 ` [PATCH 09/12] nvme: properly free resources for cancelled command Christoph Hellwig
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver.  Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 140 ++++++++++--------------------------------------
 1 file changed, 28 insertions(+), 112 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 344aa6e..03b8a3c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -193,15 +193,11 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
 }
 
-typedef void (*nvme_completion_fn)(struct nvme_queue *, void *,
-						struct nvme_completion *);
-
 struct nvme_cmd_info {
-	nvme_completion_fn fn;
-	void *ctx;
 	int aborted;
 	struct nvme_queue *nvmeq;
-	struct nvme_iod iod[0];
+	struct nvme_iod *iod;
+	struct nvme_iod __iod;
 };
 
 /*
@@ -295,15 +291,6 @@ static int nvme_init_request(void *data, struct request *req,
 	return 0;
 }
 
-static void nvme_set_info(struct nvme_cmd_info *cmd, void *ctx,
-				nvme_completion_fn handler)
-{
-	cmd->fn = handler;
-	cmd->ctx = ctx;
-	cmd->aborted = 0;
-	blk_mq_start_request(blk_mq_rq_from_pdu(cmd));
-}
-
 static void *iod_get_private(struct nvme_iod *iod)
 {
 	return (void *) (iod->private & ~0x1UL);
@@ -317,44 +304,6 @@ static bool iod_should_kfree(struct nvme_iod *iod)
 	return (iod->private & NVME_INT_MASK) == 0;
 }
 
-/* Special values must be less than 0x1000 */
-#define CMD_CTX_BASE		((void *)POISON_POINTER_DELTA)
-#define CMD_CTX_CANCELLED	(0x30C + CMD_CTX_BASE)
-#define CMD_CTX_COMPLETED	(0x310 + CMD_CTX_BASE)
-#define CMD_CTX_INVALID		(0x314 + CMD_CTX_BASE)
-
-static void special_completion(struct nvme_queue *nvmeq, void *ctx,
-						struct nvme_completion *cqe)
-{
-	if (ctx == CMD_CTX_CANCELLED)
-		return;
-	if (ctx == CMD_CTX_COMPLETED) {
-		dev_warn(nvmeq->q_dmadev,
-				"completed id %d twice on queue %d\n",
-				cqe->command_id, le16_to_cpup(&cqe->sq_id));
-		return;
-	}
-	if (ctx == CMD_CTX_INVALID) {
-		dev_warn(nvmeq->q_dmadev,
-				"invalid id %d completed on queue %d\n",
-				cqe->command_id, le16_to_cpup(&cqe->sq_id));
-		return;
-	}
-	dev_warn(nvmeq->q_dmadev, "Unknown special completion %p\n", ctx);
-}
-
-static void *cancel_cmd_info(struct nvme_cmd_info *cmd, nvme_completion_fn *fn)
-{
-	void *ctx;
-
-	if (fn)
-		*fn = cmd->fn;
-	ctx = cmd->ctx;
-	cmd->fn = special_completion;
-	cmd->ctx = CMD_CTX_CANCELLED;
-	return ctx;
-}
-
 static void nvme_finish_aen_cmd(struct nvme_dev *dev, struct nvme_completion *cqe)
 {
 	u16 status = le16_to_cpu(cqe->status) >> 1;
@@ -374,34 +323,6 @@ static void nvme_finish_aen_cmd(struct nvme_dev *dev, struct nvme_completion *cq
 	}
 }
 
-static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq,
-				  unsigned int tag)
-{
-	struct request *req = blk_mq_tag_to_rq(*nvmeq->tags, tag);
-
-	return blk_mq_rq_to_pdu(req);
-}
-
-/*
- * Called with local interrupts disabled and the q_lock held.  May not sleep.
- */
-static void *nvme_finish_cmd(struct nvme_queue *nvmeq, int tag,
-						nvme_completion_fn *fn)
-{
-	struct nvme_cmd_info *cmd = get_cmd_from_tag(nvmeq, tag);
-	void *ctx;
-	if (tag >= nvmeq->q_depth) {
-		*fn = special_completion;
-		return CMD_CTX_INVALID;
-	}
-	if (fn)
-		*fn = cmd->fn;
-	ctx = cmd->ctx;
-	cmd->fn = special_completion;
-	cmd->ctx = CMD_CTX_COMPLETED;
-	return ctx;
-}
-
 /**
  * __nvme_submit_cmd() - Copy a command into a queue and ring the doorbell
  * @nvmeq: The queue to use
@@ -465,7 +386,7 @@ static struct nvme_iod *nvme_alloc_iod(struct request *rq, struct nvme_dev *dev,
 	    size <= NVME_INT_BYTES(dev)) {
 		struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(rq);
 
-		iod = cmd->iod;
+		iod = &cmd->__iod;
 		iod_init(iod, size, rq->nr_phys_segments,
 				(unsigned long) rq | NVME_INT_MASK);
 		return iod;
@@ -562,12 +483,11 @@ static void nvme_dif_complete(u32 p, u32 v, struct t10_pi_tuple *pi)
 }
 #endif
 
-static void req_completion(struct nvme_queue *nvmeq, void *ctx,
-						struct nvme_completion *cqe)
+static void req_completion(struct nvme_queue *nvmeq, struct nvme_completion *cqe)
 {
-	struct nvme_iod *iod = ctx;
-	struct request *req = iod_get_private(iod);
+	struct request *req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
 	struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
+	struct nvme_iod *iod = cmd_rq->iod;
 	u16 status = le16_to_cpup(&cqe->status) >> 1;
 	int error = 0;
 
@@ -579,10 +499,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 		}
 
 		if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
-			if (cmd_rq->ctx == CMD_CTX_CANCELLED)
-				error = -EINTR;
-			else
-				error = status;
+			error = status;
 		} else {
 			error = nvme_error_status(status);
 		}
@@ -828,8 +745,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ret)
 		goto out;
 
+	cmd->iod = iod;
+	cmd->aborted = 0;
 	cmnd.common.command_id = req->tag;
-	nvme_set_info(cmd, iod, req_completion);
+	blk_mq_start_request(req);
 
 	spin_lock_irq(&nvmeq->q_lock);
 	__nvme_submit_cmd(nvmeq, &cmnd);
@@ -849,8 +768,6 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
 	phase = nvmeq->cq_phase;
 
 	for (;;) {
-		void *ctx;
-		nvme_completion_fn fn;
 		struct nvme_completion cqe = nvmeq->cqes[head];
 		u16 status = le16_to_cpu(cqe.status);
 
@@ -862,6 +779,13 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
 			phase = !phase;
 		}
 
+		if (unlikely(cqe.command_id >= nvmeq->q_depth)) {
+			dev_warn(nvmeq->q_dmadev,
+				"invalid id %d completed on queue %d\n",
+				cqe.command_id, le16_to_cpu(cqe.sq_id));
+			continue;
+		}
+
 		/*
 		 * AEN requests are special as they don't time out and can
 		 * survive any kind of queue freeze and often don't respond to
@@ -874,8 +798,7 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
 			continue;
 		}
 
-		ctx = nvme_finish_cmd(nvmeq, cqe.command_id, &fn);
-		fn(nvmeq, ctx, &cqe);
+		req_completion(nvmeq, &cqe);
 	}
 
 	/* If the controller ignores the cq head doorbell and continuously
@@ -1069,29 +992,22 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 static void nvme_cancel_queue_ios(struct request *req, void *data, bool reserved)
 {
 	struct nvme_queue *nvmeq = data;
-	void *ctx;
-	nvme_completion_fn fn;
-	struct nvme_cmd_info *cmd;
-	struct nvme_completion cqe;
 
 	if (!blk_mq_request_started(req))
 		return;
 
-	cmd = blk_mq_rq_to_pdu(req);
-
-	if (cmd->ctx == CMD_CTX_CANCELLED)
-		return;
-
-	if (blk_queue_dying(req->q))
-		cqe.status = cpu_to_le16((NVME_SC_ABORT_REQ | NVME_SC_DNR) << 1);
-	else
-		cqe.status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
-
-
 	dev_warn(nvmeq->q_dmadev, "Cancelling I/O %d QID %d\n",
 						req->tag, nvmeq->qid);
-	ctx = cancel_cmd_info(cmd, &fn);
-	fn(nvmeq, ctx, &cqe);
+
+	if (blk_queue_dying(req->q)) {
+		/*
+		 * Use a negative Linux errno here so that the submitter can
+		 * distinguish between hardware and driver errors.
+		 */
+		blk_mq_complete_request(req, -EINTR);
+	} else {
+		blk_mq_complete_request(req, NVME_SC_ABORT_REQ);
+	}
 }
 
 static void nvme_free_queue(struct nvme_queue *nvmeq)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2015-11-07  8:45 ` [PATCH 08/12] nvme: simplify completion handling Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  2015-11-09 18:57   ` Keith Busch
  2015-11-07  8:45 ` [PATCH 10/12] nvme: meta_sg doesn't have to be an array Christoph Hellwig
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


We need to move freeing of resources to the ->complete handler to ensure
they are also freed when we cancel the command.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 86 +++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 03b8a3c..a5ee159 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -76,12 +76,10 @@ static wait_queue_head_t nvme_kthread_wait;
 
 struct nvme_dev;
 struct nvme_queue;
-struct nvme_iod;
 
 static int nvme_reset(struct nvme_dev *dev);
 static int nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
-static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod);
 
 struct async_cmd_info {
 	struct kthread_work work;
@@ -483,42 +481,6 @@ static void nvme_dif_complete(u32 p, u32 v, struct t10_pi_tuple *pi)
 }
 #endif
 
-static void req_completion(struct nvme_queue *nvmeq, struct nvme_completion *cqe)
-{
-	struct request *req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
-	struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
-	struct nvme_iod *iod = cmd_rq->iod;
-	u16 status = le16_to_cpup(&cqe->status) >> 1;
-	int error = 0;
-
-	if (unlikely(status)) {
-		if (nvme_req_needs_retry(req, status)) {
-			nvme_unmap_data(nvmeq->dev, iod);
-			nvme_requeue_req(req);
-			return;
-		}
-
-		if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
-			error = status;
-		} else {
-			error = nvme_error_status(status);
-		}
-	}
-
-	if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
-		u32 result = le32_to_cpup(&cqe->result);
-		req->special = (void *)(uintptr_t)result;
-	}
-
-	if (cmd_rq->aborted)
-		dev_warn(nvmeq->dev->dev,
-			"completing aborted command with status:%04x\n",
-			error);
-
-	nvme_unmap_data(nvmeq->dev, iod);
-	blk_mq_complete_request(req, error);
-}
-
 static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 		int total_len)
 {
@@ -760,6 +722,43 @@ out:
 	return ret;
 }
 
+static void nvme_complete_rq(struct request *req)
+{
+	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
+	struct nvme_dev *dev = cmd->nvmeq->dev;
+	int error = 0;
+
+	nvme_unmap_data(dev, cmd->iod);
+
+	if (unlikely(req->errors)) {
+		/*
+		 * Some silly Intel userspace code breaks if it doesn't get a
+		 * negative errno back for driver returns values.
+		 */
+		if (req->errors < 0) {
+			error = req->errors;
+		} else {
+			if (nvme_req_needs_retry(req, req->errors)) {
+				nvme_requeue_req(req);
+				return;
+			}
+
+			if (req->cmd_type == REQ_TYPE_DRV_PRIV)
+				error = req->errors;
+			else
+				error = nvme_error_status(req->errors);
+		}
+	}
+
+	if (cmd->aborted) {
+		dev_warn(dev->dev,
+			"completing aborted command with status:%04x\n",
+			req->errors);
+	}
+
+	blk_mq_end_request(req, error);
+}
+
 static int nvme_process_cq(struct nvme_queue *nvmeq)
 {
 	u16 head, phase;
@@ -770,6 +769,7 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
 	for (;;) {
 		struct nvme_completion cqe = nvmeq->cqes[head];
 		u16 status = le16_to_cpu(cqe.status);
+		struct request *req;
 
 		if ((status & 1) != phase)
 			break;
@@ -798,7 +798,13 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
 			continue;
 		}
 
-		req_completion(nvmeq, &cqe);
+		req = blk_mq_tag_to_rq(*nvmeq->tags, cqe.command_id);
+		if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
+			u32 result = le32_to_cpu(cqe.result);
+			req->special = (void *)(uintptr_t)result;
+		}
+		blk_mq_complete_request(req, status >> 1);
+
 	}
 
 	/* If the controller ignores the cq head doorbell and continuously
@@ -1297,6 +1303,7 @@ static int nvme_shutdown_ctrl(struct nvme_dev *dev)
 
 static struct blk_mq_ops nvme_mq_admin_ops = {
 	.queue_rq	= nvme_queue_rq,
+	.complete	= nvme_complete_rq,
 	.map_queue	= blk_mq_map_queue,
 	.init_hctx	= nvme_admin_init_hctx,
 	.exit_hctx      = nvme_admin_exit_hctx,
@@ -1306,6 +1313,7 @@ static struct blk_mq_ops nvme_mq_admin_ops = {
 
 static struct blk_mq_ops nvme_mq_ops = {
 	.queue_rq	= nvme_queue_rq,
+	.complete	= nvme_complete_rq,
 	.map_queue	= blk_mq_map_queue,
 	.init_hctx	= nvme_init_hctx,
 	.init_request	= nvme_init_request,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/12] nvme: meta_sg doesn't have to be an array
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2015-11-07  8:45 ` [PATCH 09/12] nvme: properly free resources for cancelled command Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  2015-11-07  8:45 ` [PATCH 11/12] nvme: merge iod and cmd_info Christoph Hellwig
  2015-11-07  8:45 ` [PATCH 12/12] block: remove REQ_NO_TIMEOUT flag Christoph Hellwig
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a5ee159..a25157b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -168,7 +168,7 @@ struct nvme_iod {
 	int nents;		/* Used in scatterlist */
 	int length;		/* Of data, in bytes */
 	dma_addr_t first_dma;
-	struct scatterlist meta_sg[1]; /* metadata requires single contiguous buffer */
+	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
 	struct scatterlist sg[0];
 };
 
@@ -586,21 +586,21 @@ static int nvme_map_data(struct nvme_dev *dev, struct nvme_iod *iod,
 		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
 			goto out_unmap;
 
-		sg_init_table(iod->meta_sg, 1);
-		if (blk_rq_map_integrity_sg(q, req->bio, iod->meta_sg) != 1)
+		sg_init_table(&iod->meta_sg, 1);
+		if (blk_rq_map_integrity_sg(q, req->bio, &iod->meta_sg) != 1)
 			goto out_unmap;
 
 		if (rq_data_dir(req))
 			nvme_dif_remap(req, nvme_dif_prep);
 
-		if (!dma_map_sg(dev->dev, iod->meta_sg, 1, dma_dir))
+		if (!dma_map_sg(dev->dev, &iod->meta_sg, 1, dma_dir))
 			goto out_unmap;
 	}
 
 	cmnd->rw.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
 	cmnd->rw.prp2 = cpu_to_le64(iod->first_dma);
 	if (blk_integrity_rq(req))
-		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(iod->meta_sg));
+		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_unmap:
@@ -620,7 +620,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod)
 		if (blk_integrity_rq(req)) {
 			if (!rq_data_dir(req))
 				nvme_dif_remap(req, nvme_dif_complete);
-			dma_unmap_sg(dev->dev, iod->meta_sg, 1, dma_dir);
+			dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir);
 		}
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/12] nvme: merge iod and cmd_info
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2015-11-07  8:45 ` [PATCH 10/12] nvme: meta_sg doesn't have to be an array Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  2015-11-07  8:45 ` [PATCH 12/12] block: remove REQ_NO_TIMEOUT flag Christoph Hellwig
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


Merge the two per-request structures in the nvme driver into a single
one.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 184 +++++++++++++++++++-----------------------------
 1 file changed, 73 insertions(+), 111 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a25157b..2ef408f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -158,18 +158,19 @@ struct nvme_queue {
 /*
  * The nvme_iod describes the data in an I/O, including the list of PRP
  * entries.  You can't see it in this data structure because C doesn't let
- * me express that.  Use nvme_alloc_iod to ensure there's enough space
+ * me express that.  Use nvme_init_iod to ensure there's enough space
  * allocated to store the PRP list.
  */
 struct nvme_iod {
-	unsigned long private;	/* For the use of the submitter of the I/O */
+	struct nvme_queue *nvmeq;
+	int aborted;
 	int npages;		/* In the PRP list. 0 means small pool in use */
-	int offset;		/* Of PRP list */
 	int nents;		/* Used in scatterlist */
 	int length;		/* Of data, in bytes */
 	dma_addr_t first_dma;
 	struct scatterlist meta_sg; /* metadata requires single contiguous buffer */
-	struct scatterlist sg[0];
+	struct scatterlist *sg;
+	struct scatterlist inline_sg[0];
 };
 
 /*
@@ -191,19 +192,11 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
 }
 
-struct nvme_cmd_info {
-	int aborted;
-	struct nvme_queue *nvmeq;
-	struct nvme_iod *iod;
-	struct nvme_iod __iod;
-};
-
 /*
  * Max size of iod being embedded in the request payload
  */
 #define NVME_INT_PAGES		2
 #define NVME_INT_BYTES(dev)	(NVME_INT_PAGES * (dev)->page_size)
-#define NVME_INT_MASK		0x01
 
 /*
  * Will slightly overestimate the number of pages needed.  This is OK
@@ -216,15 +209,17 @@ static int nvme_npages(unsigned size, struct nvme_dev *dev)
 	return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
 }
 
-static unsigned int nvme_cmd_size(struct nvme_dev *dev)
+static unsigned int nvme_iod_alloc_size(struct nvme_dev *dev,
+		unsigned int size, unsigned int nseg)
 {
-	unsigned int ret = sizeof(struct nvme_cmd_info);
-
-	ret += sizeof(struct nvme_iod);
-	ret += sizeof(__le64 *) * nvme_npages(NVME_INT_BYTES(dev), dev);
-	ret += sizeof(struct scatterlist) * NVME_INT_PAGES;
+	return sizeof(__le64 *) * nvme_npages(size, dev) +
+			sizeof(struct scatterlist) * nseg;
+}
 
-	return ret;
+static unsigned int nvme_cmd_size(struct nvme_dev *dev)
+{
+	return sizeof(struct nvme_iod) +
+		nvme_iod_alloc_size(dev, NVME_INT_BYTES(dev), NVME_INT_PAGES);
 }
 
 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
@@ -254,11 +249,11 @@ static int nvme_admin_init_request(void *data, struct request *req,
 				unsigned int numa_node)
 {
 	struct nvme_dev *dev = data;
-	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = dev->queues[0];
 
 	BUG_ON(!nvmeq);
-	cmd->nvmeq = nvmeq;
+	iod->nvmeq = nvmeq;
 	return 0;
 }
 
@@ -281,27 +276,14 @@ static int nvme_init_request(void *data, struct request *req,
 				unsigned int numa_node)
 {
 	struct nvme_dev *dev = data;
-	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = dev->queues[hctx_idx + 1];
 
 	BUG_ON(!nvmeq);
-	cmd->nvmeq = nvmeq;
+	iod->nvmeq = nvmeq;
 	return 0;
 }
 
-static void *iod_get_private(struct nvme_iod *iod)
-{
-	return (void *) (iod->private & ~0x1UL);
-}
-
-/*
- * If bit 0 is set, the iod is embedded in the request payload.
- */
-static bool iod_should_kfree(struct nvme_iod *iod)
-{
-	return (iod->private & NVME_INT_MASK) == 0;
-}
-
 static void nvme_finish_aen_cmd(struct nvme_dev *dev, struct nvme_completion *cqe)
 {
 	u16 status = le16_to_cpu(cqe->status) >> 1;
@@ -344,61 +326,44 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
 	nvmeq->sq_tail = tail;
 }
 
-static __le64 **iod_list(struct nvme_iod *iod)
-{
-	return ((void *)iod) + iod->offset;
-}
-
-static inline void iod_init(struct nvme_iod *iod, unsigned nbytes,
-			    unsigned nseg, unsigned long private)
-{
-	iod->private = private;
-	iod->offset = offsetof(struct nvme_iod, sg[nseg]);
-	iod->npages = -1;
-	iod->length = nbytes;
-	iod->nents = 0;
-}
-
-static struct nvme_iod *
-__nvme_alloc_iod(unsigned nseg, unsigned bytes, struct nvme_dev *dev,
-		 unsigned long priv, gfp_t gfp)
+static __le64 **iod_list(struct request *req)
 {
-	struct nvme_iod *iod = kmalloc(sizeof(struct nvme_iod) +
-				sizeof(__le64 *) * nvme_npages(bytes, dev) +
-				sizeof(struct scatterlist) * nseg, gfp);
-
-	if (iod)
-		iod_init(iod, bytes, nseg, priv);
-
-	return iod;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	return (__le64 **)(iod->sg + req->nr_phys_segments);
 }
 
-static struct nvme_iod *nvme_alloc_iod(struct request *rq, struct nvme_dev *dev,
-			               gfp_t gfp)
+static int nvme_init_iod(struct request *rq, struct nvme_dev *dev)
 {
-	unsigned size = !(rq->cmd_flags & REQ_DISCARD) ? blk_rq_bytes(rq) :
-                                                sizeof(struct nvme_dsm_range);
-	struct nvme_iod *iod;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(rq);
+	int nseg = rq->nr_phys_segments;
+	unsigned size;
 
-	if (rq->nr_phys_segments <= NVME_INT_PAGES &&
-	    size <= NVME_INT_BYTES(dev)) {
-		struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(rq);
+	if (rq->cmd_flags & REQ_DISCARD)
+		size = sizeof(struct nvme_dsm_range);
+	else
+		size = blk_rq_bytes(rq);
 
-		iod = &cmd->__iod;
-		iod_init(iod, size, rq->nr_phys_segments,
-				(unsigned long) rq | NVME_INT_MASK);
-		return iod;
+	if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
+		iod->sg = kmalloc(nvme_iod_alloc_size(dev, size, nseg), GFP_ATOMIC);
+		if (!iod->sg)
+			return BLK_MQ_RQ_QUEUE_BUSY;
+	} else {
+		iod->sg = iod->inline_sg;
 	}
 
-	return __nvme_alloc_iod(rq->nr_phys_segments, size, dev,
-				(unsigned long) rq, gfp);
+	iod->aborted = 0;
+	iod->npages = -1;
+	iod->nents = 0;
+	iod->length = size;
+	return 0;
 }
 
-static void nvme_free_iod(struct nvme_dev *dev, struct nvme_iod *iod)
+static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
 {
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	const int last_prp = dev->page_size / 8 - 1;
 	int i;
-	__le64 **list = iod_list(iod);
+	__le64 **list = iod_list(req);
 	dma_addr_t prp_dma = iod->first_dma;
 
 	if (iod->npages == 0)
@@ -410,8 +375,8 @@ static void nvme_free_iod(struct nvme_dev *dev, struct nvme_iod *iod)
 		prp_dma = next_prp_dma;
 	}
 
-	if (iod_should_kfree(iod))
-		kfree(iod);
+	if (iod->sg != iod->inline_sg)
+		kfree(iod->sg);
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
@@ -481,9 +446,10 @@ static void nvme_dif_complete(u32 p, u32 v, struct t10_pi_tuple *pi)
 }
 #endif
 
-static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
+static bool nvme_setup_prps(struct nvme_dev *dev, struct request *req,
 		int total_len)
 {
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct dma_pool *pool;
 	int length = total_len;
 	struct scatterlist *sg = iod->sg;
@@ -492,7 +458,7 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 	u32 page_size = dev->page_size;
 	int offset = dma_addr & (page_size - 1);
 	__le64 *prp_list;
-	__le64 **list = iod_list(iod);
+	__le64 **list = iod_list(req);
 	dma_addr_t prp_dma;
 	int nprps, i;
 
@@ -560,10 +526,10 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 	return true;
 }
 
-static int nvme_map_data(struct nvme_dev *dev, struct nvme_iod *iod,
+static int nvme_map_data(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
-	struct request *req = iod_get_private(iod);
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct request_queue *q = req->q;
 	enum dma_data_direction dma_dir = rq_data_dir(req) ?
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
@@ -578,7 +544,7 @@ static int nvme_map_data(struct nvme_dev *dev, struct nvme_iod *iod,
 	if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
 		goto out;
 
-	if (!nvme_setup_prps(dev, iod, blk_rq_bytes(req)))
+	if (!nvme_setup_prps(dev, req, blk_rq_bytes(req)))
 		goto out_unmap;
 
 	ret = BLK_MQ_RQ_QUEUE_ERROR;
@@ -609,9 +575,9 @@ out:
 	return ret;
 }
 
-static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod)
+static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
-	struct request *req = iod_get_private(iod);
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	enum dma_data_direction dma_dir = rq_data_dir(req) ?
 			DMA_TO_DEVICE : DMA_FROM_DEVICE;
 
@@ -624,7 +590,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod)
 		}
 	}
 
-	nvme_free_iod(dev, iod);
+	nvme_free_iod(dev, req);
 }
 
 /*
@@ -633,16 +599,16 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_iod *iod)
  * the iod.
  */
 static int nvme_setup_discard(struct nvme_queue *nvmeq, struct nvme_ns *ns,
-		struct nvme_iod *iod, struct nvme_command *cmnd)
+		struct request *req, struct nvme_command *cmnd)
 {
-	struct request *req = iod_get_private(iod);
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_dsm_range *range;
 
 	range = dma_pool_alloc(nvmeq->dev->prp_small_pool, GFP_ATOMIC,
 						&iod->first_dma);
 	if (!range)
 		return BLK_MQ_RQ_QUEUE_BUSY;
-	iod_list(iod)[0] = (__le64 *)range;
+	iod_list(req)[0] = (__le64 *)range;
 	iod->npages = 0;
 
 	range->cattr = cpu_to_le32(0);
@@ -668,8 +634,6 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_queue *nvmeq = hctx->driver_data;
 	struct nvme_dev *dev = nvmeq->dev;
 	struct request *req = bd->rq;
-	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
-	struct nvme_iod *iod;
 	struct nvme_command cmnd;
 	int ret = BLK_MQ_RQ_QUEUE_OK;
 
@@ -686,12 +650,12 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 		}
 	}
 
-	iod = nvme_alloc_iod(req, dev, GFP_ATOMIC);
-	if (!iod)
-		return BLK_MQ_RQ_QUEUE_BUSY;
+	ret = nvme_init_iod(req, dev);
+	if (ret)
+		return ret;
 
 	if (req->cmd_flags & REQ_DISCARD) {
-		ret = nvme_setup_discard(nvmeq, ns, iod, &cmnd);
+		ret = nvme_setup_discard(nvmeq, ns, req, &cmnd);
 	} else {
 		if (req->cmd_type == REQ_TYPE_DRV_PRIV)
 			memcpy(&cmnd, req->cmd, sizeof(cmnd));
@@ -701,14 +665,12 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 			nvme_setup_rw(ns, req, &cmnd);
 
 		if (req->nr_phys_segments)
-			ret = nvme_map_data(dev, iod, &cmnd);
+			ret = nvme_map_data(dev, req, &cmnd);
 	}
 
 	if (ret)
 		goto out;
 
-	cmd->iod = iod;
-	cmd->aborted = 0;
 	cmnd.common.command_id = req->tag;
 	blk_mq_start_request(req);
 
@@ -718,17 +680,17 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	spin_unlock_irq(&nvmeq->q_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 out:
-	nvme_free_iod(dev, iod);
+	nvme_free_iod(dev, req);
 	return ret;
 }
 
 static void nvme_complete_rq(struct request *req)
 {
-	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
-	struct nvme_dev *dev = cmd->nvmeq->dev;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_dev *dev = iod->nvmeq->dev;
 	int error = 0;
 
-	nvme_unmap_data(dev, cmd->iod);
+	nvme_unmap_data(dev, req);
 
 	if (unlikely(req->errors)) {
 		/*
@@ -750,7 +712,7 @@ static void nvme_complete_rq(struct request *req)
 		}
 	}
 
-	if (cmd->aborted) {
+	if (iod->aborted) {
 		dev_warn(dev->dev,
 			"completing aborted command with status:%04x\n",
 			req->errors);
@@ -929,8 +891,8 @@ static int adapter_delete_sq(struct nvme_dev *dev, u16 sqid)
 
 static void abort_endio(struct request *req, int error)
 {
-	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
-	struct nvme_queue *nvmeq = cmd->nvmeq;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_queue *nvmeq = iod->nvmeq;
 	u32 result = (u32)(uintptr_t)req->special;
 	u16 status = req->errors;
 
@@ -942,8 +904,8 @@ static void abort_endio(struct request *req, int error)
 
 static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 {
-	struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
-	struct nvme_queue *nvmeq = cmd_rq->nvmeq;
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_queue *nvmeq = iod->nvmeq;
 	struct nvme_dev *dev = nvmeq->dev;
 	struct request *abort_req;
 	struct nvme_command cmd;
@@ -953,7 +915,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 * before and still hasn't been returned to the driver, or if this is
 	 * the admin queue.
 	 */
-	if (!nvmeq->qid || cmd_rq->aborted) {
+	if (!nvmeq->qid || iod->aborted) {
 		if (queue_work(nvme_workq, &dev->reset_work)) {
 			dev_warn(dev->dev,
 				 "I/O %d QID %d timeout, reset controller\n",
@@ -981,7 +943,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		return BLK_EH_RESET_TIMER;
 	}
 
-	cmd_rq->aborted = 1;
+	iod->aborted = 1;
 
 	abort_req->timeout = ADMIN_TIMEOUT;
 	abort_req->end_io_data = NULL;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/12] block: remove REQ_NO_TIMEOUT flag
  2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2015-11-07  8:45 ` [PATCH 11/12] nvme: merge iod and cmd_info Christoph Hellwig
@ 2015-11-07  8:45 ` Christoph Hellwig
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-07  8:45 UTC (permalink / raw)


This was added for the 'magic' AEN requests in the NVMe driver that never
return.  We now handle them purely inside the driver and don't need this
core hack any more.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c            | 2 --
 block/blk-timeout.c       | 3 ---
 include/linux/blk_types.h | 2 --
 3 files changed, 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3a919ff..09216835 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -606,8 +606,6 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 			blk_mq_complete_request(rq, -EIO);
 		return;
 	}
-	if (rq->cmd_flags & REQ_NO_TIMEOUT)
-		return;
 
 	if (time_after_eq(jiffies, rq->deadline)) {
 		if (!blk_mark_rq_complete(rq))
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index aedd128..f3e3c61 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -193,9 +193,6 @@ void blk_add_timer(struct request *req)
 	struct request_queue *q = req->q;
 	unsigned long expiry;
 
-	if (req->cmd_flags & REQ_NO_TIMEOUT)
-		return;
-
 	/* blk-mq has its own handler, so we don't need ->rq_timed_out_fn */
 	if (!q->mq_ops && !q->rq_timed_out_fn)
 		return;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e813013..47c9d88 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -188,7 +188,6 @@ enum rq_flag_bits {
 	__REQ_PM,		/* runtime pm request */
 	__REQ_HASHED,		/* on IO scheduler merge hash */
 	__REQ_MQ_INFLIGHT,	/* track inflight for MQ */
-	__REQ_NO_TIMEOUT,	/* requests may never expire */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -242,6 +241,5 @@ enum rq_flag_bits {
 #define REQ_PM			(1ULL << __REQ_PM)
 #define REQ_HASHED		(1ULL << __REQ_HASHED)
 #define REQ_MQ_INFLIGHT		(1ULL << __REQ_MQ_INFLIGHT)
-#define REQ_NO_TIMEOUT		(1ULL << __REQ_NO_TIMEOUT)
 
 #endif /* __LINUX_BLK_TYPES_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] nvme: factor out a nvme_unmap_data helper
  2015-11-07  8:44 ` [PATCH 03/12] nvme: factor out a nvme_unmap_data helper Christoph Hellwig
@ 2015-11-09 18:47   ` Keith Busch
  2015-11-09 18:56     ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-09 18:47 UTC (permalink / raw)


On Sat, Nov 07, 2015@09:44:57AM +0100, Christoph Hellwig wrote:
> This is the counter part to nvme_map_data.  The forward declaration will
> go away later in the series.

Generally, the whole series looks great, but we must be missing one or
more patches here. I don't see the "nvme_map_data" part, which I suspect
is why this patch and others fail to apply using linux-block/for-next.
Once I clear up which patches to apply to which tree, I'd like to run
this through a few test systems to make sure there won't be any surprises.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 03/12] nvme: factor out a nvme_unmap_data helper
  2015-11-09 18:47   ` Keith Busch
@ 2015-11-09 18:56     ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-09 18:56 UTC (permalink / raw)


On Mon, Nov 09, 2015@06:47:23PM +0000, Keith Busch wrote:
> On Sat, Nov 07, 2015@09:44:57AM +0100, Christoph Hellwig wrote:
> > This is the counter part to nvme_map_data.  The forward declaration will
> > go away later in the series.
> 
> Generally, the whole series looks great, but we must be missing one or
> more patches here. I don't see the "nvme_map_data" part, which I suspect
> is why this patch and others fail to apply using linux-block/for-next.
> Once I clear up which patches to apply to which tree, I'd like to run
> this through a few test systems to make sure there won't be any surprises.

I used your linux-nvme.git tree as base. I had to revert the Identify
nslist changes for now as they break qemu testing, but that shouldn't have
affected these patches.  The full git tree is here:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-req.7

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-07  8:45 ` [PATCH 09/12] nvme: properly free resources for cancelled command Christoph Hellwig
@ 2015-11-09 18:57   ` Keith Busch
  2015-11-09 19:25     ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-09 18:57 UTC (permalink / raw)


On Sat, Nov 07, 2015@09:45:03AM +0100, Christoph Hellwig wrote:
> +	if (unlikely(req->errors)) {
> +		/*
> +		 * Some silly Intel userspace code breaks if it doesn't get a
> +		 * negative errno back for driver returns values.
> +		 */

Whoa now, it's neither Intel nor userpace that needs this. It's to know
if the controller is unresponsive or returned an error. The difference
matters to the driver for initialization.

> +		if (req->errors < 0) {
> +			error = req->errors;
> +		} else {
> +			if (nvme_req_needs_retry(req, req->errors)) {
> +				nvme_requeue_req(req);
> +				return;
> +			}
> +
> +			if (req->cmd_type == REQ_TYPE_DRV_PRIV)
> +				error = req->errors;
> +			else
> +				error = nvme_error_status(req->errors);
> +		}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-09 18:57   ` Keith Busch
@ 2015-11-09 19:25     ` Christoph Hellwig
  2015-11-09 20:12       ` Keith Busch
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-09 19:25 UTC (permalink / raw)


On Mon, Nov 09, 2015@06:57:31PM +0000, Keith Busch wrote:
> On Sat, Nov 07, 2015@09:45:03AM +0100, Christoph Hellwig wrote:
> > +	if (unlikely(req->errors)) {
> > +		/*
> > +		 * Some silly Intel userspace code breaks if it doesn't get a
> > +		 * negative errno back for driver returns values.
> > +		 */
> 
> Whoa now, it's neither Intel nor userpace that needs this. It's to know
> if the controller is unresponsive or returned an error. The difference
> matters to the driver for initialization.

Haha, so we at least can root cause this now.  Can you point me
to the caller that cares?  I'd really like to get rid of the special
case of passing a negative errno here, so I'd like to figure out how
we could pass this information on instead.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-09 19:25     ` Christoph Hellwig
@ 2015-11-09 20:12       ` Keith Busch
  2015-11-10  8:13         ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-09 20:12 UTC (permalink / raw)


On Mon, Nov 09, 2015@08:25:19PM +0100, Christoph Hellwig wrote:
> On Mon, Nov 09, 2015@06:57:31PM +0000, Keith Busch wrote:
> > Whoa now, it's neither Intel nor userpace that needs this. It's to know
> > if the controller is unresponsive or returned an error. The difference
> > matters to the driver for initialization.
> 
> Haha, so we at least can root cause this now.  Can you point me
> to the caller that cares?  I'd really like to get rid of the special
> case of passing a negative errno here, so I'd like to figure out how
> we could pass this information on instead.

The "set_queue_count()" was the function that cared, but looks like
patch 8/12 makes this think a timeout is an aborted status.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 06/12] nvme: switch abort to blk_execute_rq_nowait
  2015-11-07  8:45 ` [PATCH 06/12] nvme: switch abort " Christoph Hellwig
@ 2015-11-09 21:33   ` Keith Busch
  2015-11-09 21:46     ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-09 21:33 UTC (permalink / raw)


On Sat, Nov 07, 2015@09:45:00AM +0100, Christoph Hellwig wrote:
> And remove the new unused nvme_submit_cmd helper.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c |  8 +++---
>  drivers/nvme/host/nvme.h |  2 +-
>  drivers/nvme/host/pci.c  | 66 ++++++++++++++++++++----------------------------
>  3 files changed, 32 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 37f7d69..3600a0c 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -86,12 +86,12 @@ void nvme_requeue_req(struct request *req)
>  }
>  
>  struct request *nvme_alloc_request(struct request_queue *q,
> -		struct nvme_command *cmd)
> +		struct nvme_command *cmd, bool nowait)
>  {
>  	bool write = cmd->common.opcode & 1;
>  	struct request *req;
>  
> -	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
> +	req = blk_mq_alloc_request(q, write, GFP_KERNEL, nowait);

The "nowait" flag tells blk_mq_alloc_request to allocate out of the
reserved tags, but we've reserved only one of these for AEN requests.
Your next patch changes the AEN notification a bit, but this will still
create a command id conflict for the controller.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 07/12] nvme: special case AEN requests
  2015-11-07  8:45 ` [PATCH 07/12] nvme: special case AEN requests Christoph Hellwig
@ 2015-11-09 21:43   ` Keith Busch
  2015-11-09 21:48     ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-09 21:43 UTC (permalink / raw)


On Sat, Nov 07, 2015@09:45:01AM +0100, Christoph Hellwig wrote:
> +		if (unlikely(nvmeq->qid == 0 &&
> +				cqe.command_id < NVME_NR_AEN_COMMANDS)) {
> +			nvme_finish_aen_cmd(nvmeq->dev, &cqe);
> +			continue;

As long as we're guaranteed the "reserved" tags are the lowest possible
tag values, this'll work. But that seems more coincidence than by design.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 06/12] nvme: switch abort to blk_execute_rq_nowait
  2015-11-09 21:33   ` Keith Busch
@ 2015-11-09 21:46     ` Jens Axboe
  2015-11-10  5:56       ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2015-11-09 21:46 UTC (permalink / raw)


On 11/09/2015 02:33 PM, Keith Busch wrote:
> On Sat, Nov 07, 2015@09:45:00AM +0100, Christoph Hellwig wrote:
>> And remove the new unused nvme_submit_cmd helper.
>>
>> Signed-off-by: Christoph Hellwig <hch at lst.de>
>> ---
>>   drivers/nvme/host/core.c |  8 +++---
>>   drivers/nvme/host/nvme.h |  2 +-
>>   drivers/nvme/host/pci.c  | 66 ++++++++++++++++++++----------------------------
>>   3 files changed, 32 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 37f7d69..3600a0c 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -86,12 +86,12 @@ void nvme_requeue_req(struct request *req)
>>   }
>>
>>   struct request *nvme_alloc_request(struct request_queue *q,
>> -		struct nvme_command *cmd)
>> +		struct nvme_command *cmd, bool nowait)
>>   {
>>   	bool write = cmd->common.opcode & 1;
>>   	struct request *req;
>>
>> -	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
>> +	req = blk_mq_alloc_request(q, write, GFP_KERNEL, nowait);
>
> The "nowait" flag tells blk_mq_alloc_request to allocate out of the
> reserved tags, but we've reserved only one of these for AEN requests.
> Your next patch changes the AEN notification a bit, but this will still
> create a command id conflict for the controller.

Looks like a mixup of __GFP_WAIT and 'reserved'. Looks like we need to 
pass in both 'nowait' and 'reserved' to nvme_alloc_request(), so it can 
do the right thing.

Though both call sites should tolerate sleeping, since we changed the 
timeout handler to be in process context.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 07/12] nvme: special case AEN requests
  2015-11-09 21:43   ` Keith Busch
@ 2015-11-09 21:48     ` Jens Axboe
  2015-11-10  5:57       ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2015-11-09 21:48 UTC (permalink / raw)


On 11/09/2015 02:43 PM, Keith Busch wrote:
> On Sat, Nov 07, 2015@09:45:01AM +0100, Christoph Hellwig wrote:
>> +		if (unlikely(nvmeq->qid == 0 &&
>> +				cqe.command_id < NVME_NR_AEN_COMMANDS)) {
>> +			nvme_finish_aen_cmd(nvmeq->dev, &cqe);
>> +			continue;
>
> As long as we're guaranteed the "reserved" tags are the lowest possible
> tag values, this'll work. But that seems more coincidence than by design.

That fact is just implementation detail, I don't think we should make 
assumptions about the range of reserved vs normal tags. That's just 
asking for trouble if this changes at some point.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 06/12] nvme: switch abort to blk_execute_rq_nowait
  2015-11-09 21:46     ` Jens Axboe
@ 2015-11-10  5:56       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-10  5:56 UTC (permalink / raw)


On Mon, Nov 09, 2015@02:46:58PM -0700, Jens Axboe wrote:
> Looks like a mixup of __GFP_WAIT and 'reserved'. Looks like we need to pass 
> in both 'nowait' and 'reserved' to nvme_alloc_request(), so it can do the 
> right thing.
>
> Though both call sites should tolerate sleeping, since we changed the 
> timeout handler to be in process context.

My earlier series actually had a patch to pass a flags argument to
blk_mq_alloc_request instead of the gfp_mask and the nowait bool,
I'll resurrect it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 07/12] nvme: special case AEN requests
  2015-11-09 21:48     ` Jens Axboe
@ 2015-11-10  5:57       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-10  5:57 UTC (permalink / raw)


On Mon, Nov 09, 2015@02:48:31PM -0700, Jens Axboe wrote:
> That fact is just implementation detail, I don't think we should make 
> assumptions about the range of reserved vs normal tags. That's just asking 
> for trouble if this changes at some point.

I'm pretty sure I did see another driver (hpsa patchset?) do this as well.
I can add a is_reserved_tag helper which still uses that knowledege, but
keeps it inside the block layer.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-09 20:12       ` Keith Busch
@ 2015-11-10  8:13         ` Christoph Hellwig
  2015-11-10 16:03           ` Keith Busch
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-10  8:13 UTC (permalink / raw)


On Mon, Nov 09, 2015@08:12:33PM +0000, Keith Busch wrote:
> > Haha, so we at least can root cause this now.  Can you point me
> > to the caller that cares?  I'd really like to get rid of the special
> > case of passing a negative errno here, so I'd like to figure out how
> > we could pass this information on instead.
> 
> The "set_queue_count()" was the function that cared, but looks like
> patch 8/12 makes this think a timeout is an aborted status.

I'm still a bit confused on the semantics you want/need.  If the 
Set Features for set_queue_count times out we'll call the reset handler,
which because we are inside the probe handler will remove the device.
How do we care about the return value in that case?

Can you write down a few sentences on why/how we care?  I'll volunteer
to put them into the driver in comment form once we have all this sorted
out so that anyone touching the driver in the future won't be as confused.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-10  8:13         ` Christoph Hellwig
@ 2015-11-10 16:03           ` Keith Busch
  2015-11-10 20:28             ` Keith Busch
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-10 16:03 UTC (permalink / raw)


On Tue, Nov 10, 2015@09:13:57AM +0100, Christoph Hellwig wrote:
> Set Features for set_queue_count times out we'll call the reset handler,
> which because we are inside the probe handler will remove the device.
> How do we care about the return value in that case?
> 
> Can you write down a few sentences on why/how we care?  I'll volunteer
> to put them into the driver in comment form once we have all this sorted
> out so that anyone touching the driver in the future won't be as confused.

Perhaps I am thinking how probing serially worked before, and don't
understand how this works anymore. :)

You're right, we don't really care anymore if the reset handler unwinds
it. This path is then safe to see a fake error code.

But the reset handler is the same "work" as probe now, so it won't get
scheduled. Now I completely understand why we changed nvme_timeout()
to end the request with -EIO instead of waiting for the reset work to
cancel it. That's still unsafe since it frees the command for reuse
while the ID is still technically owned by the controller.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-10 16:03           ` Keith Busch
@ 2015-11-10 20:28             ` Keith Busch
  2015-11-16 10:05               ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Keith Busch @ 2015-11-10 20:28 UTC (permalink / raw)


On Tue, Nov 10, 2015@04:03:45PM +0000, Keith Busch wrote:
> On Tue, Nov 10, 2015@09:13:57AM +0100, Christoph Hellwig wrote:
> > Set Features for set_queue_count times out we'll call the reset handler,
> > which because we are inside the probe handler will remove the device.
> > How do we care about the return value in that case?
> > 
> > Can you write down a few sentences on why/how we care?  I'll volunteer
> > to put them into the driver in comment form once we have all this sorted
> > out so that anyone touching the driver in the future won't be as confused.
> 
> Perhaps I am thinking how probing serially worked before, and don't
> understand how this works anymore. :)
> 
> You're right, we don't really care anymore if the reset handler unwinds
> it. This path is then safe to see a fake error code.

Actually this still needs to be a negative error so nvme_reset_work
doesn't clear "NVME_CTRL_RESETTING" bit. Without that, the driver gets
in an infinite reset loop.

 
> But the reset handler is the same "work" as probe now, so it won't get
> scheduled. Now I completely understand why we changed nvme_timeout()
> to end the request with -EIO instead of waiting for the reset work to
> cancel it. That's still unsafe since it frees the command for reuse
> while the ID is still technically owned by the controller.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/12] nvme: properly free resources for cancelled command
  2015-11-10 20:28             ` Keith Busch
@ 2015-11-16 10:05               ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2015-11-16 10:05 UTC (permalink / raw)


On Tue, Nov 10, 2015@08:28:11PM +0000, Keith Busch wrote:
> > Perhaps I am thinking how probing serially worked before, and don't
> > understand how this works anymore. :)
> > 
> > You're right, we don't really care anymore if the reset handler unwinds
> > it. This path is then safe to see a fake error code.
> 
> Actually this still needs to be a negative error so nvme_reset_work
> doesn't clear "NVME_CTRL_RESETTING" bit. Without that, the driver gets
> in an infinite reset loop.

I don't think this is going to help as we'll never enter the reset loop
now that we have a single reset_work item, oops.

I guess we'll just need to set the aborted status from nvme_timeout
if we are in the reset handler already so that it can handle the
errors directly instead of waiting for another reset_work do be queued
up. I'll try to come up with a version of that.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-11-16 10:05 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-07  8:44 nvme completion path optimizations and fixes V3 Christoph Hellwig
2015-11-07  8:44 ` [PATCH 01/12] block: fix blk_abort_request for blk-mq drivers Christoph Hellwig
2015-11-07  8:44 ` [PATCH 02/12] block: defer timeouts to a workqueue Christoph Hellwig
2015-11-07  8:44 ` [PATCH 03/12] nvme: factor out a nvme_unmap_data helper Christoph Hellwig
2015-11-09 18:47   ` Keith Busch
2015-11-09 18:56     ` Christoph Hellwig
2015-11-07  8:44 ` [PATCH 04/12] nvme: factor out a few helpers from req_completion Christoph Hellwig
2015-11-07  8:44 ` [PATCH 05/12] nvme: switch delete SQ/CQ to blk_execute_rq_nowait Christoph Hellwig
2015-11-07  8:45 ` [PATCH 06/12] nvme: switch abort " Christoph Hellwig
2015-11-09 21:33   ` Keith Busch
2015-11-09 21:46     ` Jens Axboe
2015-11-10  5:56       ` Christoph Hellwig
2015-11-07  8:45 ` [PATCH 07/12] nvme: special case AEN requests Christoph Hellwig
2015-11-09 21:43   ` Keith Busch
2015-11-09 21:48     ` Jens Axboe
2015-11-10  5:57       ` Christoph Hellwig
2015-11-07  8:45 ` [PATCH 08/12] nvme: simplify completion handling Christoph Hellwig
2015-11-07  8:45 ` [PATCH 09/12] nvme: properly free resources for cancelled command Christoph Hellwig
2015-11-09 18:57   ` Keith Busch
2015-11-09 19:25     ` Christoph Hellwig
2015-11-09 20:12       ` Keith Busch
2015-11-10  8:13         ` Christoph Hellwig
2015-11-10 16:03           ` Keith Busch
2015-11-10 20:28             ` Keith Busch
2015-11-16 10:05               ` Christoph Hellwig
2015-11-07  8:45 ` [PATCH 10/12] nvme: meta_sg doesn't have to be an array Christoph Hellwig
2015-11-07  8:45 ` [PATCH 11/12] nvme: merge iod and cmd_info Christoph Hellwig
2015-11-07  8:45 ` [PATCH 12/12] block: remove REQ_NO_TIMEOUT flag Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.