All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-05 13:58 ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

Hi,

The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout()
for NVMe, meantime fixes blk_sync_queue().

The 2nd patch covers timeout for admin commands for recovering controller
for avoiding possible deadlock.

The 3rd and 4th patches avoid to wait_freeze on queues which aren't frozen.

The last 4 patches fixes several races wrt. NVMe timeout handler, and
finally can make blktests block/011 passed. Meantime the NVMe PCI timeout
mecanism become much more rebost than before.

gitweb:
	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4

V4:
	- fixe nvme_init_set_host_mem_cmd()
	- use nested EH model, and run both nvme_dev_disable() and
	resetting in one same context

V3:
	- fix one new race related freezing in patch 4, nvme_reset_work()
	may hang forever without this patch
	- rewrite the last 3 patches, and avoid to break nvme_reset_ctrl*()

V2:
	- fix draining timeout work, so no need to change return value from
	.timeout()
	- fix race between nvme_start_freeze() and nvme_unfreeze()
	- cover timeout for admin commands running in EH

Ming Lei (7):
  block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
  nvme: pci: cover timeout for admin commands running in EH
  nvme: pci: only wait freezing if queue is frozen
  nvme: pci: freeze queue in nvme_dev_disable() in case of error
    recovery
  nvme: core: introduce 'reset_lock' for sync reset state and reset
    activities
  nvme: pci: prepare for supporting error recovery from resetting
    context
  nvme: pci: support nested EH

 block/blk-core.c         |  21 +++-
 block/blk-mq.c           |   9 ++
 block/blk-timeout.c      |   5 +-
 drivers/nvme/host/core.c |  46 ++++++-
 drivers/nvme/host/nvme.h |   5 +
 drivers/nvme/host/pci.c  | 304 ++++++++++++++++++++++++++++++++++++++++-------
 include/linux/blkdev.h   |  13 ++
 7 files changed, 356 insertions(+), 47 deletions(-)

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
-- 
2.9.5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-05 13:58 ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:58 UTC (permalink / raw)


Hi,

The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout()
for NVMe, meantime fixes blk_sync_queue().

The 2nd patch covers timeout for admin commands for recovering controller
for avoiding possible deadlock.

The 3rd and 4th patches avoid to wait_freeze on queues which aren't frozen.

The last 4 patches fixes several races wrt. NVMe timeout handler, and
finally can make blktests block/011 passed. Meantime the NVMe PCI timeout
mecanism become much more rebost than before.

gitweb:
	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4

V4:
	- fixe nvme_init_set_host_mem_cmd()
	- use nested EH model, and run both nvme_dev_disable() and
	resetting in one same context

V3:
	- fix one new race related freezing in patch 4, nvme_reset_work()
	may hang forever without this patch
	- rewrite the last 3 patches, and avoid to break nvme_reset_ctrl*()

V2:
	- fix draining timeout work, so no need to change return value from
	.timeout()
	- fix race between nvme_start_freeze() and nvme_unfreeze()
	- cover timeout for admin commands running in EH

Ming Lei (7):
  block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
  nvme: pci: cover timeout for admin commands running in EH
  nvme: pci: only wait freezing if queue is frozen
  nvme: pci: freeze queue in nvme_dev_disable() in case of error
    recovery
  nvme: core: introduce 'reset_lock' for sync reset state and reset
    activities
  nvme: pci: prepare for supporting error recovery from resetting
    context
  nvme: pci: support nested EH

 block/blk-core.c         |  21 +++-
 block/blk-mq.c           |   9 ++
 block/blk-timeout.c      |   5 +-
 drivers/nvme/host/core.c |  46 ++++++-
 drivers/nvme/host/nvme.h |   5 +
 drivers/nvme/host/pci.c  | 304 ++++++++++++++++++++++++++++++++++++++++-------
 include/linux/blkdev.h   |  13 ++
 7 files changed, 356 insertions(+), 47 deletions(-)

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
-- 
2.9.5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:58   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Bart Van Assche,
	Jianchao Wang, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Laurence Oberman

Turns out the current way can't drain timout completely because mod_timer()
can be triggered in the work func, which can be just run inside the synced
timeout work:

        del_timer_sync(&q->timeout);
        cancel_work_sync(&q->timeout_work);

This patch introduces one flag of 'timeout_off' for fixing this issue, turns
out this simple way does work.

Also blk_quiesce_timeout() and blk_unquiesce_timeout() are introduced for
draining timeout, which is needed by NVMe.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 21 +++++++++++++++++++--
 block/blk-mq.c         |  9 +++++++++
 block/blk-timeout.c    |  5 ++++-
 include/linux/blkdev.h | 13 +++++++++++++
 4 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 85909b431eb0..c277f1023703 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -392,6 +392,22 @@ void blk_stop_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_stop_queue);
 
+void blk_unquiesce_timeout(struct request_queue *q)
+{
+	blk_mark_timeout_quiesce(q, false);
+	mod_timer(&q->timeout, jiffies + q->rq_timeout);
+}
+EXPORT_SYMBOL(blk_unquiesce_timeout);
+
+void blk_quiesce_timeout(struct request_queue *q)
+{
+	blk_mark_timeout_quiesce(q, true);
+
+	del_timer_sync(&q->timeout);
+	cancel_work_sync(&q->timeout_work);
+}
+EXPORT_SYMBOL(blk_quiesce_timeout);
+
 /**
  * blk_sync_queue - cancel any pending callbacks on a queue
  * @q: the queue
@@ -412,8 +428,7 @@ EXPORT_SYMBOL(blk_stop_queue);
  */
 void blk_sync_queue(struct request_queue *q)
 {
-	del_timer_sync(&q->timeout);
-	cancel_work_sync(&q->timeout_work);
+	blk_quiesce_timeout(q);
 
 	if (q->mq_ops) {
 		struct blk_mq_hw_ctx *hctx;
@@ -425,6 +440,8 @@ void blk_sync_queue(struct request_queue *q)
 	} else {
 		cancel_delayed_work_sync(&q->delay_work);
 	}
+
+	blk_mark_timeout_quiesce(q, false);
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c3621453ad87..d0a5dc29c8ef 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -901,6 +901,15 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	};
 	struct blk_mq_hw_ctx *hctx;
 	int i;
+	bool timeout_off;
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	timeout_off = q->timeout_off;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (timeout_off)
+		return;
 
 	/* A deadlock might occur if a request is stuck requiring a
 	 * timeout at the same time a queue freeze is waiting
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 652d4d4d3e97..ffd0b609091e 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -136,12 +136,15 @@ void blk_timeout_work(struct work_struct *work)
 
 	spin_lock_irqsave(q->queue_lock, flags);
 
+	if (q->timeout_off)
+		goto exit;
+
 	list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
 		blk_rq_check_expired(rq, &next, &next_set);
 
 	if (next_set)
 		mod_timer(&q->timeout, round_jiffies_up(next));
-
+exit:
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5c4eee043191..a2cc4aaecf50 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -584,6 +584,7 @@ struct request_queue {
 	struct timer_list	timeout;
 	struct work_struct	timeout_work;
 	struct list_head	timeout_list;
+	bool			timeout_off;
 
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
@@ -1017,6 +1018,18 @@ extern void blk_execute_rq(struct request_queue *, struct gendisk *,
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 				  struct request *, int, rq_end_io_fn *);
 
+static inline void blk_mark_timeout_quiesce(struct request_queue *q, bool quiesce)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	q->timeout_off = quiesce;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+extern void blk_quiesce_timeout(struct request_queue *q);
+extern void blk_unquiesce_timeout(struct request_queue *q);
+
 int blk_status_to_errno(blk_status_t status);
 blk_status_t errno_to_blk_status(int errno);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
@ 2018-05-05 13:58   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:58 UTC (permalink / raw)


Turns out the current way can't drain timout completely because mod_timer()
can be triggered in the work func, which can be just run inside the synced
timeout work:

        del_timer_sync(&q->timeout);
        cancel_work_sync(&q->timeout_work);

This patch introduces one flag of 'timeout_off' for fixing this issue, turns
out this simple way does work.

Also blk_quiesce_timeout() and blk_unquiesce_timeout() are introduced for
draining timeout, which is needed by NVMe.

Cc: Bart Van Assche <bart.vanassche at wdc.com>
Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 block/blk-core.c       | 21 +++++++++++++++++++--
 block/blk-mq.c         |  9 +++++++++
 block/blk-timeout.c    |  5 ++++-
 include/linux/blkdev.h | 13 +++++++++++++
 4 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 85909b431eb0..c277f1023703 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -392,6 +392,22 @@ void blk_stop_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_stop_queue);
 
+void blk_unquiesce_timeout(struct request_queue *q)
+{
+	blk_mark_timeout_quiesce(q, false);
+	mod_timer(&q->timeout, jiffies + q->rq_timeout);
+}
+EXPORT_SYMBOL(blk_unquiesce_timeout);
+
+void blk_quiesce_timeout(struct request_queue *q)
+{
+	blk_mark_timeout_quiesce(q, true);
+
+	del_timer_sync(&q->timeout);
+	cancel_work_sync(&q->timeout_work);
+}
+EXPORT_SYMBOL(blk_quiesce_timeout);
+
 /**
  * blk_sync_queue - cancel any pending callbacks on a queue
  * @q: the queue
@@ -412,8 +428,7 @@ EXPORT_SYMBOL(blk_stop_queue);
  */
 void blk_sync_queue(struct request_queue *q)
 {
-	del_timer_sync(&q->timeout);
-	cancel_work_sync(&q->timeout_work);
+	blk_quiesce_timeout(q);
 
 	if (q->mq_ops) {
 		struct blk_mq_hw_ctx *hctx;
@@ -425,6 +440,8 @@ void blk_sync_queue(struct request_queue *q)
 	} else {
 		cancel_delayed_work_sync(&q->delay_work);
 	}
+
+	blk_mark_timeout_quiesce(q, false);
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c3621453ad87..d0a5dc29c8ef 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -901,6 +901,15 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	};
 	struct blk_mq_hw_ctx *hctx;
 	int i;
+	bool timeout_off;
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	timeout_off = q->timeout_off;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (timeout_off)
+		return;
 
 	/* A deadlock might occur if a request is stuck requiring a
 	 * timeout at the same time a queue freeze is waiting
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 652d4d4d3e97..ffd0b609091e 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -136,12 +136,15 @@ void blk_timeout_work(struct work_struct *work)
 
 	spin_lock_irqsave(q->queue_lock, flags);
 
+	if (q->timeout_off)
+		goto exit;
+
 	list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
 		blk_rq_check_expired(rq, &next, &next_set);
 
 	if (next_set)
 		mod_timer(&q->timeout, round_jiffies_up(next));
-
+exit:
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5c4eee043191..a2cc4aaecf50 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -584,6 +584,7 @@ struct request_queue {
 	struct timer_list	timeout;
 	struct work_struct	timeout_work;
 	struct list_head	timeout_list;
+	bool			timeout_off;
 
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
@@ -1017,6 +1018,18 @@ extern void blk_execute_rq(struct request_queue *, struct gendisk *,
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 				  struct request *, int, rq_end_io_fn *);
 
+static inline void blk_mark_timeout_quiesce(struct request_queue *q, bool quiesce)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	q->timeout_off = quiesce;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+extern void blk_quiesce_timeout(struct request_queue *q);
+extern void blk_unquiesce_timeout(struct request_queue *q);
+
 int blk_status_to_errno(blk_status_t status);
 blk_status_t errno_to_blk_status(int errno);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 2/7] nvme: pci: cover timeout for admin commands running in EH
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:59   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

When admin commands are used in EH for recovering controller, we have to
cover their timeout and can't depend on block's timeout since deadlock may
be caused when these commands are timed-out by block layer again.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 81 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 70 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fbc71fac6f1e..ff09b1c760ea 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1733,21 +1733,28 @@ static inline void nvme_release_cmb(struct nvme_dev *dev)
 	}
 }
 
-static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
+static void nvme_init_set_host_mem_cmd(struct nvme_dev *dev,
+		struct nvme_command *c, u32 bits)
 {
 	u64 dma_addr = dev->host_mem_descs_dma;
+
+	memset(c, 0, sizeof(*c));
+	c->features.opcode	= nvme_admin_set_features;
+	c->features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
+	c->features.dword11	= cpu_to_le32(bits);
+	c->features.dword12	= cpu_to_le32(dev->host_mem_size >>
+					      ilog2(dev->ctrl.page_size));
+	c->features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
+	c->features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
+	c->features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
+}
+
+static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
+{
 	struct nvme_command c;
 	int ret;
 
-	memset(&c, 0, sizeof(c));
-	c.features.opcode	= nvme_admin_set_features;
-	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
-	c.features.dword11	= cpu_to_le32(bits);
-	c.features.dword12	= cpu_to_le32(dev->host_mem_size >>
-					      ilog2(dev->ctrl.page_size));
-	c.features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
-	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
-	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
+	nvme_init_set_host_mem_cmd(dev, &c, bits);
 
 	ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
 	if (ret) {
@@ -1758,6 +1765,58 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
 	return ret;
 }
 
+static void nvme_set_host_mem_end_io(struct request *rq, blk_status_t sts)
+{
+	struct completion *waiting = rq->end_io_data;
+
+	rq->end_io_data = NULL;
+
+	/*
+	 * complete last, if this is a stack request the process (and thus
+	 * the rq pointer) could be invalid right after this complete()
+	 */
+	complete(waiting);
+}
+
+/*
+ * This function can only be used inside nvme_dev_disable() when timeout
+ * may not work, then this function has to cover the timeout by itself.
+ *
+ * When wait_for_completion_io_timeout() returns 0 and timeout happens,
+ * this request will be completed after controller is shutdown.
+ */
+static int nvme_set_host_mem_timeout(struct nvme_dev *dev, u32 bits)
+{
+	DECLARE_COMPLETION_ONSTACK(wait);
+	struct nvme_command c;
+	struct request_queue *q = dev->ctrl.admin_q;
+	struct request *req;
+	int ret;
+
+	nvme_init_set_host_mem_cmd(dev, &c, bits);
+
+	req = nvme_alloc_request(q, &c, 0, NVME_QID_ANY);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->timeout = ADMIN_TIMEOUT;
+	req->end_io_data = &wait;
+
+	blk_execute_rq_nowait(q, NULL, req, false,
+			nvme_set_host_mem_end_io);
+	ret = wait_for_completion_io_timeout(&wait, ADMIN_TIMEOUT);
+	if (ret > 0) {
+		if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
+			ret = -EINTR;
+		else
+			ret = nvme_req(req)->status;
+		blk_mq_free_request(req);
+	} else
+		ret = -EINTR;
+
+	return ret;
+}
+
 static void nvme_free_host_mem(struct nvme_dev *dev)
 {
 	int i;
@@ -2216,7 +2275,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 		 * but I'd rather be safe than sorry..
 		 */
 		if (dev->host_mem_descs)
-			nvme_set_host_mem(dev, 0);
+			nvme_set_host_mem_timeout(dev, 0);
 		nvme_disable_io_queues(dev);
 		nvme_disable_admin_queue(dev, shutdown);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 2/7] nvme: pci: cover timeout for admin commands running in EH
@ 2018-05-05 13:59   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)


When admin commands are used in EH for recovering controller, we have to
cover their timeout and can't depend on block's timeout since deadlock may
be caused when these commands are timed-out by block layer again.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/pci.c | 81 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 70 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fbc71fac6f1e..ff09b1c760ea 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1733,21 +1733,28 @@ static inline void nvme_release_cmb(struct nvme_dev *dev)
 	}
 }
 
-static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
+static void nvme_init_set_host_mem_cmd(struct nvme_dev *dev,
+		struct nvme_command *c, u32 bits)
 {
 	u64 dma_addr = dev->host_mem_descs_dma;
+
+	memset(c, 0, sizeof(*c));
+	c->features.opcode	= nvme_admin_set_features;
+	c->features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
+	c->features.dword11	= cpu_to_le32(bits);
+	c->features.dword12	= cpu_to_le32(dev->host_mem_size >>
+					      ilog2(dev->ctrl.page_size));
+	c->features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
+	c->features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
+	c->features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
+}
+
+static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
+{
 	struct nvme_command c;
 	int ret;
 
-	memset(&c, 0, sizeof(c));
-	c.features.opcode	= nvme_admin_set_features;
-	c.features.fid		= cpu_to_le32(NVME_FEAT_HOST_MEM_BUF);
-	c.features.dword11	= cpu_to_le32(bits);
-	c.features.dword12	= cpu_to_le32(dev->host_mem_size >>
-					      ilog2(dev->ctrl.page_size));
-	c.features.dword13	= cpu_to_le32(lower_32_bits(dma_addr));
-	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
-	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
+	nvme_init_set_host_mem_cmd(dev, &c, bits);
 
 	ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
 	if (ret) {
@@ -1758,6 +1765,58 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
 	return ret;
 }
 
+static void nvme_set_host_mem_end_io(struct request *rq, blk_status_t sts)
+{
+	struct completion *waiting = rq->end_io_data;
+
+	rq->end_io_data = NULL;
+
+	/*
+	 * complete last, if this is a stack request the process (and thus
+	 * the rq pointer) could be invalid right after this complete()
+	 */
+	complete(waiting);
+}
+
+/*
+ * This function can only be used inside nvme_dev_disable() when timeout
+ * may not work, then this function has to cover the timeout by itself.
+ *
+ * When wait_for_completion_io_timeout() returns 0 and timeout happens,
+ * this request will be completed after controller is shutdown.
+ */
+static int nvme_set_host_mem_timeout(struct nvme_dev *dev, u32 bits)
+{
+	DECLARE_COMPLETION_ONSTACK(wait);
+	struct nvme_command c;
+	struct request_queue *q = dev->ctrl.admin_q;
+	struct request *req;
+	int ret;
+
+	nvme_init_set_host_mem_cmd(dev, &c, bits);
+
+	req = nvme_alloc_request(q, &c, 0, NVME_QID_ANY);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->timeout = ADMIN_TIMEOUT;
+	req->end_io_data = &wait;
+
+	blk_execute_rq_nowait(q, NULL, req, false,
+			nvme_set_host_mem_end_io);
+	ret = wait_for_completion_io_timeout(&wait, ADMIN_TIMEOUT);
+	if (ret > 0) {
+		if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
+			ret = -EINTR;
+		else
+			ret = nvme_req(req)->status;
+		blk_mq_free_request(req);
+	} else
+		ret = -EINTR;
+
+	return ret;
+}
+
 static void nvme_free_host_mem(struct nvme_dev *dev)
 {
 	int i;
@@ -2216,7 +2275,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 		 * but I'd rather be safe than sorry..
 		 */
 		if (dev->host_mem_descs)
-			nvme_set_host_mem(dev, 0);
+			nvme_set_host_mem_timeout(dev, 0);
 		nvme_disable_io_queues(dev);
 		nvme_disable_admin_queue(dev, shutdown);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 3/7] nvme: pci: only wait freezing if queue is frozen
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:59   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

In nvme_dev_disable() called during shutting down controler,
nvme_wait_freeze_timeout() may be done on the controller not
frozen yet, so add the check for avoiding the case.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ff09b1c760ea..57bd7bebd1e5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2244,14 +2244,17 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	int i;
 	bool dead = true;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
+	bool frozen = false;
 
 	mutex_lock(&dev->shutdown_lock);
 	if (pci_is_enabled(pdev)) {
 		u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
 		if (dev->ctrl.state == NVME_CTRL_LIVE ||
-		    dev->ctrl.state == NVME_CTRL_RESETTING)
+		    dev->ctrl.state == NVME_CTRL_RESETTING) {
 			nvme_start_freeze(&dev->ctrl);
+			frozen = true;
+		}
 		dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
 			pdev->error_state  != pci_channel_io_normal);
 	}
@@ -2261,7 +2264,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	 * doing a safe shutdown.
 	 */
 	if (!dead) {
-		if (shutdown)
+		if (shutdown && frozen)
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 3/7] nvme: pci: only wait freezing if queue is frozen
@ 2018-05-05 13:59   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)


In nvme_dev_disable() called during shutting down controler,
nvme_wait_freeze_timeout() may be done on the controller not
frozen yet, so add the check for avoiding the case.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/pci.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ff09b1c760ea..57bd7bebd1e5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2244,14 +2244,17 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	int i;
 	bool dead = true;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
+	bool frozen = false;
 
 	mutex_lock(&dev->shutdown_lock);
 	if (pci_is_enabled(pdev)) {
 		u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
 		if (dev->ctrl.state == NVME_CTRL_LIVE ||
-		    dev->ctrl.state == NVME_CTRL_RESETTING)
+		    dev->ctrl.state == NVME_CTRL_RESETTING) {
 			nvme_start_freeze(&dev->ctrl);
+			frozen = true;
+		}
 		dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
 			pdev->error_state  != pci_channel_io_normal);
 	}
@@ -2261,7 +2264,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	 * doing a safe shutdown.
 	 */
 	if (!dead) {
-		if (shutdown)
+		if (shutdown && frozen)
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 4/7] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:59   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

When nvme_dev_disable() is used for error recovery, we should always
freeze queues before shutdown controller:

- reset handler supposes queues are frozen, and will wait_freeze &
unfreeze them explicitly, if queues aren't frozen during nvme_dev_disable(),
reset handler may wait forever even though there isn't any requests
allocated.

- this way may avoid to cancel lots of requests during error recovery

This patch introduces the parameter of 'freeze_queue' for fixing this
issue.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 47 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 57bd7bebd1e5..1fafe5d01355 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -69,7 +69,8 @@ struct nvme_dev;
 struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
+		freeze_queue);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -1197,7 +1198,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 		return BLK_EH_HANDLED;
 	}
@@ -1224,7 +1225,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_HANDLED;
 	default:
@@ -1240,7 +1241,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 
 		/*
@@ -2239,19 +2240,35 @@ static void nvme_pci_disable(struct nvme_dev *dev)
 	}
 }
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+/*
+ * Resetting often follows nvme_dev_disable(), so queues need to be frozen
+ * before resetting.
+ */
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
+		freeze_queue)
 {
 	int i;
 	bool dead = true;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	bool frozen = false;
 
+	/*
+	 * 'freeze_queue' is only valid for non-shutdown, and we do
+	 * inline freeze & wait_freeze_timeout for shutdown just for
+	 * completing as many as possible requests before shutdown
+	 */
+	if (shutdown)
+		freeze_queue = false;
+
+	if (freeze_queue)
+		nvme_start_freeze(&dev->ctrl);
+
 	mutex_lock(&dev->shutdown_lock);
 	if (pci_is_enabled(pdev)) {
 		u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-		if (dev->ctrl.state == NVME_CTRL_LIVE ||
-		    dev->ctrl.state == NVME_CTRL_RESETTING) {
+		if (shutdown && (dev->ctrl.state == NVME_CTRL_LIVE ||
+		    dev->ctrl.state == NVME_CTRL_RESETTING)) {
 			nvme_start_freeze(&dev->ctrl);
 			frozen = true;
 		}
@@ -2343,7 +2360,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 	dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
 
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, false);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
 }
@@ -2364,7 +2381,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 
 	/*
 	 * Introduce CONNECTING state from nvme-fc/rdma transports to mark the
@@ -2613,7 +2630,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, true);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
@@ -2625,7 +2642,7 @@ static void nvme_reset_done(struct pci_dev *pdev)
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 }
 
 /*
@@ -2644,13 +2661,13 @@ static void nvme_remove(struct pci_dev *pdev)
 
 	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 	}
 
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
@@ -2684,7 +2701,7 @@ static int nvme_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_dev_disable(ndev, true);
+	nvme_dev_disable(ndev, true, false);
 	return 0;
 }
 
@@ -2716,7 +2733,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 4/7] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery
@ 2018-05-05 13:59   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)


When nvme_dev_disable() is used for error recovery, we should always
freeze queues before shutdown controller:

- reset handler supposes queues are frozen, and will wait_freeze &
unfreeze them explicitly, if queues aren't frozen during nvme_dev_disable(),
reset handler may wait forever even though there isn't any requests
allocated.

- this way may avoid to cancel lots of requests during error recovery

This patch introduces the parameter of 'freeze_queue' for fixing this
issue.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/pci.c | 47 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 57bd7bebd1e5..1fafe5d01355 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -69,7 +69,8 @@ struct nvme_dev;
 struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
+		freeze_queue);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -1197,7 +1198,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 		return BLK_EH_HANDLED;
 	}
@@ -1224,7 +1225,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_HANDLED;
 	default:
@@ -1240,7 +1241,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		nvme_reset_ctrl(&dev->ctrl);
 
 		/*
@@ -2239,19 +2240,35 @@ static void nvme_pci_disable(struct nvme_dev *dev)
 	}
 }
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+/*
+ * Resetting often follows nvme_dev_disable(), so queues need to be frozen
+ * before resetting.
+ */
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
+		freeze_queue)
 {
 	int i;
 	bool dead = true;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	bool frozen = false;
 
+	/*
+	 * 'freeze_queue' is only valid for non-shutdown, and we do
+	 * inline freeze & wait_freeze_timeout for shutdown just for
+	 * completing as many as possible requests before shutdown
+	 */
+	if (shutdown)
+		freeze_queue = false;
+
+	if (freeze_queue)
+		nvme_start_freeze(&dev->ctrl);
+
 	mutex_lock(&dev->shutdown_lock);
 	if (pci_is_enabled(pdev)) {
 		u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-		if (dev->ctrl.state == NVME_CTRL_LIVE ||
-		    dev->ctrl.state == NVME_CTRL_RESETTING) {
+		if (shutdown && (dev->ctrl.state == NVME_CTRL_LIVE ||
+		    dev->ctrl.state == NVME_CTRL_RESETTING)) {
 			nvme_start_freeze(&dev->ctrl);
 			frozen = true;
 		}
@@ -2343,7 +2360,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 	dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
 
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, false);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
 }
@@ -2364,7 +2381,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 
 	/*
 	 * Introduce CONNECTING state from nvme-fc/rdma transports to mark the
@@ -2613,7 +2630,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, false, true);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
@@ -2625,7 +2642,7 @@ static void nvme_reset_done(struct pci_dev *pdev)
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 }
 
 /*
@@ -2644,13 +2661,13 @@ static void nvme_remove(struct pci_dev *pdev)
 
 	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, false);
 	}
 
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, true, false);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
@@ -2684,7 +2701,7 @@ static int nvme_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_dev_disable(ndev, true);
+	nvme_dev_disable(ndev, true, false);
 	return 0;
 }
 
@@ -2716,7 +2733,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, false, true);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 5/7] nvme: core: introduce 'reset_lock' for sync reset state and reset activities
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:59   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

NVMe PCI may start a new reset context to run nested reset for recovering
from reset context, and we may not change the rule of state machine until
other kinds of NVMe controller support that, so use the 'reset_lock' to sync
the state change here.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 20 +++++++++++++++++---
 drivers/nvme/host/nvme.h |  3 +++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9df4f71e58ca..3aaee4dbf58e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -100,13 +100,25 @@ static struct class *nvme_subsys_class;
 static void nvme_ns_remove(struct nvme_ns *ns);
 static int nvme_revalidate_disk(struct gendisk *disk);
 
+/*
+ * NVMe PCI may support nested reset for recovering from reset context,
+ * and we may not change the rule of state machine until other kinds of
+ * NVMe controller support that, so use the 'reset_lock' to sync the
+ * state change here.
+ */
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
 {
+	int ret = -EBUSY;
+
+	mutex_lock(&ctrl->reset_lock);
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
-		return -EBUSY;
+		goto fail;
 	if (!queue_work(nvme_reset_wq, &ctrl->reset_work))
-		return -EBUSY;
-	return 0;
+		goto fail;
+	ret = 0;
+ fail:
+	mutex_unlock(&ctrl->reset_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(nvme_reset_ctrl);
 
@@ -3447,6 +3459,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
 	INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
 
+	mutex_init(&ctrl->reset_lock);
+
 	ret = ida_simple_get(&nvme_instance_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		goto out;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 061fecfd44f5..99f55c6f69f8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -146,6 +146,9 @@ struct nvme_ctrl {
 	struct device ctrl_device;
 	struct device *device;	/* char device */
 	struct cdev cdev;
+
+	/* sync reset state update and related reset activities */
+	struct mutex reset_lock;
 	struct work_struct reset_work;
 	struct work_struct delete_work;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 5/7] nvme: core: introduce 'reset_lock' for sync reset state and reset activities
@ 2018-05-05 13:59   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)


NVMe PCI may start a new reset context to run nested reset for recovering
from reset context, and we may not change the rule of state machine until
other kinds of NVMe controller support that, so use the 'reset_lock' to sync
the state change here.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/core.c | 20 +++++++++++++++++---
 drivers/nvme/host/nvme.h |  3 +++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9df4f71e58ca..3aaee4dbf58e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -100,13 +100,25 @@ static struct class *nvme_subsys_class;
 static void nvme_ns_remove(struct nvme_ns *ns);
 static int nvme_revalidate_disk(struct gendisk *disk);
 
+/*
+ * NVMe PCI may support nested reset for recovering from reset context,
+ * and we may not change the rule of state machine until other kinds of
+ * NVMe controller support that, so use the 'reset_lock' to sync the
+ * state change here.
+ */
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
 {
+	int ret = -EBUSY;
+
+	mutex_lock(&ctrl->reset_lock);
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
-		return -EBUSY;
+		goto fail;
 	if (!queue_work(nvme_reset_wq, &ctrl->reset_work))
-		return -EBUSY;
-	return 0;
+		goto fail;
+	ret = 0;
+ fail:
+	mutex_unlock(&ctrl->reset_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(nvme_reset_ctrl);
 
@@ -3447,6 +3459,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
 	INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
 
+	mutex_init(&ctrl->reset_lock);
+
 	ret = ida_simple_get(&nvme_instance_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		goto out;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 061fecfd44f5..99f55c6f69f8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -146,6 +146,9 @@ struct nvme_ctrl {
 	struct device ctrl_device;
 	struct device *device;	/* char device */
 	struct cdev cdev;
+
+	/* sync reset state update and related reset activities */
+	struct mutex reset_lock;
 	struct work_struct reset_work;
 	struct work_struct delete_work;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:59   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

Either the admin or normal IO in reset context may be timed out because
controller error happens. When this timeout happens, we may have to
start controller recovery again.

This patch holds the introduced reset lock when running reset, so that
we may support nested reset in the following patches.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1fafe5d01355..2fbe24274ad0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 		nvme_put_ctrl(&dev->ctrl);
 }
 
-static void nvme_reset_work(struct work_struct *work)
+static void nvme_reset_dev(struct nvme_dev *dev)
 {
-	struct nvme_dev *dev =
-		container_of(work, struct nvme_dev, ctrl.reset_work);
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result = -ENODEV;
 	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
 
+	mutex_lock(&dev->ctrl.reset_lock);
+
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 		goto out;
 
@@ -2448,7 +2448,11 @@ static void nvme_reset_work(struct work_struct *work)
 		new_state = NVME_CTRL_ADMIN_ONLY;
 	} else {
 		nvme_start_queues(&dev->ctrl);
+		mutex_unlock(&dev->ctrl.reset_lock);
+
 		nvme_wait_freeze(&dev->ctrl);
+
+		mutex_lock(&dev->ctrl.reset_lock);
 		/* hit this only when allocate tagset fails */
 		if (nvme_dev_add(dev))
 			new_state = NVME_CTRL_ADMIN_ONLY;
@@ -2466,10 +2470,20 @@ static void nvme_reset_work(struct work_struct *work)
 	}
 
 	nvme_start_ctrl(&dev->ctrl);
+	mutex_unlock(&dev->ctrl.reset_lock);
 	return;
 
  out:
 	nvme_remove_dead_ctrl(dev, result);
+	mutex_unlock(&dev->ctrl.reset_lock);
+}
+
+static void nvme_reset_work(struct work_struct *work)
+{
+	struct nvme_dev *dev =
+		container_of(work, struct nvme_dev, ctrl.reset_work);
+
+	nvme_reset_dev(dev);
 }
 
 static void nvme_remove_dead_ctrl_work(struct work_struct *work)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context
@ 2018-05-05 13:59   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)


Either the admin or normal IO in reset context may be timed out because
controller error happens. When this timeout happens, we may have to
start controller recovery again.

This patch holds the introduced reset lock when running reset, so that
we may support nested reset in the following patches.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/pci.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1fafe5d01355..2fbe24274ad0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 		nvme_put_ctrl(&dev->ctrl);
 }
 
-static void nvme_reset_work(struct work_struct *work)
+static void nvme_reset_dev(struct nvme_dev *dev)
 {
-	struct nvme_dev *dev =
-		container_of(work, struct nvme_dev, ctrl.reset_work);
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result = -ENODEV;
 	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
 
+	mutex_lock(&dev->ctrl.reset_lock);
+
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 		goto out;
 
@@ -2448,7 +2448,11 @@ static void nvme_reset_work(struct work_struct *work)
 		new_state = NVME_CTRL_ADMIN_ONLY;
 	} else {
 		nvme_start_queues(&dev->ctrl);
+		mutex_unlock(&dev->ctrl.reset_lock);
+
 		nvme_wait_freeze(&dev->ctrl);
+
+		mutex_lock(&dev->ctrl.reset_lock);
 		/* hit this only when allocate tagset fails */
 		if (nvme_dev_add(dev))
 			new_state = NVME_CTRL_ADMIN_ONLY;
@@ -2466,10 +2470,20 @@ static void nvme_reset_work(struct work_struct *work)
 	}
 
 	nvme_start_ctrl(&dev->ctrl);
+	mutex_unlock(&dev->ctrl.reset_lock);
 	return;
 
  out:
 	nvme_remove_dead_ctrl(dev, result);
+	mutex_unlock(&dev->ctrl.reset_lock);
+}
+
+static void nvme_reset_work(struct work_struct *work)
+{
+	struct nvme_dev *dev =
+		container_of(work, struct nvme_dev, ctrl.reset_work);
+
+	nvme_reset_dev(dev);
 }
 
 static void nvme_remove_dead_ctrl_work(struct work_struct *work)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 7/7] nvme: pci: support nested EH
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 13:59   ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Ming Lei, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, Laurence Oberman

When one req is timed out, now nvme_timeout() handles it by the
following way:

	nvme_dev_disable(dev, false);
	nvme_reset_ctrl(&dev->ctrl);
	return BLK_EH_HANDLED.

There are several issues about the above approach:

1) IO may fail during resetting

Admin IO timeout may be triggered in nvme_reset_dev() when error happens.
Normal IO timeout may be triggered too during nvme_wait_freeze() in
reset path. When the two kinds of timeout happen, the current reset mechanism
can't work any more.

2) race between nvme_start_freeze and nvme_wait_freeze() & nvme_unfreeze()

nvme_dev_disable() and resetting controller are required for recovering
controller, but the two are run from different contexts. nvme_start_freeze()
is call from nvme_dev_disable() which is run timeout work context, and
nvme_unfreeze() is run from reset work context. Unfortunatley timeout may be
triggered during resetting controller, so nvme_start_freeze() may be run
several times. Also two reset work may run one by one, this may cause
hang in nvme_wait_freeze() forever.

3) all namespace's EH require to shutdown & reset the controller

block's timeout handler is per-request-queue, that means each
namespace's error handling may shutdown & reset the whole controller,
then the shutdown from one namespace may quiese queues when resetting
from another namespace is in-progress.

This patch fixes the above issues by using nested EH:

1) run controller shutdown(nvme_dev_disable()) and resetting(nvme_reset_dev)
from one same EH context

2) always start a new context for handling EH, and cancel all
in-flight requests(include the timed-out ones) in nvme_dev_disable()
by quiescing timeout event before shutdown controller.

3) limit the max number of nested EH, when the limit is reached, fails
the controller by marking its state as DELETING and fail all in-flight
request. This approach for failing controller is from Keith's previous
patch.

With this approach, blktest block/011 can be passed.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c |  26 ++++++++
 drivers/nvme/host/nvme.h |   2 +
 drivers/nvme/host/pci.c  | 161 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 173 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3aaee4dbf58e..d9a62e2cc33e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -254,6 +254,8 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
 void nvme_cancel_request(struct request *req, void *data, bool reserved)
 {
+	struct nvme_ctrl *ctrl = data;
+
 	if (!blk_mq_request_started(req))
 		return;
 
@@ -261,6 +263,8 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved)
 				"Cancelling I/O %d", req->tag);
 
 	nvme_req(req)->status = NVME_SC_ABORT_REQ;
+	if (ctrl->state == NVME_CTRL_DELETING)
+		nvme_req(req)->status |= NVME_SC_DNR;
 	blk_mq_complete_request(req);
 
 }
@@ -3583,6 +3587,28 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
+void nvme_unquiesce_timeout(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_unquiesce_timeout(ns->queue);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_unquiesce_timeout);
+
+void nvme_quiesce_timeout(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_quiesce_timeout(ns->queue);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_quiesce_timeout);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 99f55c6f69f8..32f76cc8bb65 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -405,6 +405,8 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		union nvme_result *res);
 
+void nvme_unquiesce_timeout(struct nvme_ctrl *ctrl);
+void nvme_quiesce_timeout(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2fbe24274ad0..105d02fcac2d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -71,6 +71,7 @@ struct nvme_queue;
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
 		freeze_queue);
+static void nvme_reset_dev(struct nvme_dev *dev, bool update_state);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -113,6 +114,20 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+
+	/* EH handler */
+	spinlock_t	eh_lock;
+	bool		ctrl_shutdown_started;
+	bool		ctrl_failed;
+	unsigned int	nested_eh;
+	struct work_struct fail_ctrl_work;
+};
+
+#define  NVME_MAX_NESTED_EH	32
+struct nvme_eh_work {
+	struct work_struct	work;
+	struct nvme_dev		*dev;
+	int			seq;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -1177,6 +1192,93 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts)
 			 csts, result);
 }
 
+static void nvme_eh_fail_ctrl_work(struct work_struct *work)
+{
+	struct nvme_dev *dev =
+		container_of(work, struct nvme_dev, fail_ctrl_work);
+
+	dev_info(dev->ctrl.device, "EH: fail controller\n");
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+	nvme_dev_disable(dev, false, true);
+}
+
+static void nvme_eh_mark_ctrl_shutdown(struct nvme_dev *dev)
+{
+	spin_lock(&dev->eh_lock);
+	dev->ctrl_shutdown_started = false;
+	spin_unlock(&dev->eh_lock);
+}
+
+static void nvme_eh_done(struct nvme_dev *dev)
+{
+	spin_lock(&dev->eh_lock);
+	dev->nested_eh--;
+	spin_unlock(&dev->eh_lock);
+}
+
+static void nvme_eh_work(struct work_struct *work)
+{
+	struct nvme_eh_work *eh_work =
+		container_of(work, struct nvme_eh_work, work);
+	struct nvme_dev *dev = eh_work->dev;
+
+	dev_info(dev->ctrl.device, "EH %d: before shutdown\n",
+			eh_work->seq);
+	nvme_dev_disable(dev, false, true);
+	nvme_eh_mark_ctrl_shutdown(dev);
+
+	dev_info(dev->ctrl.device, "EH %d: after shutdown\n",
+			eh_work->seq);
+
+	nvme_reset_dev(dev, true);
+	nvme_eh_done(dev);
+	dev_info(dev->ctrl.device, "EH %d: after recovery\n",
+			eh_work->seq);
+
+	kfree(eh_work);
+}
+
+static void nvme_eh_schedule(struct nvme_dev *dev)
+{
+	bool need_sched = false;
+	bool fail_ctrl = false;
+	struct nvme_eh_work *eh_work;
+	int seq;
+
+	spin_lock(&dev->eh_lock);
+	if (!dev->ctrl_shutdown_started) {
+		need_sched = true;
+		seq = dev->nested_eh;
+		if (++dev->nested_eh >= NVME_MAX_NESTED_EH) {
+			if (!dev->ctrl_failed)
+				dev->ctrl_failed = fail_ctrl = true;
+			else
+				need_sched = false;
+		} else
+			dev->ctrl_shutdown_started = true;
+	}
+	spin_unlock(&dev->eh_lock);
+
+	if (!need_sched)
+		return;
+
+	if (fail_ctrl) {
+ fail_ctrl:
+		INIT_WORK(&dev->fail_ctrl_work, nvme_eh_fail_ctrl_work);
+		queue_work(nvme_reset_wq, &dev->fail_ctrl_work);
+		return;
+	}
+
+	eh_work = kzalloc(sizeof(*eh_work), GFP_NOIO);
+	if (!eh_work)
+		goto fail_ctrl;
+
+	eh_work->dev = dev;
+	eh_work->seq = seq;
+	INIT_WORK(&eh_work->work, nvme_eh_work);
+	queue_work(nvme_reset_wq, &eh_work->work);
+}
+
 static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1198,9 +1300,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false, true);
-		nvme_reset_ctrl(&dev->ctrl);
-		return BLK_EH_HANDLED;
+		nvme_eh_schedule(dev);
+		return BLK_EH_RESET_TIMER;
 	}
 
 	/*
@@ -1225,9 +1326,9 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false, false);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-		return BLK_EH_HANDLED;
+		nvme_eh_schedule(dev);
+		return BLK_EH_RESET_TIMER;
 	default:
 		break;
 	}
@@ -1241,15 +1342,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false, true);
-		nvme_reset_ctrl(&dev->ctrl);
-
 		/*
 		 * Mark the request as handled, since the inline shutdown
 		 * forces all outstanding requests to complete.
 		 */
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-		return BLK_EH_HANDLED;
+		nvme_eh_schedule(dev);
+		return BLK_EH_RESET_TIMER;
 	}
 
 	if (atomic_dec_return(&dev->ctrl.abort_limit) < 0) {
@@ -2301,12 +2400,26 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
 	}
 	for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
 		nvme_suspend_queue(&dev->queues[i]);
+	/*
+	 * safe to sync timeout after queues are quiesced, then all
+	 * requests(include the time-out ones) will be canceled.
+	 */
+	nvme_quiesce_timeout(&dev->ctrl);
+	blk_quiesce_timeout(dev->ctrl.admin_q);
 
 	nvme_pci_disable(dev);
 
+	/*
+	 * Both timeout and interrupt handler have been drained, and all
+	 * in-flight requests will be canceled now.
+	 */
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
 
+	/* all requests have been canceled now, so enable timeout now */
+	nvme_unquiesce_timeout(&dev->ctrl);
+	blk_unquiesce_timeout(dev->ctrl.admin_q);
+
 	/*
 	 * The driver will not be starting up queues again if shutting down so
 	 * must flush all entered requests to their failed completion to avoid
@@ -2365,7 +2478,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 		nvme_put_ctrl(&dev->ctrl);
 }
 
-static void nvme_reset_dev(struct nvme_dev *dev)
+static void nvme_reset_dev(struct nvme_dev *dev, bool update_state)
 {
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result = -ENODEV;
@@ -2373,7 +2486,19 @@ static void nvme_reset_dev(struct nvme_dev *dev)
 
 	mutex_lock(&dev->ctrl.reset_lock);
 
-	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
+	if (update_state) {
+		if (dev->ctrl.state != NVME_CTRL_RESETTING &&
+		    dev->ctrl.state != NVME_CTRL_CONNECTING) {
+		    if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+			dev_warn(dev->ctrl.device, "failed to change state to %d\n",
+					NVME_CTRL_RESETTING);
+			goto out;
+		    }
+		}
+	}
+
+	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING &&
+				dev->ctrl.state != NVME_CTRL_CONNECTING))
 		goto out;
 
 	/*
@@ -2387,10 +2512,12 @@ static void nvme_reset_dev(struct nvme_dev *dev)
 	 * Introduce CONNECTING state from nvme-fc/rdma transports to mark the
 	 * initializing procedure here.
 	 */
-	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
-		dev_warn(dev->ctrl.device,
-			"failed to mark controller CONNECTING\n");
-		goto out;
+	if (dev->ctrl.state != NVME_CTRL_CONNECTING) {
+		if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
+			dev_warn(dev->ctrl.device,
+				 "failed to mark controller CONNECTING\n");
+			goto out;
+		}
 	}
 
 	result = nvme_pci_enable(dev);
@@ -2483,7 +2610,7 @@ static void nvme_reset_work(struct work_struct *work)
 	struct nvme_dev *dev =
 		container_of(work, struct nvme_dev, ctrl.reset_work);
 
-	nvme_reset_dev(dev);
+	nvme_reset_dev(dev, false);
 }
 
 static void nvme_remove_dead_ctrl_work(struct work_struct *work)
@@ -2625,6 +2752,8 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
+	spin_lock_init(&dev->eh_lock);
+
 	nvme_reset_ctrl(&dev->ctrl);
 
 	return 0;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V4 7/7] nvme: pci: support nested EH
@ 2018-05-05 13:59   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-05 13:59 UTC (permalink / raw)


When one req is timed out, now nvme_timeout() handles it by the
following way:

	nvme_dev_disable(dev, false);
	nvme_reset_ctrl(&dev->ctrl);
	return BLK_EH_HANDLED.

There are several issues about the above approach:

1) IO may fail during resetting

Admin IO timeout may be triggered in nvme_reset_dev() when error happens.
Normal IO timeout may be triggered too during nvme_wait_freeze() in
reset path. When the two kinds of timeout happen, the current reset mechanism
can't work any more.

2) race between nvme_start_freeze and nvme_wait_freeze() & nvme_unfreeze()

nvme_dev_disable() and resetting controller are required for recovering
controller, but the two are run from different contexts. nvme_start_freeze()
is call from nvme_dev_disable() which is run timeout work context, and
nvme_unfreeze() is run from reset work context. Unfortunatley timeout may be
triggered during resetting controller, so nvme_start_freeze() may be run
several times. Also two reset work may run one by one, this may cause
hang in nvme_wait_freeze() forever.

3) all namespace's EH require to shutdown & reset the controller

block's timeout handler is per-request-queue, that means each
namespace's error handling may shutdown & reset the whole controller,
then the shutdown from one namespace may quiese queues when resetting
from another namespace is in-progress.

This patch fixes the above issues by using nested EH:

1) run controller shutdown(nvme_dev_disable()) and resetting(nvme_reset_dev)
from one same EH context

2) always start a new context for handling EH, and cancel all
in-flight requests(include the timed-out ones) in nvme_dev_disable()
by quiescing timeout event before shutdown controller.

3) limit the max number of nested EH, when the limit is reached, fails
the controller by marking its state as DELETING and fail all in-flight
request. This approach for failing controller is from Keith's previous
patch.

With this approach, blktest block/011 can be passed.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: Laurence Oberman <loberman at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/core.c |  26 ++++++++
 drivers/nvme/host/nvme.h |   2 +
 drivers/nvme/host/pci.c  | 161 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 173 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3aaee4dbf58e..d9a62e2cc33e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -254,6 +254,8 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
 void nvme_cancel_request(struct request *req, void *data, bool reserved)
 {
+	struct nvme_ctrl *ctrl = data;
+
 	if (!blk_mq_request_started(req))
 		return;
 
@@ -261,6 +263,8 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved)
 				"Cancelling I/O %d", req->tag);
 
 	nvme_req(req)->status = NVME_SC_ABORT_REQ;
+	if (ctrl->state == NVME_CTRL_DELETING)
+		nvme_req(req)->status |= NVME_SC_DNR;
 	blk_mq_complete_request(req);
 
 }
@@ -3583,6 +3587,28 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
+void nvme_unquiesce_timeout(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_unquiesce_timeout(ns->queue);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_unquiesce_timeout);
+
+void nvme_quiesce_timeout(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_quiesce_timeout(ns->queue);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_quiesce_timeout);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 99f55c6f69f8..32f76cc8bb65 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -405,6 +405,8 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		union nvme_result *res);
 
+void nvme_unquiesce_timeout(struct nvme_ctrl *ctrl);
+void nvme_quiesce_timeout(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2fbe24274ad0..105d02fcac2d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -71,6 +71,7 @@ struct nvme_queue;
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
 		freeze_queue);
+static void nvme_reset_dev(struct nvme_dev *dev, bool update_state);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -113,6 +114,20 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+
+	/* EH handler */
+	spinlock_t	eh_lock;
+	bool		ctrl_shutdown_started;
+	bool		ctrl_failed;
+	unsigned int	nested_eh;
+	struct work_struct fail_ctrl_work;
+};
+
+#define  NVME_MAX_NESTED_EH	32
+struct nvme_eh_work {
+	struct work_struct	work;
+	struct nvme_dev		*dev;
+	int			seq;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -1177,6 +1192,93 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts)
 			 csts, result);
 }
 
+static void nvme_eh_fail_ctrl_work(struct work_struct *work)
+{
+	struct nvme_dev *dev =
+		container_of(work, struct nvme_dev, fail_ctrl_work);
+
+	dev_info(dev->ctrl.device, "EH: fail controller\n");
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+	nvme_dev_disable(dev, false, true);
+}
+
+static void nvme_eh_mark_ctrl_shutdown(struct nvme_dev *dev)
+{
+	spin_lock(&dev->eh_lock);
+	dev->ctrl_shutdown_started = false;
+	spin_unlock(&dev->eh_lock);
+}
+
+static void nvme_eh_done(struct nvme_dev *dev)
+{
+	spin_lock(&dev->eh_lock);
+	dev->nested_eh--;
+	spin_unlock(&dev->eh_lock);
+}
+
+static void nvme_eh_work(struct work_struct *work)
+{
+	struct nvme_eh_work *eh_work =
+		container_of(work, struct nvme_eh_work, work);
+	struct nvme_dev *dev = eh_work->dev;
+
+	dev_info(dev->ctrl.device, "EH %d: before shutdown\n",
+			eh_work->seq);
+	nvme_dev_disable(dev, false, true);
+	nvme_eh_mark_ctrl_shutdown(dev);
+
+	dev_info(dev->ctrl.device, "EH %d: after shutdown\n",
+			eh_work->seq);
+
+	nvme_reset_dev(dev, true);
+	nvme_eh_done(dev);
+	dev_info(dev->ctrl.device, "EH %d: after recovery\n",
+			eh_work->seq);
+
+	kfree(eh_work);
+}
+
+static void nvme_eh_schedule(struct nvme_dev *dev)
+{
+	bool need_sched = false;
+	bool fail_ctrl = false;
+	struct nvme_eh_work *eh_work;
+	int seq;
+
+	spin_lock(&dev->eh_lock);
+	if (!dev->ctrl_shutdown_started) {
+		need_sched = true;
+		seq = dev->nested_eh;
+		if (++dev->nested_eh >= NVME_MAX_NESTED_EH) {
+			if (!dev->ctrl_failed)
+				dev->ctrl_failed = fail_ctrl = true;
+			else
+				need_sched = false;
+		} else
+			dev->ctrl_shutdown_started = true;
+	}
+	spin_unlock(&dev->eh_lock);
+
+	if (!need_sched)
+		return;
+
+	if (fail_ctrl) {
+ fail_ctrl:
+		INIT_WORK(&dev->fail_ctrl_work, nvme_eh_fail_ctrl_work);
+		queue_work(nvme_reset_wq, &dev->fail_ctrl_work);
+		return;
+	}
+
+	eh_work = kzalloc(sizeof(*eh_work), GFP_NOIO);
+	if (!eh_work)
+		goto fail_ctrl;
+
+	eh_work->dev = dev;
+	eh_work->seq = seq;
+	INIT_WORK(&eh_work->work, nvme_eh_work);
+	queue_work(nvme_reset_wq, &eh_work->work);
+}
+
 static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1198,9 +1300,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	 */
 	if (nvme_should_reset(dev, csts)) {
 		nvme_warn_reset(dev, csts);
-		nvme_dev_disable(dev, false, true);
-		nvme_reset_ctrl(&dev->ctrl);
-		return BLK_EH_HANDLED;
+		nvme_eh_schedule(dev);
+		return BLK_EH_RESET_TIMER;
 	}
 
 	/*
@@ -1225,9 +1326,9 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false, false);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-		return BLK_EH_HANDLED;
+		nvme_eh_schedule(dev);
+		return BLK_EH_RESET_TIMER;
 	default:
 		break;
 	}
@@ -1241,15 +1342,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, false, true);
-		nvme_reset_ctrl(&dev->ctrl);
-
 		/*
 		 * Mark the request as handled, since the inline shutdown
 		 * forces all outstanding requests to complete.
 		 */
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-		return BLK_EH_HANDLED;
+		nvme_eh_schedule(dev);
+		return BLK_EH_RESET_TIMER;
 	}
 
 	if (atomic_dec_return(&dev->ctrl.abort_limit) < 0) {
@@ -2301,12 +2400,26 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown, bool
 	}
 	for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
 		nvme_suspend_queue(&dev->queues[i]);
+	/*
+	 * safe to sync timeout after queues are quiesced, then all
+	 * requests(include the time-out ones) will be canceled.
+	 */
+	nvme_quiesce_timeout(&dev->ctrl);
+	blk_quiesce_timeout(dev->ctrl.admin_q);
 
 	nvme_pci_disable(dev);
 
+	/*
+	 * Both timeout and interrupt handler have been drained, and all
+	 * in-flight requests will be canceled now.
+	 */
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
 
+	/* all requests have been canceled now, so enable timeout now */
+	nvme_unquiesce_timeout(&dev->ctrl);
+	blk_unquiesce_timeout(dev->ctrl.admin_q);
+
 	/*
 	 * The driver will not be starting up queues again if shutting down so
 	 * must flush all entered requests to their failed completion to avoid
@@ -2365,7 +2478,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 		nvme_put_ctrl(&dev->ctrl);
 }
 
-static void nvme_reset_dev(struct nvme_dev *dev)
+static void nvme_reset_dev(struct nvme_dev *dev, bool update_state)
 {
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result = -ENODEV;
@@ -2373,7 +2486,19 @@ static void nvme_reset_dev(struct nvme_dev *dev)
 
 	mutex_lock(&dev->ctrl.reset_lock);
 
-	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
+	if (update_state) {
+		if (dev->ctrl.state != NVME_CTRL_RESETTING &&
+		    dev->ctrl.state != NVME_CTRL_CONNECTING) {
+		    if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+			dev_warn(dev->ctrl.device, "failed to change state to %d\n",
+					NVME_CTRL_RESETTING);
+			goto out;
+		    }
+		}
+	}
+
+	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING &&
+				dev->ctrl.state != NVME_CTRL_CONNECTING))
 		goto out;
 
 	/*
@@ -2387,10 +2512,12 @@ static void nvme_reset_dev(struct nvme_dev *dev)
 	 * Introduce CONNECTING state from nvme-fc/rdma transports to mark the
 	 * initializing procedure here.
 	 */
-	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
-		dev_warn(dev->ctrl.device,
-			"failed to mark controller CONNECTING\n");
-		goto out;
+	if (dev->ctrl.state != NVME_CTRL_CONNECTING) {
+		if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
+			dev_warn(dev->ctrl.device,
+				 "failed to mark controller CONNECTING\n");
+			goto out;
+		}
 	}
 
 	result = nvme_pci_enable(dev);
@@ -2483,7 +2610,7 @@ static void nvme_reset_work(struct work_struct *work)
 	struct nvme_dev *dev =
 		container_of(work, struct nvme_dev, ctrl.reset_work);
 
-	nvme_reset_dev(dev);
+	nvme_reset_dev(dev, false);
 }
 
 static void nvme_remove_dead_ctrl_work(struct work_struct *work)
@@ -2625,6 +2752,8 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
+	spin_lock_init(&dev->eh_lock);
+
 	nvme_reset_ctrl(&dev->ctrl);
 
 	return 0;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-05 23:11   ` Laurence Oberman
  -1 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-05 23:11 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Jens Axboe, linux-block, Jianchao Wang, Christoph Hellwig,
	Sagi Grimberg, linux-nvme

On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> Hi,
> 
> The 1st patch introduces blk_quiesce_timeout() and
> blk_unquiesce_timeout()
> for NVMe, meantime fixes blk_sync_queue().
> 
> The 2nd patch covers timeout for admin commands for recovering
> controller
> for avoiding possible deadlock.
> 
> The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> frozen.
> 
> The last 4 patches fixes several races wrt. NVMe timeout handler, and
> finally can make blktests block/011 passed. Meantime the NVMe PCI
> timeout
> mecanism become much more rebost than before.
> 
> gitweb:
> 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> 
> V4:
> 	- fixe nvme_init_set_host_mem_cmd()
> 	- use nested EH model, and run both nvme_dev_disable() and
> 	resetting in one same context
> 
> V3:
> 	- fix one new race related freezing in patch 4,
> nvme_reset_work()
> 	may hang forever without this patch
> 	- rewrite the last 3 patches, and avoid to break
> nvme_reset_ctrl*()
> 
> V2:
> 	- fix draining timeout work, so no need to change return value
> from
> 	.timeout()
> 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> 	- cover timeout for admin commands running in EH
> 
> Ming Lei (7):
>   block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
>   nvme: pci: cover timeout for admin commands running in EH
>   nvme: pci: only wait freezing if queue is frozen
>   nvme: pci: freeze queue in nvme_dev_disable() in case of error
>     recovery
>   nvme: core: introduce 'reset_lock' for sync reset state and reset
>     activities
>   nvme: pci: prepare for supporting error recovery from resetting
>     context
>   nvme: pci: support nested EH
> 
>  block/blk-core.c         |  21 +++-
>  block/blk-mq.c           |   9 ++
>  block/blk-timeout.c      |   5 +-
>  drivers/nvme/host/core.c |  46 ++++++-
>  drivers/nvme/host/nvme.h |   5 +
>  drivers/nvme/host/pci.c  | 304
> ++++++++++++++++++++++++++++++++++++++++-------
>  include/linux/blkdev.h   |  13 ++
>  7 files changed, 356 insertions(+), 47 deletions(-)
> 
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: linux-nvme@lists.infradead.org
> Cc: Laurence Oberman <loberman@redhat.com>

Hello Ming

I have a two node NUMA system here running your kernel tree
4.17.0-rc3.ming.nvme+

[root@segstorage1 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 3 5 6 8 11 13 14
node 0 size: 63922 MB
node 0 free: 61310 MB
node 1 cpus: 1 2 4 7 9 10 12 15
node 1 size: 64422 MB
node 1 free: 62372 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

I ran block/011

[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
    runtime    ...  106.936s
    --- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
    +++ results/nvme0n1/block/011.out.bad	2018-05-05
19:07:21.028634858 -0400
    @@ -1,2 +1,36 @@
     Running block/011
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    ...
    (Run 'diff -u tests/block/011.out
results/nvme0n1/block/011.out.bad' to see the entire diff)

[ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
[ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.718239] nvme nvme0: EH 0: before shutdown
[ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760926] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1453.330251] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1453.391713] nvme nvme0: EH 0: after shutdown
[ 1456.804695] device-mapper: multipath: Failing path 259:0.
[ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
[ 1526.754335] nvme nvme0: EH 1: before shutdown
[ 1526.793257] nvme nvme0: EH 1: after shutdown
[ 1526.793327] nvme nvme0: Identify Controller failed (-4)
[ 1526.847869] nvme nvme0: Removing after probe failure status: -5
[ 1526.888206] nvme nvme0: EH 0: after recovery
[ 1526.888212] nvme0n1: detected capacity change from 400088457216 to 0
[ 1526.947520] print_req_error: 1 callbacks suppressed
[ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector 794920
[ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector 569328
[ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector 1234608
[ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector 389296
[ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector 712432
[ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector 889304
[ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector 205776
[ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector 126480
[ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector 1601232
[ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector 1234360
[ 1526.947745] Pid 683(fio) over core_pipe_limit
[ 1526.947746] Skipping core dump
[ 1526.947747] Pid 675(fio) over core_pipe_limit
[ 1526.947748] Skipping core dump
[ 1526.947863] Pid 672(fio) over core_pipe_limit
[ 1526.947863] Skipping core dump
[ 1526.947865] Pid 674(fio) over core_pipe_limit
[ 1526.947866] Skipping core dump
[ 1526.947870] Pid 676(fio) over core_pipe_limit
[ 1526.947871] Pid 679(fio) over core_pipe_limit
[ 1526.947872] Skipping core dump
[ 1526.947872] Skipping core dump
[ 1526.948197] Pid 677(fio) over core_pipe_limit
[ 1526.948197] Skipping core dump
[ 1526.948245] Pid 686(fio) over core_pipe_limit
[ 1526.948245] Skipping core dump
[ 1526.974610] Pid 680(fio) over core_pipe_limit
[ 1526.974611] Pid 684(fio) over core_pipe_limit
[ 1526.974611] Skipping core dump
[ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
[ 1526.980373] nvme nvme0: Removing after probe failure status: -19
[ 1526.980385] nvme nvme0: EH 1: after recovery
[ 1526.980477] Pid 687(fio) over core_pipe_limit
[ 1526.980478] Skipping core dump
[ 1527.858207] Skipping core dump

And leaves me looping here

[ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than 120
seconds.
[ 1721.311263]       Tainted: G          I       4.17.0-rc3.ming.nvme+
#1
[ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1721.392957] kworker/u66:0   D    0 24214      2 0x80000080
[ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
[ 1721.458568] Call Trace:
[ 1721.472499]  ? __schedule+0x290/0x870
[ 1721.493515]  schedule+0x32/0x80
[ 1721.511656]  blk_mq_freeze_queue_wait+0x46/0xb0
[ 1721.537609]  ? remove_wait_queue+0x60/0x60
[ 1721.561081]  blk_cleanup_queue+0x7e/0x180
[ 1721.584637]  nvme_ns_remove+0x106/0x140 [nvme_core]
[ 1721.612589]  nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
[ 1721.643163]  nvme_remove+0x80/0x120 [nvme]
[ 1721.666188]  pci_device_remove+0x3b/0xc0
[ 1721.688553]  device_release_driver_internal+0x148/0x220
[ 1721.719332]  nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
[ 1721.750474]  process_one_work+0x158/0x360
[ 1721.772632]  worker_thread+0x47/0x3e0
[ 1721.792471]  kthread+0xf8/0x130
[ 1721.810354]  ? max_active_store+0x80/0x80
[ 1721.832459]  ? kthread_bind+0x10/0x10
[ 1721.852845]  ret_from_fork+0x35/0x40

Did I di something wrong

I never set anything else, the nvme0n1 was not mounted etc.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-05 23:11   ` Laurence Oberman
  0 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-05 23:11 UTC (permalink / raw)


On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> Hi,
> 
> The 1st patch introduces blk_quiesce_timeout() and
> blk_unquiesce_timeout()
> for NVMe, meantime fixes blk_sync_queue().
> 
> The 2nd patch covers timeout for admin commands for recovering
> controller
> for avoiding possible deadlock.
> 
> The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> frozen.
> 
> The last 4 patches fixes several races wrt. NVMe timeout handler, and
> finally can make blktests block/011 passed. Meantime the NVMe PCI
> timeout
> mecanism become much more rebost than before.
> 
> gitweb:
> 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> 
> V4:
> 	- fixe nvme_init_set_host_mem_cmd()
> 	- use nested EH model, and run both nvme_dev_disable() and
> 	resetting in one same context
> 
> V3:
> 	- fix one new race related freezing in patch 4,
> nvme_reset_work()
> 	may hang forever without this patch
> 	- rewrite the last 3 patches, and avoid to break
> nvme_reset_ctrl*()
> 
> V2:
> 	- fix draining timeout work, so no need to change return value
> from
> 	.timeout()
> 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> 	- cover timeout for admin commands running in EH
> 
> Ming Lei (7):
> ? block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
> ? nvme: pci: cover timeout for admin commands running in EH
> ? nvme: pci: only wait freezing if queue is frozen
> ? nvme: pci: freeze queue in nvme_dev_disable() in case of error
> ????recovery
> ? nvme: core: introduce 'reset_lock' for sync reset state and reset
> ????activities
> ? nvme: pci: prepare for supporting error recovery from resetting
> ????context
> ? nvme: pci: support nested EH
> 
> ?block/blk-core.c?????????|??21 +++-
> ?block/blk-mq.c???????????|???9 ++
> ?block/blk-timeout.c??????|???5 +-
> ?drivers/nvme/host/core.c |??46 ++++++-
> ?drivers/nvme/host/nvme.h |???5 +
> ?drivers/nvme/host/pci.c??| 304
> ++++++++++++++++++++++++++++++++++++++++-------
> ?include/linux/blkdev.h???|??13 ++
> ?7 files changed, 356 insertions(+), 47 deletions(-)
> 
> Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> Cc: linux-nvme at lists.infradead.org
> Cc: Laurence Oberman <loberman at redhat.com>

Hello Ming

I have a two node NUMA system here running your kernel tree
4.17.0-rc3.ming.nvme+

[root at segstorage1 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 3 5 6 8 11 13 14
node 0 size: 63922 MB
node 0 free: 61310 MB
node 1 cpus: 1 2 4 7 9 10 12 15
node 1 size: 64422 MB
node 1 free: 62372 MB
node distances:
node???0???1?
? 0:??10??20?
? 1:??20??10?

I ran block/011

[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
????runtime????...??106.936s
????--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
????+++ results/nvme0n1/block/011.out.bad	2018-05-05
19:07:21.028634858 -0400
????@@ -1,2 +1,36 @@
?????Running block/011
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????...
????(Run 'diff -u tests/block/011.out
results/nvme0n1/block/011.out.bad' to see the entire diff)

[ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
[ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.718239] nvme nvme0: EH 0: before shutdown
[ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1452.760926] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1453.330251] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[ 1453.391713] nvme nvme0: EH 0: after shutdown
[ 1456.804695] device-mapper: multipath: Failing path 259:0.
[ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
[ 1526.754335] nvme nvme0: EH 1: before shutdown
[ 1526.793257] nvme nvme0: EH 1: after shutdown
[ 1526.793327] nvme nvme0: Identify Controller failed (-4)
[ 1526.847869] nvme nvme0: Removing after probe failure status: -5
[ 1526.888206] nvme nvme0: EH 0: after recovery
[ 1526.888212] nvme0n1: detected capacity change from 400088457216 to 0
[ 1526.947520] print_req_error: 1 callbacks suppressed
[ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector 794920
[ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector 569328
[ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector 1234608
[ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector 389296
[ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector 712432
[ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector 889304
[ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector 205776
[ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector 126480
[ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector 1601232
[ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector 1234360
[ 1526.947745] Pid 683(fio) over core_pipe_limit
[ 1526.947746] Skipping core dump
[ 1526.947747] Pid 675(fio) over core_pipe_limit
[ 1526.947748] Skipping core dump
[ 1526.947863] Pid 672(fio) over core_pipe_limit
[ 1526.947863] Skipping core dump
[ 1526.947865] Pid 674(fio) over core_pipe_limit
[ 1526.947866] Skipping core dump
[ 1526.947870] Pid 676(fio) over core_pipe_limit
[ 1526.947871] Pid 679(fio) over core_pipe_limit
[ 1526.947872] Skipping core dump
[ 1526.947872] Skipping core dump
[ 1526.948197] Pid 677(fio) over core_pipe_limit
[ 1526.948197] Skipping core dump
[ 1526.948245] Pid 686(fio) over core_pipe_limit
[ 1526.948245] Skipping core dump
[ 1526.974610] Pid 680(fio) over core_pipe_limit
[ 1526.974611] Pid 684(fio) over core_pipe_limit
[ 1526.974611] Skipping core dump
[ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
[ 1526.980373] nvme nvme0: Removing after probe failure status: -19
[ 1526.980385] nvme nvme0: EH 1: after recovery
[ 1526.980477] Pid 687(fio) over core_pipe_limit
[ 1526.980478] Skipping core dump
[ 1527.858207] Skipping core dump

And leaves me looping here

[ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than 120
seconds.
[ 1721.311263]???????Tainted: G??????????I???????4.17.0-rc3.ming.nvme+
#1
[ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1721.392957] kworker/u66:0???D????0 24214??????2 0x80000080
[ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
[ 1721.458568] Call Trace:
[ 1721.472499]??? __schedule+0x290/0x870
[ 1721.493515]??schedule+0x32/0x80
[ 1721.511656]??blk_mq_freeze_queue_wait+0x46/0xb0
[ 1721.537609]??? remove_wait_queue+0x60/0x60
[ 1721.561081]??blk_cleanup_queue+0x7e/0x180
[ 1721.584637]??nvme_ns_remove+0x106/0x140 [nvme_core]
[ 1721.612589]??nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
[ 1721.643163]??nvme_remove+0x80/0x120 [nvme]
[ 1721.666188]??pci_device_remove+0x3b/0xc0
[ 1721.688553]??device_release_driver_internal+0x148/0x220
[ 1721.719332]??nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
[ 1721.750474]??process_one_work+0x158/0x360
[ 1721.772632]??worker_thread+0x47/0x3e0
[ 1721.792471]??kthread+0xf8/0x130
[ 1721.810354]??? max_active_store+0x80/0x80
[ 1721.832459]??? kthread_bind+0x10/0x10
[ 1721.852845]??ret_from_fork+0x35/0x40

Did I di something wrong

I never set anything else, the nvme0n1 was not mounted etc.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-05 23:11   ` Laurence Oberman
@ 2018-05-05 23:31     ` Laurence Oberman
  -1 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-05 23:31 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Jens Axboe, linux-block, Jianchao Wang, Christoph Hellwig,
	Sagi Grimberg, linux-nvme

On Sat, 2018-05-05 at 19:11 -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > Hi,
> > 
> > The 1st patch introduces blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > for NVMe, meantime fixes blk_sync_queue().
> > 
> > The 2nd patch covers timeout for admin commands for recovering
> > controller
> > for avoiding possible deadlock.
> > 
> > The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> > frozen.
> > 
> > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > and
> > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > timeout
> > mecanism become much more rebost than before.
> > 
> > gitweb:
> > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > 
> > V4:
> > 	- fixe nvme_init_set_host_mem_cmd()
> > 	- use nested EH model, and run both nvme_dev_disable() and
> > 	resetting in one same context
> > 
> > V3:
> > 	- fix one new race related freezing in patch 4,
> > nvme_reset_work()
> > 	may hang forever without this patch
> > 	- rewrite the last 3 patches, and avoid to break
> > nvme_reset_ctrl*()
> > 
> > V2:
> > 	- fix draining timeout work, so no need to change return value
> > from
> > 	.timeout()
> > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > 	- cover timeout for admin commands running in EH
> > 
> > Ming Lei (7):
> >   block: introduce blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> >   nvme: pci: cover timeout for admin commands running in EH
> >   nvme: pci: only wait freezing if queue is frozen
> >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> >     recovery
> >   nvme: core: introduce 'reset_lock' for sync reset state and reset
> >     activities
> >   nvme: pci: prepare for supporting error recovery from resetting
> >     context
> >   nvme: pci: support nested EH
> > 
> >  block/blk-core.c         |  21 +++-
> >  block/blk-mq.c           |   9 ++
> >  block/blk-timeout.c      |   5 +-
> >  drivers/nvme/host/core.c |  46 ++++++-
> >  drivers/nvme/host/nvme.h |   5 +
> >  drivers/nvme/host/pci.c  | 304
> > ++++++++++++++++++++++++++++++++++++++++-------
> >  include/linux/blkdev.h   |  13 ++
> >  7 files changed, 356 insertions(+), 47 deletions(-)
> > 
> > Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: linux-nvme@lists.infradead.org
> > Cc: Laurence Oberman <loberman@redhat.com>
> 
> Hello Ming
> 
> I have a two node NUMA system here running your kernel tree
> 4.17.0-rc3.ming.nvme+
> 
> [root@segstorage1 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 3 5 6 8 11 13 14
> node 0 size: 63922 MB
> node 0 free: 61310 MB
> node 1 cpus: 1 2 4 7 9 10 12 15
> node 1 size: 64422 MB
> node 1 free: 62372 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> 
> I ran block/011
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
>     runtime    ...  106.936s
>     --- tests/block/011.out	2018-05-05 18:01:14.268414752
> -0400
>     +++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:07:21.028634858 -0400
>     @@ -1,2 +1,36 @@
>      Running block/011
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     ...
>     (Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718239] nvme nvme0: EH 0: before shutdown
> [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760926] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.330251] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.391713] nvme nvme0: EH 0: after shutdown
> [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> [ 1526.754335] nvme nvme0: EH 1: before shutdown
> [ 1526.793257] nvme nvme0: EH 1: after shutdown
> [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> [ 1526.888206] nvme nvme0: EH 0: after recovery
> [ 1526.888212] nvme0n1: detected capacity change from 400088457216 to
> 0
> [ 1526.947520] print_req_error: 1 callbacks suppressed
> [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector 794920
> [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector 569328
> [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
> 1234608
> [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector 389296
> [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector 712432
> [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector 889304
> [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector 205776
> [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector 126480
> [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
> 1601232
> [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
> 1234360
> [ 1526.947745] Pid 683(fio) over core_pipe_limit
> [ 1526.947746] Skipping core dump
> [ 1526.947747] Pid 675(fio) over core_pipe_limit
> [ 1526.947748] Skipping core dump
> [ 1526.947863] Pid 672(fio) over core_pipe_limit
> [ 1526.947863] Skipping core dump
> [ 1526.947865] Pid 674(fio) over core_pipe_limit
> [ 1526.947866] Skipping core dump
> [ 1526.947870] Pid 676(fio) over core_pipe_limit
> [ 1526.947871] Pid 679(fio) over core_pipe_limit
> [ 1526.947872] Skipping core dump
> [ 1526.947872] Skipping core dump
> [ 1526.948197] Pid 677(fio) over core_pipe_limit
> [ 1526.948197] Skipping core dump
> [ 1526.948245] Pid 686(fio) over core_pipe_limit
> [ 1526.948245] Skipping core dump
> [ 1526.974610] Pid 680(fio) over core_pipe_limit
> [ 1526.974611] Pid 684(fio) over core_pipe_limit
> [ 1526.974611] Skipping core dump
> [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> [ 1526.980385] nvme nvme0: EH 1: after recovery
> [ 1526.980477] Pid 687(fio) over core_pipe_limit
> [ 1526.980478] Skipping core dump
> [ 1527.858207] Skipping core dump
> 
> And leaves me looping here
> 
> [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
> 120
> seconds.
> [ 1721.311263]       Tainted: G          I       4.17.0-
> rc3.ming.nvme+
> #1
> [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1721.392957] kworker/u66:0   D    0 24214      2 0x80000080
> [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> [ 1721.458568] Call Trace:
> [ 1721.472499]  ? __schedule+0x290/0x870
> [ 1721.493515]  schedule+0x32/0x80
> [ 1721.511656]  blk_mq_freeze_queue_wait+0x46/0xb0
> [ 1721.537609]  ? remove_wait_queue+0x60/0x60
> [ 1721.561081]  blk_cleanup_queue+0x7e/0x180
> [ 1721.584637]  nvme_ns_remove+0x106/0x140 [nvme_core]
> [ 1721.612589]  nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> [ 1721.643163]  nvme_remove+0x80/0x120 [nvme]
> [ 1721.666188]  pci_device_remove+0x3b/0xc0
> [ 1721.688553]  device_release_driver_internal+0x148/0x220
> [ 1721.719332]  nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> [ 1721.750474]  process_one_work+0x158/0x360
> [ 1721.772632]  worker_thread+0x47/0x3e0
> [ 1721.792471]  kthread+0xf8/0x130
> [ 1721.810354]  ? max_active_store+0x80/0x80
> [ 1721.832459]  ? kthread_bind+0x10/0x10
> [ 1721.852845]  ret_from_fork+0x35/0x40
> 
> Did I di something wrong
> 
> I never set anything else, the nvme0n1 was not mounted etc.
> 
> Thanks
> Laurence

Second attempt same issue and this time we panicked

[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
    runtime    ...  106.936s
    --- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
    +++ results/nvme0n1/block/011.out.bad	2018-05-05
19:07:21.028634858 -0400
    @@ -1,2 +1,36 @@
     Running block/011
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
    ...
    (Run 'diff -u tests/block/011.out
results/nvme0n1/block/011.out.bad' to see the entire diff)

[  387.483279] run blktests block/011 at 2018-05-05 19:27:33
[  418.076690] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.117901] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.117929] nvme nvme0: EH 0: before shutdown
[  418.158827] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158830] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158833] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158836] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158838] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158841] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158844] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158847] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158849] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158852] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158855] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158858] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158861] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.158863] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  418.785063] nvme nvme0: EH 0: after shutdown
[  420.708723] device-mapper: multipath: Failing path 259:0.
[  486.106834] nvme nvme0: I/O 6 QID 0 timeout, disable controller
[  486.140306] nvme nvme0: EH 1: before shutdown
[  486.179884] nvme nvme0: EH 1: after shutdown
[  486.179961] nvme nvme0: Identify Controller failed (-4)
[  486.232868] nvme nvme0: Removing after probe failure status: -5
[  486.273935] nvme nvme0: EH 0: after recovery
[  486.274230] nvme0n1: detected capacity change from 400088457216 to 0
[  486.334575] print_req_error: I/O error, dev nvme0n1, sector 1234608
[  486.334582] print_req_error: I/O error, dev nvme0n1, sector 1755840
[  486.334598] print_req_error: I/O error, dev nvme0n1, sector 569328
[  486.334600] print_req_error: I/O error, dev nvme0n1, sector 183296
[  486.334614] print_req_error: I/O error, dev nvme0n1, sector 174576
[  486.334616] print_req_error: I/O error, dev nvme0n1, sector 1234360
[  486.334621] print_req_error: I/O error, dev nvme0n1, sector 786336
[  486.334622] print_req_error: I/O error, dev nvme0n1, sector 205776
[  486.334624] print_req_error: I/O error, dev nvme0n1, sector 534320
[  486.334628] print_req_error: I/O error, dev nvme0n1, sector 712432
[  486.334856] Pid 7792(fio) over core_pipe_limit
[  486.334857] Pid 7799(fio) over core_pipe_limit
[  486.334857] Skipping core dump
[  486.334857] Skipping core dump
[  486.334918] Pid 7784(fio) over core_pipe_limit
[  486.334919] Pid 7797(fio) over core_pipe_limit
[  486.334920] Pid 7798(fio) over core_pipe_limit
[  486.334921] Pid 7791(fio) over core_pipe_limit
[  486.334922] Skipping core dump
[  486.334922] Skipping core dump
[  486.334922] Skipping core dump
[  486.334923] Skipping core dump
[  486.335060] Pid 7789(fio) over core_pipe_limit
[  486.335061] Skipping core dump
[  486.335290] Pid 7785(fio) over core_pipe_limit
[  486.335291] Skipping core dump
[  486.335292] Pid 7796(fio) over core_pipe_limit
[  486.335293] Skipping core dump
[  486.335316] Pid 7786(fio) over core_pipe_limit
[  486.335317] Skipping core dump
[  487.110906] nvme nvme0: failed to mark controller CONNECTING
[  487.141743] nvme nvme0: Removing after probe failure status: -19
[  487.176341] nvme nvme0: EH 1: after recovery
[  487.232034] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[  487.276604] PGD 0 P4D 0 
[  487.290548] Oops: 0000 [#1] SMP PTI
[  487.310135] Modules linked in: macsec tcp_diag udp_diag inet_diag
unix_diag af_packet_diag netlink_diag binfmt_misc ebtable_filter
ebtables ip6table_filter ip6_tables devlink xt_physdev br_netfilter
bridge stp llc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4
nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel pcbc aesni_intel dm_round_robin
crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support cryptd ipmi_si
glue_helper pcspkr joydev ipmi_devintf hpilo acpi_power_meter sg
i7core_edac lpc_ich hpwdt dm_service_time ipmi_msghandler shpchp
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath
ip_tables xfs libcrc32c radeon i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt
[  487.719632]  fb_sys_fops ttm sd_mod qla2xxx drm nvme_fc nvme_fabrics
i2c_core crc32c_intel nvme serio_raw hpsa bnx2 nvme_core
scsi_transport_fc scsi_transport_sas dm_mirror dm_region_hash dm_log
dm_mod
[  487.817595] CPU: 4 PID: 763 Comm: kworker/u66:8 Kdump: loaded
Tainted: G          I       4.17.0-rc3.ming.nvme+ #1
[  487.876571] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[  487.913158] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
[  487.946586] RIP: 0010:sbitmap_any_bit_set+0xb/0x30
[  487.973172] RSP: 0018:ffffb19e47fdfe00 EFLAGS: 00010202
[  488.003255] RAX: ffff8f0457931408 RBX: ffff8f0457931400 RCX:
0000000000000004
[  488.044199] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff8f04579314d0
[  488.085253] RBP: ffff8f04570b8000 R08: 00000000000271a0 R09:
ffffffffacda1b44
[  488.126295] R10: ffffd6ee3f4ed000 R11: 0000000000000000 R12:
0000000000000001
[  488.166746] R13: 0000000000000001 R14: 0000000000000000 R15:
ffff8f0457821138
[  488.207076] FS:  0000000000000000(0000) GS:ffff8f145b280000(0000)
knlGS:0000000000000000
[  488.252727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  488.284395] CR2: 0000000000000000 CR3: 0000001e4aa0a006 CR4:
00000000000206e0
[  488.324505] Call Trace:
[  488.337945]  blk_mq_run_hw_queue+0xad/0xf0
[  488.361057]  blk_mq_run_hw_queues+0x4b/0x60
[  488.384507]  nvme_kill_queues+0x26/0x80 [nvme_core]
[  488.411528]  nvme_remove_dead_ctrl_work+0x17/0x40 [nvme]
[  488.441602]  process_one_work+0x158/0x360
[  488.464568]  worker_thread+0x1fa/0x3e0
[  488.486044]  kthread+0xf8/0x130
[  488.504022]  ? max_active_store+0x80/0x80
[  488.527034]  ? kthread_bind+0x10/0x10
[  488.548026]  ret_from_fork+0x35/0x40
[  488.569062] Code: c6 44 0f 46 ce 83 c2 01 45 89 ca 4c 89 54 01 08 48
8b 4f 10 2b 74 01 08 39 57 08 77 d8 f3 c3 90 8b 4f 08 85 c9 74 1f 48 8b
57 10 <48> 83 3a 00 75 18 31 c0 eb 0a 48 83 c2 40 48 83 3a 00 75 0a 83 
[  488.676148] RIP: sbitmap_any_bit_set+0xb/0x30 RSP: ffffb19e47fdfe00
[  488.711006] CR2: 0000000000000000

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-05 23:31     ` Laurence Oberman
  0 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-05 23:31 UTC (permalink / raw)


On Sat, 2018-05-05@19:11 -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> > Hi,
> > 
> > The 1st patch introduces blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > for NVMe, meantime fixes blk_sync_queue().
> > 
> > The 2nd patch covers timeout for admin commands for recovering
> > controller
> > for avoiding possible deadlock.
> > 
> > The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> > frozen.
> > 
> > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > and
> > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > timeout
> > mecanism become much more rebost than before.
> > 
> > gitweb:
> > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > 
> > V4:
> > 	- fixe nvme_init_set_host_mem_cmd()
> > 	- use nested EH model, and run both nvme_dev_disable() and
> > 	resetting in one same context
> > 
> > V3:
> > 	- fix one new race related freezing in patch 4,
> > nvme_reset_work()
> > 	may hang forever without this patch
> > 	- rewrite the last 3 patches, and avoid to break
> > nvme_reset_ctrl*()
> > 
> > V2:
> > 	- fix draining timeout work, so no need to change return value
> > from
> > 	.timeout()
> > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > 	- cover timeout for admin commands running in EH
> > 
> > Ming Lei (7):
> > ? block: introduce blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > ? nvme: pci: cover timeout for admin commands running in EH
> > ? nvme: pci: only wait freezing if queue is frozen
> > ? nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > ????recovery
> > ? nvme: core: introduce 'reset_lock' for sync reset state and reset
> > ????activities
> > ? nvme: pci: prepare for supporting error recovery from resetting
> > ????context
> > ? nvme: pci: support nested EH
> > 
> > ?block/blk-core.c?????????|??21 +++-
> > ?block/blk-mq.c???????????|???9 ++
> > ?block/blk-timeout.c??????|???5 +-
> > ?drivers/nvme/host/core.c |??46 ++++++-
> > ?drivers/nvme/host/nvme.h |???5 +
> > ?drivers/nvme/host/pci.c??| 304
> > ++++++++++++++++++++++++++++++++++++++++-------
> > ?include/linux/blkdev.h???|??13 ++
> > ?7 files changed, 356 insertions(+), 47 deletions(-)
> > 
> > Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
> > Cc: Christoph Hellwig <hch at lst.de>
> > Cc: Sagi Grimberg <sagi at grimberg.me>
> > Cc: linux-nvme at lists.infradead.org
> > Cc: Laurence Oberman <loberman at redhat.com>
> 
> Hello Ming
> 
> I have a two node NUMA system here running your kernel tree
> 4.17.0-rc3.ming.nvme+
> 
> [root at segstorage1 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 3 5 6 8 11 13 14
> node 0 size: 63922 MB
> node 0 free: 61310 MB
> node 1 cpus: 1 2 4 7 9 10 12 15
> node 1 size: 64422 MB
> node 1 free: 62372 MB
> node distances:
> node???0???1?
> ? 0:??10??20?
> ? 1:??20??10?
> 
> I ran block/011
> 
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
> ????runtime????...??106.936s
> ????--- tests/block/011.out	2018-05-05 18:01:14.268414752
> -0400
> ????+++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:07:21.028634858 -0400
> ????@@ -1,2 +1,36 @@
> ?????Running block/011
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????...
> ????(Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718239] nvme nvme0: EH 0: before shutdown
> [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760926] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.330251] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.391713] nvme nvme0: EH 0: after shutdown
> [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> [ 1526.754335] nvme nvme0: EH 1: before shutdown
> [ 1526.793257] nvme nvme0: EH 1: after shutdown
> [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> [ 1526.888206] nvme nvme0: EH 0: after recovery
> [ 1526.888212] nvme0n1: detected capacity change from 400088457216 to
> 0
> [ 1526.947520] print_req_error: 1 callbacks suppressed
> [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector 794920
> [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector 569328
> [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
> 1234608
> [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector 389296
> [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector 712432
> [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector 889304
> [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector 205776
> [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector 126480
> [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
> 1601232
> [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
> 1234360
> [ 1526.947745] Pid 683(fio) over core_pipe_limit
> [ 1526.947746] Skipping core dump
> [ 1526.947747] Pid 675(fio) over core_pipe_limit
> [ 1526.947748] Skipping core dump
> [ 1526.947863] Pid 672(fio) over core_pipe_limit
> [ 1526.947863] Skipping core dump
> [ 1526.947865] Pid 674(fio) over core_pipe_limit
> [ 1526.947866] Skipping core dump
> [ 1526.947870] Pid 676(fio) over core_pipe_limit
> [ 1526.947871] Pid 679(fio) over core_pipe_limit
> [ 1526.947872] Skipping core dump
> [ 1526.947872] Skipping core dump
> [ 1526.948197] Pid 677(fio) over core_pipe_limit
> [ 1526.948197] Skipping core dump
> [ 1526.948245] Pid 686(fio) over core_pipe_limit
> [ 1526.948245] Skipping core dump
> [ 1526.974610] Pid 680(fio) over core_pipe_limit
> [ 1526.974611] Pid 684(fio) over core_pipe_limit
> [ 1526.974611] Skipping core dump
> [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> [ 1526.980385] nvme nvme0: EH 1: after recovery
> [ 1526.980477] Pid 687(fio) over core_pipe_limit
> [ 1526.980478] Skipping core dump
> [ 1527.858207] Skipping core dump
> 
> And leaves me looping here
> 
> [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
> 120
> seconds.
> [ 1721.311263]???????Tainted: G??????????I???????4.17.0-
> rc3.ming.nvme+
> #1
> [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1721.392957] kworker/u66:0???D????0 24214??????2 0x80000080
> [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> [ 1721.458568] Call Trace:
> [ 1721.472499]??? __schedule+0x290/0x870
> [ 1721.493515]??schedule+0x32/0x80
> [ 1721.511656]??blk_mq_freeze_queue_wait+0x46/0xb0
> [ 1721.537609]??? remove_wait_queue+0x60/0x60
> [ 1721.561081]??blk_cleanup_queue+0x7e/0x180
> [ 1721.584637]??nvme_ns_remove+0x106/0x140 [nvme_core]
> [ 1721.612589]??nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> [ 1721.643163]??nvme_remove+0x80/0x120 [nvme]
> [ 1721.666188]??pci_device_remove+0x3b/0xc0
> [ 1721.688553]??device_release_driver_internal+0x148/0x220
> [ 1721.719332]??nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> [ 1721.750474]??process_one_work+0x158/0x360
> [ 1721.772632]??worker_thread+0x47/0x3e0
> [ 1721.792471]??kthread+0xf8/0x130
> [ 1721.810354]??? max_active_store+0x80/0x80
> [ 1721.832459]??? kthread_bind+0x10/0x10
> [ 1721.852845]??ret_from_fork+0x35/0x40
> 
> Did I di something wrong
> 
> I never set anything else, the nvme0n1 was not mounted etc.
> 
> Thanks
> Laurence

Second attempt same issue and this time we panicked

[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
????runtime????...??106.936s
????--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
????+++ results/nvme0n1/block/011.out.bad	2018-05-05
19:07:21.028634858 -0400
????@@ -1,2 +1,36 @@
?????Running block/011
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
IO_U_F_FLIGHT) == 0' failed.
????...
????(Run 'diff -u tests/block/011.out
results/nvme0n1/block/011.out.bad' to see the entire diff)

[??387.483279] run blktests block/011 at 2018-05-05 19:27:33
[??418.076690] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.117901] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.117929] nvme nvme0: EH 0: before shutdown
[??418.158827] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158830] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158833] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158836] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158838] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158841] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158844] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158847] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158849] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158852] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158855] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158858] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158861] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.158863] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??418.785063] nvme nvme0: EH 0: after shutdown
[??420.708723] device-mapper: multipath: Failing path 259:0.
[??486.106834] nvme nvme0: I/O 6 QID 0 timeout, disable controller
[??486.140306] nvme nvme0: EH 1: before shutdown
[??486.179884] nvme nvme0: EH 1: after shutdown
[??486.179961] nvme nvme0: Identify Controller failed (-4)
[??486.232868] nvme nvme0: Removing after probe failure status: -5
[??486.273935] nvme nvme0: EH 0: after recovery
[??486.274230] nvme0n1: detected capacity change from 400088457216 to 0
[??486.334575] print_req_error: I/O error, dev nvme0n1, sector 1234608
[??486.334582] print_req_error: I/O error, dev nvme0n1, sector 1755840
[??486.334598] print_req_error: I/O error, dev nvme0n1, sector 569328
[??486.334600] print_req_error: I/O error, dev nvme0n1, sector 183296
[??486.334614] print_req_error: I/O error, dev nvme0n1, sector 174576
[??486.334616] print_req_error: I/O error, dev nvme0n1, sector 1234360
[??486.334621] print_req_error: I/O error, dev nvme0n1, sector 786336
[??486.334622] print_req_error: I/O error, dev nvme0n1, sector 205776
[??486.334624] print_req_error: I/O error, dev nvme0n1, sector 534320
[??486.334628] print_req_error: I/O error, dev nvme0n1, sector 712432
[??486.334856] Pid 7792(fio) over core_pipe_limit
[??486.334857] Pid 7799(fio) over core_pipe_limit
[??486.334857] Skipping core dump
[??486.334857] Skipping core dump
[??486.334918] Pid 7784(fio) over core_pipe_limit
[??486.334919] Pid 7797(fio) over core_pipe_limit
[??486.334920] Pid 7798(fio) over core_pipe_limit
[??486.334921] Pid 7791(fio) over core_pipe_limit
[??486.334922] Skipping core dump
[??486.334922] Skipping core dump
[??486.334922] Skipping core dump
[??486.334923] Skipping core dump
[??486.335060] Pid 7789(fio) over core_pipe_limit
[??486.335061] Skipping core dump
[??486.335290] Pid 7785(fio) over core_pipe_limit
[??486.335291] Skipping core dump
[??486.335292] Pid 7796(fio) over core_pipe_limit
[??486.335293] Skipping core dump
[??486.335316] Pid 7786(fio) over core_pipe_limit
[??486.335317] Skipping core dump
[??487.110906] nvme nvme0: failed to mark controller CONNECTING
[??487.141743] nvme nvme0: Removing after probe failure status: -19
[??487.176341] nvme nvme0: EH 1: after recovery
[??487.232034] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[??487.276604] PGD 0 P4D 0?
[??487.290548] Oops: 0000 [#1] SMP PTI
[??487.310135] Modules linked in: macsec tcp_diag udp_diag inet_diag
unix_diag af_packet_diag netlink_diag binfmt_misc ebtable_filter
ebtables ip6table_filter ip6_tables devlink xt_physdev br_netfilter
bridge stp llc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4
nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel pcbc aesni_intel dm_round_robin
crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support cryptd ipmi_si
glue_helper pcspkr joydev ipmi_devintf hpilo acpi_power_meter sg
i7core_edac lpc_ich hpwdt dm_service_time ipmi_msghandler shpchp
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath
ip_tables xfs libcrc32c radeon i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt
[??487.719632]??fb_sys_fops ttm sd_mod qla2xxx drm nvme_fc nvme_fabrics
i2c_core crc32c_intel nvme serio_raw hpsa bnx2 nvme_core
scsi_transport_fc scsi_transport_sas dm_mirror dm_region_hash dm_log
dm_mod
[??487.817595] CPU: 4 PID: 763 Comm: kworker/u66:8 Kdump: loaded
Tainted: G??????????I???????4.17.0-rc3.ming.nvme+ #1
[??487.876571] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[??487.913158] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
[??487.946586] RIP: 0010:sbitmap_any_bit_set+0xb/0x30
[??487.973172] RSP: 0018:ffffb19e47fdfe00 EFLAGS: 00010202
[??488.003255] RAX: ffff8f0457931408 RBX: ffff8f0457931400 RCX:
0000000000000004
[??488.044199] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff8f04579314d0
[??488.085253] RBP: ffff8f04570b8000 R08: 00000000000271a0 R09:
ffffffffacda1b44
[??488.126295] R10: ffffd6ee3f4ed000 R11: 0000000000000000 R12:
0000000000000001
[??488.166746] R13: 0000000000000001 R14: 0000000000000000 R15:
ffff8f0457821138
[??488.207076] FS:??0000000000000000(0000) GS:ffff8f145b280000(0000)
knlGS:0000000000000000
[??488.252727] CS:??0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[??488.284395] CR2: 0000000000000000 CR3: 0000001e4aa0a006 CR4:
00000000000206e0
[??488.324505] Call Trace:
[??488.337945]??blk_mq_run_hw_queue+0xad/0xf0
[??488.361057]??blk_mq_run_hw_queues+0x4b/0x60
[??488.384507]??nvme_kill_queues+0x26/0x80 [nvme_core]
[??488.411528]??nvme_remove_dead_ctrl_work+0x17/0x40 [nvme]
[??488.441602]??process_one_work+0x158/0x360
[??488.464568]??worker_thread+0x1fa/0x3e0
[??488.486044]??kthread+0xf8/0x130
[??488.504022]??? max_active_store+0x80/0x80
[??488.527034]??? kthread_bind+0x10/0x10
[??488.548026]??ret_from_fork+0x35/0x40
[??488.569062] Code: c6 44 0f 46 ce 83 c2 01 45 89 ca 4c 89 54 01 08 48
8b 4f 10 2b 74 01 08 39 57 08 77 d8 f3 c3 90 8b 4f 08 85 c9 74 1f 48 8b
57 10 <48> 83 3a 00 75 18 31 c0 eb 0a 48 83 c2 40 48 83 3a 00 75 0a 83?
[??488.676148] RIP: sbitmap_any_bit_set+0xb/0x30 RSP: ffffb19e47fdfe00
[??488.711006] CR2: 0000000000000000

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-05 23:31     ` Laurence Oberman
@ 2018-05-05 23:51       ` Laurence Oberman
  -1 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-05 23:51 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Jens Axboe, linux-block, Jianchao Wang, Christoph Hellwig,
	Sagi Grimberg, linux-nvme

On Sat, 2018-05-05 at 19:31 -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05 at 19:11 -0400, Laurence Oberman wrote:
> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > The 1st patch introduces blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > for NVMe, meantime fixes blk_sync_queue().
> > > 
> > > The 2nd patch covers timeout for admin commands for recovering
> > > controller
> > > for avoiding possible deadlock.
> > > 
> > > The 3rd and 4th patches avoid to wait_freeze on queues which
> > > aren't
> > > frozen.
> > > 
> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > > and
> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > > timeout
> > > mecanism become much more rebost than before.
> > > 
> > > gitweb:
> > > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > > 
> > > V4:
> > > 	- fixe nvme_init_set_host_mem_cmd()
> > > 	- use nested EH model, and run both nvme_dev_disable() and
> > > 	resetting in one same context
> > > 
> > > V3:
> > > 	- fix one new race related freezing in patch 4,
> > > nvme_reset_work()
> > > 	may hang forever without this patch
> > > 	- rewrite the last 3 patches, and avoid to break
> > > nvme_reset_ctrl*()
> > > 
> > > V2:
> > > 	- fix draining timeout work, so no need to change return value
> > > from
> > > 	.timeout()
> > > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > > 	- cover timeout for admin commands running in EH
> > > 
> > > Ming Lei (7):
> > >   block: introduce blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > >   nvme: pci: cover timeout for admin commands running in EH
> > >   nvme: pci: only wait freezing if queue is frozen
> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > >     recovery
> > >   nvme: core: introduce 'reset_lock' for sync reset state and
> > > reset
> > >     activities
> > >   nvme: pci: prepare for supporting error recovery from resetting
> > >     context
> > >   nvme: pci: support nested EH
> > > 
> > >  block/blk-core.c         |  21 +++-
> > >  block/blk-mq.c           |   9 ++
> > >  block/blk-timeout.c      |   5 +-
> > >  drivers/nvme/host/core.c |  46 ++++++-
> > >  drivers/nvme/host/nvme.h |   5 +
> > >  drivers/nvme/host/pci.c  | 304
> > > ++++++++++++++++++++++++++++++++++++++++-------
> > >  include/linux/blkdev.h   |  13 ++
> > >  7 files changed, 356 insertions(+), 47 deletions(-)
> > > 
> > > Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Sagi Grimberg <sagi@grimberg.me>
> > > Cc: linux-nvme@lists.infradead.org
> > > Cc: Laurence Oberman <loberman@redhat.com>
> > 
> > Hello Ming
> > 
> > I have a two node NUMA system here running your kernel tree
> > 4.17.0-rc3.ming.nvme+
> > 
> > [root@segstorage1 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 3 5 6 8 11 13 14
> > node 0 size: 63922 MB
> > node 0 free: 61310 MB
> > node 1 cpus: 1 2 4 7 9 10 12 15
> > node 1 size: 64422 MB
> > node 1 free: 62372 MB
> > node distances:
> > node   0   1 
> >   0:  10  20 
> >   1:  20  10 
> > 
> > I ran block/011
> > 
> > [root@segstorage1 blktests]# ./check block/011
> > block/011 => nvme0n1 (disable PCI device while doing
> > I/O)    [failed]
> >     runtime    ...  106.936s
> >     --- tests/block/011.out	2018-05-05 18:01:14.268414752
> > -0400
> >     +++ results/nvme0n1/block/011.out.bad	2018-05-05
> > 19:07:21.028634858 -0400
> >     @@ -1,2 +1,36 @@
> >      Running block/011
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     ...
> >     (Run 'diff -u tests/block/011.out
> > results/nvme0n1/block/011.out.bad' to see the entire diff)
> > 
> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> > [ 1452.676351] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718221] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
> > [ 1452.760890] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760894] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760897] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760900] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760903] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760906] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760909] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760912] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760915] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760918] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760921] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760923] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760926] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.330251] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.391713] nvme nvme0: EH 0: after shutdown
> > [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> > [ 1526.754335] nvme nvme0: EH 1: before shutdown
> > [ 1526.793257] nvme nvme0: EH 1: after shutdown
> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> > [ 1526.888206] nvme nvme0: EH 0: after recovery
> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216
> > to
> > 0
> > [ 1526.947520] print_req_error: 1 callbacks suppressed
> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector
> > 794920
> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector
> > 569328
> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
> > 1234608
> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector
> > 389296
> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector
> > 712432
> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector
> > 889304
> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector
> > 205776
> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector
> > 126480
> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
> > 1601232
> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
> > 1234360
> > [ 1526.947745] Pid 683(fio) over core_pipe_limit
> > [ 1526.947746] Skipping core dump
> > [ 1526.947747] Pid 675(fio) over core_pipe_limit
> > [ 1526.947748] Skipping core dump
> > [ 1526.947863] Pid 672(fio) over core_pipe_limit
> > [ 1526.947863] Skipping core dump
> > [ 1526.947865] Pid 674(fio) over core_pipe_limit
> > [ 1526.947866] Skipping core dump
> > [ 1526.947870] Pid 676(fio) over core_pipe_limit
> > [ 1526.947871] Pid 679(fio) over core_pipe_limit
> > [ 1526.947872] Skipping core dump
> > [ 1526.947872] Skipping core dump
> > [ 1526.948197] Pid 677(fio) over core_pipe_limit
> > [ 1526.948197] Skipping core dump
> > [ 1526.948245] Pid 686(fio) over core_pipe_limit
> > [ 1526.948245] Skipping core dump
> > [ 1526.974610] Pid 680(fio) over core_pipe_limit
> > [ 1526.974611] Pid 684(fio) over core_pipe_limit
> > [ 1526.974611] Skipping core dump
> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> > [ 1526.980385] nvme nvme0: EH 1: after recovery
> > [ 1526.980477] Pid 687(fio) over core_pipe_limit
> > [ 1526.980478] Skipping core dump
> > [ 1527.858207] Skipping core dump
> > 
> > And leaves me looping here
> > 
> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
> > 120
> > seconds.
> > [ 1721.311263]       Tainted: G          I       4.17.0-
> > rc3.ming.nvme+
> > #1
> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 1721.392957] kworker/u66:0   D    0 24214      2 0x80000080
> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> > [ 1721.458568] Call Trace:
> > [ 1721.472499]  ? __schedule+0x290/0x870
> > [ 1721.493515]  schedule+0x32/0x80
> > [ 1721.511656]  blk_mq_freeze_queue_wait+0x46/0xb0
> > [ 1721.537609]  ? remove_wait_queue+0x60/0x60
> > [ 1721.561081]  blk_cleanup_queue+0x7e/0x180
> > [ 1721.584637]  nvme_ns_remove+0x106/0x140 [nvme_core]
> > [ 1721.612589]  nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> > [ 1721.643163]  nvme_remove+0x80/0x120 [nvme]
> > [ 1721.666188]  pci_device_remove+0x3b/0xc0
> > [ 1721.688553]  device_release_driver_internal+0x148/0x220
> > [ 1721.719332]  nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> > [ 1721.750474]  process_one_work+0x158/0x360
> > [ 1721.772632]  worker_thread+0x47/0x3e0
> > [ 1721.792471]  kthread+0xf8/0x130
> > [ 1721.810354]  ? max_active_store+0x80/0x80
> > [ 1721.832459]  ? kthread_bind+0x10/0x10
> > [ 1721.852845]  ret_from_fork+0x35/0x40
> > 
> > Did I di something wrong
> > 
> > I never set anything else, the nvme0n1 was not mounted etc.
> > 
> > Thanks
> > Laurence
> 
> Second attempt same issue and this time we panicked
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
>     runtime    ...  106.936s
>     --- tests/block/011.out	2018-05-05 18:01:14.268414752
> -0400
>     +++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:07:21.028634858 -0400
>     @@ -1,2 +1,36 @@
>      Running block/011
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
>     ...
>     (Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [  387.483279] run blktests block/011 at 2018-05-05 19:27:33
> [  418.076690] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.117901] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.117929] nvme nvme0: EH 0: before shutdown
> [  418.158827] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158830] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158833] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158836] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158838] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158841] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158844] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158847] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158849] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158852] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158855] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158858] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158861] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.158863] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  418.785063] nvme nvme0: EH 0: after shutdown
> [  420.708723] device-mapper: multipath: Failing path 259:0.
> [  486.106834] nvme nvme0: I/O 6 QID 0 timeout, disable controller
> [  486.140306] nvme nvme0: EH 1: before shutdown
> [  486.179884] nvme nvme0: EH 1: after shutdown
> [  486.179961] nvme nvme0: Identify Controller failed (-4)
> [  486.232868] nvme nvme0: Removing after probe failure status: -5
> [  486.273935] nvme nvme0: EH 0: after recovery
> [  486.274230] nvme0n1: detected capacity change from 400088457216 to
> 0
> [  486.334575] print_req_error: I/O error, dev nvme0n1, sector
> 1234608
> [  486.334582] print_req_error: I/O error, dev nvme0n1, sector
> 1755840
> [  486.334598] print_req_error: I/O error, dev nvme0n1, sector 569328
> [  486.334600] print_req_error: I/O error, dev nvme0n1, sector 183296
> [  486.334614] print_req_error: I/O error, dev nvme0n1, sector 174576
> [  486.334616] print_req_error: I/O error, dev nvme0n1, sector
> 1234360
> [  486.334621] print_req_error: I/O error, dev nvme0n1, sector 786336
> [  486.334622] print_req_error: I/O error, dev nvme0n1, sector 205776
> [  486.334624] print_req_error: I/O error, dev nvme0n1, sector 534320
> [  486.334628] print_req_error: I/O error, dev nvme0n1, sector 712432
> [  486.334856] Pid 7792(fio) over core_pipe_limit
> [  486.334857] Pid 7799(fio) over core_pipe_limit
> [  486.334857] Skipping core dump
> [  486.334857] Skipping core dump
> [  486.334918] Pid 7784(fio) over core_pipe_limit
> [  486.334919] Pid 7797(fio) over core_pipe_limit
> [  486.334920] Pid 7798(fio) over core_pipe_limit
> [  486.334921] Pid 7791(fio) over core_pipe_limit
> [  486.334922] Skipping core dump
> [  486.334922] Skipping core dump
> [  486.334922] Skipping core dump
> [  486.334923] Skipping core dump
> [  486.335060] Pid 7789(fio) over core_pipe_limit
> [  486.335061] Skipping core dump
> [  486.335290] Pid 7785(fio) over core_pipe_limit
> [  486.335291] Skipping core dump
> [  486.335292] Pid 7796(fio) over core_pipe_limit
> [  486.335293] Skipping core dump
> [  486.335316] Pid 7786(fio) over core_pipe_limit
> [  486.335317] Skipping core dump
> [  487.110906] nvme nvme0: failed to mark controller CONNECTING
> [  487.141743] nvme nvme0: Removing after probe failure status: -19
> [  487.176341] nvme nvme0: EH 1: after recovery
> [  487.232034] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000000
> [  487.276604] PGD 0 P4D 0 
> [  487.290548] Oops: 0000 [#1] SMP PTI
> [  487.310135] Modules linked in: macsec tcp_diag udp_diag inet_diag
> unix_diag af_packet_diag netlink_diag binfmt_misc ebtable_filter
> ebtables ip6table_filter ip6_tables devlink xt_physdev br_netfilter
> bridge stp llc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4
> nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel pcbc aesni_intel dm_round_robin
> crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support cryptd ipmi_si
> glue_helper pcspkr joydev ipmi_devintf hpilo acpi_power_meter sg
> i7core_edac lpc_ich hpwdt dm_service_time ipmi_msghandler shpchp
> pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath
> ip_tables xfs libcrc32c radeon i2c_algo_bit drm_kms_helper
> syscopyarea
> sysfillrect sysimgblt
> [  487.719632]  fb_sys_fops ttm sd_mod qla2xxx drm nvme_fc
> nvme_fabrics
> i2c_core crc32c_intel nvme serio_raw hpsa bnx2 nvme_core
> scsi_transport_fc scsi_transport_sas dm_mirror dm_region_hash dm_log
> dm_mod
> [  487.817595] CPU: 4 PID: 763 Comm: kworker/u66:8 Kdump: loaded
> Tainted: G          I       4.17.0-rc3.ming.nvme+ #1
> [  487.876571] Hardware name: HP ProLiant DL380 G7, BIOS P67
> 08/16/2015
> [  487.913158] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> [  487.946586] RIP: 0010:sbitmap_any_bit_set+0xb/0x30
> [  487.973172] RSP: 0018:ffffb19e47fdfe00 EFLAGS: 00010202
> [  488.003255] RAX: ffff8f0457931408 RBX: ffff8f0457931400 RCX:
> 0000000000000004
> [  488.044199] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff8f04579314d0
> [  488.085253] RBP: ffff8f04570b8000 R08: 00000000000271a0 R09:
> ffffffffacda1b44
> [  488.126295] R10: ffffd6ee3f4ed000 R11: 0000000000000000 R12:
> 0000000000000001
> [  488.166746] R13: 0000000000000001 R14: 0000000000000000 R15:
> ffff8f0457821138
> [  488.207076] FS:  0000000000000000(0000) GS:ffff8f145b280000(0000)
> knlGS:0000000000000000
> [  488.252727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  488.284395] CR2: 0000000000000000 CR3: 0000001e4aa0a006 CR4:
> 00000000000206e0
> [  488.324505] Call Trace:
> [  488.337945]  blk_mq_run_hw_queue+0xad/0xf0
> [  488.361057]  blk_mq_run_hw_queues+0x4b/0x60
> [  488.384507]  nvme_kill_queues+0x26/0x80 [nvme_core]
> [  488.411528]  nvme_remove_dead_ctrl_work+0x17/0x40 [nvme]
> [  488.441602]  process_one_work+0x158/0x360
> [  488.464568]  worker_thread+0x1fa/0x3e0
> [  488.486044]  kthread+0xf8/0x130
> [  488.504022]  ? max_active_store+0x80/0x80
> [  488.527034]  ? kthread_bind+0x10/0x10
> [  488.548026]  ret_from_fork+0x35/0x40
> [  488.569062] Code: c6 44 0f 46 ce 83 c2 01 45 89 ca 4c 89 54 01 08
> 48
> 8b 4f 10 2b 74 01 08 39 57 08 77 d8 f3 c3 90 8b 4f 08 85 c9 74 1f 48
> 8b
> 57 10 <48> 83 3a 00 75 18 31 c0 eb 0a 48 83 c2 40 48 83 3a 00 75 0a
> 83 
> [  488.676148] RIP: sbitmap_any_bit_set+0xb/0x30 RSP:
> ffffb19e47fdfe00
> [  488.711006] CR2: 0000000000000000

3rd and 4th attempts slightly better, but clearly not dependable

[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
    runtime    ...  81.188s
    --- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
    +++ results/nvme0n1/block/011.out.bad	2018-05-05
19:44:48.848568687 -0400
    @@ -1,2 +1,3 @@
     Running block/011
    +tests/block/011: line 47: echo: write error: Input/output error
     Test complete

This one passed 
[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
    runtime  81.188s  ...  43.400s

I will capture a vmcore next time it panics and give some information
after analyzing the core

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-05 23:51       ` Laurence Oberman
  0 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-05 23:51 UTC (permalink / raw)


On Sat, 2018-05-05@19:31 -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05@19:11 -0400, Laurence Oberman wrote:
> > On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > The 1st patch introduces blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > for NVMe, meantime fixes blk_sync_queue().
> > > 
> > > The 2nd patch covers timeout for admin commands for recovering
> > > controller
> > > for avoiding possible deadlock.
> > > 
> > > The 3rd and 4th patches avoid to wait_freeze on queues which
> > > aren't
> > > frozen.
> > > 
> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > > and
> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > > timeout
> > > mecanism become much more rebost than before.
> > > 
> > > gitweb:
> > > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > > 
> > > V4:
> > > 	- fixe nvme_init_set_host_mem_cmd()
> > > 	- use nested EH model, and run both nvme_dev_disable() and
> > > 	resetting in one same context
> > > 
> > > V3:
> > > 	- fix one new race related freezing in patch 4,
> > > nvme_reset_work()
> > > 	may hang forever without this patch
> > > 	- rewrite the last 3 patches, and avoid to break
> > > nvme_reset_ctrl*()
> > > 
> > > V2:
> > > 	- fix draining timeout work, so no need to change return value
> > > from
> > > 	.timeout()
> > > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > > 	- cover timeout for admin commands running in EH
> > > 
> > > Ming Lei (7):
> > > ? block: introduce blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > ? nvme: pci: cover timeout for admin commands running in EH
> > > ? nvme: pci: only wait freezing if queue is frozen
> > > ? nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > > ????recovery
> > > ? nvme: core: introduce 'reset_lock' for sync reset state and
> > > reset
> > > ????activities
> > > ? nvme: pci: prepare for supporting error recovery from resetting
> > > ????context
> > > ? nvme: pci: support nested EH
> > > 
> > > ?block/blk-core.c?????????|??21 +++-
> > > ?block/blk-mq.c???????????|???9 ++
> > > ?block/blk-timeout.c??????|???5 +-
> > > ?drivers/nvme/host/core.c |??46 ++++++-
> > > ?drivers/nvme/host/nvme.h |???5 +
> > > ?drivers/nvme/host/pci.c??| 304
> > > ++++++++++++++++++++++++++++++++++++++++-------
> > > ?include/linux/blkdev.h???|??13 ++
> > > ?7 files changed, 356 insertions(+), 47 deletions(-)
> > > 
> > > Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
> > > Cc: Christoph Hellwig <hch at lst.de>
> > > Cc: Sagi Grimberg <sagi at grimberg.me>
> > > Cc: linux-nvme at lists.infradead.org
> > > Cc: Laurence Oberman <loberman at redhat.com>
> > 
> > Hello Ming
> > 
> > I have a two node NUMA system here running your kernel tree
> > 4.17.0-rc3.ming.nvme+
> > 
> > [root at segstorage1 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 3 5 6 8 11 13 14
> > node 0 size: 63922 MB
> > node 0 free: 61310 MB
> > node 1 cpus: 1 2 4 7 9 10 12 15
> > node 1 size: 64422 MB
> > node 1 free: 62372 MB
> > node distances:
> > node???0???1?
> > ? 0:??10??20?
> > ? 1:??20??10?
> > 
> > I ran block/011
> > 
> > [root at segstorage1 blktests]# ./check block/011
> > block/011 => nvme0n1 (disable PCI device while doing
> > I/O)????[failed]
> > ????runtime????...??106.936s
> > ????--- tests/block/011.out	2018-05-05 18:01:14.268414752
> > -0400
> > ????+++ results/nvme0n1/block/011.out.bad	2018-05-05
> > 19:07:21.028634858 -0400
> > ????@@ -1,2 +1,36 @@
> > ?????Running block/011
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????...
> > ????(Run 'diff -u tests/block/011.out
> > results/nvme0n1/block/011.out.bad' to see the entire diff)
> > 
> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> > [ 1452.676351] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718221] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
> > [ 1452.760890] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760894] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760897] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760900] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760903] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760906] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760909] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760912] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760915] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760918] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760921] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760923] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760926] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.330251] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.391713] nvme nvme0: EH 0: after shutdown
> > [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> > [ 1526.754335] nvme nvme0: EH 1: before shutdown
> > [ 1526.793257] nvme nvme0: EH 1: after shutdown
> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> > [ 1526.888206] nvme nvme0: EH 0: after recovery
> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216
> > to
> > 0
> > [ 1526.947520] print_req_error: 1 callbacks suppressed
> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector
> > 794920
> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector
> > 569328
> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
> > 1234608
> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector
> > 389296
> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector
> > 712432
> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector
> > 889304
> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector
> > 205776
> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector
> > 126480
> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
> > 1601232
> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
> > 1234360
> > [ 1526.947745] Pid 683(fio) over core_pipe_limit
> > [ 1526.947746] Skipping core dump
> > [ 1526.947747] Pid 675(fio) over core_pipe_limit
> > [ 1526.947748] Skipping core dump
> > [ 1526.947863] Pid 672(fio) over core_pipe_limit
> > [ 1526.947863] Skipping core dump
> > [ 1526.947865] Pid 674(fio) over core_pipe_limit
> > [ 1526.947866] Skipping core dump
> > [ 1526.947870] Pid 676(fio) over core_pipe_limit
> > [ 1526.947871] Pid 679(fio) over core_pipe_limit
> > [ 1526.947872] Skipping core dump
> > [ 1526.947872] Skipping core dump
> > [ 1526.948197] Pid 677(fio) over core_pipe_limit
> > [ 1526.948197] Skipping core dump
> > [ 1526.948245] Pid 686(fio) over core_pipe_limit
> > [ 1526.948245] Skipping core dump
> > [ 1526.974610] Pid 680(fio) over core_pipe_limit
> > [ 1526.974611] Pid 684(fio) over core_pipe_limit
> > [ 1526.974611] Skipping core dump
> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> > [ 1526.980385] nvme nvme0: EH 1: after recovery
> > [ 1526.980477] Pid 687(fio) over core_pipe_limit
> > [ 1526.980478] Skipping core dump
> > [ 1527.858207] Skipping core dump
> > 
> > And leaves me looping here
> > 
> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
> > 120
> > seconds.
> > [ 1721.311263]???????Tainted: G??????????I???????4.17.0-
> > rc3.ming.nvme+
> > #1
> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 1721.392957] kworker/u66:0???D????0 24214??????2 0x80000080
> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> > [ 1721.458568] Call Trace:
> > [ 1721.472499]??? __schedule+0x290/0x870
> > [ 1721.493515]??schedule+0x32/0x80
> > [ 1721.511656]??blk_mq_freeze_queue_wait+0x46/0xb0
> > [ 1721.537609]??? remove_wait_queue+0x60/0x60
> > [ 1721.561081]??blk_cleanup_queue+0x7e/0x180
> > [ 1721.584637]??nvme_ns_remove+0x106/0x140 [nvme_core]
> > [ 1721.612589]??nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> > [ 1721.643163]??nvme_remove+0x80/0x120 [nvme]
> > [ 1721.666188]??pci_device_remove+0x3b/0xc0
> > [ 1721.688553]??device_release_driver_internal+0x148/0x220
> > [ 1721.719332]??nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> > [ 1721.750474]??process_one_work+0x158/0x360
> > [ 1721.772632]??worker_thread+0x47/0x3e0
> > [ 1721.792471]??kthread+0xf8/0x130
> > [ 1721.810354]??? max_active_store+0x80/0x80
> > [ 1721.832459]??? kthread_bind+0x10/0x10
> > [ 1721.852845]??ret_from_fork+0x35/0x40
> > 
> > Did I di something wrong
> > 
> > I never set anything else, the nvme0n1 was not mounted etc.
> > 
> > Thanks
> > Laurence
> 
> Second attempt same issue and this time we panicked
> 
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
> ????runtime????...??106.936s
> ????--- tests/block/011.out	2018-05-05 18:01:14.268414752
> -0400
> ????+++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:07:21.028634858 -0400
> ????@@ -1,2 +1,36 @@
> ?????Running block/011
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????...
> ????(Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [??387.483279] run blktests block/011 at 2018-05-05 19:27:33
> [??418.076690] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.117901] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.117929] nvme nvme0: EH 0: before shutdown
> [??418.158827] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158830] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158833] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158836] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158838] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158841] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158844] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158847] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158849] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158852] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158855] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158858] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158861] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.158863] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [??418.785063] nvme nvme0: EH 0: after shutdown
> [??420.708723] device-mapper: multipath: Failing path 259:0.
> [??486.106834] nvme nvme0: I/O 6 QID 0 timeout, disable controller
> [??486.140306] nvme nvme0: EH 1: before shutdown
> [??486.179884] nvme nvme0: EH 1: after shutdown
> [??486.179961] nvme nvme0: Identify Controller failed (-4)
> [??486.232868] nvme nvme0: Removing after probe failure status: -5
> [??486.273935] nvme nvme0: EH 0: after recovery
> [??486.274230] nvme0n1: detected capacity change from 400088457216 to
> 0
> [??486.334575] print_req_error: I/O error, dev nvme0n1, sector
> 1234608
> [??486.334582] print_req_error: I/O error, dev nvme0n1, sector
> 1755840
> [??486.334598] print_req_error: I/O error, dev nvme0n1, sector 569328
> [??486.334600] print_req_error: I/O error, dev nvme0n1, sector 183296
> [??486.334614] print_req_error: I/O error, dev nvme0n1, sector 174576
> [??486.334616] print_req_error: I/O error, dev nvme0n1, sector
> 1234360
> [??486.334621] print_req_error: I/O error, dev nvme0n1, sector 786336
> [??486.334622] print_req_error: I/O error, dev nvme0n1, sector 205776
> [??486.334624] print_req_error: I/O error, dev nvme0n1, sector 534320
> [??486.334628] print_req_error: I/O error, dev nvme0n1, sector 712432
> [??486.334856] Pid 7792(fio) over core_pipe_limit
> [??486.334857] Pid 7799(fio) over core_pipe_limit
> [??486.334857] Skipping core dump
> [??486.334857] Skipping core dump
> [??486.334918] Pid 7784(fio) over core_pipe_limit
> [??486.334919] Pid 7797(fio) over core_pipe_limit
> [??486.334920] Pid 7798(fio) over core_pipe_limit
> [??486.334921] Pid 7791(fio) over core_pipe_limit
> [??486.334922] Skipping core dump
> [??486.334922] Skipping core dump
> [??486.334922] Skipping core dump
> [??486.334923] Skipping core dump
> [??486.335060] Pid 7789(fio) over core_pipe_limit
> [??486.335061] Skipping core dump
> [??486.335290] Pid 7785(fio) over core_pipe_limit
> [??486.335291] Skipping core dump
> [??486.335292] Pid 7796(fio) over core_pipe_limit
> [??486.335293] Skipping core dump
> [??486.335316] Pid 7786(fio) over core_pipe_limit
> [??486.335317] Skipping core dump
> [??487.110906] nvme nvme0: failed to mark controller CONNECTING
> [??487.141743] nvme nvme0: Removing after probe failure status: -19
> [??487.176341] nvme nvme0: EH 1: after recovery
> [??487.232034] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000000
> [??487.276604] PGD 0 P4D 0?
> [??487.290548] Oops: 0000 [#1] SMP PTI
> [??487.310135] Modules linked in: macsec tcp_diag udp_diag inet_diag
> unix_diag af_packet_diag netlink_diag binfmt_misc ebtable_filter
> ebtables ip6table_filter ip6_tables devlink xt_physdev br_netfilter
> bridge stp llc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4
> nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel pcbc aesni_intel dm_round_robin
> crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support cryptd ipmi_si
> glue_helper pcspkr joydev ipmi_devintf hpilo acpi_power_meter sg
> i7core_edac lpc_ich hpwdt dm_service_time ipmi_msghandler shpchp
> pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath
> ip_tables xfs libcrc32c radeon i2c_algo_bit drm_kms_helper
> syscopyarea
> sysfillrect sysimgblt
> [??487.719632]??fb_sys_fops ttm sd_mod qla2xxx drm nvme_fc
> nvme_fabrics
> i2c_core crc32c_intel nvme serio_raw hpsa bnx2 nvme_core
> scsi_transport_fc scsi_transport_sas dm_mirror dm_region_hash dm_log
> dm_mod
> [??487.817595] CPU: 4 PID: 763 Comm: kworker/u66:8 Kdump: loaded
> Tainted: G??????????I???????4.17.0-rc3.ming.nvme+ #1
> [??487.876571] Hardware name: HP ProLiant DL380 G7, BIOS P67
> 08/16/2015
> [??487.913158] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> [??487.946586] RIP: 0010:sbitmap_any_bit_set+0xb/0x30
> [??487.973172] RSP: 0018:ffffb19e47fdfe00 EFLAGS: 00010202
> [??488.003255] RAX: ffff8f0457931408 RBX: ffff8f0457931400 RCX:
> 0000000000000004
> [??488.044199] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff8f04579314d0
> [??488.085253] RBP: ffff8f04570b8000 R08: 00000000000271a0 R09:
> ffffffffacda1b44
> [??488.126295] R10: ffffd6ee3f4ed000 R11: 0000000000000000 R12:
> 0000000000000001
> [??488.166746] R13: 0000000000000001 R14: 0000000000000000 R15:
> ffff8f0457821138
> [??488.207076] FS:??0000000000000000(0000) GS:ffff8f145b280000(0000)
> knlGS:0000000000000000
> [??488.252727] CS:??0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [??488.284395] CR2: 0000000000000000 CR3: 0000001e4aa0a006 CR4:
> 00000000000206e0
> [??488.324505] Call Trace:
> [??488.337945]??blk_mq_run_hw_queue+0xad/0xf0
> [??488.361057]??blk_mq_run_hw_queues+0x4b/0x60
> [??488.384507]??nvme_kill_queues+0x26/0x80 [nvme_core]
> [??488.411528]??nvme_remove_dead_ctrl_work+0x17/0x40 [nvme]
> [??488.441602]??process_one_work+0x158/0x360
> [??488.464568]??worker_thread+0x1fa/0x3e0
> [??488.486044]??kthread+0xf8/0x130
> [??488.504022]??? max_active_store+0x80/0x80
> [??488.527034]??? kthread_bind+0x10/0x10
> [??488.548026]??ret_from_fork+0x35/0x40
> [??488.569062] Code: c6 44 0f 46 ce 83 c2 01 45 89 ca 4c 89 54 01 08
> 48
> 8b 4f 10 2b 74 01 08 39 57 08 77 d8 f3 c3 90 8b 4f 08 85 c9 74 1f 48
> 8b
> 57 10 <48> 83 3a 00 75 18 31 c0 eb 0a 48 83 c2 40 48 83 3a 00 75 0a
> 83?
> [??488.676148] RIP: sbitmap_any_bit_set+0xb/0x30 RSP:
> ffffb19e47fdfe00
> [??488.711006] CR2: 0000000000000000

3rd and 4th attempts slightly better, but clearly not dependable

[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
????runtime????...??81.188s
????--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
????+++ results/nvme0n1/block/011.out.bad	2018-05-05
19:44:48.848568687 -0400
????@@ -1,2 +1,3 @@
?????Running block/011
????+tests/block/011: line 47: echo: write error: Input/output error
?????Test complete

This one passed 
[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[passed]
????runtime??81.188s??...??43.400s

I will capture a vmcore next time it panics and give some information
after analyzing the core

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context
  2018-05-05 13:59   ` Ming Lei
@ 2018-05-07 15:04     ` James Smart
  -1 siblings, 0 replies; 42+ messages in thread
From: James Smart @ 2018-05-07 15:04 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Jens Axboe, Laurence Oberman, Sagi Grimberg, linux-nvme,
	linux-block, Jianchao Wang, Christoph Hellwig



On 5/5/2018 6:59 AM, Ming Lei wrote:
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
>   		nvme_put_ctrl(&dev->ctrl);
>   }
>   
> -static void nvme_reset_work(struct work_struct *work)
> +static void nvme_reset_dev(struct nvme_dev *dev)
>   {
> -	struct nvme_dev *dev =
> -		container_of(work, struct nvme_dev, ctrl.reset_work);
>   	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>   	int result = -ENODEV;
>   	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
>   
> +	mutex_lock(&dev->ctrl.reset_lock);
> +
>   	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>   		goto out;
>   

I believe the reset_lock is unnecessary (patch 5) as it should be 
covered by the transition of the state to RESETTING which is done under 
lock.

Thus the error is:
instead of:
      if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
          goto out;

it should be:
      if (dev->ctrl.state != NVME_CTRL_RESETTING))
          return;


-- james

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context
@ 2018-05-07 15:04     ` James Smart
  0 siblings, 0 replies; 42+ messages in thread
From: James Smart @ 2018-05-07 15:04 UTC (permalink / raw)




On 5/5/2018 6:59 AM, Ming Lei wrote:
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
>   		nvme_put_ctrl(&dev->ctrl);
>   }
>   
> -static void nvme_reset_work(struct work_struct *work)
> +static void nvme_reset_dev(struct nvme_dev *dev)
>   {
> -	struct nvme_dev *dev =
> -		container_of(work, struct nvme_dev, ctrl.reset_work);
>   	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>   	int result = -ENODEV;
>   	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
>   
> +	mutex_lock(&dev->ctrl.reset_lock);
> +
>   	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>   		goto out;
>   

I believe the reset_lock is unnecessary (patch 5) as it should be 
covered by the transition of the state to RESETTING which is done under 
lock.

Thus the error is:
instead of:
 ???? if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 ???? ??? goto out;

it should be:
 ???? if (dev->ctrl.state != NVME_CTRL_RESETTING))
 ???? ??? return;


-- james

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-05 23:51       ` Laurence Oberman
@ 2018-05-08 15:09         ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-05-08 15:09 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Ming Lei, Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme,
	linux-block, Jianchao Wang, Christoph Hellwig

On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote:
> 3rd and 4th attempts slightly better, but clearly not dependable
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)����[failed]
> ����runtime����...��81.188s
> ����--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
> ����+++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:44:48.848568687 -0400
> ����@@ -1,2 +1,3 @@
> �����Running block/011
> ����+tests/block/011: line 47: echo: write error: Input/output error
> �����Test complete
> 
> This one passed 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)����[passed]
> ����runtime��81.188s��...��43.400s
> 
> I will capture a vmcore next time it panics and give some information
> after analyzing the core

We definitely should never panic, but I am not sure this blktest can be
reliable on IO errors: the test is disabling memory space enabling and
bus master without the driver's knowledge, and it does this repeatedly
in a tight loop. If the test happens to disable the device while the
driver is trying to recover from the previous iteration, the recovery
will surely fail, so I think IO errors may possibly be expected.

As far as I can tell, the only way you'll actually get it to succeed is
if the test's subsequent "enable" happen's to hit in conjuction with the
driver's reset pci_enable_device_mem(), such that the pci_dev's enable_cnt
is > 1, which prevents the disabling for the remainder of the test's
looping.

I still think this is a very good test, but we might be able to make it
more deterministic on what actually happens to the pci device.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-08 15:09         ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-05-08 15:09 UTC (permalink / raw)


On Sat, May 05, 2018@07:51:22PM -0400, Laurence Oberman wrote:
> 3rd and 4th attempts slightly better, but clearly not dependable
> 
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
> ????runtime????...??81.188s
> ????--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
> ????+++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:44:48.848568687 -0400
> ????@@ -1,2 +1,3 @@
> ?????Running block/011
> ????+tests/block/011: line 47: echo: write error: Input/output error
> ?????Test complete
> 
> This one passed 
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)????[passed]
> ????runtime??81.188s??...??43.400s
> 
> I will capture a vmcore next time it panics and give some information
> after analyzing the core

We definitely should never panic, but I am not sure this blktest can be
reliable on IO errors: the test is disabling memory space enabling and
bus master without the driver's knowledge, and it does this repeatedly
in a tight loop. If the test happens to disable the device while the
driver is trying to recover from the previous iteration, the recovery
will surely fail, so I think IO errors may possibly be expected.

As far as I can tell, the only way you'll actually get it to succeed is
if the test's subsequent "enable" happen's to hit in conjuction with the
driver's reset pci_enable_device_mem(), such that the pci_dev's enable_cnt
is > 1, which prevents the disabling for the remainder of the test's
looping.

I still think this is a very good test, but we might be able to make it
more deterministic on what actually happens to the pci device.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-05 13:58 ` Ming Lei
@ 2018-05-09  5:46   ` jianchao.wang
  -1 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-05-09  5:46 UTC (permalink / raw)
  To: Ming Lei, Keith Busch
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, Laurence Oberman

Hi ming

I did some tests on my local.

[  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller

This should be a timeout on nvme_reset_dev->nvme_wait_freeze.

[  598.828743] nvme nvme0: EH 1: before shutdown
[  599.013586] nvme nvme0: EH 1: after shutdown
[  599.137197] nvme nvme0: EH 1: after recovery

The EH 1 have mark the state to LIVE

[  599.137241] nvme nvme0: failed to mark controller state 1

So the EH 0 failed to mark state to LIVE
The card was removed.
This should not be expected by nested EH.

[  599.137322] nvme nvme0: Removing after probe failure status: 0
[  599.326539] nvme nvme0: EH 0: after recovery
[  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
[  599.457208] nvme nvme0: failed to set APST feature (-19)

nvme_reset_dev should identify whether it is nested.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-09  5:46   ` jianchao.wang
  0 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-05-09  5:46 UTC (permalink / raw)


Hi ming

I did some tests on my local.

[  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller

This should be a timeout on nvme_reset_dev->nvme_wait_freeze.

[  598.828743] nvme nvme0: EH 1: before shutdown
[  599.013586] nvme nvme0: EH 1: after shutdown
[  599.137197] nvme nvme0: EH 1: after recovery

The EH 1 have mark the state to LIVE

[  599.137241] nvme nvme0: failed to mark controller state 1

So the EH 0 failed to mark state to LIVE
The card was removed.
This should not be expected by nested EH.

[  599.137322] nvme nvme0: Removing after probe failure status: 0
[  599.326539] nvme nvme0: EH 0: after recovery
[  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
[  599.457208] nvme nvme0: failed to set APST feature (-19)

nvme_reset_dev should identify whether it is nested.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-09  5:46   ` jianchao.wang
@ 2018-05-10  2:09     ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10  2:09 UTC (permalink / raw)
  To: jianchao.wang
  Cc: Keith Busch, Jens Axboe, Laurence Oberman, Sagi Grimberg,
	linux-nvme, linux-block, Christoph Hellwig

On Wed, May 09, 2018 at 01:46:09PM +0800, jianchao.wang wrote:
> Hi ming
> 
> I did some tests on my local.
> 
> [  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller
> 
> This should be a timeout on nvme_reset_dev->nvme_wait_freeze.
> 
> [  598.828743] nvme nvme0: EH 1: before shutdown
> [  599.013586] nvme nvme0: EH 1: after shutdown
> [  599.137197] nvme nvme0: EH 1: after recovery
> 
> The EH 1 have mark the state to LIVE
> 
> [  599.137241] nvme nvme0: failed to mark controller state 1
> 
> So the EH 0 failed to mark state to LIVE
> The card was removed.
> This should not be expected by nested EH.

Right.

> 
> [  599.137322] nvme nvme0: Removing after probe failure status: 0
> [  599.326539] nvme nvme0: EH 0: after recovery
> [  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
> [  599.457208] nvme nvme0: failed to set APST feature (-19)
> 
> nvme_reset_dev should identify whether it is nested.

The above should be caused by race between updating controller state,
hope I can find some time in this week to investigate it further.

Also maybe we can change to remove controller until nested EH has
been tried enough times.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-10  2:09     ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10  2:09 UTC (permalink / raw)


On Wed, May 09, 2018@01:46:09PM +0800, jianchao.wang wrote:
> Hi ming
> 
> I did some tests on my local.
> 
> [  598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller
> 
> This should be a timeout on nvme_reset_dev->nvme_wait_freeze.
> 
> [  598.828743] nvme nvme0: EH 1: before shutdown
> [  599.013586] nvme nvme0: EH 1: after shutdown
> [  599.137197] nvme nvme0: EH 1: after recovery
> 
> The EH 1 have mark the state to LIVE
> 
> [  599.137241] nvme nvme0: failed to mark controller state 1
> 
> So the EH 0 failed to mark state to LIVE
> The card was removed.
> This should not be expected by nested EH.

Right.

> 
> [  599.137322] nvme nvme0: Removing after probe failure status: 0
> [  599.326539] nvme nvme0: EH 0: after recovery
> [  599.326760] nvme0n1: detected capacity change from 128035676160 to 0
> [  599.457208] nvme nvme0: failed to set APST feature (-19)
> 
> nvme_reset_dev should identify whether it is nested.

The above should be caused by race between updating controller state,
hope I can find some time in this week to investigate it further.

Also maybe we can change to remove controller until nested EH has
been tried enough times.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-05 23:11   ` Laurence Oberman
@ 2018-05-10 10:28     ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 10:28 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Keith Busch, Jens Axboe, linux-block, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > Hi,
> > 
> > The 1st patch introduces blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > for NVMe, meantime fixes blk_sync_queue().
> > 
> > The 2nd patch covers timeout for admin commands for recovering
> > controller
> > for avoiding possible deadlock.
> > 
> > The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> > frozen.
> > 
> > The last 4 patches fixes several races wrt. NVMe timeout handler, and
> > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > timeout
> > mecanism become much more rebost than before.
> > 
> > gitweb:
> > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > 
> > V4:
> > 	- fixe nvme_init_set_host_mem_cmd()
> > 	- use nested EH model, and run both nvme_dev_disable() and
> > 	resetting in one same context
> > 
> > V3:
> > 	- fix one new race related freezing in patch 4,
> > nvme_reset_work()
> > 	may hang forever without this patch
> > 	- rewrite the last 3 patches, and avoid to break
> > nvme_reset_ctrl*()
> > 
> > V2:
> > 	- fix draining timeout work, so no need to change return value
> > from
> > 	.timeout()
> > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > 	- cover timeout for admin commands running in EH
> > 
> > Ming Lei (7):
> > � block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
> > � nvme: pci: cover timeout for admin commands running in EH
> > � nvme: pci: only wait freezing if queue is frozen
> > � nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > ����recovery
> > � nvme: core: introduce 'reset_lock' for sync reset state and reset
> > ����activities
> > � nvme: pci: prepare for supporting error recovery from resetting
> > ����context
> > � nvme: pci: support nested EH
> > 
> > �block/blk-core.c���������|��21 +++-
> > �block/blk-mq.c�����������|���9 ++
> > �block/blk-timeout.c������|���5 +-
> > �drivers/nvme/host/core.c |��46 ++++++-
> > �drivers/nvme/host/nvme.h |���5 +
> > �drivers/nvme/host/pci.c��| 304
> > ++++++++++++++++++++++++++++++++++++++++-------
> > �include/linux/blkdev.h���|��13 ++
> > �7 files changed, 356 insertions(+), 47 deletions(-)
> > 
> > Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: linux-nvme@lists.infradead.org
> > Cc: Laurence Oberman <loberman@redhat.com>
> 
> Hello Ming
> 
> I have a two node NUMA system here running your kernel tree
> 4.17.0-rc3.ming.nvme+
> 
> [root@segstorage1 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 3 5 6 8 11 13 14
> node 0 size: 63922 MB
> node 0 free: 61310 MB
> node 1 cpus: 1 2 4 7 9 10 12 15
> node 1 size: 64422 MB
> node 1 free: 62372 MB
> node distances:
> node���0���1�
> � 0:��10��20�
> � 1:��20��10�
> 
> I ran block/011
> 
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)����[failed]
> ����runtime����...��106.936s
> ����--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
> ����+++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:07:21.028634858 -0400
> ����@@ -1,2 +1,36 @@
> �����Running block/011
> ����+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ����+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ����+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ����+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ����+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ����+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ����...
> ����(Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718239] nvme nvme0: EH 0: before shutdown
> [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760926] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.330251] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.391713] nvme nvme0: EH 0: after shutdown
> [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> [ 1526.754335] nvme nvme0: EH 1: before shutdown
> [ 1526.793257] nvme nvme0: EH 1: after shutdown
> [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> [ 1526.888206] nvme nvme0: EH 0: after recovery
> [ 1526.888212] nvme0n1: detected capacity change from 400088457216 to 0
> [ 1526.947520] print_req_error: 1 callbacks suppressed
> [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector 794920
> [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector 569328
> [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector 1234608
> [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector 389296
> [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector 712432
> [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector 889304
> [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector 205776
> [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector 126480
> [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector 1601232
> [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector 1234360
> [ 1526.947745] Pid 683(fio) over core_pipe_limit
> [ 1526.947746] Skipping core dump
> [ 1526.947747] Pid 675(fio) over core_pipe_limit
> [ 1526.947748] Skipping core dump
> [ 1526.947863] Pid 672(fio) over core_pipe_limit
> [ 1526.947863] Skipping core dump
> [ 1526.947865] Pid 674(fio) over core_pipe_limit
> [ 1526.947866] Skipping core dump
> [ 1526.947870] Pid 676(fio) over core_pipe_limit
> [ 1526.947871] Pid 679(fio) over core_pipe_limit
> [ 1526.947872] Skipping core dump
> [ 1526.947872] Skipping core dump
> [ 1526.948197] Pid 677(fio) over core_pipe_limit
> [ 1526.948197] Skipping core dump
> [ 1526.948245] Pid 686(fio) over core_pipe_limit
> [ 1526.948245] Skipping core dump
> [ 1526.974610] Pid 680(fio) over core_pipe_limit
> [ 1526.974611] Pid 684(fio) over core_pipe_limit
> [ 1526.974611] Skipping core dump
> [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> [ 1526.980385] nvme nvme0: EH 1: after recovery
> [ 1526.980477] Pid 687(fio) over core_pipe_limit
> [ 1526.980478] Skipping core dump
> [ 1527.858207] Skipping core dump
> 
> And leaves me looping here
> 
> [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than 120
> seconds.
> [ 1721.311263]�������Tainted: G����������I�������4.17.0-rc3.ming.nvme+
> #1
> [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1721.392957] kworker/u66:0���D����0 24214������2 0x80000080
> [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> [ 1721.458568] Call Trace:
> [ 1721.472499]��? __schedule+0x290/0x870
> [ 1721.493515]��schedule+0x32/0x80
> [ 1721.511656]��blk_mq_freeze_queue_wait+0x46/0xb0
> [ 1721.537609]��? remove_wait_queue+0x60/0x60
> [ 1721.561081]��blk_cleanup_queue+0x7e/0x180
> [ 1721.584637]��nvme_ns_remove+0x106/0x140 [nvme_core]
> [ 1721.612589]��nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> [ 1721.643163]��nvme_remove+0x80/0x120 [nvme]
> [ 1721.666188]��pci_device_remove+0x3b/0xc0
> [ 1721.688553]��device_release_driver_internal+0x148/0x220
> [ 1721.719332]��nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> [ 1721.750474]��process_one_work+0x158/0x360
> [ 1721.772632]��worker_thread+0x47/0x3e0
> [ 1721.792471]��kthread+0xf8/0x130
> [ 1721.810354]��? max_active_store+0x80/0x80
> [ 1721.832459]��? kthread_bind+0x10/0x10
> [ 1721.852845]��ret_from_fork+0x35/0x40
> 
> Did I di something wrong
> 
> I never set anything else, the nvme0n1 was not mounted etc.

Hi Laurence,

Thanks for your test!

Could you run the following V5(not posted yest) and see if
the issues you triggered can be fixed? If not, please provide
me the dmesg log.

https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5

BTW, the main change is on handling reset failure, in V5, only
the failure from top EH is handled.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-10 10:28     ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 10:28 UTC (permalink / raw)


On Sat, May 05, 2018@07:11:33PM -0400, Laurence Oberman wrote:
> On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> > Hi,
> > 
> > The 1st patch introduces blk_quiesce_timeout() and
> > blk_unquiesce_timeout()
> > for NVMe, meantime fixes blk_sync_queue().
> > 
> > The 2nd patch covers timeout for admin commands for recovering
> > controller
> > for avoiding possible deadlock.
> > 
> > The 3rd and 4th patches avoid to wait_freeze on queues which aren't
> > frozen.
> > 
> > The last 4 patches fixes several races wrt. NVMe timeout handler, and
> > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > timeout
> > mecanism become much more rebost than before.
> > 
> > gitweb:
> > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > 
> > V4:
> > 	- fixe nvme_init_set_host_mem_cmd()
> > 	- use nested EH model, and run both nvme_dev_disable() and
> > 	resetting in one same context
> > 
> > V3:
> > 	- fix one new race related freezing in patch 4,
> > nvme_reset_work()
> > 	may hang forever without this patch
> > 	- rewrite the last 3 patches, and avoid to break
> > nvme_reset_ctrl*()
> > 
> > V2:
> > 	- fix draining timeout work, so no need to change return value
> > from
> > 	.timeout()
> > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > 	- cover timeout for admin commands running in EH
> > 
> > Ming Lei (7):
> > ? block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
> > ? nvme: pci: cover timeout for admin commands running in EH
> > ? nvme: pci: only wait freezing if queue is frozen
> > ? nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > ????recovery
> > ? nvme: core: introduce 'reset_lock' for sync reset state and reset
> > ????activities
> > ? nvme: pci: prepare for supporting error recovery from resetting
> > ????context
> > ? nvme: pci: support nested EH
> > 
> > ?block/blk-core.c?????????|??21 +++-
> > ?block/blk-mq.c???????????|???9 ++
> > ?block/blk-timeout.c??????|???5 +-
> > ?drivers/nvme/host/core.c |??46 ++++++-
> > ?drivers/nvme/host/nvme.h |???5 +
> > ?drivers/nvme/host/pci.c??| 304
> > ++++++++++++++++++++++++++++++++++++++++-------
> > ?include/linux/blkdev.h???|??13 ++
> > ?7 files changed, 356 insertions(+), 47 deletions(-)
> > 
> > Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
> > Cc: Christoph Hellwig <hch at lst.de>
> > Cc: Sagi Grimberg <sagi at grimberg.me>
> > Cc: linux-nvme at lists.infradead.org
> > Cc: Laurence Oberman <loberman at redhat.com>
> 
> Hello Ming
> 
> I have a two node NUMA system here running your kernel tree
> 4.17.0-rc3.ming.nvme+
> 
> [root at segstorage1 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 3 5 6 8 11 13 14
> node 0 size: 63922 MB
> node 0 free: 61310 MB
> node 1 cpus: 1 2 4 7 9 10 12 15
> node 1 size: 64422 MB
> node 1 free: 62372 MB
> node distances:
> node???0???1?
> ? 0:??10??20?
> ? 1:??20??10?
> 
> I ran block/011
> 
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
> ????runtime????...??106.936s
> ????--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
> ????+++ results/nvme0n1/block/011.out.bad	2018-05-05
> 19:07:21.028634858 -0400
> ????@@ -1,2 +1,36 @@
> ?????Running block/011
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> IO_U_F_FLIGHT) == 0' failed.
> ????...
> ????(Run 'diff -u tests/block/011.out
> results/nvme0n1/block/011.out.bad' to see the entire diff)
> 
> [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.718239] nvme nvme0: EH 0: before shutdown
> [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1452.760926] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.330251] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1453.391713] nvme nvme0: EH 0: after shutdown
> [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> [ 1526.754335] nvme nvme0: EH 1: before shutdown
> [ 1526.793257] nvme nvme0: EH 1: after shutdown
> [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> [ 1526.888206] nvme nvme0: EH 0: after recovery
> [ 1526.888212] nvme0n1: detected capacity change from 400088457216 to 0
> [ 1526.947520] print_req_error: 1 callbacks suppressed
> [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector 794920
> [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector 569328
> [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector 1234608
> [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector 389296
> [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector 712432
> [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector 889304
> [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector 205776
> [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector 126480
> [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector 1601232
> [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector 1234360
> [ 1526.947745] Pid 683(fio) over core_pipe_limit
> [ 1526.947746] Skipping core dump
> [ 1526.947747] Pid 675(fio) over core_pipe_limit
> [ 1526.947748] Skipping core dump
> [ 1526.947863] Pid 672(fio) over core_pipe_limit
> [ 1526.947863] Skipping core dump
> [ 1526.947865] Pid 674(fio) over core_pipe_limit
> [ 1526.947866] Skipping core dump
> [ 1526.947870] Pid 676(fio) over core_pipe_limit
> [ 1526.947871] Pid 679(fio) over core_pipe_limit
> [ 1526.947872] Skipping core dump
> [ 1526.947872] Skipping core dump
> [ 1526.948197] Pid 677(fio) over core_pipe_limit
> [ 1526.948197] Skipping core dump
> [ 1526.948245] Pid 686(fio) over core_pipe_limit
> [ 1526.948245] Skipping core dump
> [ 1526.974610] Pid 680(fio) over core_pipe_limit
> [ 1526.974611] Pid 684(fio) over core_pipe_limit
> [ 1526.974611] Skipping core dump
> [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> [ 1526.980385] nvme nvme0: EH 1: after recovery
> [ 1526.980477] Pid 687(fio) over core_pipe_limit
> [ 1526.980478] Skipping core dump
> [ 1527.858207] Skipping core dump
> 
> And leaves me looping here
> 
> [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than 120
> seconds.
> [ 1721.311263]???????Tainted: G??????????I???????4.17.0-rc3.ming.nvme+
> #1
> [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1721.392957] kworker/u66:0???D????0 24214??????2 0x80000080
> [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> [ 1721.458568] Call Trace:
> [ 1721.472499]??? __schedule+0x290/0x870
> [ 1721.493515]??schedule+0x32/0x80
> [ 1721.511656]??blk_mq_freeze_queue_wait+0x46/0xb0
> [ 1721.537609]??? remove_wait_queue+0x60/0x60
> [ 1721.561081]??blk_cleanup_queue+0x7e/0x180
> [ 1721.584637]??nvme_ns_remove+0x106/0x140 [nvme_core]
> [ 1721.612589]??nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> [ 1721.643163]??nvme_remove+0x80/0x120 [nvme]
> [ 1721.666188]??pci_device_remove+0x3b/0xc0
> [ 1721.688553]??device_release_driver_internal+0x148/0x220
> [ 1721.719332]??nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> [ 1721.750474]??process_one_work+0x158/0x360
> [ 1721.772632]??worker_thread+0x47/0x3e0
> [ 1721.792471]??kthread+0xf8/0x130
> [ 1721.810354]??? max_active_store+0x80/0x80
> [ 1721.832459]??? kthread_bind+0x10/0x10
> [ 1721.852845]??ret_from_fork+0x35/0x40
> 
> Did I di something wrong
> 
> I never set anything else, the nvme0n1 was not mounted etc.

Hi Laurence,

Thanks for your test!

Could you run the following V5(not posted yest) and see if
the issues you triggered can be fixed? If not, please provide
me the dmesg log.

https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5

BTW, the main change is on handling reset failure, in V5, only
the failure from top EH is handled.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
  2018-05-05 13:58   ` Ming Lei
@ 2018-05-10 15:01     ` Bart Van Assche
  -1 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2018-05-10 15:01 UTC (permalink / raw)
  To: keith.busch, ming.lei
  Cc: hch, sagi, linux-block, linux-nvme, loberman, axboe, jianchao.w.wang

T24gU2F0LCAyMDE4LTA1LTA1IGF0IDIxOjU4ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gVHVy
bnMgb3V0IHRoZSBjdXJyZW50IHdheSBjYW4ndCBkcmFpbiB0aW1vdXQgY29tcGxldGVseSBiZWNh
dXNlIG1vZF90aW1lcigpDQo+IGNhbiBiZSB0cmlnZ2VyZWQgaW4gdGhlIHdvcmsgZnVuYywgd2hp
Y2ggY2FuIGJlIGp1c3QgcnVuIGluc2lkZSB0aGUgc3luY2VkDQo+IHRpbWVvdXQgd29yazoNCj4g
DQo+ICAgICAgICAgZGVsX3RpbWVyX3N5bmMoJnEtPnRpbWVvdXQpOw0KPiAgICAgICAgIGNhbmNl
bF93b3JrX3N5bmMoJnEtPnRpbWVvdXRfd29yayk7DQo+IA0KPiBUaGlzIHBhdGNoIGludHJvZHVj
ZXMgb25lIGZsYWcgb2YgJ3RpbWVvdXRfb2ZmJyBmb3IgZml4aW5nIHRoaXMgaXNzdWUsIHR1cm5z
DQo+IG91dCB0aGlzIHNpbXBsZSB3YXkgZG9lcyB3b3JrLg0KPiANCj4gQWxzbyBibGtfcXVpZXNj
ZV90aW1lb3V0KCkgYW5kIGJsa191bnF1aWVzY2VfdGltZW91dCgpIGFyZSBpbnRyb2R1Y2VkIGZv
cg0KPiBkcmFpbmluZyB0aW1lb3V0LCB3aGljaCBpcyBuZWVkZWQgYnkgTlZNZS4NCg0KSGVsbG8g
TWluZywNCg0KVGhlIGRlc2NyaXB0aW9uIG9mIHRoZSBhYm92ZSBwYXRjaCBkb2VzIG5vdCBtb3Rp
dmF0ZSBzdWZmaWNpZW50bHkgd2h5IHlvdSB0aGluaw0KdGhhdCB0aGlzIGNoYW5nZSBpcyBuZWNl
c3NhcnkuIEFzIHlvdSBrbm93IGl0IGlzIGFscmVhZHkgcG9zc2libGUgdG8gd2FpdCB1bnRpbA0K
dGltZW91dCBoYW5kbGluZyBoYXMgZmluaXNoZWQgYnkgY2FsbGluZyBibGtfbXFfZnJlZXplX3F1
ZXVlKCkgKw0KYmxrX21xX3VuZnJlZXplX3F1ZXVlKCkuIEFuIGV4cGxhbmF0aW9uIGlzIG5lZWRl
ZCBvZiB3aHkgeW91IHRoaW5rIHRoYXQgY2FsbGluZw0KYmxrX21xX2ZyZWV6ZV9xdWV1ZSgpICsg
YmxrX21xX3VuZnJlZXplX3F1ZXVlKCkgaXMgbm90IHN1ZmZpY2llbnQgYW5kIHdoeSB5b3UNCmNo
b3NlIHRoZSBzb2x1dGlvbiBpbXBsZW1lbnRlZCBpbiB0aGlzIHBhdGNoLg0KDQpUaGFua3MsDQoN
CkJhcnQuDQoNCg0KDQo=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
@ 2018-05-10 15:01     ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2018-05-10 15:01 UTC (permalink / raw)


On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> Turns out the current way can't drain timout completely because mod_timer()
> can be triggered in the work func, which can be just run inside the synced
> timeout work:
> 
>         del_timer_sync(&q->timeout);
>         cancel_work_sync(&q->timeout_work);
> 
> This patch introduces one flag of 'timeout_off' for fixing this issue, turns
> out this simple way does work.
> 
> Also blk_quiesce_timeout() and blk_unquiesce_timeout() are introduced for
> draining timeout, which is needed by NVMe.

Hello Ming,

The description of the above patch does not motivate sufficiently why you think
that this change is necessary. As you know it is already possible to wait until
timeout handling has finished by calling blk_mq_freeze_queue() +
blk_mq_unfreeze_queue(). An explanation is needed of why you think that calling
blk_mq_freeze_queue() + blk_mq_unfreeze_queue() is not sufficient and why you
chose the solution implemented in this patch.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context
  2018-05-07 15:04     ` James Smart
@ 2018-05-10 20:53       ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 20:53 UTC (permalink / raw)
  To: James Smart
  Cc: Keith Busch, Jens Axboe, Laurence Oberman, Sagi Grimberg,
	linux-nvme, linux-block, Jianchao Wang, Christoph Hellwig

On Mon, May 07, 2018 at 08:04:18AM -0700, James Smart wrote:
> 
> 
> On 5/5/2018 6:59 AM, Ming Lei wrote:
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
> >   		nvme_put_ctrl(&dev->ctrl);
> >   }
> > -static void nvme_reset_work(struct work_struct *work)
> > +static void nvme_reset_dev(struct nvme_dev *dev)
> >   {
> > -	struct nvme_dev *dev =
> > -		container_of(work, struct nvme_dev, ctrl.reset_work);
> >   	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
> >   	int result = -ENODEV;
> >   	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
> > +	mutex_lock(&dev->ctrl.reset_lock);
> > +
> >   	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> >   		goto out;
> 
> I believe the reset_lock is unnecessary (patch 5) as it should be covered by
> the transition of the state to RESETTING which is done under lock.
> 
> Thus the error is:
> instead of:
> ���� if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> ���� ��� goto out;
> 
> it should be:
> ���� if (dev->ctrl.state != NVME_CTRL_RESETTING))
> ���� ��� return;
> 

Right, I have dropped this patch in V5(not posted yet):

https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context
@ 2018-05-10 20:53       ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 20:53 UTC (permalink / raw)


On Mon, May 07, 2018@08:04:18AM -0700, James Smart wrote:
> 
> 
> On 5/5/2018 6:59 AM, Ming Lei wrote:
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2365,14 +2365,14 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
> >   		nvme_put_ctrl(&dev->ctrl);
> >   }
> > -static void nvme_reset_work(struct work_struct *work)
> > +static void nvme_reset_dev(struct nvme_dev *dev)
> >   {
> > -	struct nvme_dev *dev =
> > -		container_of(work, struct nvme_dev, ctrl.reset_work);
> >   	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
> >   	int result = -ENODEV;
> >   	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
> > +	mutex_lock(&dev->ctrl.reset_lock);
> > +
> >   	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> >   		goto out;
> 
> I believe the reset_lock is unnecessary (patch 5) as it should be covered by
> the transition of the state to RESETTING which is done under lock.
> 
> Thus the error is:
> instead of:
> ???? if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> ???? ??? goto out;
> 
> it should be:
> ???? if (dev->ctrl.state != NVME_CTRL_RESETTING))
> ???? ??? return;
> 

Right, I have dropped this patch in V5(not posted yet):

https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
  2018-05-10 15:01     ` Bart Van Assche
@ 2018-05-10 21:00       ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 21:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: keith.busch, axboe, loberman, sagi, linux-nvme, linux-block,
	jianchao.w.wang, hch

On Thu, May 10, 2018 at 03:01:04PM +0000, Bart Van Assche wrote:
> On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > Turns out the current way can't drain timout completely because mod_timer()
> > can be triggered in the work func, which can be just run inside the synced
> > timeout work:
> > 
> >         del_timer_sync(&q->timeout);
> >         cancel_work_sync(&q->timeout_work);
> > 
> > This patch introduces one flag of 'timeout_off' for fixing this issue, turns
> > out this simple way does work.
> > 
> > Also blk_quiesce_timeout() and blk_unquiesce_timeout() are introduced for
> > draining timeout, which is needed by NVMe.
> 
> Hello Ming,
> 
> The description of the above patch does not motivate sufficiently why you think
> that this change is necessary. As you know it is already possible to wait until
> timeout handling has finished by calling blk_mq_freeze_queue() +
> blk_mq_unfreeze_queue(). An explanation is needed of why you think that calling

blk_mq_freeze_queue() +  blk_mq_unfreeze_queue() can't work, you have to
call blk_mq_freeze_queue_wait() between the two, but blk_mq_freeze_queue_wait
is a big trouble for NVMe, and can't be used inside nvme_dev_disable().

You can find the usage in the last patch of this series.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()
@ 2018-05-10 21:00       ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 21:00 UTC (permalink / raw)


On Thu, May 10, 2018@03:01:04PM +0000, Bart Van Assche wrote:
> On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> > Turns out the current way can't drain timout completely because mod_timer()
> > can be triggered in the work func, which can be just run inside the synced
> > timeout work:
> > 
> >         del_timer_sync(&q->timeout);
> >         cancel_work_sync(&q->timeout_work);
> > 
> > This patch introduces one flag of 'timeout_off' for fixing this issue, turns
> > out this simple way does work.
> > 
> > Also blk_quiesce_timeout() and blk_unquiesce_timeout() are introduced for
> > draining timeout, which is needed by NVMe.
> 
> Hello Ming,
> 
> The description of the above patch does not motivate sufficiently why you think
> that this change is necessary. As you know it is already possible to wait until
> timeout handling has finished by calling blk_mq_freeze_queue() +
> blk_mq_unfreeze_queue(). An explanation is needed of why you think that calling

blk_mq_freeze_queue() +  blk_mq_unfreeze_queue() can't work, you have to
call blk_mq_freeze_queue_wait() between the two, but blk_mq_freeze_queue_wait
is a big trouble for NVMe, and can't be used inside nvme_dev_disable().

You can find the usage in the last patch of this series.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-10 10:28     ` Ming Lei
@ 2018-05-10 21:59       ` Laurence Oberman
  -1 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-10 21:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, Jens Axboe, linux-block, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote:
> On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote:
> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > The 1st patch introduces blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > for NVMe, meantime fixes blk_sync_queue().
> > > 
> > > The 2nd patch covers timeout for admin commands for recovering
> > > controller
> > > for avoiding possible deadlock.
> > > 
> > > The 3rd and 4th patches avoid to wait_freeze on queues which
> > > aren't
> > > frozen.
> > > 
> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > > and
> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > > timeout
> > > mecanism become much more rebost than before.
> > > 
> > > gitweb:
> > > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > > 
> > > V4:
> > > 	- fixe nvme_init_set_host_mem_cmd()
> > > 	- use nested EH model, and run both nvme_dev_disable() and
> > > 	resetting in one same context
> > > 
> > > V3:
> > > 	- fix one new race related freezing in patch 4,
> > > nvme_reset_work()
> > > 	may hang forever without this patch
> > > 	- rewrite the last 3 patches, and avoid to break
> > > nvme_reset_ctrl*()
> > > 
> > > V2:
> > > 	- fix draining timeout work, so no need to change return value
> > > from
> > > 	.timeout()
> > > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > > 	- cover timeout for admin commands running in EH
> > > 
> > > Ming Lei (7):
> > >   block: introduce blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > >   nvme: pci: cover timeout for admin commands running in EH
> > >   nvme: pci: only wait freezing if queue is frozen
> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > >     recovery
> > >   nvme: core: introduce 'reset_lock' for sync reset state and
> > > reset
> > >     activities
> > >   nvme: pci: prepare for supporting error recovery from resetting
> > >     context
> > >   nvme: pci: support nested EH
> > > 
> > >  block/blk-core.c         |  21 +++-
> > >  block/blk-mq.c           |   9 ++
> > >  block/blk-timeout.c      |   5 +-
> > >  drivers/nvme/host/core.c |  46 ++++++-
> > >  drivers/nvme/host/nvme.h |   5 +
> > >  drivers/nvme/host/pci.c  | 304
> > > ++++++++++++++++++++++++++++++++++++++++-------
> > >  include/linux/blkdev.h   |  13 ++
> > >  7 files changed, 356 insertions(+), 47 deletions(-)
> > > 
> > > Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Sagi Grimberg <sagi@grimberg.me>
> > > Cc: linux-nvme@lists.infradead.org
> > > Cc: Laurence Oberman <loberman@redhat.com>
> > 
> > Hello Ming
> > 
> > I have a two node NUMA system here running your kernel tree
> > 4.17.0-rc3.ming.nvme+
> > 
> > [root@segstorage1 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 3 5 6 8 11 13 14
> > node 0 size: 63922 MB
> > node 0 free: 61310 MB
> > node 1 cpus: 1 2 4 7 9 10 12 15
> > node 1 size: 64422 MB
> > node 1 free: 62372 MB
> > node distances:
> > node   0   1 
> >   0:  10  20 
> >   1:  20  10 
> > 
> > I ran block/011
> > 
> > [root@segstorage1 blktests]# ./check block/011
> > block/011 => nvme0n1 (disable PCI device while doing
> > I/O)    [failed]
> >     runtime    ...  106.936s
> >     --- tests/block/011.out	2018-05-05 18:01:14.268414752
> > -0400
> >     +++ results/nvme0n1/block/011.out.bad	2018-05-05
> > 19:07:21.028634858 -0400
> >     @@ -1,2 +1,36 @@
> >      Running block/011
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> >     ...
> >     (Run 'diff -u tests/block/011.out
> > results/nvme0n1/block/011.out.bad' to see the entire diff)
> > 
> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> > [ 1452.676351] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718221] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
> > [ 1452.760890] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760894] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760897] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760900] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760903] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760906] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760909] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760912] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760915] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760918] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760921] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760923] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760926] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.330251] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.391713] nvme nvme0: EH 0: after shutdown
> > [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> > [ 1526.754335] nvme nvme0: EH 1: before shutdown
> > [ 1526.793257] nvme nvme0: EH 1: after shutdown
> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> > [ 1526.888206] nvme nvme0: EH 0: after recovery
> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216
> > to 0
> > [ 1526.947520] print_req_error: 1 callbacks suppressed
> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector
> > 794920
> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector
> > 569328
> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
> > 1234608
> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector
> > 389296
> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector
> > 712432
> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector
> > 889304
> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector
> > 205776
> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector
> > 126480
> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
> > 1601232
> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
> > 1234360
> > [ 1526.947745] Pid 683(fio) over core_pipe_limit
> > [ 1526.947746] Skipping core dump
> > [ 1526.947747] Pid 675(fio) over core_pipe_limit
> > [ 1526.947748] Skipping core dump
> > [ 1526.947863] Pid 672(fio) over core_pipe_limit
> > [ 1526.947863] Skipping core dump
> > [ 1526.947865] Pid 674(fio) over core_pipe_limit
> > [ 1526.947866] Skipping core dump
> > [ 1526.947870] Pid 676(fio) over core_pipe_limit
> > [ 1526.947871] Pid 679(fio) over core_pipe_limit
> > [ 1526.947872] Skipping core dump
> > [ 1526.947872] Skipping core dump
> > [ 1526.948197] Pid 677(fio) over core_pipe_limit
> > [ 1526.948197] Skipping core dump
> > [ 1526.948245] Pid 686(fio) over core_pipe_limit
> > [ 1526.948245] Skipping core dump
> > [ 1526.974610] Pid 680(fio) over core_pipe_limit
> > [ 1526.974611] Pid 684(fio) over core_pipe_limit
> > [ 1526.974611] Skipping core dump
> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> > [ 1526.980385] nvme nvme0: EH 1: after recovery
> > [ 1526.980477] Pid 687(fio) over core_pipe_limit
> > [ 1526.980478] Skipping core dump
> > [ 1527.858207] Skipping core dump
> > 
> > And leaves me looping here
> > 
> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
> > 120
> > seconds.
> > [ 1721.311263]       Tainted: G          I       4.17.0-
> > rc3.ming.nvme+
> > #1
> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 1721.392957] kworker/u66:0   D    0 24214      2 0x80000080
> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> > [ 1721.458568] Call Trace:
> > [ 1721.472499]  ? __schedule+0x290/0x870
> > [ 1721.493515]  schedule+0x32/0x80
> > [ 1721.511656]  blk_mq_freeze_queue_wait+0x46/0xb0
> > [ 1721.537609]  ? remove_wait_queue+0x60/0x60
> > [ 1721.561081]  blk_cleanup_queue+0x7e/0x180
> > [ 1721.584637]  nvme_ns_remove+0x106/0x140 [nvme_core]
> > [ 1721.612589]  nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> > [ 1721.643163]  nvme_remove+0x80/0x120 [nvme]
> > [ 1721.666188]  pci_device_remove+0x3b/0xc0
> > [ 1721.688553]  device_release_driver_internal+0x148/0x220
> > [ 1721.719332]  nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> > [ 1721.750474]  process_one_work+0x158/0x360
> > [ 1721.772632]  worker_thread+0x47/0x3e0
> > [ 1721.792471]  kthread+0xf8/0x130
> > [ 1721.810354]  ? max_active_store+0x80/0x80
> > [ 1721.832459]  ? kthread_bind+0x10/0x10
> > [ 1721.852845]  ret_from_fork+0x35/0x40
> > 
> > Did I di something wrong
> > 
> > I never set anything else, the nvme0n1 was not mounted etc.
> 
> Hi Laurence,
> 
> Thanks for your test!
> 
> Could you run the following V5(not posted yest) and see if
> the issues you triggered can be fixed? If not, please provide
> me the dmesg log.
> 
> https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5
> 
> BTW, the main change is on handling reset failure, in V5, only
> the failure from top EH is handled.
> 
> Thanks,
> Ming

Hello Ming

Seems better , had a failure on first test but no panics. 
Following tests have all passed

root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
    runtime  41.790s  ...  79.184s
    --- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
    +++ results/nvme0n1/block/011.out.bad	2018-05-10
17:48:34.792080746 -0400
    @@ -1,2 +1,3 @@
     Running block/011
    +tests/block/011: line 47: echo: write error: Input/output error
     Test complete
[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
    runtime  79.184s  ...  42.196s
[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
    runtime  42.196s  ...  41.390s
[root@segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
    runtime  41.390s  ...  42.193s

Kernel 4.17.0-rc3.ming.v5+ on an x86_64

segstorage1 login: [  631.297687] run blktests block/011 at 2018-05-10
17:47:15
[  661.951541] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  661.990218] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  661.990257] nvme nvme0: EH 0: before shutdown
[  662.031388] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031395] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031398] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031402] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031405] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031409] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031412] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031416] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.031420] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.436080] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.477826] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.519368] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.560755] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.602456] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[  662.657904] nvme nvme0: EH 0: after shutdown
[  668.730405] nvme nvme0: EH 0: after recovery 0
[  738.859987] run blktests block/011 at 2018-05-10 17:49:03
[  810.586431] run blktests block/011 at 2018-05-10 17:50:14
[ 1065.694108] run blktests block/011 at 2018-05-10 17:54:29

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-10 21:59       ` Laurence Oberman
  0 siblings, 0 replies; 42+ messages in thread
From: Laurence Oberman @ 2018-05-10 21:59 UTC (permalink / raw)


On Thu, 2018-05-10@18:28 +0800, Ming Lei wrote:
> On Sat, May 05, 2018@07:11:33PM -0400, Laurence Oberman wrote:
> > On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > The 1st patch introduces blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > for NVMe, meantime fixes blk_sync_queue().
> > > 
> > > The 2nd patch covers timeout for admin commands for recovering
> > > controller
> > > for avoiding possible deadlock.
> > > 
> > > The 3rd and 4th patches avoid to wait_freeze on queues which
> > > aren't
> > > frozen.
> > > 
> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
> > > and
> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
> > > timeout
> > > mecanism become much more rebost than before.
> > > 
> > > gitweb:
> > > 	https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
> > > 
> > > V4:
> > > 	- fixe nvme_init_set_host_mem_cmd()
> > > 	- use nested EH model, and run both nvme_dev_disable() and
> > > 	resetting in one same context
> > > 
> > > V3:
> > > 	- fix one new race related freezing in patch 4,
> > > nvme_reset_work()
> > > 	may hang forever without this patch
> > > 	- rewrite the last 3 patches, and avoid to break
> > > nvme_reset_ctrl*()
> > > 
> > > V2:
> > > 	- fix draining timeout work, so no need to change return value
> > > from
> > > 	.timeout()
> > > 	- fix race between nvme_start_freeze() and nvme_unfreeze()
> > > 	- cover timeout for admin commands running in EH
> > > 
> > > Ming Lei (7):
> > > ? block: introduce blk_quiesce_timeout() and
> > > blk_unquiesce_timeout()
> > > ? nvme: pci: cover timeout for admin commands running in EH
> > > ? nvme: pci: only wait freezing if queue is frozen
> > > ? nvme: pci: freeze queue in nvme_dev_disable() in case of error
> > > ????recovery
> > > ? nvme: core: introduce 'reset_lock' for sync reset state and
> > > reset
> > > ????activities
> > > ? nvme: pci: prepare for supporting error recovery from resetting
> > > ????context
> > > ? nvme: pci: support nested EH
> > > 
> > > ?block/blk-core.c?????????|??21 +++-
> > > ?block/blk-mq.c???????????|???9 ++
> > > ?block/blk-timeout.c??????|???5 +-
> > > ?drivers/nvme/host/core.c |??46 ++++++-
> > > ?drivers/nvme/host/nvme.h |???5 +
> > > ?drivers/nvme/host/pci.c??| 304
> > > ++++++++++++++++++++++++++++++++++++++++-------
> > > ?include/linux/blkdev.h???|??13 ++
> > > ?7 files changed, 356 insertions(+), 47 deletions(-)
> > > 
> > > Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
> > > Cc: Christoph Hellwig <hch at lst.de>
> > > Cc: Sagi Grimberg <sagi at grimberg.me>
> > > Cc: linux-nvme at lists.infradead.org
> > > Cc: Laurence Oberman <loberman at redhat.com>
> > 
> > Hello Ming
> > 
> > I have a two node NUMA system here running your kernel tree
> > 4.17.0-rc3.ming.nvme+
> > 
> > [root at segstorage1 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 3 5 6 8 11 13 14
> > node 0 size: 63922 MB
> > node 0 free: 61310 MB
> > node 1 cpus: 1 2 4 7 9 10 12 15
> > node 1 size: 64422 MB
> > node 1 free: 62372 MB
> > node distances:
> > node???0???1?
> > ? 0:??10??20?
> > ? 1:??20??10?
> > 
> > I ran block/011
> > 
> > [root at segstorage1 blktests]# ./check block/011
> > block/011 => nvme0n1 (disable PCI device while doing
> > I/O)????[failed]
> > ????runtime????...??106.936s
> > ????--- tests/block/011.out	2018-05-05 18:01:14.268414752
> > -0400
> > ????+++ results/nvme0n1/block/011.out.bad	2018-05-05
> > 19:07:21.028634858 -0400
> > ????@@ -1,2 +1,36 @@
> > ?????Running block/011
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????+fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
> > IO_U_F_FLIGHT) == 0' failed.
> > ????...
> > ????(Run 'diff -u tests/block/011.out
> > results/nvme0n1/block/011.out.bad' to see the entire diff)
> > 
> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
> > [ 1452.676351] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718221] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
> > [ 1452.760890] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760894] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760897] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760900] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760903] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760906] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760909] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760912] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760915] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760918] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760921] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760923] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1452.760926] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.330251] nvme nvme0: controller is down; will reset:
> > CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1453.391713] nvme nvme0: EH 0: after shutdown
> > [ 1456.804695] device-mapper: multipath: Failing path 259:0.
> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
> > [ 1526.754335] nvme nvme0: EH 1: before shutdown
> > [ 1526.793257] nvme nvme0: EH 1: after shutdown
> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
> > [ 1526.888206] nvme nvme0: EH 0: after recovery
> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216
> > to 0
> > [ 1526.947520] print_req_error: 1 callbacks suppressed
> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector
> > 794920
> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector
> > 569328
> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
> > 1234608
> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector
> > 389296
> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector
> > 712432
> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector
> > 889304
> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector
> > 205776
> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector
> > 126480
> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
> > 1601232
> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
> > 1234360
> > [ 1526.947745] Pid 683(fio) over core_pipe_limit
> > [ 1526.947746] Skipping core dump
> > [ 1526.947747] Pid 675(fio) over core_pipe_limit
> > [ 1526.947748] Skipping core dump
> > [ 1526.947863] Pid 672(fio) over core_pipe_limit
> > [ 1526.947863] Skipping core dump
> > [ 1526.947865] Pid 674(fio) over core_pipe_limit
> > [ 1526.947866] Skipping core dump
> > [ 1526.947870] Pid 676(fio) over core_pipe_limit
> > [ 1526.947871] Pid 679(fio) over core_pipe_limit
> > [ 1526.947872] Skipping core dump
> > [ 1526.947872] Skipping core dump
> > [ 1526.948197] Pid 677(fio) over core_pipe_limit
> > [ 1526.948197] Skipping core dump
> > [ 1526.948245] Pid 686(fio) over core_pipe_limit
> > [ 1526.948245] Skipping core dump
> > [ 1526.974610] Pid 680(fio) over core_pipe_limit
> > [ 1526.974611] Pid 684(fio) over core_pipe_limit
> > [ 1526.974611] Skipping core dump
> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
> > [ 1526.980385] nvme nvme0: EH 1: after recovery
> > [ 1526.980477] Pid 687(fio) over core_pipe_limit
> > [ 1526.980478] Skipping core dump
> > [ 1527.858207] Skipping core dump
> > 
> > And leaves me looping here
> > 
> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
> > 120
> > seconds.
> > [ 1721.311263]???????Tainted: G??????????I???????4.17.0-
> > rc3.ming.nvme+
> > #1
> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 1721.392957] kworker/u66:0???D????0 24214??????2 0x80000080
> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
> > [ 1721.458568] Call Trace:
> > [ 1721.472499]??? __schedule+0x290/0x870
> > [ 1721.493515]??schedule+0x32/0x80
> > [ 1721.511656]??blk_mq_freeze_queue_wait+0x46/0xb0
> > [ 1721.537609]??? remove_wait_queue+0x60/0x60
> > [ 1721.561081]??blk_cleanup_queue+0x7e/0x180
> > [ 1721.584637]??nvme_ns_remove+0x106/0x140 [nvme_core]
> > [ 1721.612589]??nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
> > [ 1721.643163]??nvme_remove+0x80/0x120 [nvme]
> > [ 1721.666188]??pci_device_remove+0x3b/0xc0
> > [ 1721.688553]??device_release_driver_internal+0x148/0x220
> > [ 1721.719332]??nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
> > [ 1721.750474]??process_one_work+0x158/0x360
> > [ 1721.772632]??worker_thread+0x47/0x3e0
> > [ 1721.792471]??kthread+0xf8/0x130
> > [ 1721.810354]??? max_active_store+0x80/0x80
> > [ 1721.832459]??? kthread_bind+0x10/0x10
> > [ 1721.852845]??ret_from_fork+0x35/0x40
> > 
> > Did I di something wrong
> > 
> > I never set anything else, the nvme0n1 was not mounted etc.
> 
> Hi Laurence,
> 
> Thanks for your test!
> 
> Could you run the following V5(not posted yest) and see if
> the issues you triggered can be fixed? If not, please provide
> me the dmesg log.
> 
> https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5
> 
> BTW, the main change is on handling reset failure, in V5, only
> the failure from top EH is handled.
> 
> Thanks,
> Ming

Hello Ming

Seems better , had a failure on first test but no panics. 
Following tests have all passed

root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[failed]
????runtime??41.790s??...??79.184s
????--- tests/block/011.out	2018-05-05 18:01:14.268414752 -0400
????+++ results/nvme0n1/block/011.out.bad	2018-05-10
17:48:34.792080746 -0400
????@@ -1,2 +1,3 @@
?????Running block/011
????+tests/block/011: line 47: echo: write error: Input/output error
?????Test complete
[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[passed]
????runtime??79.184s??...??42.196s
[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[passed]
????runtime??42.196s??...??41.390s
[root at segstorage1 blktests]# ./check block/011
block/011 => nvme0n1 (disable PCI device while doing I/O)????[passed]
????runtime??41.390s??...??42.193s

Kernel 4.17.0-rc3.ming.v5+ on an x86_64

segstorage1 login: [??631.297687] run blktests block/011 at 2018-05-10
17:47:15
[??661.951541] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??661.990218] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??661.990257] nvme nvme0: EH 0: before shutdown
[??662.031388] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031395] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031398] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031402] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031405] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031409] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031412] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031416] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.031420] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.436080] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.477826] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.519368] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.560755] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.602456] nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x10
[??662.657904] nvme nvme0: EH 0: after shutdown
[??668.730405] nvme nvme0: EH 0: after recovery 0
[??738.859987] run blktests block/011 at 2018-05-10 17:49:03
[??810.586431] run blktests block/011 at 2018-05-10 17:50:14
[ 1065.694108] run blktests block/011 at 2018-05-10 17:54:29

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
  2018-05-10 21:59       ` Laurence Oberman
@ 2018-05-10 22:10         ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 22:10 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Ming Lei, Keith Busch, Jens Axboe, linux-block, Jianchao Wang,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

Hi Laurence,

Great thanks for your so quick test!

On Fri, May 11, 2018 at 5:59 AM, Laurence Oberman <loberman@redhat.com> wrote:
> On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote:
>> On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote:
>> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote:
>> > > Hi,
>> > >
>> > > The 1st patch introduces blk_quiesce_timeout() and
>> > > blk_unquiesce_timeout()
>> > > for NVMe, meantime fixes blk_sync_queue().
>> > >
>> > > The 2nd patch covers timeout for admin commands for recovering
>> > > controller
>> > > for avoiding possible deadlock.
>> > >
>> > > The 3rd and 4th patches avoid to wait_freeze on queues which
>> > > aren't
>> > > frozen.
>> > >
>> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
>> > > and
>> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
>> > > timeout
>> > > mecanism become much more rebost than before.
>> > >
>> > > gitweb:
>> > >   https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
>> > >
>> > > V4:
>> > >   - fixe nvme_init_set_host_mem_cmd()
>> > >   - use nested EH model, and run both nvme_dev_disable() and
>> > >   resetting in one same context
>> > >
>> > > V3:
>> > >   - fix one new race related freezing in patch 4,
>> > > nvme_reset_work()
>> > >   may hang forever without this patch
>> > >   - rewrite the last 3 patches, and avoid to break
>> > > nvme_reset_ctrl*()
>> > >
>> > > V2:
>> > >   - fix draining timeout work, so no need to change return value
>> > > from
>> > >   .timeout()
>> > >   - fix race between nvme_start_freeze() and nvme_unfreeze()
>> > >   - cover timeout for admin commands running in EH
>> > >
>> > > Ming Lei (7):
>> > >   block: introduce blk_quiesce_timeout() and
>> > > blk_unquiesce_timeout()
>> > >   nvme: pci: cover timeout for admin commands running in EH
>> > >   nvme: pci: only wait freezing if queue is frozen
>> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
>> > >     recovery
>> > >   nvme: core: introduce 'reset_lock' for sync reset state and
>> > > reset
>> > >     activities
>> > >   nvme: pci: prepare for supporting error recovery from resetting
>> > >     context
>> > >   nvme: pci: support nested EH
>> > >
>> > >  block/blk-core.c         |  21 +++-
>> > >  block/blk-mq.c           |   9 ++
>> > >  block/blk-timeout.c      |   5 +-
>> > >  drivers/nvme/host/core.c |  46 ++++++-
>> > >  drivers/nvme/host/nvme.h |   5 +
>> > >  drivers/nvme/host/pci.c  | 304
>> > > ++++++++++++++++++++++++++++++++++++++++-------
>> > >  include/linux/blkdev.h   |  13 ++
>> > >  7 files changed, 356 insertions(+), 47 deletions(-)
>> > >
>> > > Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
>> > > Cc: Christoph Hellwig <hch@lst.de>
>> > > Cc: Sagi Grimberg <sagi@grimberg.me>
>> > > Cc: linux-nvme@lists.infradead.org
>> > > Cc: Laurence Oberman <loberman@redhat.com>
>> >
>> > Hello Ming
>> >
>> > I have a two node NUMA system here running your kernel tree
>> > 4.17.0-rc3.ming.nvme+
>> >
>> > [root@segstorage1 ~]# numactl --hardware
>> > available: 2 nodes (0-1)
>> > node 0 cpus: 0 3 5 6 8 11 13 14
>> > node 0 size: 63922 MB
>> > node 0 free: 61310 MB
>> > node 1 cpus: 1 2 4 7 9 10 12 15
>> > node 1 size: 64422 MB
>> > node 1 free: 62372 MB
>> > node distances:
>> > node   0   1
>> >   0:  10  20
>> >   1:  20  10
>> >
>> > I ran block/011
>> >
>> > [root@segstorage1 blktests]# ./check block/011
>> > block/011 => nvme0n1 (disable PCI device while doing
>> > I/O)    [failed]
>> >     runtime    ...  106.936s
>> >     --- tests/block/011.out 2018-05-05 18:01:14.268414752
>> > -0400
>> >     +++ results/nvme0n1/block/011.out.bad   2018-05-05
>> > 19:07:21.028634858 -0400
>> >     @@ -1,2 +1,36 @@
>> >      Running block/011
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     ...
>> >     (Run 'diff -u tests/block/011.out
>> > results/nvme0n1/block/011.out.bad' to see the entire diff)
>> >
>> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
>> > [ 1452.676351] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.718221] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
>> > [ 1452.760890] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760894] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760897] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760900] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760903] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760906] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760909] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760912] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760915] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760918] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760921] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760923] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760926] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1453.330251] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1453.391713] nvme nvme0: EH 0: after shutdown
>> > [ 1456.804695] device-mapper: multipath: Failing path 259:0.
>> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
>> > [ 1526.754335] nvme nvme0: EH 1: before shutdown
>> > [ 1526.793257] nvme nvme0: EH 1: after shutdown
>> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
>> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
>> > [ 1526.888206] nvme nvme0: EH 0: after recovery
>> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216
>> > to 0
>> > [ 1526.947520] print_req_error: 1 callbacks suppressed
>> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector
>> > 794920
>> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector
>> > 569328
>> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
>> > 1234608
>> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector
>> > 389296
>> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector
>> > 712432
>> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector
>> > 889304
>> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector
>> > 205776
>> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector
>> > 126480
>> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
>> > 1601232
>> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
>> > 1234360
>> > [ 1526.947745] Pid 683(fio) over core_pipe_limit
>> > [ 1526.947746] Skipping core dump
>> > [ 1526.947747] Pid 675(fio) over core_pipe_limit
>> > [ 1526.947748] Skipping core dump
>> > [ 1526.947863] Pid 672(fio) over core_pipe_limit
>> > [ 1526.947863] Skipping core dump
>> > [ 1526.947865] Pid 674(fio) over core_pipe_limit
>> > [ 1526.947866] Skipping core dump
>> > [ 1526.947870] Pid 676(fio) over core_pipe_limit
>> > [ 1526.947871] Pid 679(fio) over core_pipe_limit
>> > [ 1526.947872] Skipping core dump
>> > [ 1526.947872] Skipping core dump
>> > [ 1526.948197] Pid 677(fio) over core_pipe_limit
>> > [ 1526.948197] Skipping core dump
>> > [ 1526.948245] Pid 686(fio) over core_pipe_limit
>> > [ 1526.948245] Skipping core dump
>> > [ 1526.974610] Pid 680(fio) over core_pipe_limit
>> > [ 1526.974611] Pid 684(fio) over core_pipe_limit
>> > [ 1526.974611] Skipping core dump
>> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
>> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
>> > [ 1526.980385] nvme nvme0: EH 1: after recovery
>> > [ 1526.980477] Pid 687(fio) over core_pipe_limit
>> > [ 1526.980478] Skipping core dump
>> > [ 1527.858207] Skipping core dump
>> >
>> > And leaves me looping here
>> >
>> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
>> > 120
>> > seconds.
>> > [ 1721.311263]       Tainted: G          I       4.17.0-
>> > rc3.ming.nvme+
>> > #1
>> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > [ 1721.392957] kworker/u66:0   D    0 24214      2 0x80000080
>> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
>> > [ 1721.458568] Call Trace:
>> > [ 1721.472499]  ? __schedule+0x290/0x870
>> > [ 1721.493515]  schedule+0x32/0x80
>> > [ 1721.511656]  blk_mq_freeze_queue_wait+0x46/0xb0
>> > [ 1721.537609]  ? remove_wait_queue+0x60/0x60
>> > [ 1721.561081]  blk_cleanup_queue+0x7e/0x180
>> > [ 1721.584637]  nvme_ns_remove+0x106/0x140 [nvme_core]
>> > [ 1721.612589]  nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
>> > [ 1721.643163]  nvme_remove+0x80/0x120 [nvme]
>> > [ 1721.666188]  pci_device_remove+0x3b/0xc0
>> > [ 1721.688553]  device_release_driver_internal+0x148/0x220
>> > [ 1721.719332]  nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
>> > [ 1721.750474]  process_one_work+0x158/0x360
>> > [ 1721.772632]  worker_thread+0x47/0x3e0
>> > [ 1721.792471]  kthread+0xf8/0x130
>> > [ 1721.810354]  ? max_active_store+0x80/0x80
>> > [ 1721.832459]  ? kthread_bind+0x10/0x10
>> > [ 1721.852845]  ret_from_fork+0x35/0x40
>> >
>> > Did I di something wrong
>> >
>> > I never set anything else, the nvme0n1 was not mounted etc.
>>
>> Hi Laurence,
>>
>> Thanks for your test!
>>
>> Could you run the following V5(not posted yest) and see if
>> the issues you triggered can be fixed? If not, please provide
>> me the dmesg log.
>>
>> https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5
>>
>> BTW, the main change is on handling reset failure, in V5, only
>> the failure from top EH is handled.
>>
>> Thanks,
>> Ming
>
> Hello Ming
>
> Seems better , had a failure on first test but no panics.
> Following tests have all passed
>
> root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
>     runtime  41.790s  ...  79.184s
>     --- tests/block/011.out     2018-05-05 18:01:14.268414752 -0400
>     +++ results/nvme0n1/block/011.out.bad       2018-05-10
> 17:48:34.792080746 -0400
>     @@ -1,2 +1,3 @@
>      Running block/011
>     +tests/block/011: line 47: echo: write error: Input/output error
>      Test complete

This is expected result, since it is possible for request completed
as error finally after retrying 5 times.

I will post out V5 soon.

> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
>     runtime  79.184s  ...  42.196s
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
>     runtime  42.196s  ...  41.390s
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
>     runtime  41.390s  ...  42.193s
>
> Kernel 4.17.0-rc3.ming.v5+ on an x86_64
>
> segstorage1 login: [  631.297687] run blktests block/011 at 2018-05-10
> 17:47:15
> [  661.951541] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  661.990218] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  661.990257] nvme nvme0: EH 0: before shutdown
> [  662.031388] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031395] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031398] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031402] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031405] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031409] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031412] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031416] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031420] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.436080] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.477826] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.519368] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.560755] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.602456] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.657904] nvme nvme0: EH 0: after shutdown
> [  668.730405] nvme nvme0: EH 0: after recovery 0
> [  738.859987] run blktests block/011 at 2018-05-10 17:49:03
> [  810.586431] run blktests block/011 at 2018-05-10 17:50:14
> [ 1065.694108] run blktests block/011 at 2018-05-10 17:54:29



Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
@ 2018-05-10 22:10         ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2018-05-10 22:10 UTC (permalink / raw)


Hi Laurence,

Great thanks for your so quick test!

On Fri, May 11, 2018@5:59 AM, Laurence Oberman <loberman@redhat.com> wrote:
> On Thu, 2018-05-10@18:28 +0800, Ming Lei wrote:
>> On Sat, May 05, 2018@07:11:33PM -0400, Laurence Oberman wrote:
>> > On Sat, 2018-05-05@21:58 +0800, Ming Lei wrote:
>> > > Hi,
>> > >
>> > > The 1st patch introduces blk_quiesce_timeout() and
>> > > blk_unquiesce_timeout()
>> > > for NVMe, meantime fixes blk_sync_queue().
>> > >
>> > > The 2nd patch covers timeout for admin commands for recovering
>> > > controller
>> > > for avoiding possible deadlock.
>> > >
>> > > The 3rd and 4th patches avoid to wait_freeze on queues which
>> > > aren't
>> > > frozen.
>> > >
>> > > The last 4 patches fixes several races wrt. NVMe timeout handler,
>> > > and
>> > > finally can make blktests block/011 passed. Meantime the NVMe PCI
>> > > timeout
>> > > mecanism become much more rebost than before.
>> > >
>> > > gitweb:
>> > >   https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4
>> > >
>> > > V4:
>> > >   - fixe nvme_init_set_host_mem_cmd()
>> > >   - use nested EH model, and run both nvme_dev_disable() and
>> > >   resetting in one same context
>> > >
>> > > V3:
>> > >   - fix one new race related freezing in patch 4,
>> > > nvme_reset_work()
>> > >   may hang forever without this patch
>> > >   - rewrite the last 3 patches, and avoid to break
>> > > nvme_reset_ctrl*()
>> > >
>> > > V2:
>> > >   - fix draining timeout work, so no need to change return value
>> > > from
>> > >   .timeout()
>> > >   - fix race between nvme_start_freeze() and nvme_unfreeze()
>> > >   - cover timeout for admin commands running in EH
>> > >
>> > > Ming Lei (7):
>> > >   block: introduce blk_quiesce_timeout() and
>> > > blk_unquiesce_timeout()
>> > >   nvme: pci: cover timeout for admin commands running in EH
>> > >   nvme: pci: only wait freezing if queue is frozen
>> > >   nvme: pci: freeze queue in nvme_dev_disable() in case of error
>> > >     recovery
>> > >   nvme: core: introduce 'reset_lock' for sync reset state and
>> > > reset
>> > >     activities
>> > >   nvme: pci: prepare for supporting error recovery from resetting
>> > >     context
>> > >   nvme: pci: support nested EH
>> > >
>> > >  block/blk-core.c         |  21 +++-
>> > >  block/blk-mq.c           |   9 ++
>> > >  block/blk-timeout.c      |   5 +-
>> > >  drivers/nvme/host/core.c |  46 ++++++-
>> > >  drivers/nvme/host/nvme.h |   5 +
>> > >  drivers/nvme/host/pci.c  | 304
>> > > ++++++++++++++++++++++++++++++++++++++++-------
>> > >  include/linux/blkdev.h   |  13 ++
>> > >  7 files changed, 356 insertions(+), 47 deletions(-)
>> > >
>> > > Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
>> > > Cc: Christoph Hellwig <hch at lst.de>
>> > > Cc: Sagi Grimberg <sagi at grimberg.me>
>> > > Cc: linux-nvme at lists.infradead.org
>> > > Cc: Laurence Oberman <loberman at redhat.com>
>> >
>> > Hello Ming
>> >
>> > I have a two node NUMA system here running your kernel tree
>> > 4.17.0-rc3.ming.nvme+
>> >
>> > [root at segstorage1 ~]# numactl --hardware
>> > available: 2 nodes (0-1)
>> > node 0 cpus: 0 3 5 6 8 11 13 14
>> > node 0 size: 63922 MB
>> > node 0 free: 61310 MB
>> > node 1 cpus: 1 2 4 7 9 10 12 15
>> > node 1 size: 64422 MB
>> > node 1 free: 62372 MB
>> > node distances:
>> > node   0   1
>> >   0:  10  20
>> >   1:  20  10
>> >
>> > I ran block/011
>> >
>> > [root at segstorage1 blktests]# ./check block/011
>> > block/011 => nvme0n1 (disable PCI device while doing
>> > I/O)    [failed]
>> >     runtime    ...  106.936s
>> >     --- tests/block/011.out 2018-05-05 18:01:14.268414752
>> > -0400
>> >     +++ results/nvme0n1/block/011.out.bad   2018-05-05
>> > 19:07:21.028634858 -0400
>> >     @@ -1,2 +1,36 @@
>> >      Running block/011
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags &
>> > IO_U_F_FLIGHT) == 0' failed.
>> >     ...
>> >     (Run 'diff -u tests/block/011.out
>> > results/nvme0n1/block/011.out.bad' to see the entire diff)
>> >
>> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34
>> > [ 1452.676351] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.718221] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.718239] nvme nvme0: EH 0: before shutdown
>> > [ 1452.760890] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760894] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760897] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760900] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760903] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760906] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760909] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760912] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760915] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760918] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760921] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760923] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1452.760926] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1453.330251] nvme nvme0: controller is down; will reset:
>> > CSTS=0x3,
>> > PCI_STATUS=0x10
>> > [ 1453.391713] nvme nvme0: EH 0: after shutdown
>> > [ 1456.804695] device-mapper: multipath: Failing path 259:0.
>> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller
>> > [ 1526.754335] nvme nvme0: EH 1: before shutdown
>> > [ 1526.793257] nvme nvme0: EH 1: after shutdown
>> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4)
>> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5
>> > [ 1526.888206] nvme nvme0: EH 0: after recovery
>> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216
>> > to 0
>> > [ 1526.947520] print_req_error: 1 callbacks suppressed
>> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector
>> > 794920
>> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector
>> > 569328
>> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector
>> > 1234608
>> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector
>> > 389296
>> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector
>> > 712432
>> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector
>> > 889304
>> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector
>> > 205776
>> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector
>> > 126480
>> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector
>> > 1601232
>> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector
>> > 1234360
>> > [ 1526.947745] Pid 683(fio) over core_pipe_limit
>> > [ 1526.947746] Skipping core dump
>> > [ 1526.947747] Pid 675(fio) over core_pipe_limit
>> > [ 1526.947748] Skipping core dump
>> > [ 1526.947863] Pid 672(fio) over core_pipe_limit
>> > [ 1526.947863] Skipping core dump
>> > [ 1526.947865] Pid 674(fio) over core_pipe_limit
>> > [ 1526.947866] Skipping core dump
>> > [ 1526.947870] Pid 676(fio) over core_pipe_limit
>> > [ 1526.947871] Pid 679(fio) over core_pipe_limit
>> > [ 1526.947872] Skipping core dump
>> > [ 1526.947872] Skipping core dump
>> > [ 1526.948197] Pid 677(fio) over core_pipe_limit
>> > [ 1526.948197] Skipping core dump
>> > [ 1526.948245] Pid 686(fio) over core_pipe_limit
>> > [ 1526.948245] Skipping core dump
>> > [ 1526.974610] Pid 680(fio) over core_pipe_limit
>> > [ 1526.974611] Pid 684(fio) over core_pipe_limit
>> > [ 1526.974611] Skipping core dump
>> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING
>> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19
>> > [ 1526.980385] nvme nvme0: EH 1: after recovery
>> > [ 1526.980477] Pid 687(fio) over core_pipe_limit
>> > [ 1526.980478] Skipping core dump
>> > [ 1527.858207] Skipping core dump
>> >
>> > And leaves me looping here
>> >
>> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than
>> > 120
>> > seconds.
>> > [ 1721.311263]       Tainted: G          I       4.17.0-
>> > rc3.ming.nvme+
>> > #1
>> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > [ 1721.392957] kworker/u66:0   D    0 24214      2 0x80000080
>> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
>> > [ 1721.458568] Call Trace:
>> > [ 1721.472499]  ? __schedule+0x290/0x870
>> > [ 1721.493515]  schedule+0x32/0x80
>> > [ 1721.511656]  blk_mq_freeze_queue_wait+0x46/0xb0
>> > [ 1721.537609]  ? remove_wait_queue+0x60/0x60
>> > [ 1721.561081]  blk_cleanup_queue+0x7e/0x180
>> > [ 1721.584637]  nvme_ns_remove+0x106/0x140 [nvme_core]
>> > [ 1721.612589]  nvme_remove_namespaces+0x8e/0xd0 [nvme_core]
>> > [ 1721.643163]  nvme_remove+0x80/0x120 [nvme]
>> > [ 1721.666188]  pci_device_remove+0x3b/0xc0
>> > [ 1721.688553]  device_release_driver_internal+0x148/0x220
>> > [ 1721.719332]  nvme_remove_dead_ctrl_work+0x29/0x40 [nvme]
>> > [ 1721.750474]  process_one_work+0x158/0x360
>> > [ 1721.772632]  worker_thread+0x47/0x3e0
>> > [ 1721.792471]  kthread+0xf8/0x130
>> > [ 1721.810354]  ? max_active_store+0x80/0x80
>> > [ 1721.832459]  ? kthread_bind+0x10/0x10
>> > [ 1721.852845]  ret_from_fork+0x35/0x40
>> >
>> > Did I di something wrong
>> >
>> > I never set anything else, the nvme0n1 was not mounted etc.
>>
>> Hi Laurence,
>>
>> Thanks for your test!
>>
>> Could you run the following V5(not posted yest) and see if
>> the issues you triggered can be fixed? If not, please provide
>> me the dmesg log.
>>
>> https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5
>>
>> BTW, the main change is on handling reset failure, in V5, only
>> the failure from top EH is handled.
>>
>> Thanks,
>> Ming
>
> Hello Ming
>
> Seems better , had a failure on first test but no panics.
> Following tests have all passed
>
> root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
>     runtime  41.790s  ...  79.184s
>     --- tests/block/011.out     2018-05-05 18:01:14.268414752 -0400
>     +++ results/nvme0n1/block/011.out.bad       2018-05-10
> 17:48:34.792080746 -0400
>     @@ -1,2 +1,3 @@
>      Running block/011
>     +tests/block/011: line 47: echo: write error: Input/output error
>      Test complete

This is expected result, since it is possible for request completed
as error finally after retrying 5 times.

I will post out V5 soon.

> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
>     runtime  79.184s  ...  42.196s
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
>     runtime  42.196s  ...  41.390s
> [root at segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)    [passed]
>     runtime  41.390s  ...  42.193s
>
> Kernel 4.17.0-rc3.ming.v5+ on an x86_64
>
> segstorage1 login: [  631.297687] run blktests block/011 at 2018-05-10
> 17:47:15
> [  661.951541] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  661.990218] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  661.990257] nvme nvme0: EH 0: before shutdown
> [  662.031388] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031395] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031398] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031402] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031405] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031409] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031412] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031416] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.031420] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.436080] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.477826] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.519368] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.560755] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.602456] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [  662.657904] nvme nvme0: EH 0: after shutdown
> [  668.730405] nvme nvme0: EH 0: after recovery 0
> [  738.859987] run blktests block/011 at 2018-05-10 17:49:03
> [  810.586431] run blktests block/011 at 2018-05-10 17:50:14
> [ 1065.694108] run blktests block/011 at 2018-05-10 17:54:29



Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-05-10 22:10 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-05 13:58 [PATCH V4 0/7] nvme: pci: fix & improve timeout handling Ming Lei
2018-05-05 13:58 ` Ming Lei
2018-05-05 13:58 ` [PATCH V4 1/7] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() Ming Lei
2018-05-05 13:58   ` Ming Lei
2018-05-10 15:01   ` Bart Van Assche
2018-05-10 15:01     ` Bart Van Assche
2018-05-10 21:00     ` Ming Lei
2018-05-10 21:00       ` Ming Lei
2018-05-05 13:59 ` [PATCH V4 2/7] nvme: pci: cover timeout for admin commands running in EH Ming Lei
2018-05-05 13:59   ` Ming Lei
2018-05-05 13:59 ` [PATCH V4 3/7] nvme: pci: only wait freezing if queue is frozen Ming Lei
2018-05-05 13:59   ` Ming Lei
2018-05-05 13:59 ` [PATCH V4 4/7] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery Ming Lei
2018-05-05 13:59   ` Ming Lei
2018-05-05 13:59 ` [PATCH V4 5/7] nvme: core: introduce 'reset_lock' for sync reset state and reset activities Ming Lei
2018-05-05 13:59   ` Ming Lei
2018-05-05 13:59 ` [PATCH V4 6/7] nvme: pci: prepare for supporting error recovery from resetting context Ming Lei
2018-05-05 13:59   ` Ming Lei
2018-05-07 15:04   ` James Smart
2018-05-07 15:04     ` James Smart
2018-05-10 20:53     ` Ming Lei
2018-05-10 20:53       ` Ming Lei
2018-05-05 13:59 ` [PATCH V4 7/7] nvme: pci: support nested EH Ming Lei
2018-05-05 13:59   ` Ming Lei
2018-05-05 23:11 ` [PATCH V4 0/7] nvme: pci: fix & improve timeout handling Laurence Oberman
2018-05-05 23:11   ` Laurence Oberman
2018-05-05 23:31   ` Laurence Oberman
2018-05-05 23:31     ` Laurence Oberman
2018-05-05 23:51     ` Laurence Oberman
2018-05-05 23:51       ` Laurence Oberman
2018-05-08 15:09       ` Keith Busch
2018-05-08 15:09         ` Keith Busch
2018-05-10 10:28   ` Ming Lei
2018-05-10 10:28     ` Ming Lei
2018-05-10 21:59     ` Laurence Oberman
2018-05-10 21:59       ` Laurence Oberman
2018-05-10 22:10       ` Ming Lei
2018-05-10 22:10         ` Ming Lei
2018-05-09  5:46 ` jianchao.wang
2018-05-09  5:46   ` jianchao.wang
2018-05-10  2:09   ` Ming Lei
2018-05-10  2:09     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.