All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] NVMe: Make surprise removal work again
@ 2016-01-25 21:23 Keith Busch
  2016-01-25 21:23 ` [PATCH 2/2] NVMe: Fix io incapable return values Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Keith Busch @ 2016-01-25 21:23 UTC (permalink / raw)


Ends all IO on disk removal when the controller can't respond. For
device failure or surprise removal, the driver ends new requests after
disabling the controller and setting the queue to dying. The h/w queue
is restarted to flush pending commands so they can be failed.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 19 ++++++++++++++-----
 drivers/nvme/host/pci.c  | 13 +++++++++++++
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c5bf001..37815c9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1178,6 +1178,13 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	kfree(ns);
 }
 
+static void __nvme_start_queue_locked(struct nvme_ns *ns)
+{
+	queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
+	blk_mq_start_stopped_hw_queues(ns->queue, true);
+	blk_mq_kick_requeue_list(ns->queue);
+}
+
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
 	bool kill = nvme_io_incapable(ns->ctrl) &&
@@ -1187,15 +1194,20 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	if (kill) {
 		blk_set_queue_dying(ns->queue);
+		mb();
 
 		/*
 		 * The controller was shutdown first if we got here through
 		 * device removal. The shutdown may requeue outstanding
 		 * requests. These need to be aborted immediately so
 		 * del_gendisk doesn't block indefinitely for their completion.
+		 * The queue needs to be restarted to let pending requests
+		 * fail.
 		 */
 		blk_mq_abort_requeue_list(ns->queue);
+		__nvme_start_queue_locked(ns);
 	}
+
 	if (ns->disk->flags & GENHD_FL_UP) {
 		if (blk_get_integrity(ns->disk))
 			blk_integrity_unregister(ns->disk);
@@ -1424,11 +1436,8 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
-		blk_mq_kick_requeue_list(ns->queue);
-	}
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		__nvme_start_queue_locked(ns);
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 72ef832..bdf148e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -640,6 +640,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_command cmnd;
 	int ret = BLK_MQ_RQ_QUEUE_OK;
 
+	if (unlikely(blk_queue_dying(req->q))) {
+		blk_mq_end_request(req, -EIO);
+		return BLK_MQ_RQ_QUEUE_OK;
+	}
 	/*
 	 * If formated with metadata, require the block layer provide a buffer
 	 * unless this namespace is formated such that the metadata can be
@@ -2118,6 +2122,15 @@ static void nvme_remove(struct pci_dev *pdev)
 	pci_set_drvdata(pdev, NULL);
 	flush_work(&dev->reset_work);
 	flush_work(&dev->scan_work);
+
+	/*
+	 * If the controller can't do IO (surprise removal, for example), we
+	 * need to quiesce prior to deleting namespaces. This ends outstanding
+	 * requests and prevents attempts to sync dirty data.
+	 */
+	if (nvme_io_incapable(&dev->ctrl))
+		nvme_dev_disable(dev, true);
+
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_uninit_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, true);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] NVMe: Fix io incapable return values
  2016-01-25 21:23 [PATCH 1/2] NVMe: Make surprise removal work again Keith Busch
@ 2016-01-25 21:23 ` Keith Busch
  2016-01-26 16:54   ` Christoph Hellwig
  2016-01-26 16:53 ` [PATCH 1/2] NVMe: Make surprise removal work again Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-01-25 21:23 UTC (permalink / raw)


The function returns true when the controller can't handle IO.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/nvme.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4fb5bb7..9664d07 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -139,9 +139,9 @@ static inline bool nvme_io_incapable(struct nvme_ctrl *ctrl)
 	u32 val = 0;
 
 	if (ctrl->ops->io_incapable(ctrl))
-		return false;
+		return true;
 	if (ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &val))
-		return false;
+		return true;
 	return val & NVME_CSTS_CFS;
 }
 
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-25 21:23 [PATCH 1/2] NVMe: Make surprise removal work again Keith Busch
  2016-01-25 21:23 ` [PATCH 2/2] NVMe: Fix io incapable return values Keith Busch
@ 2016-01-26 16:53 ` Christoph Hellwig
  2016-01-26 23:59   ` Keith Busch
  2016-01-27 11:47 ` Sagi Grimberg
  2016-02-01 15:01 ` Wenbo Wang
  3 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-26 16:53 UTC (permalink / raw)


> +static void __nvme_start_queue_locked(struct nvme_ns *ns)
> +{
> +	queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
> +	blk_mq_start_stopped_hw_queues(ns->queue, true);
> +	blk_mq_kick_requeue_list(ns->queue);
> +}

This almost screams for a block level helper.

> +
>  static void nvme_ns_remove(struct nvme_ns *ns)
>  {
>  	bool kill = nvme_io_incapable(ns->ctrl) &&
> @@ -1187,15 +1194,20 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>  
>  	if (kill) {
>  		blk_set_queue_dying(ns->queue);
> +		mb();

Please always comment memory barriers, I can't really make any sense
of this one.

>  		/*
>  		 * The controller was shutdown first if we got here through
>  		 * device removal. The shutdown may requeue outstanding
>  		 * requests. These need to be aborted immediately so
>  		 * del_gendisk doesn't block indefinitely for their completion.
> +		 * The queue needs to be restarted to let pending requests
> +		 * fail.
>  		 */
>  		blk_mq_abort_requeue_list(ns->queue);
> +		__nvme_start_queue_locked(ns);

Also why do we need a kick of the requeue list above if we just aborted
all requests on it?

> index 72ef832..bdf148e 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -640,6 +640,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct nvme_command cmnd;
>  	int ret = BLK_MQ_RQ_QUEUE_OK;
>  
> +	if (unlikely(blk_queue_dying(req->q))) {
> +		blk_mq_end_request(req, -EIO);
> +		return BLK_MQ_RQ_QUEUE_OK;
> +	}

Seems like this is something blk-mq should be doing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] NVMe: Fix io incapable return values
  2016-01-25 21:23 ` [PATCH 2/2] NVMe: Fix io incapable return values Keith Busch
@ 2016-01-26 16:54   ` Christoph Hellwig
  2016-01-26 18:11     ` Ming Lin
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-26 16:54 UTC (permalink / raw)


On Mon, Jan 25, 2016@02:23:37PM -0700, Keith Busch wrote:
> The function returns true when the controller can't handle IO.

Ooops.  I though I had actually seen a fix for this a short while
ago, not sure why that got missed.

Acked-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] NVMe: Fix io incapable return values
  2016-01-26 16:54   ` Christoph Hellwig
@ 2016-01-26 18:11     ` Ming Lin
  2016-01-27 11:19       ` Sagi Grimberg
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lin @ 2016-01-26 18:11 UTC (permalink / raw)


On Tue, Jan 26, 2016@8:54 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Jan 25, 2016@02:23:37PM -0700, Keith Busch wrote:
>> The function returns true when the controller can't handle IO.
>
> Ooops.  I though I had actually seen a fix for this a short while
> ago, not sure why that got missed.

Yes, you saw that. It is in NVMeOF driver.

>
> Acked-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-26 16:53 ` [PATCH 1/2] NVMe: Make surprise removal work again Christoph Hellwig
@ 2016-01-26 23:59   ` Keith Busch
  0 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2016-01-26 23:59 UTC (permalink / raw)


On Tue, Jan 26, 2016@08:53:34AM -0800, Christoph Hellwig wrote:
> >  	if (kill) {
> >  		blk_set_queue_dying(ns->queue);
> > +		mb();
> 
> Please always comment memory barriers, I can't really make any sense
> of this one.

I meant to remove this before posting. It was for an error I thought
occured when testing the patch, but didn't confirm: did a h/w queue
restart and not see the dying flag in nvme_queue_rq? I don't think that's
what happened, but left that in by mistake.

I'll respin.

> >  		blk_mq_abort_requeue_list(ns->queue);
> > +		__nvme_start_queue_locked(ns);
> 
> Also why do we need a kick of the requeue list above if we just aborted
> all requests on it?

Just reusing existing code. We don't need to abort the requeue list
first. It's just a short cut to ending commands.

I don't think the queue restarting was necessary before since requests
used to wait on a frozen queue before failing. Now they wait on tags,
so need some way to flush them to failure.

> > +++ b/drivers/nvme/host/pci.c
> > @@ -640,6 +640,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  	struct nvme_command cmnd;
> >  	int ret = BLK_MQ_RQ_QUEUE_OK;
> >  
> > +	if (unlikely(blk_queue_dying(req->q))) {
> > +		blk_mq_end_request(req, -EIO);
> > +		return BLK_MQ_RQ_QUEUE_OK;
> > +	}
> 
> Seems like this is something blk-mq should be doing.

Sounds good to me.

I was a afraid to handle this at the generic layer since it could change
what other protocols see. It looks safe, though SCSI tracks device state
differently, and I've a bad record of breaking other drivers with block
layer "fixes" recently. :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] NVMe: Fix io incapable return values
  2016-01-26 18:11     ` Ming Lin
@ 2016-01-27 11:19       ` Sagi Grimberg
  0 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2016-01-27 11:19 UTC (permalink / raw)



>>> The function returns true when the controller can't handle IO.
>>
>> Ooops.  I though I had actually seen a fix for this a short while
>> ago, not sure why that got missed.
>
> Yes, you saw that. It is in NVMeOF driver.

Yea...

>> Acked-by: Christoph Hellwig <hch at lst.de>

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-25 21:23 [PATCH 1/2] NVMe: Make surprise removal work again Keith Busch
  2016-01-25 21:23 ` [PATCH 2/2] NVMe: Fix io incapable return values Keith Busch
  2016-01-26 16:53 ` [PATCH 1/2] NVMe: Make surprise removal work again Christoph Hellwig
@ 2016-01-27 11:47 ` Sagi Grimberg
  2016-01-27 14:19   ` Keith Busch
  2016-02-01 15:01 ` Wenbo Wang
  3 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2016-01-27 11:47 UTC (permalink / raw)



> @@ -1187,15 +1194,20 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>
>   	if (kill) {
>   		blk_set_queue_dying(ns->queue);
> +		mb();
>
>   		/*
>   		 * The controller was shutdown first if we got here through
>   		 * device removal. The shutdown may requeue outstanding
>   		 * requests. These need to be aborted immediately so
>   		 * del_gendisk doesn't block indefinitely for their completion.
> +		 * The queue needs to be restarted to let pending requests
> +		 * fail.
>   		 */
>   		blk_mq_abort_requeue_list(ns->queue);
> +		__nvme_start_queue_locked(ns);

Why not making sure that all the pending requests are moved to
the requeue list before we even get here? call nvme_cancel_io on
pending requests which would either fail the requests (blk_queue_dying)
or move them to the requeue list?

> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 72ef832..bdf148e 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -640,6 +640,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	struct nvme_command cmnd;
>   	int ret = BLK_MQ_RQ_QUEUE_OK;
>
> +	if (unlikely(blk_queue_dying(req->q))) {
> +		blk_mq_end_request(req, -EIO);
> +		return BLK_MQ_RQ_QUEUE_OK;
> +	}

This is something we should try our best to move away from IMO...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-27 11:47 ` Sagi Grimberg
@ 2016-01-27 14:19   ` Keith Busch
  2016-01-28 14:48     ` Sagi Grimberg
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-01-27 14:19 UTC (permalink / raw)


On Wed, Jan 27, 2016@01:47:45PM +0200, Sagi Grimberg wrote:
> >  		blk_mq_abort_requeue_list(ns->queue);
> >+		__nvme_start_queue_locked(ns);
> 
> Why not making sure that all the pending requests are moved to
> the requeue list before we even get here? call nvme_cancel_io on
> pending requests which would either fail the requests (blk_queue_dying)
> or move them to the requeue list?

That works only for active requests. There could be processes that entered
the queue and waiting for request tags to become available. These need
to be flushed to completion somehow ... maybe they shouldn't even succeed
in getting a request on a dying queue?

> >+	if (unlikely(blk_queue_dying(req->q))) {
> >+		blk_mq_end_request(req, -EIO);
> >+		return BLK_MQ_RQ_QUEUE_OK;
> >+	}
> 
> This is something we should try our best to move away from IMO...

Agreed, though I think this is better than relying on queue freeze that
used to happen.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-27 14:19   ` Keith Busch
@ 2016-01-28 14:48     ` Sagi Grimberg
  2016-01-28 14:55       ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2016-01-28 14:48 UTC (permalink / raw)



>>>   		blk_mq_abort_requeue_list(ns->queue);
>>> +		__nvme_start_queue_locked(ns);
>>
>> Why not making sure that all the pending requests are moved to
>> the requeue list before we even get here? call nvme_cancel_io on
>> pending requests which would either fail the requests (blk_queue_dying)
>> or move them to the requeue list?
>
> That works only for active requests. There could be processes that entered
> the queue and waiting for request tags to become available. These need
> to be flushed to completion somehow ... maybe they shouldn't even succeed
> in getting a request on a dying queue?

Isn't that what blk_mq_wake_waiters() is for?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-28 14:48     ` Sagi Grimberg
@ 2016-01-28 14:55       ` Keith Busch
  2016-01-28 15:20         ` Sagi Grimberg
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-01-28 14:55 UTC (permalink / raw)


On Thu, Jan 28, 2016@04:48:16PM +0200, Sagi Grimberg wrote:
> >That works only for active requests. There could be processes that entered
> >the queue and waiting for request tags to become available. These need
> >to be flushed to completion somehow ... maybe they shouldn't even succeed
> >in getting a request on a dying queue?
> 
> Isn't that what blk_mq_wake_waiters() is for?

Yes, that'll wake processes waiting on tags, but not to their
demise. They'll get a tag and send it to the h/w context. We need to
start these back up so the driver can end those requests, or we need to
do something to end them at the block layer. It sounds like developers
are preferring the latter.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-28 14:55       ` Keith Busch
@ 2016-01-28 15:20         ` Sagi Grimberg
  0 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2016-01-28 15:20 UTC (permalink / raw)



>> Isn't that what blk_mq_wake_waiters() is for?
>
> Yes, that'll wake processes waiting on tags, but not to their
> demise. They'll get a tag and send it to the h/w context. We need to
> start these back up so the driver can end those requests, or we need to
> do something to end them at the block layer. It sounds like developers
> are preferring the latter.

Yes please.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-01-25 21:23 [PATCH 1/2] NVMe: Make surprise removal work again Keith Busch
                   ` (2 preceding siblings ...)
  2016-01-27 11:47 ` Sagi Grimberg
@ 2016-02-01 15:01 ` Wenbo Wang
  2016-02-01 15:27   ` Busch, Keith
       [not found]   ` <B58D82457FDA0744A320A2FC5AC253B93D3D7E57@fmsmsx104.amr.corp.intel.com>
  3 siblings, 2 replies; 17+ messages in thread
From: Wenbo Wang @ 2016-02-01 15:01 UTC (permalink / raw)


>		blk_mq_abort_requeue_list(ns->queue);
>+		__nvme_start_queue_locked(ns);

Why is the queue stopped? Is it due to unsuccessful device reset (since device is hot removed)?

-----Original Message-----
From: Linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf Of Keith Busch
Sent: Tuesday, January 26, 2016 5:24 AM
To: linux-nvme at lists.infradead.org; Jens Axboe
Cc: Christoph Hellwig; Keith Busch
Subject: [PATCH 1/2] NVMe: Make surprise removal work again

Ends all IO on disk removal when the controller can't respond. For device failure or surprise removal, the driver ends new requests after disabling the controller and setting the queue to dying. The h/w queue is restarted to flush pending commands so they can be failed.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 19 ++++++++++++++-----  drivers/nvme/host/pci.c  | 13 +++++++++++++
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index c5bf001..37815c9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1178,6 +1178,13 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	kfree(ns);
 }
 
+static void __nvme_start_queue_locked(struct nvme_ns *ns) {
+	queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
+	blk_mq_start_stopped_hw_queues(ns->queue, true);
+	blk_mq_kick_requeue_list(ns->queue);
+}
+
 static void nvme_ns_remove(struct nvme_ns *ns)  {
 	bool kill = nvme_io_incapable(ns->ctrl) && @@ -1187,15 +1194,20 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	if (kill) {
 		blk_set_queue_dying(ns->queue);
+		mb();
 
 		/*
 		 * The controller was shutdown first if we got here through
 		 * device removal. The shutdown may requeue outstanding
 		 * requests. These need to be aborted immediately so
 		 * del_gendisk doesn't block indefinitely for their completion.
+		 * The queue needs to be restarted to let pending requests
+		 * fail.
 		 */
 		blk_mq_abort_requeue_list(ns->queue);
+		__nvme_start_queue_locked(ns);
 	}
+
 	if (ns->disk->flags & GENHD_FL_UP) {
 		if (blk_get_integrity(ns->disk))
 			blk_integrity_unregister(ns->disk);
@@ -1424,11 +1436,8 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
-		blk_mq_kick_requeue_list(ns->queue);
-	}
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		__nvme_start_queue_locked(ns);
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 72ef832..bdf148e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -640,6 +640,10 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_command cmnd;
 	int ret = BLK_MQ_RQ_QUEUE_OK;
 
+	if (unlikely(blk_queue_dying(req->q))) {
+		blk_mq_end_request(req, -EIO);
+		return BLK_MQ_RQ_QUEUE_OK;
+	}
 	/*
 	 * If formated with metadata, require the block layer provide a buffer
 	 * unless this namespace is formated such that the metadata can be @@ -2118,6 +2122,15 @@ static void nvme_remove(struct pci_dev *pdev)
 	pci_set_drvdata(pdev, NULL);
 	flush_work(&dev->reset_work);
 	flush_work(&dev->scan_work);
+
+	/*
+	 * If the controller can't do IO (surprise removal, for example), we
+	 * need to quiesce prior to deleting namespaces. This ends outstanding
+	 * requests and prevents attempts to sync dirty data.
+	 */
+	if (nvme_io_incapable(&dev->ctrl))
+		nvme_dev_disable(dev, true);
+
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_uninit_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, true);
--
2.6.2.307.g37023ba


_______________________________________________
Linux-nvme mailing list
Linux-nvme at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-02-01 15:01 ` Wenbo Wang
@ 2016-02-01 15:27   ` Busch, Keith
       [not found]   ` <B58D82457FDA0744A320A2FC5AC253B93D3D7E57@fmsmsx104.amr.corp.intel.com>
  1 sibling, 0 replies; 17+ messages in thread
From: Busch, Keith @ 2016-02-01 15:27 UTC (permalink / raw)


> Subject: RE: [PATCH 1/2] NVMe: Make surprise removal work again
> 
> >		blk_mq_abort_requeue_list(ns->queue);
> >+		__nvme_start_queue_locked(ns);
> 
> Why is the queue stopped? Is it due to unsuccessful device reset (since device is hot removed)?

It could be from an unsuccessful reset if that's the path we took. We also could go straight to the removal case if the pci hotplug driver notifies the driver, in which case this patch will disables the controller first to halt new IO and reap outstanding requests. This also stops queues, so needed to restart it to flush out the pending requests.

Anyway, the direction I was given was to move the request ending to the block layer when we kill it, so this won't be necessary in the next revision (will be sent out today).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
       [not found]   ` <B58D82457FDA0744A320A2FC5AC253B93D3D7E57@fmsmsx104.amr.corp.intel.com>
@ 2016-02-02  1:17     ` Keith Busch
  2016-02-02  2:41       ` Wenbo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-02-02  1:17 UTC (permalink / raw)


On Mon, Feb 01, 2016@07:27:23AM -0800, Busch, Keith wrote:
> the direction I was given was to move the request ending to the block layer
> when we kill it, so this won't be necessary in the next revision (will be
> sent out today).

After merging and moving io ending to the block layer, something appears
broken. Not sure what's going on, so just posting here in case there's
better ideas.

The test runs buffered writes to an nvme drive (ex: dd if=/dev/zero
of=/dev/nvme0n1 bs=16M), then yank the drive when that ramps up.

Device removal completes after a few seconds, and /dev/nvme0n1 is no
longer present. However, the 'dd' task never completes, with kernel
stack trace:

[<ffffffff81093fb0>] __mod_timer+0xd4/0xe6
[<ffffffff810939bf>] process_timeout+0x0/0xc
[<ffffffff810fcad7>] balance_dirty_pages_ratelimited+0x8b1/0xa05
[<ffffffff8116a2a3>] __set_page_dirty.constprop.61+0x81/0x9f
[<ffffffff810f3457>] generic_perform_write+0x15a/0x1d1
[<ffffffff81157095>] generic_update_time+0x9f/0xaa
[<ffffffff810f4965>] __generic_file_write_iter+0xea/0x146
[<ffffffff8116d2a3>] blkdev_write_iter+0x78/0xf5
[<ffffffff811429b4>] __vfs_write+0x83/0xab
[<ffffffff81143cda>] vfs_write+0x87/0xdd
[<ffffffff81143ed7>] SyS_write+0x56/0x8a
[<ffffffff81472657>] entry_SYSCALL_64_fastpath+0x12/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff

If driver ends all IO's it knows about and blk_cleanup_queue returns,
then the driver did it's part as far as I know. Not sure how to get this
to end with the expected IO error, but I'm pretty sure this used to work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-02-02  1:17     ` Keith Busch
@ 2016-02-02  2:41       ` Wenbo Wang
  2016-02-02  5:29         ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Wenbo Wang @ 2016-02-02  2:41 UTC (permalink / raw)


Is it confirmed that blk_cleanup_queue has returned? It is likely the driver stuck in del_gendisk.

-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com] 
Sent: Tuesday, February 2, 2016 9:18 AM
To: Wenbo Wang; linux-nvme at lists.infradead.org; Jens Axboe
Cc: Christoph Hellwig
Subject: Re: [PATCH 1/2] NVMe: Make surprise removal work again

On Mon, Feb 01, 2016@07:27:23AM -0800, Busch, Keith wrote:
> the direction I was given was to move the request ending to the block 
> layer when we kill it, so this won't be necessary in the next revision 
> (will be sent out today).

After merging and moving io ending to the block layer, something appears broken. Not sure what's going on, so just posting here in case there's better ideas.

The test runs buffered writes to an nvme drive (ex: dd if=/dev/zero
of=/dev/nvme0n1 bs=16M), then yank the drive when that ramps up.

Device removal completes after a few seconds, and /dev/nvme0n1 is no longer present. However, the 'dd' task never completes, with kernel stack trace:

[<ffffffff81093fb0>] __mod_timer+0xd4/0xe6 [<ffffffff810939bf>] process_timeout+0x0/0xc [<ffffffff810fcad7>] balance_dirty_pages_ratelimited+0x8b1/0xa05
[<ffffffff8116a2a3>] __set_page_dirty.constprop.61+0x81/0x9f
[<ffffffff810f3457>] generic_perform_write+0x15a/0x1d1 [<ffffffff81157095>] generic_update_time+0x9f/0xaa [<ffffffff810f4965>] __generic_file_write_iter+0xea/0x146
[<ffffffff8116d2a3>] blkdev_write_iter+0x78/0xf5 [<ffffffff811429b4>] __vfs_write+0x83/0xab [<ffffffff81143cda>] vfs_write+0x87/0xdd [<ffffffff81143ed7>] SyS_write+0x56/0x8a [<ffffffff81472657>] entry_SYSCALL_64_fastpath+0x12/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff

If driver ends all IO's it knows about and blk_cleanup_queue returns, then the driver did it's part as far as I know. Not sure how to get this to end with the expected IO error, but I'm pretty sure this used to work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] NVMe: Make surprise removal work again
  2016-02-02  2:41       ` Wenbo Wang
@ 2016-02-02  5:29         ` Keith Busch
  0 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2016-02-02  5:29 UTC (permalink / raw)


On Tue, Feb 02, 2016@02:41:53AM +0000, Wenbo Wang wrote:
> Is it confirmed that blk_cleanup_queue has returned? It is likely the driver stuck in del_gendisk.

Yes, the entire driver's remove completes, which includes
blk_cleanup_queue and del_gendisk.

I don't know what the task is waiting for, but recall this worked
before. Might bisect.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-02-02  5:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-25 21:23 [PATCH 1/2] NVMe: Make surprise removal work again Keith Busch
2016-01-25 21:23 ` [PATCH 2/2] NVMe: Fix io incapable return values Keith Busch
2016-01-26 16:54   ` Christoph Hellwig
2016-01-26 18:11     ` Ming Lin
2016-01-27 11:19       ` Sagi Grimberg
2016-01-26 16:53 ` [PATCH 1/2] NVMe: Make surprise removal work again Christoph Hellwig
2016-01-26 23:59   ` Keith Busch
2016-01-27 11:47 ` Sagi Grimberg
2016-01-27 14:19   ` Keith Busch
2016-01-28 14:48     ` Sagi Grimberg
2016-01-28 14:55       ` Keith Busch
2016-01-28 15:20         ` Sagi Grimberg
2016-02-01 15:01 ` Wenbo Wang
2016-02-01 15:27   ` Busch, Keith
     [not found]   ` <B58D82457FDA0744A320A2FC5AC253B93D3D7E57@fmsmsx104.amr.corp.intel.com>
2016-02-02  1:17     ` Keith Busch
2016-02-02  2:41       ` Wenbo Wang
2016-02-02  5:29         ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.