All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] nvme: don't schedule multiple resets
@ 2016-10-05 20:32 Keith Busch
  2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Keith Busch @ 2016-10-05 20:32 UTC (permalink / raw)


The queue_work only fails if the work is pending, but not yet running. If
the work is running, the work item would get requeued, triggering a
double reset. If the first reset fails for any reason, the second
reset triggers:

	WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)

Hitting that schedules controller deletion for a second time, which
potentially takes a reference on the device that is being deleted.
If the reset occurs at the same time as a hot removal event, this causes
a double-free.

This patch has the reset helper function check if the work is busy
prior to queueing, and changes all places that schedule resets to use
this function. Since most users don't want to sync with that work, the
"flush_work" is moved to the only caller that wants to sync.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 47a44e9..108d301 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -892,7 +892,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
 		nvme_dev_disable(dev, false);
-		queue_work(nvme_workq, &dev->reset_work);
+		nvme_reset(dev);
 
 		/*
 		 * Mark the request as handled, since the inline shutdown
@@ -1290,7 +1290,7 @@ static void nvme_watchdog_timer(unsigned long data)
 
 	/* Skip controllers under certain specific conditions. */
 	if (nvme_should_reset(dev, csts)) {
-		if (queue_work(nvme_workq, &dev->reset_work))
+		if (!nvme_reset(dev))
 			dev_warn(dev->dev,
 				"Failed status: 0x%x, reset controller.\n",
 				csts);
@@ -1812,11 +1812,10 @@ static int nvme_reset(struct nvme_dev *dev)
 {
 	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
 		return -ENODEV;
-
+	if (work_busy(&dev->reset_work))
+		return -ENODEV;
 	if (!queue_work(nvme_workq, &dev->reset_work))
 		return -EBUSY;
-
-	flush_work(&dev->reset_work);
 	return 0;
 }
 
@@ -1840,7 +1839,12 @@ static int nvme_pci_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
 
 static int nvme_pci_reset_ctrl(struct nvme_ctrl *ctrl)
 {
-	return nvme_reset(to_nvme_dev(ctrl));
+	struct nvme_dev *dev = to_nvme_dev(ctrl);
+	int ret = nvme_reset(dev);
+
+	if (!ret)
+		flush_work(&dev->reset_work);
+	return ret;
 }
 
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
@@ -1934,7 +1938,7 @@ static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
 	if (prepare)
 		nvme_dev_disable(dev, false);
 	else
-		queue_work(nvme_workq, &dev->reset_work);
+		nvme_reset(dev);
 }
 
 static void nvme_shutdown(struct pci_dev *pdev)
@@ -2003,7 +2007,7 @@ static int nvme_resume(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	queue_work(nvme_workq, &ndev->reset_work);
+	nvme_reset(ndev);
 	return 0;
 }
 #endif
@@ -2042,7 +2046,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
 
 	dev_info(dev->ctrl.device, "restart after slot reset\n");
 	pci_restore_state(pdev);
-	queue_work(nvme_workq, &dev->reset_work);
+	nvme_reset(dev);
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: Delete created IO queues on reset
  2016-10-05 20:32 [PATCH 1/2] nvme: don't schedule multiple resets Keith Busch
@ 2016-10-05 20:32 ` Keith Busch
  2016-10-05 22:58   ` Sagi Grimberg
                     ` (2 more replies)
  2016-10-05 22:57 ` [PATCH 1/2] nvme: don't schedule multiple resets Sagi Grimberg
  2016-10-06  9:34 ` Christoph Hellwig
  2 siblings, 3 replies; 8+ messages in thread
From: Keith Busch @ 2016-10-05 20:32 UTC (permalink / raw)


Commit c21377f8 (Suspend all queues before deletion) decrements the
online queue count prior to our attempt to delete those IO queues, so
the driver ended up not having the controller delete any. This patch
uses the queue_count instead of online_queues.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 108d301..670f0cb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1512,7 +1512,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
 
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
-	int pass, queues = dev->online_queues - 1;
+	int pass, queues = dev->queue_count - 1;
 	unsigned long timeout;
 	u8 opcode = nvme_admin_delete_sq;
 
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 1/2] nvme: don't schedule multiple resets
  2016-10-05 20:32 [PATCH 1/2] nvme: don't schedule multiple resets Keith Busch
  2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
@ 2016-10-05 22:57 ` Sagi Grimberg
  2016-10-06  9:34 ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2016-10-05 22:57 UTC (permalink / raw)


Looks good,

Reviewed-by: Sagi Grimberg<sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: Delete created IO queues on reset
  2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
@ 2016-10-05 22:58   ` Sagi Grimberg
  2016-10-06  1:47   ` Gabriel Krisman Bertazi
  2016-10-06  9:32   ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2016-10-05 22:58 UTC (permalink / raw)


> Commit c21377f8 (Suspend all queues before deletion) decrements the
> online queue count prior to our attempt to delete those IO queues, so
> the driver ended up not having the controller delete any. This patch
> uses the queue_count instead of online_queues.

A fixes tag would be useful here.

Other than that looks good,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: Delete created IO queues on reset
  2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
  2016-10-05 22:58   ` Sagi Grimberg
@ 2016-10-06  1:47   ` Gabriel Krisman Bertazi
  2016-10-06  9:32   ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Gabriel Krisman Bertazi @ 2016-10-06  1:47 UTC (permalink / raw)


Keith Busch <keith.busch at intel.com> writes:

> Commit c21377f8 (Suspend all queues before deletion) decrements the
> online queue count prior to our attempt to delete those IO queues, so
> the driver ended up not having the controller delete any. This patch
> uses the queue_count instead of online_queues.

hmm, this is obviously correct.  Nice catch.  Shame I missed it :(

Reviewed-by: Gabriel Krisman Bertazi <krisman at linux.vnet.ibm.com>

>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 108d301..670f0cb 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1512,7 +1512,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
>
>  static void nvme_disable_io_queues(struct nvme_dev *dev)
>  {
> -	int pass, queues = dev->online_queues - 1;
> +	int pass, queues = dev->queue_count - 1;
>  	unsigned long timeout;
>  	u8 opcode = nvme_admin_delete_sq;

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: Delete created IO queues on reset
  2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
  2016-10-05 22:58   ` Sagi Grimberg
  2016-10-06  1:47   ` Gabriel Krisman Bertazi
@ 2016-10-06  9:32   ` Christoph Hellwig
  2016-10-06 15:23     ` Keith Busch
  2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2016-10-06  9:32 UTC (permalink / raw)


On Wed, Oct 05, 2016@04:32:46PM -0400, Keith Busch wrote:
> Commit c21377f8 (Suspend all queues before deletion) decrements the
> online queue count prior to our attempt to delete those IO queues, so
> the driver ended up not having the controller delete any. This patch
> uses the queue_count instead of online_queues.

What if not all queues were online before?  Should we take a
a snapshot of ->online_queues before suspending the queues and
then use that later?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] nvme: don't schedule multiple resets
  2016-10-05 20:32 [PATCH 1/2] nvme: don't schedule multiple resets Keith Busch
  2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
  2016-10-05 22:57 ` [PATCH 1/2] nvme: don't schedule multiple resets Sagi Grimberg
@ 2016-10-06  9:34 ` Christoph Hellwig
  2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-10-06  9:34 UTC (permalink / raw)


On Wed, Oct 05, 2016@04:32:45PM -0400, Keith Busch wrote:
> The queue_work only fails if the work is pending, but not yet running. If
> the work is running, the work item would get requeued, triggering a
> double reset. If the first reset fails for any reason, the second
> reset triggers:
> 
> 	WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)
> 
> Hitting that schedules controller deletion for a second time, which
> potentially takes a reference on the device that is being deleted.
> If the reset occurs at the same time as a hot removal event, this causes
> a double-free.
> 
> This patch has the reset helper function check if the work is busy
> prior to queueing, and changes all places that schedule resets to use
> this function. Since most users don't want to sync with that work, the
> "flush_work" is moved to the only caller that wants to sync.

Looks fine.  I actually have something very similar in an old
branch, except that I also moved nvme_reset to common code
and made the fabrics drivers use it.  I'll really need to get
back to that stuff..

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: Delete created IO queues on reset
  2016-10-06  9:32   ` Christoph Hellwig
@ 2016-10-06 15:23     ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2016-10-06 15:23 UTC (permalink / raw)


On Thu, Oct 06, 2016@11:32:52AM +0200, Christoph Hellwig wrote:
> On Wed, Oct 05, 2016@04:32:46PM -0400, Keith Busch wrote:
> > Commit c21377f8 (Suspend all queues before deletion) decrements the
> > online queue count prior to our attempt to delete those IO queues, so
> > the driver ended up not having the controller delete any. This patch
> > uses the queue_count instead of online_queues.
> 
> What if not all queues were online before?  Should we take a
> a snapshot of ->online_queues before suspending the queues and
> then use that later?

That sounds good. In the worst case, using queue_count may attempt to
delete a queue the adapter didn't create, which just returns an invalid
QID error, and everything continues as normal. But it's easily
avoidable, so I'll send a v2 of this one.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-10-06 15:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-05 20:32 [PATCH 1/2] nvme: don't schedule multiple resets Keith Busch
2016-10-05 20:32 ` [PATCH 2/2] nvme: Delete created IO queues on reset Keith Busch
2016-10-05 22:58   ` Sagi Grimberg
2016-10-06  1:47   ` Gabriel Krisman Bertazi
2016-10-06  9:32   ` Christoph Hellwig
2016-10-06 15:23     ` Keith Busch
2016-10-05 22:57 ` [PATCH 1/2] nvme: don't schedule multiple resets Sagi Grimberg
2016-10-06  9:34 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.