[RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
@ 2019-05-22  0:37 Kenneth Heitke
  2019-05-22 19:26 ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Kenneth Heitke @ 2019-05-22  0:37 UTC (permalink / raw)


If an admin command timeout occurs while a PCIe reset (FLR) is
pending, the CSTS bits may not be valid which could result in
the controller being removed.

[372337.996566] nvme nvme0: I/O 0 QID 0 timeout, reset controller
[372339.984662] nvme 0000:1c:00.0: enabling device (0000 -> 0002)
[372339.984951] nvme nvme0: Removing after probe failure status: -19

Signed-off-by: Kenneth Heitke <kenneth.heitke at intel.com>
---
 drivers/nvme/host/pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2a8708c9ac18..aa9ea64a8b53 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -118,6 +118,8 @@ struct nvme_dev {
 	struct nvme_ctrl ctrl;
 
 	mempool_t *iod_mempool;
+	unsigned long flags;
+#define NVME_CTRL_PCI_RESET_PENDING	0
 
 	/* shadow doorbell buffer support: */
 	u32 *dbbuf_dbs;
@@ -1250,6 +1252,11 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts)
 			 csts, result);
 }
 
+static bool nvme_pci_reset_pending(const struct nvme_dev *dev)
+{
+	return !!test_bit(NVME_CTRL_PCI_RESET_PENDING, &dev->flags);
+}
+
 static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1267,6 +1274,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	if (pci_channel_offline(to_pci_dev(dev->dev)))
 		return BLK_EH_RESET_TIMER;
 
+	/* If a PCIe reset (FLR) is pending, wait for it to complete */
+	if (nvme_pci_reset_pending(dev))
+		return BLK_EH_RESET_TIMER;
+
 	/*
 	 * Reset immediately if the controller is failed
 	 */
@@ -2780,12 +2791,14 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
+	set_bit(NVME_CTRL_PCI_RESET_PENDING, &dev->flags);
 	nvme_dev_disable(dev, false);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
+	clear_bit(NVME_CTRL_PCI_RESET_PENDING, &dev->flags);
 	nvme_reset_ctrl_sync(&dev->ctrl);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
  2019-05-22  0:37 [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending Kenneth Heitke
@ 2019-05-22 19:26 ` Keith Busch
  2019-05-22 20:09   ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2019-05-22 19:26 UTC (permalink / raw)


On Tue, May 21, 2019@06:37:41PM -0600, Kenneth Heitke wrote:
> If an admin command timeout occurs while a PCIe reset (FLR) is
> pending, the CSTS bits may not be valid which could result in
> the controller being removed.
> 
> [372337.996566] nvme nvme0: I/O 0 QID 0 timeout, reset controller
> [372339.984662] nvme 0000:1c:00.0: enabling device (0000 -> 0002)
> [372339.984951] nvme nvme0: Removing after probe failure status: -19

The disable reclaims all commands, including the ones it dispatches, so
it sounds like you're talking about a race between the ones it dispatched
and its timeout work. If so, we can just make sure commands sent during
nvme_dev_disable never timeout, which are just the delete queue commands:

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f562154551ce..4678704c2138 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2210,7 +2210,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	req->timeout = ADMIN_TIMEOUT;
+	req->timeout = UINT_MAX;
 	req->end_io_data = nvmeq;
 
 	init_completion(&nvmeq->delete_done);
--

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
  2019-05-22 19:26 ` Keith Busch
@ 2019-05-22 20:09   ` Keith Busch
  2019-05-23 21:57     ` Heitke, Kenneth
  2019-05-24  6:45     ` Sagi Grimberg
  0 siblings, 2 replies; 7+ messages in thread
From: Keith Busch @ 2019-05-22 20:09 UTC (permalink / raw)


On Wed, May 22, 2019@01:26:57PM -0600, Keith Busch wrote:
> The disable reclaims all commands, including the ones it dispatches, so
> it sounds like you're talking about a race between the ones it dispatched
> and its timeout work. If so, we can just make sure commands sent during
> nvme_dev_disable never timeout, which are just the delete queue commands:
> 
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f562154551ce..4678704c2138 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2210,7 +2210,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
>  
> -	req->timeout = ADMIN_TIMEOUT;
> +	req->timeout = UINT_MAX;
>  	req->end_io_data = nvmeq;
>  
>  	init_completion(&nvmeq->delete_done);
> --

I think we should do the above anyway, but it isn't going to help if
commands dispatched outside disabling timeout. This should fix that.
Note, we never needed to have a sync'ed reset on reset_done(), but
this makes it necessary.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f562154551ce..3edb9d098eb8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1257,13 +1257,14 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	struct nvme_dev *dev = nvmeq->dev;
 	struct request *abort_req;
 	struct nvme_command cmd;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
 	/* If PCI error recovery process is happening, we cannot reset or
 	 * the recovery mechanism will surely fail.
 	 */
 	mb();
-	if (pci_channel_offline(to_pci_dev(dev->dev)))
+	if (pci_channel_offline(pdev) || pdev->block_cfg_access)
 		return BLK_EH_RESET_TIMER;
 
 	/*
@@ -2782,12 +2783,13 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 	nvme_dev_disable(dev, false);
+	nvme_sync_queues(&dev->ctrl);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_reset_ctrl_sync(&dev->ctrl);
+	nvme_reset_ctrl(&dev->ctrl);
 }
 
 static void nvme_shutdown(struct pci_dev *pdev)
--

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
  2019-05-22 20:09   ` Keith Busch
@ 2019-05-23 21:57     ` Heitke, Kenneth
  2019-05-23 21:59       ` Keith Busch
  2019-05-24  6:45     ` Sagi Grimberg
  1 sibling, 1 reply; 7+ messages in thread
From: Heitke, Kenneth @ 2019-05-23 21:57 UTC (permalink / raw)



On 5/22/2019 2:09 PM, Keith Busch wrote:
> On Wed, May 22, 2019@01:26:57PM -0600, Keith Busch wrote:
>> The disable reclaims all commands, including the ones it dispatches, so
>> it sounds like you're talking about a race between the ones it dispatched
>> and its timeout work. If so, we can just make sure commands sent during
>> nvme_dev_disable never timeout, which are just the delete queue commands:
>>
>> ---
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index f562154551ce..4678704c2138 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -2210,7 +2210,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
>>   	if (IS_ERR(req))
>>   		return PTR_ERR(req);
>>   
>> -	req->timeout = ADMIN_TIMEOUT;
>> +	req->timeout = UINT_MAX;
>>   	req->end_io_data = nvmeq;
>>   
>>   	init_completion(&nvmeq->delete_done);
>> --
> 
> I think we should do the above anyway, but it isn't going to help if
> commands dispatched outside disabling timeout. This should fix that.
> Note, we never needed to have a sync'ed reset on reset_done(), but
> this makes it necessary.
> 
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f562154551ce..3edb9d098eb8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1257,13 +1257,14 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>   	struct nvme_dev *dev = nvmeq->dev;
>   	struct request *abort_req;
>   	struct nvme_command cmd;
> +	struct pci_dev *pdev = to_pci_dev(dev->dev);
>   	u32 csts = readl(dev->bar + NVME_REG_CSTS);
>   
>   	/* If PCI error recovery process is happening, we cannot reset or
>   	 * the recovery mechanism will surely fail.
>   	 */
>   	mb();
> -	if (pci_channel_offline(to_pci_dev(dev->dev)))
> +	if (pci_channel_offline(pdev) || pdev->block_cfg_access)
>   		return BLK_EH_RESET_TIMER;
>

Thanks Keith. The block_cfg_access is exactly what I was looking for.

The use case that I have is NVMe format which can run a long time. If 
the FLR occurs while the format command is pending, it will be held off 
while the nvme_dev_disable() waits for the queues to quiesce (which 
doesn't happen until the command completes or times out).

>   	/*
> @@ -2782,12 +2783,13 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
>   	nvme_dev_disable(dev, false);
> +	nvme_sync_queues(&dev->ctrl);
>   }
>   
>   static void nvme_reset_done(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
> -	nvme_reset_ctrl_sync(&dev->ctrl);
> +	nvme_reset_ctrl(&dev->ctrl);
>   }
>   
>   static void nvme_shutdown(struct pci_dev *pdev)
> --
> 

For my specific case, is the sync_queues and reset_ctrl change needed as 
well?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
  2019-05-23 21:57     ` Heitke, Kenneth
@ 2019-05-23 21:59       ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2019-05-23 21:59 UTC (permalink / raw)


On Thu, May 23, 2019@03:57:22PM -0600, Heitke, Kenneth wrote:
> On 5/22/2019 2:09 PM, Keith Busch wrote:
> > @@ -2782,12 +2783,13 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
> >   {
> >   	struct nvme_dev *dev = pci_get_drvdata(pdev);
> >   	nvme_dev_disable(dev, false);
> > +	nvme_sync_queues(&dev->ctrl);
> >   }
> >   static void nvme_reset_done(struct pci_dev *pdev)
> >   {
> >   	struct nvme_dev *dev = pci_get_drvdata(pdev);
> > -	nvme_reset_ctrl_sync(&dev->ctrl);
> > +	nvme_reset_ctrl(&dev->ctrl);
> >   }
> >   static void nvme_shutdown(struct pci_dev *pdev)
> > --
> > 
> 
> For my specific case, is the sync_queues and reset_ctrl change needed as
> well?

I shouldn't have included the sync_queue part.

Definitely need the nvme_reset_ctrl change as blk_cfg_access is still
set here so need to reset asynchronously to unblock new timeouts.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
  2019-05-22 20:09   ` Keith Busch
  2019-05-23 21:57     ` Heitke, Kenneth
@ 2019-05-24  6:45     ` Sagi Grimberg
  2019-05-24 21:05       ` Keith Busch
  1 sibling, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2019-05-24  6:45 UTC (permalink / raw)


Keith,

> I think we should do the above anyway, but it isn't going to help if
> commands dispatched outside disabling timeout. This should fix that.
> Note, we never needed to have a sync'ed reset on reset_done(), but
> this makes it necessary.

With async reset on reset_done() what guarantees that nvme_dev_disable
does not run concurrently with another context of nvme_reset_work? both
mangle with the same queues assuming that they are not running
concurrently.

quick archive browse got me to:
http://lists.infradead.org/pipermail/linux-nvme/2017-December/014599.html

discussion on my patch, but I think that it was a side effect from
ming's tests..

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
  2019-05-24  6:45     ` Sagi Grimberg
@ 2019-05-24 21:05       ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2019-05-24 21:05 UTC (permalink / raw)


On Thu, May 23, 2019@11:45:28PM -0700, Sagi Grimberg wrote:
> Keith,
> 
> > I think we should do the above anyway, but it isn't going to help if
> > commands dispatched outside disabling timeout. This should fix that.
> > Note, we never needed to have a sync'ed reset on reset_done(), but
> > this makes it necessary.
> 
> With async reset on reset_done() what guarantees that nvme_dev_disable
> does not run concurrently with another context of nvme_reset_work? both
> mangle with the same queues assuming that they are not running
> concurrently.
> 
> quick archive browse got me to:
> http://lists.infradead.org/pipermail/linux-nvme/2017-December/014599.html
> 
> discussion on my patch, but I think that it was a side effect from
> ming's tests..

Oh, you're right. I think Ming must have been writing to the pc reset
repeatedly, in which case this proposal will have a problem coordinating.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-05-24 21:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22  0:37 [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending Kenneth Heitke
2019-05-22 19:26 ` Keith Busch
2019-05-22 20:09   ` Keith Busch
2019-05-23 21:57     ` Heitke, Kenneth
2019-05-23 21:59       ` Keith Busch
2019-05-24  6:45     ` Sagi Grimberg
2019-05-24 21:05       ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.