* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
@ 2019-05-22 0:37 Kenneth Heitke
2019-05-22 19:26 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Kenneth Heitke @ 2019-05-22 0:37 UTC (permalink / raw)
If an admin command timeout occurs while a PCIe reset (FLR) is
pending, the CSTS bits may not be valid which could result in
the controller being removed.
[372337.996566] nvme nvme0: I/O 0 QID 0 timeout, reset controller
[372339.984662] nvme 0000:1c:00.0: enabling device (0000 -> 0002)
[372339.984951] nvme nvme0: Removing after probe failure status: -19
Signed-off-by: Kenneth Heitke <kenneth.heitke at intel.com>
---
drivers/nvme/host/pci.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2a8708c9ac18..aa9ea64a8b53 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -118,6 +118,8 @@ struct nvme_dev {
struct nvme_ctrl ctrl;
mempool_t *iod_mempool;
+ unsigned long flags;
+#define NVME_CTRL_PCI_RESET_PENDING 0
/* shadow doorbell buffer support: */
u32 *dbbuf_dbs;
@@ -1250,6 +1252,11 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts)
csts, result);
}
+static bool nvme_pci_reset_pending(const struct nvme_dev *dev)
+{
+ return !!test_bit(NVME_CTRL_PCI_RESET_PENDING, &dev->flags);
+}
+
static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1267,6 +1274,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
if (pci_channel_offline(to_pci_dev(dev->dev)))
return BLK_EH_RESET_TIMER;
+ /* If a PCIe reset (FLR) is pending, wait for it to complete */
+ if (nvme_pci_reset_pending(dev))
+ return BLK_EH_RESET_TIMER;
+
/*
* Reset immediately if the controller is failed
*/
@@ -2780,12 +2791,14 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
static void nvme_reset_prepare(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
+ set_bit(NVME_CTRL_PCI_RESET_PENDING, &dev->flags);
nvme_dev_disable(dev, false);
}
static void nvme_reset_done(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
+ clear_bit(NVME_CTRL_PCI_RESET_PENDING, &dev->flags);
nvme_reset_ctrl_sync(&dev->ctrl);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
2019-05-22 0:37 [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending Kenneth Heitke
@ 2019-05-22 19:26 ` Keith Busch
2019-05-22 20:09 ` Keith Busch
0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2019-05-22 19:26 UTC (permalink / raw)
On Tue, May 21, 2019@06:37:41PM -0600, Kenneth Heitke wrote:
> If an admin command timeout occurs while a PCIe reset (FLR) is
> pending, the CSTS bits may not be valid which could result in
> the controller being removed.
>
> [372337.996566] nvme nvme0: I/O 0 QID 0 timeout, reset controller
> [372339.984662] nvme 0000:1c:00.0: enabling device (0000 -> 0002)
> [372339.984951] nvme nvme0: Removing after probe failure status: -19
The disable reclaims all commands, including the ones it dispatches, so
it sounds like you're talking about a race between the ones it dispatched
and its timeout work. If so, we can just make sure commands sent during
nvme_dev_disable never timeout, which are just the delete queue commands:
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f562154551ce..4678704c2138 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2210,7 +2210,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
if (IS_ERR(req))
return PTR_ERR(req);
- req->timeout = ADMIN_TIMEOUT;
+ req->timeout = UINT_MAX;
req->end_io_data = nvmeq;
init_completion(&nvmeq->delete_done);
--
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
2019-05-22 19:26 ` Keith Busch
@ 2019-05-22 20:09 ` Keith Busch
2019-05-23 21:57 ` Heitke, Kenneth
2019-05-24 6:45 ` Sagi Grimberg
0 siblings, 2 replies; 7+ messages in thread
From: Keith Busch @ 2019-05-22 20:09 UTC (permalink / raw)
On Wed, May 22, 2019@01:26:57PM -0600, Keith Busch wrote:
> The disable reclaims all commands, including the ones it dispatches, so
> it sounds like you're talking about a race between the ones it dispatched
> and its timeout work. If so, we can just make sure commands sent during
> nvme_dev_disable never timeout, which are just the delete queue commands:
>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f562154551ce..4678704c2138 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2210,7 +2210,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - req->timeout = ADMIN_TIMEOUT;
> + req->timeout = UINT_MAX;
> req->end_io_data = nvmeq;
>
> init_completion(&nvmeq->delete_done);
> --
I think we should do the above anyway, but it isn't going to help if
commands dispatched outside disabling timeout. This should fix that.
Note, we never needed to have a sync'ed reset on reset_done(), but
this makes it necessary.
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f562154551ce..3edb9d098eb8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1257,13 +1257,14 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
struct nvme_dev *dev = nvmeq->dev;
struct request *abort_req;
struct nvme_command cmd;
+ struct pci_dev *pdev = to_pci_dev(dev->dev);
u32 csts = readl(dev->bar + NVME_REG_CSTS);
/* If PCI error recovery process is happening, we cannot reset or
* the recovery mechanism will surely fail.
*/
mb();
- if (pci_channel_offline(to_pci_dev(dev->dev)))
+ if (pci_channel_offline(pdev) || pdev->block_cfg_access)
return BLK_EH_RESET_TIMER;
/*
@@ -2782,12 +2783,13 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
nvme_dev_disable(dev, false);
+ nvme_sync_queues(&dev->ctrl);
}
static void nvme_reset_done(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
- nvme_reset_ctrl_sync(&dev->ctrl);
+ nvme_reset_ctrl(&dev->ctrl);
}
static void nvme_shutdown(struct pci_dev *pdev)
--
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
2019-05-22 20:09 ` Keith Busch
@ 2019-05-23 21:57 ` Heitke, Kenneth
2019-05-23 21:59 ` Keith Busch
2019-05-24 6:45 ` Sagi Grimberg
1 sibling, 1 reply; 7+ messages in thread
From: Heitke, Kenneth @ 2019-05-23 21:57 UTC (permalink / raw)
On 5/22/2019 2:09 PM, Keith Busch wrote:
> On Wed, May 22, 2019@01:26:57PM -0600, Keith Busch wrote:
>> The disable reclaims all commands, including the ones it dispatches, so
>> it sounds like you're talking about a race between the ones it dispatched
>> and its timeout work. If so, we can just make sure commands sent during
>> nvme_dev_disable never timeout, which are just the delete queue commands:
>>
>> ---
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index f562154551ce..4678704c2138 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -2210,7 +2210,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
>> if (IS_ERR(req))
>> return PTR_ERR(req);
>>
>> - req->timeout = ADMIN_TIMEOUT;
>> + req->timeout = UINT_MAX;
>> req->end_io_data = nvmeq;
>>
>> init_completion(&nvmeq->delete_done);
>> --
>
> I think we should do the above anyway, but it isn't going to help if
> commands dispatched outside disabling timeout. This should fix that.
> Note, we never needed to have a sync'ed reset on reset_done(), but
> this makes it necessary.
>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f562154551ce..3edb9d098eb8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1257,13 +1257,14 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> struct nvme_dev *dev = nvmeq->dev;
> struct request *abort_req;
> struct nvme_command cmd;
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
>
> /* If PCI error recovery process is happening, we cannot reset or
> * the recovery mechanism will surely fail.
> */
> mb();
> - if (pci_channel_offline(to_pci_dev(dev->dev)))
> + if (pci_channel_offline(pdev) || pdev->block_cfg_access)
> return BLK_EH_RESET_TIMER;
>
Thanks Keith. The block_cfg_access is exactly what I was looking for.
The use case that I have is NVMe format which can run a long time. If
the FLR occurs while the format command is pending, it will be held off
while the nvme_dev_disable() waits for the queues to quiesce (which
doesn't happen until the command completes or times out).
> /*
> @@ -2782,12 +2783,13 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
> {
> struct nvme_dev *dev = pci_get_drvdata(pdev);
> nvme_dev_disable(dev, false);
> + nvme_sync_queues(&dev->ctrl);
> }
>
> static void nvme_reset_done(struct pci_dev *pdev)
> {
> struct nvme_dev *dev = pci_get_drvdata(pdev);
> - nvme_reset_ctrl_sync(&dev->ctrl);
> + nvme_reset_ctrl(&dev->ctrl);
> }
>
> static void nvme_shutdown(struct pci_dev *pdev)
> --
>
For my specific case, is the sync_queues and reset_ctrl change needed as
well?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
2019-05-23 21:57 ` Heitke, Kenneth
@ 2019-05-23 21:59 ` Keith Busch
0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2019-05-23 21:59 UTC (permalink / raw)
On Thu, May 23, 2019@03:57:22PM -0600, Heitke, Kenneth wrote:
> On 5/22/2019 2:09 PM, Keith Busch wrote:
> > @@ -2782,12 +2783,13 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
> > {
> > struct nvme_dev *dev = pci_get_drvdata(pdev);
> > nvme_dev_disable(dev, false);
> > + nvme_sync_queues(&dev->ctrl);
> > }
> > static void nvme_reset_done(struct pci_dev *pdev)
> > {
> > struct nvme_dev *dev = pci_get_drvdata(pdev);
> > - nvme_reset_ctrl_sync(&dev->ctrl);
> > + nvme_reset_ctrl(&dev->ctrl);
> > }
> > static void nvme_shutdown(struct pci_dev *pdev)
> > --
> >
>
> For my specific case, is the sync_queues and reset_ctrl change needed as
> well?
I shouldn't have included the sync_queue part.
Definitely need the nvme_reset_ctrl change as blk_cfg_access is still
set here so need to reset asynchronously to unblock new timeouts.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
2019-05-22 20:09 ` Keith Busch
2019-05-23 21:57 ` Heitke, Kenneth
@ 2019-05-24 6:45 ` Sagi Grimberg
2019-05-24 21:05 ` Keith Busch
1 sibling, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2019-05-24 6:45 UTC (permalink / raw)
Keith,
> I think we should do the above anyway, but it isn't going to help if
> commands dispatched outside disabling timeout. This should fix that.
> Note, we never needed to have a sync'ed reset on reset_done(), but
> this makes it necessary.
With async reset on reset_done() what guarantees that nvme_dev_disable
does not run concurrently with another context of nvme_reset_work? both
mangle with the same queues assuming that they are not running
concurrently.
quick archive browse got me to:
http://lists.infradead.org/pipermail/linux-nvme/2017-December/014599.html
discussion on my patch, but I think that it was a side effect from
ming's tests..
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending
2019-05-24 6:45 ` Sagi Grimberg
@ 2019-05-24 21:05 ` Keith Busch
0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2019-05-24 21:05 UTC (permalink / raw)
On Thu, May 23, 2019@11:45:28PM -0700, Sagi Grimberg wrote:
> Keith,
>
> > I think we should do the above anyway, but it isn't going to help if
> > commands dispatched outside disabling timeout. This should fix that.
> > Note, we never needed to have a sync'ed reset on reset_done(), but
> > this makes it necessary.
>
> With async reset on reset_done() what guarantees that nvme_dev_disable
> does not run concurrently with another context of nvme_reset_work? both
> mangle with the same queues assuming that they are not running
> concurrently.
>
> quick archive browse got me to:
> http://lists.infradead.org/pipermail/linux-nvme/2017-December/014599.html
>
> discussion on my patch, but I think that it was a side effect from
> ming's tests..
Oh, you're right. I think Ming must have been writing to the pc reset
repeatedly, in which case this proposal will have a problem coordinating.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-05-24 21:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22 0:37 [RFC PATCH] nvme: Ignore timeouts while a PCIe reset is pending Kenneth Heitke
2019-05-22 19:26 ` Keith Busch
2019-05-22 20:09 ` Keith Busch
2019-05-23 21:57 ` Heitke, Kenneth
2019-05-23 21:59 ` Keith Busch
2019-05-24 6:45 ` Sagi Grimberg
2019-05-24 21:05 ` Keith Busch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.