All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] nvme: fix reconnection stalls
@ 2020-06-04 11:56 Hannes Reinecke
  2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-06-04 11:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, Daniel Wagner, Sagi Grimberg, Keith Busch, Hannes Reinecke

Hi all,

during testing we have come across namespaces not being available
after reconnection. This is caused by (yet another) deadlock between
reconnect and scanning; reconnection will wait for the scan workqueue
to be flushed, but scanning cannot make progress if I/O is
outstanding, as the I/O will only be completed _after_ reconnection
completed.
The particular issue here is that nvme_revalidate_disk() might be
several I/O to be sent; the first one (IDENTIFY NAMESPACE) will be
terminated, but the subsequent ones are not.
These two patches fix this deadlock by checking the controller state
before sending the I/O.

As usual, comments and reviews are welcome.

Hannes Reinecke (2):
  nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  nvme: do not update multipath disk information if the controller is
    down

 drivers/nvme/host/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.16.4


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke
@ 2020-06-04 11:56 ` Hannes Reinecke
  2020-06-04 13:22   ` Keith Busch
  2020-06-04 16:58   ` Sagi Grimberg
  2020-06-04 11:56 ` [PATCH 2/2] nvme: do not update multipath disk information if the controller is down Hannes Reinecke
  2020-06-04 16:53 ` [PATCH 0/2] nvme: fix reconnection stalls Sagi Grimberg
  2 siblings, 2 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-06-04 11:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, Daniel Wagner, Sagi Grimberg, Keith Busch, Hannes Reinecke

When a controller reset happens during scanning nvme_identify_ns()
will be aborted, but the subsequent call to nvme_identify_ns_descs()
will stall as it will only be completed once the controller reconnect
is finished.
But as the reconnect waits for scanning to complete the particular
namespace will deadlock and never reconnected again.

Reported-by: Martin George <martin.george@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 569671e264b5..b811787505f7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 		memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
 	if (ctrl->vs >= NVME_VS(1, 2, 0))
 		memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
-	if (ctrl->vs >= NVME_VS(1, 3, 0))
+	if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE)
 		return nvme_identify_ns_descs(ctrl, nsid, ids);
 	return 0;
 }
-- 
2.16.4


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: do not update multipath disk information if the controller is down
  2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke
  2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke
@ 2020-06-04 11:56 ` Hannes Reinecke
  2020-06-04 16:53 ` [PATCH 0/2] nvme: fix reconnection stalls Sagi Grimberg
  2 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-06-04 11:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, Daniel Wagner, Sagi Grimberg, Keith Busch, Hannes Reinecke

When nvme_revalidate_disk() is called while the controller is not live
we need to avoid revalidating the multipath head device, as any I/O
sent to the multipath device might not be possible and we would deadlock
trying to flush the scan thread during reconnect.

Reported-by: Martin George <martin.george@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b811787505f7..4b5bca5f2e4f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1971,7 +1971,7 @@ static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(iob));
 	nvme_update_disk_info(disk, ns, id);
 #ifdef CONFIG_NVME_MULTIPATH
-	if (ns->head->disk) {
+	if (ns->head->disk && ctrl->state == NVME_CTRL_LIVE) {
 		nvme_update_disk_info(ns->head->disk, ns, id);
 		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
 		revalidate_disk(ns->head->disk);
-- 
2.16.4


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke
@ 2020-06-04 13:22   ` Keith Busch
  2020-06-04 13:48     ` Hannes Reinecke
  2020-06-04 16:58   ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2020-06-04 13:22 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote:
> When a controller reset happens during scanning nvme_identify_ns()
> will be aborted, but the subsequent call to nvme_identify_ns_descs()
> will stall as it will only be completed once the controller reconnect
> is finished.

If nvme_identify_ns() is aborted, shouldn't we not proceed to the next
command? It looks like the code already does that, from nvme_alloc_ns():

	ret = nvme_identify_ns(ctrl, nsid, &id);
	if (ret)
		goto out_free_queue;

And then from nvme_revalidate_disk():

	ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id);
	if (ret)
		goto out;

Those are the only two paths to nvme_identify_ns_descs(), so it looks
like nvme_identify_ns() must be successful in order to get there.

> But as the reconnect waits for scanning to complete the particular
> namespace will deadlock and never reconnected again.

The fix looks a bit fragile. What if the controller is reset immediately
after the check for live? It'd be safer such that reconnect didn't have
to wait for scan_work, no?  The pci transport has no such dependency,
for example.

> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 569671e264b5..b811787505f7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
>  		memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
>  	if (ctrl->vs >= NVME_VS(1, 2, 0))
>  		memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
> -	if (ctrl->vs >= NVME_VS(1, 3, 0))
> +	if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE)
>  		return nvme_identify_ns_descs(ctrl, nsid, ids);
>  	return 0;
>  }

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 13:22   ` Keith Busch
@ 2020-06-04 13:48     ` Hannes Reinecke
  2020-06-04 13:57       ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2020-06-04 13:48 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg

On 6/4/20 3:22 PM, Keith Busch wrote:
> On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote:
>> When a controller reset happens during scanning nvme_identify_ns()
>> will be aborted, but the subsequent call to nvme_identify_ns_descs()
>> will stall as it will only be completed once the controller reconnect
>> is finished.
> 
> If nvme_identify_ns() is aborted, shouldn't we not proceed to the next
> command? It looks like the code already does that, from nvme_alloc_ns():
> 
> 	ret = nvme_identify_ns(ctrl, nsid, &id);
> 	if (ret)
> 		goto out_free_queue;
> 
> And then from nvme_revalidate_disk():
> 
> 	ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id);
> 	if (ret)
> 		goto out;
> 
> Those are the only two paths to nvme_identify_ns_descs(), so it looks
> like nvme_identify_ns() must be successful in order to get there.
> 

True. But if the controller is entering reset juat _after_ the call to 
nvme_identify_ns() there won't be any I/O to abort, and the scanning 
code will proceed to call nvme_report_ns_ids() exposing this issue.

>> But as the reconnect waits for scanning to complete the particular
>> namespace will deadlock and never reconnected again.
> 
> The fix looks a bit fragile. What if the controller is reset immediately
> after the check for live? It'd be safer such that reconnect didn't have
> to wait for scan_work, no?  The pci transport has no such dependency,
> for example.
> 
Oh, that wouldn't matter; once the controller is reset (ie enters a 
non-LIVE state) all outstanding I/O is aborted.

The point here is that only _outstanding_ I/O is aborted.
If the scan thread continues sending I/O after that things stall.

And I can't really see how we should be able to decouple the scan thread 
from reset; both will be modifying the namespace state (or even 
presence), so we have to flush one of them to avoid us having to lock 
each and every modification and killing performance.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 13:48     ` Hannes Reinecke
@ 2020-06-04 13:57       ` Keith Busch
  2020-06-04 15:38         ` Hannes Reinecke
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2020-06-04 13:57 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Thu, Jun 04, 2020 at 03:48:02PM +0200, Hannes Reinecke wrote:
> On 6/4/20 3:22 PM, Keith Busch wrote:
> > On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote:
> > > When a controller reset happens during scanning nvme_identify_ns()
> > > will be aborted, but the subsequent call to nvme_identify_ns_descs()
> > > will stall as it will only be completed once the controller reconnect
> > > is finished.
> > 
> > If nvme_identify_ns() is aborted, shouldn't we not proceed to the next
> > command? It looks like the code already does that, from nvme_alloc_ns():
> > 
> > 	ret = nvme_identify_ns(ctrl, nsid, &id);
> > 	if (ret)
> > 		goto out_free_queue;
> > 
> > And then from nvme_revalidate_disk():
> > 
> > 	ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id);
> > 	if (ret)
> > 		goto out;
> > 
> > Those are the only two paths to nvme_identify_ns_descs(), so it looks
> > like nvme_identify_ns() must be successful in order to get there.
> > 
> 
> True. But if the controller is entering reset juat _after_ the call to
> nvme_identify_ns() there won't be any I/O to abort, and the scanning code
> will proceed to call nvme_report_ns_ids() exposing this issue.
> 
> > > But as the reconnect waits for scanning to complete the particular
> > > namespace will deadlock and never reconnected again.
> > 
> > The fix looks a bit fragile. What if the controller is reset immediately
> > after the check for live? It'd be safer such that reconnect didn't have
> > to wait for scan_work, no?  The pci transport has no such dependency,
> > for example.
> > 
> Oh, that wouldn't matter; once the controller is reset (ie enters a non-LIVE
> state) all outstanding I/O is aborted.
> 
> The point here is that only _outstanding_ I/O is aborted.
> If the scan thread continues sending I/O after that things stall.

My concern is if the controller is LIVE at the moment of your new check,
but is reset immediately after that, and before the identify command is
sent. The identify won't be aborted since it hasn't been dispatched yet.
Would that get everything stuck again?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 13:57       ` Keith Busch
@ 2020-06-04 15:38         ` Hannes Reinecke
  0 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-06-04 15:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg

On 6/4/20 3:57 PM, Keith Busch wrote:
> On Thu, Jun 04, 2020 at 03:48:02PM +0200, Hannes Reinecke wrote:
>> On 6/4/20 3:22 PM, Keith Busch wrote:
>>> On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote:
>>>> When a controller reset happens during scanning nvme_identify_ns()
>>>> will be aborted, but the subsequent call to nvme_identify_ns_descs()
>>>> will stall as it will only be completed once the controller reconnect
>>>> is finished.
>>>
>>> If nvme_identify_ns() is aborted, shouldn't we not proceed to the next
>>> command? It looks like the code already does that, from nvme_alloc_ns():
>>>
>>> 	ret = nvme_identify_ns(ctrl, nsid, &id);
>>> 	if (ret)
>>> 		goto out_free_queue;
>>>
>>> And then from nvme_revalidate_disk():
>>>
>>> 	ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id);
>>> 	if (ret)
>>> 		goto out;
>>>
>>> Those are the only two paths to nvme_identify_ns_descs(), so it looks
>>> like nvme_identify_ns() must be successful in order to get there.
>>>
>>
>> True. But if the controller is entering reset juat _after_ the call to
>> nvme_identify_ns() there won't be any I/O to abort, and the scanning code
>> will proceed to call nvme_report_ns_ids() exposing this issue.
>>
>>>> But as the reconnect waits for scanning to complete the particular
>>>> namespace will deadlock and never reconnected again.
>>>
>>> The fix looks a bit fragile. What if the controller is reset immediately
>>> after the check for live? It'd be safer such that reconnect didn't have
>>> to wait for scan_work, no?  The pci transport has no such dependency,
>>> for example.
>>>
>> Oh, that wouldn't matter; once the controller is reset (ie enters a non-LIVE
>> state) all outstanding I/O is aborted.
>>
>> The point here is that only _outstanding_ I/O is aborted.
>> If the scan thread continues sending I/O after that things stall.
> 
> My concern is if the controller is LIVE at the moment of your new check,
> but is reset immediately after that, and before the identify command is
> sent. The identify won't be aborted since it hasn't been dispatched yet.
> Would that get everything stuck again?
> 
Yes, but that's quite unlikely.
Or, to be precise, the likelyhood of anything triggering the state 
change after that test is highly unlikely, as there is nothing which 
could have triggered it. Remember, we are currently scanning the same 
namespace for which a fault has been detected. Which means that there 
cannot any other I/O being sent to this namespace (as it's not visible 
yet to userland) but the I/O generated during scanning itself.
And the state change will be happening during endio handling, so by the 
time we're checking the state any state change has already happened.

While there might bo other namespaces using the same physical 
connection, and these namespaces might develop a fault, too, this event 
would then signalled via AENs, and would handled in a completely 
different thread. So this scenario shouldn't matter for this particular 
namespace.

So I can't see how the state change should occur after that test.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] nvme: fix reconnection stalls
  2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke
  2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke
  2020-06-04 11:56 ` [PATCH 2/2] nvme: do not update multipath disk information if the controller is down Hannes Reinecke
@ 2020-06-04 16:53 ` Sagi Grimberg
  2 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2020-06-04 16:53 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, Daniel Wagner, linux-nvme


> Hi all,

And here we go again.. :)

> during testing we have come across namespaces not being available
> after reconnection. This is caused by (yet another) deadlock between
> reconnect and scanning; reconnection will wait for the scan workqueue
> to be flushed,

I have a deja vu moment here, but where exactly the reconnection waits
for the scan work to complete?

> but scanning cannot make progress if I/O is
> outstanding, as the I/O will only be completed _after_ reconnection
> completed.

Where is there a dependency for I/O complete only after reconnection?

> The particular issue here is that nvme_revalidate_disk() might be
> several I/O to be sent; the first one (IDENTIFY NAMESPACE) will be
> terminated, but the subsequent ones are not.

Why aren't they? we unquiesce the admin queue after we tear it down.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke
  2020-06-04 13:22   ` Keith Busch
@ 2020-06-04 16:58   ` Sagi Grimberg
  2020-06-05  6:20     ` Hannes Reinecke
  1 sibling, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2020-06-04 16:58 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, Daniel Wagner, linux-nvme


> When a controller reset happens during scanning nvme_identify_ns()
> will be aborted, but the subsequent call to nvme_identify_ns_descs()
> will stall as it will only be completed once the controller reconnect
> is finished.
> But as the reconnect waits for scanning to complete the particular
> namespace will deadlock and never reconnected again.
> 
> Reported-by: Martin George <martin.george@netapp.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/nvme/host/core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 569671e264b5..b811787505f7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
>   		memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
>   	if (ctrl->vs >= NVME_VS(1, 2, 0))
>   		memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
> -	if (ctrl->vs >= NVME_VS(1, 3, 0))
> +	if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE)

I'm slightly allergic to these sort of state checks so untrivially placed...

Hannes, did you have the below applied when reproducing this?
--
commit 59c7c3caaaf8750df4ec3255082f15eb4e371514 (tag: block-5.7-2020-05-09)
Author: Sagi Grimberg <sagi@grimberg.me>
Date:   Wed May 6 15:44:02 2020 -0700

     nvme: fix possible hang when ns scanning fails during error recovery

     When the controller is reconnecting, the host fails I/O and admin
     commands as the host cannot reach the controller. ns scanning may
     revalidate namespaces during that period and it is wrong to remove
     namespaces due to these failures as we may hang (see 205da2434301).

     One command that may fail is nvme_identify_ns_descs. Since we return
     success due to having ns identify descriptor list optional, we continue
     to compare ns identifiers in nvme_revalidate_disk, obviously fail and
     return -ENODEV to nvme_validate_ns, which will remove the namespace.

     Exactly what we don't want to happen.

     Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list 
is optional")
     Tested-by: Anton Eidelman <anton@lightbitslabs.com>
     Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
     Reviewed-by: Keith Busch <kbusch@kernel.org>
     Signed-off-by: Christoph Hellwig <hch@lst.de>
     Signed-off-by: Jens Axboe <axboe@kernel.dk>
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids()
  2020-06-04 16:58   ` Sagi Grimberg
@ 2020-06-05  6:20     ` Hannes Reinecke
  0 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-06-05  6:20 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, Daniel Wagner, linux-nvme

On 6/4/20 6:58 PM, Sagi Grimberg wrote:
> 
>> When a controller reset happens during scanning nvme_identify_ns()
>> will be aborted, but the subsequent call to nvme_identify_ns_descs()
>> will stall as it will only be completed once the controller reconnect
>> is finished.
>> But as the reconnect waits for scanning to complete the particular
>> namespace will deadlock and never reconnected again.
>>
>> Reported-by: Martin George <martin.george@netapp.com>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/core.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 569671e264b5..b811787505f7 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl 
>> *ctrl, unsigned int nsid,
>>           memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
>>       if (ctrl->vs >= NVME_VS(1, 2, 0))
>>           memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
>> -    if (ctrl->vs >= NVME_VS(1, 3, 0))
>> +    if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE)
> 
> I'm slightly allergic to these sort of state checks so untrivially 
> placed...
> 
> Hannes, did you have the below applied when reproducing this?
> -- 
> commit 59c7c3caaaf8750df4ec3255082f15eb4e371514 (tag: block-5.7-2020-05-09)
> Author: Sagi Grimberg <sagi@grimberg.me>
> Date:   Wed May 6 15:44:02 2020 -0700
> 
>      nvme: fix possible hang when ns scanning fails during error recovery
> 
>      When the controller is reconnecting, the host fails I/O and admin
>      commands as the host cannot reach the controller. ns scanning may
>      revalidate namespaces during that period and it is wrong to remove
>      namespaces due to these failures as we may hang (see 205da2434301).
> 
>      One command that may fail is nvme_identify_ns_descs. Since we return
>      success due to having ns identify descriptor list optional, we 
> continue
>      to compare ns identifiers in nvme_revalidate_disk, obviously fail and
>      return -ENODEV to nvme_validate_ns, which will remove the namespace.
> 
>      Exactly what we don't want to happen.
> 
>      Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list 
> is optional")
>      Tested-by: Anton Eidelman <anton@lightbitslabs.com>
>      Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
>      Reviewed-by: Keith Busch <kbusch@kernel.org>
>      Signed-off-by: Christoph Hellwig <hch@lst.de>
>      Signed-off-by: Jens Axboe <axboe@kernel.dk>
> -- 
Ah. No, indeed I haven't.
(The joys of running a downstream kernel).

Will be retesting with that.
Thanks for the pointer.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-06-05  6:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke
2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke
2020-06-04 13:22   ` Keith Busch
2020-06-04 13:48     ` Hannes Reinecke
2020-06-04 13:57       ` Keith Busch
2020-06-04 15:38         ` Hannes Reinecke
2020-06-04 16:58   ` Sagi Grimberg
2020-06-05  6:20     ` Hannes Reinecke
2020-06-04 11:56 ` [PATCH 2/2] nvme: do not update multipath disk information if the controller is down Hannes Reinecke
2020-06-04 16:53 ` [PATCH 0/2] nvme: fix reconnection stalls Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.