* [PATCH 0/2] nvme: fix reconnection stalls @ 2020-06-04 11:56 Hannes Reinecke 2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Hannes Reinecke @ 2020-06-04 11:56 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme, Daniel Wagner, Sagi Grimberg, Keith Busch, Hannes Reinecke Hi all, during testing we have come across namespaces not being available after reconnection. This is caused by (yet another) deadlock between reconnect and scanning; reconnection will wait for the scan workqueue to be flushed, but scanning cannot make progress if I/O is outstanding, as the I/O will only be completed _after_ reconnection completed. The particular issue here is that nvme_revalidate_disk() might be several I/O to be sent; the first one (IDENTIFY NAMESPACE) will be terminated, but the subsequent ones are not. These two patches fix this deadlock by checking the controller state before sending the I/O. As usual, comments and reviews are welcome. Hannes Reinecke (2): nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() nvme: do not update multipath disk information if the controller is down drivers/nvme/host/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.16.4 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke @ 2020-06-04 11:56 ` Hannes Reinecke 2020-06-04 13:22 ` Keith Busch 2020-06-04 16:58 ` Sagi Grimberg 2020-06-04 11:56 ` [PATCH 2/2] nvme: do not update multipath disk information if the controller is down Hannes Reinecke 2020-06-04 16:53 ` [PATCH 0/2] nvme: fix reconnection stalls Sagi Grimberg 2 siblings, 2 replies; 10+ messages in thread From: Hannes Reinecke @ 2020-06-04 11:56 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme, Daniel Wagner, Sagi Grimberg, Keith Busch, Hannes Reinecke When a controller reset happens during scanning nvme_identify_ns() will be aborted, but the subsequent call to nvme_identify_ns_descs() will stall as it will only be completed once the controller reconnect is finished. But as the reconnect waits for scanning to complete the particular namespace will deadlock and never reconnected again. Reported-by: Martin George <martin.george@netapp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 569671e264b5..b811787505f7 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid, memcpy(ids->eui64, id->eui64, sizeof(id->eui64)); if (ctrl->vs >= NVME_VS(1, 2, 0)) memcpy(ids->nguid, id->nguid, sizeof(id->nguid)); - if (ctrl->vs >= NVME_VS(1, 3, 0)) + if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE) return nvme_identify_ns_descs(ctrl, nsid, ids); return 0; } -- 2.16.4 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke @ 2020-06-04 13:22 ` Keith Busch 2020-06-04 13:48 ` Hannes Reinecke 2020-06-04 16:58 ` Sagi Grimberg 1 sibling, 1 reply; 10+ messages in thread From: Keith Busch @ 2020-06-04 13:22 UTC (permalink / raw) To: Hannes Reinecke Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote: > When a controller reset happens during scanning nvme_identify_ns() > will be aborted, but the subsequent call to nvme_identify_ns_descs() > will stall as it will only be completed once the controller reconnect > is finished. If nvme_identify_ns() is aborted, shouldn't we not proceed to the next command? It looks like the code already does that, from nvme_alloc_ns(): ret = nvme_identify_ns(ctrl, nsid, &id); if (ret) goto out_free_queue; And then from nvme_revalidate_disk(): ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id); if (ret) goto out; Those are the only two paths to nvme_identify_ns_descs(), so it looks like nvme_identify_ns() must be successful in order to get there. > But as the reconnect waits for scanning to complete the particular > namespace will deadlock and never reconnected again. The fix looks a bit fragile. What if the controller is reset immediately after the check for live? It'd be safer such that reconnect didn't have to wait for scan_work, no? The pci transport has no such dependency, for example. > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 569671e264b5..b811787505f7 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid, > memcpy(ids->eui64, id->eui64, sizeof(id->eui64)); > if (ctrl->vs >= NVME_VS(1, 2, 0)) > memcpy(ids->nguid, id->nguid, sizeof(id->nguid)); > - if (ctrl->vs >= NVME_VS(1, 3, 0)) > + if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE) > return nvme_identify_ns_descs(ctrl, nsid, ids); > return 0; > } _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 13:22 ` Keith Busch @ 2020-06-04 13:48 ` Hannes Reinecke 2020-06-04 13:57 ` Keith Busch 0 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2020-06-04 13:48 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg On 6/4/20 3:22 PM, Keith Busch wrote: > On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote: >> When a controller reset happens during scanning nvme_identify_ns() >> will be aborted, but the subsequent call to nvme_identify_ns_descs() >> will stall as it will only be completed once the controller reconnect >> is finished. > > If nvme_identify_ns() is aborted, shouldn't we not proceed to the next > command? It looks like the code already does that, from nvme_alloc_ns(): > > ret = nvme_identify_ns(ctrl, nsid, &id); > if (ret) > goto out_free_queue; > > And then from nvme_revalidate_disk(): > > ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id); > if (ret) > goto out; > > Those are the only two paths to nvme_identify_ns_descs(), so it looks > like nvme_identify_ns() must be successful in order to get there. > True. But if the controller is entering reset juat _after_ the call to nvme_identify_ns() there won't be any I/O to abort, and the scanning code will proceed to call nvme_report_ns_ids() exposing this issue. >> But as the reconnect waits for scanning to complete the particular >> namespace will deadlock and never reconnected again. > > The fix looks a bit fragile. What if the controller is reset immediately > after the check for live? It'd be safer such that reconnect didn't have > to wait for scan_work, no? The pci transport has no such dependency, > for example. > Oh, that wouldn't matter; once the controller is reset (ie enters a non-LIVE state) all outstanding I/O is aborted. The point here is that only _outstanding_ I/O is aborted. If the scan thread continues sending I/O after that things stall. And I can't really see how we should be able to decouple the scan thread from reset; both will be modifying the namespace state (or even presence), so we have to flush one of them to avoid us having to lock each and every modification and killing performance. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 13:48 ` Hannes Reinecke @ 2020-06-04 13:57 ` Keith Busch 2020-06-04 15:38 ` Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: Keith Busch @ 2020-06-04 13:57 UTC (permalink / raw) To: Hannes Reinecke Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg On Thu, Jun 04, 2020 at 03:48:02PM +0200, Hannes Reinecke wrote: > On 6/4/20 3:22 PM, Keith Busch wrote: > > On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote: > > > When a controller reset happens during scanning nvme_identify_ns() > > > will be aborted, but the subsequent call to nvme_identify_ns_descs() > > > will stall as it will only be completed once the controller reconnect > > > is finished. > > > > If nvme_identify_ns() is aborted, shouldn't we not proceed to the next > > command? It looks like the code already does that, from nvme_alloc_ns(): > > > > ret = nvme_identify_ns(ctrl, nsid, &id); > > if (ret) > > goto out_free_queue; > > > > And then from nvme_revalidate_disk(): > > > > ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id); > > if (ret) > > goto out; > > > > Those are the only two paths to nvme_identify_ns_descs(), so it looks > > like nvme_identify_ns() must be successful in order to get there. > > > > True. But if the controller is entering reset juat _after_ the call to > nvme_identify_ns() there won't be any I/O to abort, and the scanning code > will proceed to call nvme_report_ns_ids() exposing this issue. > > > > But as the reconnect waits for scanning to complete the particular > > > namespace will deadlock and never reconnected again. > > > > The fix looks a bit fragile. What if the controller is reset immediately > > after the check for live? It'd be safer such that reconnect didn't have > > to wait for scan_work, no? The pci transport has no such dependency, > > for example. > > > Oh, that wouldn't matter; once the controller is reset (ie enters a non-LIVE > state) all outstanding I/O is aborted. > > The point here is that only _outstanding_ I/O is aborted. > If the scan thread continues sending I/O after that things stall. My concern is if the controller is LIVE at the moment of your new check, but is reset immediately after that, and before the identify command is sent. The identify won't be aborted since it hasn't been dispatched yet. Would that get everything stuck again? _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 13:57 ` Keith Busch @ 2020-06-04 15:38 ` Hannes Reinecke 0 siblings, 0 replies; 10+ messages in thread From: Hannes Reinecke @ 2020-06-04 15:38 UTC (permalink / raw) To: Keith Busch Cc: Keith Busch, Daniel Wagner, Christoph Hellwig, linux-nvme, Sagi Grimberg On 6/4/20 3:57 PM, Keith Busch wrote: > On Thu, Jun 04, 2020 at 03:48:02PM +0200, Hannes Reinecke wrote: >> On 6/4/20 3:22 PM, Keith Busch wrote: >>> On Thu, Jun 04, 2020 at 01:56:01PM +0200, Hannes Reinecke wrote: >>>> When a controller reset happens during scanning nvme_identify_ns() >>>> will be aborted, but the subsequent call to nvme_identify_ns_descs() >>>> will stall as it will only be completed once the controller reconnect >>>> is finished. >>> >>> If nvme_identify_ns() is aborted, shouldn't we not proceed to the next >>> command? It looks like the code already does that, from nvme_alloc_ns(): >>> >>> ret = nvme_identify_ns(ctrl, nsid, &id); >>> if (ret) >>> goto out_free_queue; >>> >>> And then from nvme_revalidate_disk(): >>> >>> ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id); >>> if (ret) >>> goto out; >>> >>> Those are the only two paths to nvme_identify_ns_descs(), so it looks >>> like nvme_identify_ns() must be successful in order to get there. >>> >> >> True. But if the controller is entering reset juat _after_ the call to >> nvme_identify_ns() there won't be any I/O to abort, and the scanning code >> will proceed to call nvme_report_ns_ids() exposing this issue. >> >>>> But as the reconnect waits for scanning to complete the particular >>>> namespace will deadlock and never reconnected again. >>> >>> The fix looks a bit fragile. What if the controller is reset immediately >>> after the check for live? It'd be safer such that reconnect didn't have >>> to wait for scan_work, no? The pci transport has no such dependency, >>> for example. >>> >> Oh, that wouldn't matter; once the controller is reset (ie enters a non-LIVE >> state) all outstanding I/O is aborted. >> >> The point here is that only _outstanding_ I/O is aborted. >> If the scan thread continues sending I/O after that things stall. > > My concern is if the controller is LIVE at the moment of your new check, > but is reset immediately after that, and before the identify command is > sent. The identify won't be aborted since it hasn't been dispatched yet. > Would that get everything stuck again? > Yes, but that's quite unlikely. Or, to be precise, the likelyhood of anything triggering the state change after that test is highly unlikely, as there is nothing which could have triggered it. Remember, we are currently scanning the same namespace for which a fault has been detected. Which means that there cannot any other I/O being sent to this namespace (as it's not visible yet to userland) but the I/O generated during scanning itself. And the state change will be happening during endio handling, so by the time we're checking the state any state change has already happened. While there might bo other namespaces using the same physical connection, and these namespaces might develop a fault, too, this event would then signalled via AENs, and would handled in a completely different thread. So this scenario shouldn't matter for this particular namespace. So I can't see how the state change should occur after that test. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke 2020-06-04 13:22 ` Keith Busch @ 2020-06-04 16:58 ` Sagi Grimberg 2020-06-05 6:20 ` Hannes Reinecke 1 sibling, 1 reply; 10+ messages in thread From: Sagi Grimberg @ 2020-06-04 16:58 UTC (permalink / raw) To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, Daniel Wagner, linux-nvme > When a controller reset happens during scanning nvme_identify_ns() > will be aborted, but the subsequent call to nvme_identify_ns_descs() > will stall as it will only be completed once the controller reconnect > is finished. > But as the reconnect waits for scanning to complete the particular > namespace will deadlock and never reconnected again. > > Reported-by: Martin George <martin.george@netapp.com> > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/nvme/host/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 569671e264b5..b811787505f7 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid, > memcpy(ids->eui64, id->eui64, sizeof(id->eui64)); > if (ctrl->vs >= NVME_VS(1, 2, 0)) > memcpy(ids->nguid, id->nguid, sizeof(id->nguid)); > - if (ctrl->vs >= NVME_VS(1, 3, 0)) > + if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE) I'm slightly allergic to these sort of state checks so untrivially placed... Hannes, did you have the below applied when reproducing this? -- commit 59c7c3caaaf8750df4ec3255082f15eb4e371514 (tag: block-5.7-2020-05-09) Author: Sagi Grimberg <sagi@grimberg.me> Date: Wed May 6 15:44:02 2020 -0700 nvme: fix possible hang when ns scanning fails during error recovery When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301). One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns identify descriptor list optional, we continue to compare ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace. Exactly what we don't want to happen. Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> -- _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() 2020-06-04 16:58 ` Sagi Grimberg @ 2020-06-05 6:20 ` Hannes Reinecke 0 siblings, 0 replies; 10+ messages in thread From: Hannes Reinecke @ 2020-06-05 6:20 UTC (permalink / raw) To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, Daniel Wagner, linux-nvme On 6/4/20 6:58 PM, Sagi Grimberg wrote: > >> When a controller reset happens during scanning nvme_identify_ns() >> will be aborted, but the subsequent call to nvme_identify_ns_descs() >> will stall as it will only be completed once the controller reconnect >> is finished. >> But as the reconnect waits for scanning to complete the particular >> namespace will deadlock and never reconnected again. >> >> Reported-by: Martin George <martin.george@netapp.com> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/nvme/host/core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 569671e264b5..b811787505f7 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -1792,7 +1792,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl >> *ctrl, unsigned int nsid, >> memcpy(ids->eui64, id->eui64, sizeof(id->eui64)); >> if (ctrl->vs >= NVME_VS(1, 2, 0)) >> memcpy(ids->nguid, id->nguid, sizeof(id->nguid)); >> - if (ctrl->vs >= NVME_VS(1, 3, 0)) >> + if (ctrl->vs >= NVME_VS(1, 3, 0) && ctrl->state == NVME_CTRL_LIVE) > > I'm slightly allergic to these sort of state checks so untrivially > placed... > > Hannes, did you have the below applied when reproducing this? > -- > commit 59c7c3caaaf8750df4ec3255082f15eb4e371514 (tag: block-5.7-2020-05-09) > Author: Sagi Grimberg <sagi@grimberg.me> > Date: Wed May 6 15:44:02 2020 -0700 > > nvme: fix possible hang when ns scanning fails during error recovery > > When the controller is reconnecting, the host fails I/O and admin > commands as the host cannot reach the controller. ns scanning may > revalidate namespaces during that period and it is wrong to remove > namespaces due to these failures as we may hang (see 205da2434301). > > One command that may fail is nvme_identify_ns_descs. Since we return > success due to having ns identify descriptor list optional, we > continue > to compare ns identifiers in nvme_revalidate_disk, obviously fail and > return -ENODEV to nvme_validate_ns, which will remove the namespace. > > Exactly what we don't want to happen. > > Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list > is optional") > Tested-by: Anton Eidelman <anton@lightbitslabs.com> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > Reviewed-by: Keith Busch <kbusch@kernel.org> > Signed-off-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > -- Ah. No, indeed I haven't. (The joys of running a downstream kernel). Will be retesting with that. Thanks for the pointer. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] nvme: do not update multipath disk information if the controller is down 2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke 2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke @ 2020-06-04 11:56 ` Hannes Reinecke 2020-06-04 16:53 ` [PATCH 0/2] nvme: fix reconnection stalls Sagi Grimberg 2 siblings, 0 replies; 10+ messages in thread From: Hannes Reinecke @ 2020-06-04 11:56 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme, Daniel Wagner, Sagi Grimberg, Keith Busch, Hannes Reinecke When nvme_revalidate_disk() is called while the controller is not live we need to avoid revalidating the multipath head device, as any I/O sent to the multipath device might not be possible and we would deadlock trying to flush the scan thread during reconnect. Reported-by: Martin George <martin.george@netapp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index b811787505f7..4b5bca5f2e4f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1971,7 +1971,7 @@ static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id) blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(iob)); nvme_update_disk_info(disk, ns, id); #ifdef CONFIG_NVME_MULTIPATH - if (ns->head->disk) { + if (ns->head->disk && ctrl->state == NVME_CTRL_LIVE) { nvme_update_disk_info(ns->head->disk, ns, id); blk_queue_stack_limits(ns->head->disk->queue, ns->queue); revalidate_disk(ns->head->disk); -- 2.16.4 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] nvme: fix reconnection stalls 2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke 2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke 2020-06-04 11:56 ` [PATCH 2/2] nvme: do not update multipath disk information if the controller is down Hannes Reinecke @ 2020-06-04 16:53 ` Sagi Grimberg 2 siblings, 0 replies; 10+ messages in thread From: Sagi Grimberg @ 2020-06-04 16:53 UTC (permalink / raw) To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, Daniel Wagner, linux-nvme > Hi all, And here we go again.. :) > during testing we have come across namespaces not being available > after reconnection. This is caused by (yet another) deadlock between > reconnect and scanning; reconnection will wait for the scan workqueue > to be flushed, I have a deja vu moment here, but where exactly the reconnection waits for the scan work to complete? > but scanning cannot make progress if I/O is > outstanding, as the I/O will only be completed _after_ reconnection > completed. Where is there a dependency for I/O complete only after reconnection? > The particular issue here is that nvme_revalidate_disk() might be > several I/O to be sent; the first one (IDENTIFY NAMESPACE) will be > terminated, but the subsequent ones are not. Why aren't they? we unquiesce the admin queue after we tear it down. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-06-05 6:20 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-04 11:56 [PATCH 0/2] nvme: fix reconnection stalls Hannes Reinecke 2020-06-04 11:56 ` [PATCH 1/2] nvme: check for NVME_CTRL_LIVE in nvme_report_ns_ids() Hannes Reinecke 2020-06-04 13:22 ` Keith Busch 2020-06-04 13:48 ` Hannes Reinecke 2020-06-04 13:57 ` Keith Busch 2020-06-04 15:38 ` Hannes Reinecke 2020-06-04 16:58 ` Sagi Grimberg 2020-06-05 6:20 ` Hannes Reinecke 2020-06-04 11:56 ` [PATCH 2/2] nvme: do not update multipath disk information if the controller is down Hannes Reinecke 2020-06-04 16:53 ` [PATCH 0/2] nvme: fix reconnection stalls Sagi Grimberg
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.