All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: don't wait freeze during resetting
@ 2022-09-20  1:57 Ming Lei
  2022-09-20  8:18 ` Sagi Grimberg
  2022-09-22 14:22 ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Ming Lei @ 2022-09-20  1:57 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme
  Cc: Yi Zhang, Ming Lei, Sagi Grimberg, Chao Leng, Keith Busch

First it isn't necessary to call nvme_wait_freeze during reset.
For nvme-pci, if tagset isn't allocated, there can't be any inflight
IOs; otherwise blk_mq_update_nr_hw_queues can freeze & wait queues.

Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
while requests are in progress"), it is fine to unfreeze queue without
draining inflight IOs.

Also both nvme-rdma and nvme-tcp's timeout handler provides forward
progress if the controller state isn't LIVE, so it is fine to drop
the timeout function of nvme_wait_freeze_timeout().

Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chao Leng <lengchao@huawei.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/apple.c |  1 -
 drivers/nvme/host/pci.c   |  1 -
 drivers/nvme/host/rdma.c  | 13 -------------
 drivers/nvme/host/tcp.c   | 13 -------------
 4 files changed, 28 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 5fc5ea196b40..9cd02b57fc85 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1126,7 +1126,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
 	anv->ctrl.queue_count = nr_io_queues + 1;
 
 	nvme_start_queues(&anv->ctrl);
-	nvme_wait_freeze(&anv->ctrl);
 	blk_mq_update_nr_hw_queues(&anv->tagset, 1);
 	nvme_unfreeze(&anv->ctrl);
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 98864b853eef..985b216907fc 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_free_tagset(dev);
 	} else {
 		nvme_start_queues(&dev->ctrl);
-		nvme_wait_freeze(&dev->ctrl);
 		if (!dev->ctrl.tagset)
 			nvme_pci_alloc_tag_set(dev);
 		else
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3100643be299..beb0d1a6a84d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -986,15 +986,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 
 	if (!new) {
 		nvme_start_queues(&ctrl->ctrl);
-		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
-			/*
-			 * If we timed out waiting for freeze we are likely to
-			 * be stuck.  Fail the controller initialization just
-			 * to be safe.
-			 */
-			ret = -ENODEV;
-			goto out_wait_freeze_timed_out;
-		}
 		blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
 			ctrl->ctrl.queue_count - 1);
 		nvme_unfreeze(&ctrl->ctrl);
@@ -1002,10 +993,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 
 	return 0;
 
-out_wait_freeze_timed_out:
-	nvme_stop_queues(&ctrl->ctrl);
-	nvme_sync_io_queues(&ctrl->ctrl);
-	nvme_rdma_stop_io_queues(ctrl);
 out_cleanup_connect_q:
 	nvme_cancel_tagset(&ctrl->ctrl);
 	if (new)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index d5871fd6f769..49d9bef806f9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1920,15 +1920,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 
 	if (!new) {
 		nvme_start_queues(ctrl);
-		if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
-			/*
-			 * If we timed out waiting for freeze we are likely to
-			 * be stuck.  Fail the controller initialization just
-			 * to be safe.
-			 */
-			ret = -ENODEV;
-			goto out_wait_freeze_timed_out;
-		}
 		blk_mq_update_nr_hw_queues(ctrl->tagset,
 			ctrl->queue_count - 1);
 		nvme_unfreeze(ctrl);
@@ -1936,10 +1927,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 
 	return 0;
 
-out_wait_freeze_timed_out:
-	nvme_stop_queues(ctrl);
-	nvme_sync_io_queues(ctrl);
-	nvme_tcp_stop_io_queues(ctrl);
 out_cleanup_connect_q:
 	nvme_cancel_tagset(ctrl);
 	if (new)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: don't wait freeze during resetting
  2022-09-20  1:57 [PATCH] nvme: don't wait freeze during resetting Ming Lei
@ 2022-09-20  8:18 ` Sagi Grimberg
  2022-09-21  1:25   ` Ming Lei
  2022-09-22 14:22 ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2022-09-20  8:18 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig, linux-nvme; +Cc: Yi Zhang, Chao Leng, Keith Busch


> First it isn't necessary to call nvme_wait_freeze during reset.
> For nvme-pci, if tagset isn't allocated, there can't be any inflight
> IOs; otherwise blk_mq_update_nr_hw_queues can freeze & wait queues.
> 
> Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> while requests are in progress"), it is fine to unfreeze queue without
> draining inflight IOs.
> 
> Also both nvme-rdma and nvme-tcp's timeout handler provides forward
> progress if the controller state isn't LIVE, so it is fine to drop
> the timeout function of nvme_wait_freeze_timeout().

The rdma/tcp should probably be split to separate patches.

> 
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Chao Leng <lengchao@huawei.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/apple.c |  1 -
>   drivers/nvme/host/pci.c   |  1 -
>   drivers/nvme/host/rdma.c  | 13 -------------
>   drivers/nvme/host/tcp.c   | 13 -------------
>   4 files changed, 28 deletions(-)
> 
> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> index 5fc5ea196b40..9cd02b57fc85 100644
> --- a/drivers/nvme/host/apple.c
> +++ b/drivers/nvme/host/apple.c
> @@ -1126,7 +1126,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
>   	anv->ctrl.queue_count = nr_io_queues + 1;
>   
>   	nvme_start_queues(&anv->ctrl);
> -	nvme_wait_freeze(&anv->ctrl);
>   	blk_mq_update_nr_hw_queues(&anv->tagset, 1);
>   	nvme_unfreeze(&anv->ctrl);
>   
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 98864b853eef..985b216907fc 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
>   		nvme_free_tagset(dev);
>   	} else {
>   		nvme_start_queues(&dev->ctrl);
> -		nvme_wait_freeze(&dev->ctrl);
>   		if (!dev->ctrl.tagset)
>   			nvme_pci_alloc_tag_set(dev);
>   		else
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 3100643be299..beb0d1a6a84d 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -986,15 +986,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
>   
>   	if (!new) {
>   		nvme_start_queues(&ctrl->ctrl);
> -		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
> -			/*
> -			 * If we timed out waiting for freeze we are likely to
> -			 * be stuck.  Fail the controller initialization just
> -			 * to be safe.
> -			 */
> -			ret = -ENODEV;
> -			goto out_wait_freeze_timed_out;
> -		}

So here is the description from the patch that introduced this:
--
nvme-rdma: fix reset hang if controller died in the middle of a reset

If the controller becomes unresponsive in the middle of a reset, we
will hang because we are waiting for the freeze to complete, but that
cannot happen since we have commands that are inflight holding the
q_usage_counter, and we can't blindly fail requests that times out.

So give a timeout and if we cannot wait for queue freeze before
unfreezing, fail and have the error handling take care how to
proceed (either schedule a reconnect of remove the controller).
--

So if between nvme_start_queues() and the freeze (with a full wait)
that is done in blk_mq_update_nr_hw_queues() the controller becomes
non responsive, in this case we may hang blocking on I/O that was
pending and requeued after nvme_start_queues().

The problem is, that we cannot do any error recovery because the
controller is in the middle of a reset/reconnect...
So the code that you deleted was designed to detect this state, and
reschedule another reconnect if the controller became non responsive.

What is preventing this from happening now?

I wish we had a test for this... It is very difficult to hit because the
controller needs to become non-responsive exactly at this point in the
reset...

>   		blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
>   			ctrl->ctrl.queue_count - 1);
>   		nvme_unfreeze(&ctrl->ctrl);
> @@ -1002,10 +993,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
>   
>   	return 0;
>   
> -out_wait_freeze_timed_out:
> -	nvme_stop_queues(&ctrl->ctrl);
> -	nvme_sync_io_queues(&ctrl->ctrl);
> -	nvme_rdma_stop_io_queues(ctrl);
>   out_cleanup_connect_q:
>   	nvme_cancel_tagset(&ctrl->ctrl);
>   	if (new)
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index d5871fd6f769..49d9bef806f9 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1920,15 +1920,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>   
>   	if (!new) {
>   		nvme_start_queues(ctrl);
> -		if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
> -			/*
> -			 * If we timed out waiting for freeze we are likely to
> -			 * be stuck.  Fail the controller initialization just
> -			 * to be safe.
> -			 */
> -			ret = -ENODEV;
> -			goto out_wait_freeze_timed_out;
> -		}
>   		blk_mq_update_nr_hw_queues(ctrl->tagset,
>   			ctrl->queue_count - 1);
>   		nvme_unfreeze(ctrl);
> @@ -1936,10 +1927,6 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>   
>   	return 0;
>   
> -out_wait_freeze_timed_out:
> -	nvme_stop_queues(ctrl);
> -	nvme_sync_io_queues(ctrl);
> -	nvme_tcp_stop_io_queues(ctrl);
>   out_cleanup_connect_q:
>   	nvme_cancel_tagset(ctrl);
>   	if (new)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: don't wait freeze during resetting
  2022-09-20  8:18 ` Sagi Grimberg
@ 2022-09-21  1:25   ` Ming Lei
  2022-09-21  8:19     ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-09-21  1:25 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Chao Leng, Keith Busch,
	ming.lei

On Tue, Sep 20, 2022 at 11:18:33AM +0300, Sagi Grimberg wrote:
> 
> > First it isn't necessary to call nvme_wait_freeze during reset.
> > For nvme-pci, if tagset isn't allocated, there can't be any inflight
> > IOs; otherwise blk_mq_update_nr_hw_queues can freeze & wait queues.
> > 
> > Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> > while requests are in progress"), it is fine to unfreeze queue without
> > draining inflight IOs.
> > 
> > Also both nvme-rdma and nvme-tcp's timeout handler provides forward
> > progress if the controller state isn't LIVE, so it is fine to drop
> > the timeout function of nvme_wait_freeze_timeout().
> 
> The rdma/tcp should probably be split to separate patches.
> 
> > 
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Chao Leng <lengchao@huawei.com>
> > Cc: Keith Busch <kbusch@kernel.org>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/apple.c |  1 -
> >   drivers/nvme/host/pci.c   |  1 -
> >   drivers/nvme/host/rdma.c  | 13 -------------
> >   drivers/nvme/host/tcp.c   | 13 -------------
> >   4 files changed, 28 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> > index 5fc5ea196b40..9cd02b57fc85 100644
> > --- a/drivers/nvme/host/apple.c
> > +++ b/drivers/nvme/host/apple.c
> > @@ -1126,7 +1126,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
> >   	anv->ctrl.queue_count = nr_io_queues + 1;
> >   	nvme_start_queues(&anv->ctrl);
> > -	nvme_wait_freeze(&anv->ctrl);
> >   	blk_mq_update_nr_hw_queues(&anv->tagset, 1);
> >   	nvme_unfreeze(&anv->ctrl);
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 98864b853eef..985b216907fc 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
> >   		nvme_free_tagset(dev);
> >   	} else {
> >   		nvme_start_queues(&dev->ctrl);
> > -		nvme_wait_freeze(&dev->ctrl);
> >   		if (!dev->ctrl.tagset)
> >   			nvme_pci_alloc_tag_set(dev);
> >   		else
> > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > index 3100643be299..beb0d1a6a84d 100644
> > --- a/drivers/nvme/host/rdma.c
> > +++ b/drivers/nvme/host/rdma.c
> > @@ -986,15 +986,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> >   	if (!new) {
> >   		nvme_start_queues(&ctrl->ctrl);
> > -		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
> > -			/*
> > -			 * If we timed out waiting for freeze we are likely to
> > -			 * be stuck.  Fail the controller initialization just
> > -			 * to be safe.
> > -			 */
> > -			ret = -ENODEV;
> > -			goto out_wait_freeze_timed_out;
> > -		}
> 
> So here is the description from the patch that introduced this:
> --
> nvme-rdma: fix reset hang if controller died in the middle of a reset
> 
> If the controller becomes unresponsive in the middle of a reset, we
> will hang because we are waiting for the freeze to complete, but that
> cannot happen since we have commands that are inflight holding the
> q_usage_counter, and we can't blindly fail requests that times out.
> 
> So give a timeout and if we cannot wait for queue freeze before
> unfreezing, fail and have the error handling take care how to
> proceed (either schedule a reconnect of remove the controller).
> --
> 
> So if between nvme_start_queues() and the freeze (with a full wait)
> that is done in blk_mq_update_nr_hw_queues() the controller becomes
> non responsive, in this case we may hang blocking on I/O that was
> pending and requeued after nvme_start_queues().
> 
> The problem is, that we cannot do any error recovery because the
> controller is in the middle of a reset/reconnect...
> So the code that you deleted was designed to detect this state, and
> reschedule another reconnect if the controller became non responsive.
> 
> What is preventing this from happening now?

Please see nvme_rdma_timeout() & nvme_tcp_timeout(), if controller state
isn't live, request will be aborted.


Thanks, 
Ming



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: don't wait freeze during resetting
  2022-09-21  1:25   ` Ming Lei
@ 2022-09-21  8:19     ` Sagi Grimberg
  2022-09-22  8:36       ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2022-09-21  8:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Chao Leng, Keith Busch



On 9/21/22 04:25, Ming Lei wrote:
> On Tue, Sep 20, 2022 at 11:18:33AM +0300, Sagi Grimberg wrote:
>>
>>> First it isn't necessary to call nvme_wait_freeze during reset.
>>> For nvme-pci, if tagset isn't allocated, there can't be any inflight
>>> IOs; otherwise blk_mq_update_nr_hw_queues can freeze & wait queues.
>>>
>>> Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
>>> while requests are in progress"), it is fine to unfreeze queue without
>>> draining inflight IOs.
>>>
>>> Also both nvme-rdma and nvme-tcp's timeout handler provides forward
>>> progress if the controller state isn't LIVE, so it is fine to drop
>>> the timeout function of nvme_wait_freeze_timeout().
>>
>> The rdma/tcp should probably be split to separate patches.
>>
>>>
>>> Cc: Sagi Grimberg <sagi@grimberg.me>
>>> Cc: Chao Leng <lengchao@huawei.com>
>>> Cc: Keith Busch <kbusch@kernel.org>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    drivers/nvme/host/apple.c |  1 -
>>>    drivers/nvme/host/pci.c   |  1 -
>>>    drivers/nvme/host/rdma.c  | 13 -------------
>>>    drivers/nvme/host/tcp.c   | 13 -------------
>>>    4 files changed, 28 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
>>> index 5fc5ea196b40..9cd02b57fc85 100644
>>> --- a/drivers/nvme/host/apple.c
>>> +++ b/drivers/nvme/host/apple.c
>>> @@ -1126,7 +1126,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
>>>    	anv->ctrl.queue_count = nr_io_queues + 1;
>>>    	nvme_start_queues(&anv->ctrl);
>>> -	nvme_wait_freeze(&anv->ctrl);
>>>    	blk_mq_update_nr_hw_queues(&anv->tagset, 1);
>>>    	nvme_unfreeze(&anv->ctrl);
>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>> index 98864b853eef..985b216907fc 100644
>>> --- a/drivers/nvme/host/pci.c
>>> +++ b/drivers/nvme/host/pci.c
>>> @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
>>>    		nvme_free_tagset(dev);
>>>    	} else {
>>>    		nvme_start_queues(&dev->ctrl);
>>> -		nvme_wait_freeze(&dev->ctrl);
>>>    		if (!dev->ctrl.tagset)
>>>    			nvme_pci_alloc_tag_set(dev);
>>>    		else
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index 3100643be299..beb0d1a6a84d 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -986,15 +986,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
>>>    	if (!new) {
>>>    		nvme_start_queues(&ctrl->ctrl);
>>> -		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
>>> -			/*
>>> -			 * If we timed out waiting for freeze we are likely to
>>> -			 * be stuck.  Fail the controller initialization just
>>> -			 * to be safe.
>>> -			 */
>>> -			ret = -ENODEV;
>>> -			goto out_wait_freeze_timed_out;
>>> -		}
>>
>> So here is the description from the patch that introduced this:
>> --
>> nvme-rdma: fix reset hang if controller died in the middle of a reset
>>
>> If the controller becomes unresponsive in the middle of a reset, we
>> will hang because we are waiting for the freeze to complete, but that
>> cannot happen since we have commands that are inflight holding the
>> q_usage_counter, and we can't blindly fail requests that times out.
>>
>> So give a timeout and if we cannot wait for queue freeze before
>> unfreezing, fail and have the error handling take care how to
>> proceed (either schedule a reconnect of remove the controller).
>> --
>>
>> So if between nvme_start_queues() and the freeze (with a full wait)
>> that is done in blk_mq_update_nr_hw_queues() the controller becomes
>> non responsive, in this case we may hang blocking on I/O that was
>> pending and requeued after nvme_start_queues().
>>
>> The problem is, that we cannot do any error recovery because the
>> controller is in the middle of a reset/reconnect...
>> So the code that you deleted was designed to detect this state, and
>> reschedule another reconnect if the controller became non responsive.
>>
>> What is preventing this from happening now?
> 
> Please see nvme_rdma_timeout() & nvme_tcp_timeout(), if controller state
> isn't live, request will be aborted.

I agree with you. However non-mpath devices will most likely retry the
command and not fail it like in the multipath case (see 
nvme_decide_disposition) and will cause the I/O to block.

While it is arguable if non-mpath fabrics devices are important in any
capacity, the design was that IO is not completed until the controller
either successfully reconnects (and retried), or it disconnects
(failed), or fast_io_fail_tmo expires.

Hence for non-mpath controllers, the request(s) will timeout, and
aborted, but nvme will opt to retry them instead of completing them
with a failure (at least until fast_io_fail_tmo expires, but that can
be arbitrarily long).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: don't wait freeze during resetting
  2022-09-21  8:19     ` Sagi Grimberg
@ 2022-09-22  8:36       ` Ming Lei
  0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2022-09-22  8:36 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Chao Leng, Keith Busch

On Wed, Sep 21, 2022 at 11:19:21AM +0300, Sagi Grimberg wrote:
> 
> 
> On 9/21/22 04:25, Ming Lei wrote:
> > On Tue, Sep 20, 2022 at 11:18:33AM +0300, Sagi Grimberg wrote:
> > > 
> > > > First it isn't necessary to call nvme_wait_freeze during reset.
> > > > For nvme-pci, if tagset isn't allocated, there can't be any inflight
> > > > IOs; otherwise blk_mq_update_nr_hw_queues can freeze & wait queues.
> > > > 
> > > > Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> > > > while requests are in progress"), it is fine to unfreeze queue without
> > > > draining inflight IOs.
> > > > 
> > > > Also both nvme-rdma and nvme-tcp's timeout handler provides forward
> > > > progress if the controller state isn't LIVE, so it is fine to drop
> > > > the timeout function of nvme_wait_freeze_timeout().
> > > 
> > > The rdma/tcp should probably be split to separate patches.
> > > 
> > > > 
> > > > Cc: Sagi Grimberg <sagi@grimberg.me>
> > > > Cc: Chao Leng <lengchao@huawei.com>
> > > > Cc: Keith Busch <kbusch@kernel.org>
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >    drivers/nvme/host/apple.c |  1 -
> > > >    drivers/nvme/host/pci.c   |  1 -
> > > >    drivers/nvme/host/rdma.c  | 13 -------------
> > > >    drivers/nvme/host/tcp.c   | 13 -------------
> > > >    4 files changed, 28 deletions(-)
> > > > 
> > > > diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> > > > index 5fc5ea196b40..9cd02b57fc85 100644
> > > > --- a/drivers/nvme/host/apple.c
> > > > +++ b/drivers/nvme/host/apple.c
> > > > @@ -1126,7 +1126,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
> > > >    	anv->ctrl.queue_count = nr_io_queues + 1;
> > > >    	nvme_start_queues(&anv->ctrl);
> > > > -	nvme_wait_freeze(&anv->ctrl);
> > > >    	blk_mq_update_nr_hw_queues(&anv->tagset, 1);
> > > >    	nvme_unfreeze(&anv->ctrl);
> > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > > index 98864b853eef..985b216907fc 100644
> > > > --- a/drivers/nvme/host/pci.c
> > > > +++ b/drivers/nvme/host/pci.c
> > > > @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
> > > >    		nvme_free_tagset(dev);
> > > >    	} else {
> > > >    		nvme_start_queues(&dev->ctrl);
> > > > -		nvme_wait_freeze(&dev->ctrl);
> > > >    		if (!dev->ctrl.tagset)
> > > >    			nvme_pci_alloc_tag_set(dev);
> > > >    		else
> > > > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > > > index 3100643be299..beb0d1a6a84d 100644
> > > > --- a/drivers/nvme/host/rdma.c
> > > > +++ b/drivers/nvme/host/rdma.c
> > > > @@ -986,15 +986,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> > > >    	if (!new) {
> > > >    		nvme_start_queues(&ctrl->ctrl);
> > > > -		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
> > > > -			/*
> > > > -			 * If we timed out waiting for freeze we are likely to
> > > > -			 * be stuck.  Fail the controller initialization just
> > > > -			 * to be safe.
> > > > -			 */
> > > > -			ret = -ENODEV;
> > > > -			goto out_wait_freeze_timed_out;
> > > > -		}
> > > 
> > > So here is the description from the patch that introduced this:
> > > --
> > > nvme-rdma: fix reset hang if controller died in the middle of a reset
> > > 
> > > If the controller becomes unresponsive in the middle of a reset, we
> > > will hang because we are waiting for the freeze to complete, but that
> > > cannot happen since we have commands that are inflight holding the
> > > q_usage_counter, and we can't blindly fail requests that times out.
> > > 
> > > So give a timeout and if we cannot wait for queue freeze before
> > > unfreezing, fail and have the error handling take care how to
> > > proceed (either schedule a reconnect of remove the controller).
> > > --
> > > 
> > > So if between nvme_start_queues() and the freeze (with a full wait)
> > > that is done in blk_mq_update_nr_hw_queues() the controller becomes
> > > non responsive, in this case we may hang blocking on I/O that was
> > > pending and requeued after nvme_start_queues().
> > > 
> > > The problem is, that we cannot do any error recovery because the
> > > controller is in the middle of a reset/reconnect...
> > > So the code that you deleted was designed to detect this state, and
> > > reschedule another reconnect if the controller became non responsive.
> > > 
> > > What is preventing this from happening now?
> > 
> > Please see nvme_rdma_timeout() & nvme_tcp_timeout(), if controller state
> > isn't live, request will be aborted.
> 
> I agree with you. However non-mpath devices will most likely retry the
> command and not fail it like in the multipath case (see
> nvme_decide_disposition) and will cause the I/O to block.
> 
> While it is arguable if non-mpath fabrics devices are important in any
> capacity, the design was that IO is not completed until the controller
> either successfully reconnects (and retried), or it disconnects
> (failed), or fast_io_fail_tmo expires.
> 
> Hence for non-mpath controllers, the request(s) will timeout, and
> aborted, but nvme will opt to retry them instead of completing them
> with a failure (at least until fast_io_fail_tmo expires, but that can
> be arbitrarily long).

OK, I think it is better to change the behavior for non-mpath rdma/tcp, will
remove it in next version.


thanks,
Ming



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: don't wait freeze during resetting
  2022-09-20  1:57 [PATCH] nvme: don't wait freeze during resetting Ming Lei
  2022-09-20  8:18 ` Sagi Grimberg
@ 2022-09-22 14:22 ` Christoph Hellwig
  2022-09-22 14:22   ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-09-22 14:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
	Chao Leng, Keith Busch

Thanks,

applied for nvme-6.1.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: don't wait freeze during resetting
  2022-09-22 14:22 ` Christoph Hellwig
@ 2022-09-22 14:22   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-09-22 14:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
	Chao Leng, Keith Busch

On Thu, Sep 22, 2022 at 04:22:11PM +0200, Christoph Hellwig wrote:
> Thanks,
> 
> applied for nvme-6.1.

Sorry, I replied to the wrong mail, so strike this.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-22 14:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20  1:57 [PATCH] nvme: don't wait freeze during resetting Ming Lei
2022-09-20  8:18 ` Sagi Grimberg
2022-09-21  1:25   ` Ming Lei
2022-09-21  8:19     ` Sagi Grimberg
2022-09-22  8:36       ` Ming Lei
2022-09-22 14:22 ` Christoph Hellwig
2022-09-22 14:22   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.