Re: [PATCH] nvme-rdma: fix deadlock when delete ctrl due to reconnect fail

From: Sagi Grimberg <sagi@grimberg.me>
To: Chao Leng <lengchao@huawei.com>, linux-nvme@lists.infradead.org
Cc: kbusch@kernel.org, axboe@fb.com, hch@lst.de
Subject: Re: [PATCH] nvme-rdma: fix deadlock when delete ctrl due to reconnect fail
Date: Mon, 27 Jul 2020 20:32:53 -0700	[thread overview]
Message-ID: <9e6242b2-fc04-632d-0ea4-7ca877c5da2c@grimberg.me> (raw)
In-Reply-To: <c229ca6e-646a-372a-2526-ad90e2da1abf@huawei.com>

>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>> index f8f856dc0c67..b381e2cde50a 100644
>>>> --- a/drivers/nvme/host/rdma.c
>>>> +++ b/drivers/nvme/host/rdma.c
>>>> @@ -989,8 +989,7 @@ static void nvme_rdma_teardown_io_queues(struct 
>>>> nvme_rdma_ctrl *ctrl,
>>>>                   nvme_cancel_request, &ctrl->ctrl);
>>>>               blk_mq_tagset_wait_completed_request(ctrl->ctrl.tagset);
>>>>           }
>>>> -        if (remove)
>>>> -            nvme_start_queues(&ctrl->ctrl);
>>>> +        nvme_start_queues(&ctrl->ctrl);
>>>
>>> This will fail I/O during controller reset, so nak on this.
> The io will do not fail. If work with native multipath or dm-multipath,
> nvme_rdma_queue_rq will return io error, and then multipath will
> fail over to other path and retry io, this is we expected. If work
> without multipath, nvme_rdma_queue_rq will return BLK_STS_RESOURCE,
> and then the upper layer will requeue and retry. Surely there is a
> weakness:the io will retry repeated every BLK_MQ_RESOURCE_DELAY(3ms)
> while reconnecting. Because controller reset may need long time,
> and nvme over roce is mainly used with multipath software, so when
> controller reset we expect fail over to other path and retry io,, just
> like error recovery. If work without multipath, we tolerate repeated
> I/O retries during error recovery or controller reset.

I/O should not fail during reset, mpath or not, period.

>> Can you try this:
>> -- 
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index d58231636d11..96c0d664fe9b 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1149,6 +1149,11 @@ static void 
>> nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
>>          return;
>>
>>   requeue:
>> +       /*
>> +        * make sure queues are not quiesced due to a reconnect
>> +        * sequence that failed after creating some I/O queues
>> +        */
>> +       nvme_start_queues(ctrl);
>>          dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
>>                          ctrl->ctrl.nr_reconnects);
>>          nvme_rdma_reconnect_or_remove(ctrl);
>> -- 
>> .Surely this can solve the problem, but when controller reset in
> multipath environment we expect fail over to other path and retry io,
> just like error recovery. Both for controller reset and error recovery,
> we need unquiesce the queue. So nvme_rdma_teardown_io_queues should
> directly unquiesce queues after cancel request, do not need care
> the parameter:remove.
> For this solution, if nvme_rdma_setup_ctrl failed in 
> nvme_rdma_reset_ctrl_work,
> will start nvme_rdma_reconnect_ctrl_work, thus no difference.
> If we do not tolerate repeated I/O retries during controller reset,
> we can do like this:
> ---
>   drivers/nvme/host/rdma.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index b381e2cde50a..ac79c4d294c2 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1936,6 +1936,7 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
>            */
>           flush_work(&ctrl->err_work);
>           nvme_rdma_teardown_io_queues(ctrl, false);
> +        nvme_start_queues(&ctrl->ctrl);
>           nvme_rdma_teardown_admin_queue(ctrl, false);
>           return BLK_EH_DONE;
>       }

That is fine too, just do not unquiesce the queues in a normal reset.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme