linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
@ 2020-08-05  6:33 Chao Leng
  2020-08-05  7:10 ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Chao Leng @ 2020-08-05  6:33 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, lengchao, sagi

A deadlock happens When we test nvme over roce with link blink. The
reason: link blink will cause error recovery, and then reconnect.If
reconnect fail due to nvme_set_queue_count timeout, the reconnect
process will set the queue count as 0 and continue , and then
nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
because the admin queue is quiesced.

log:
Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
(881)
stack:
root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
[kworker/u12:4+nvme-wq]
[<0>] blk_execute_rq+0x69/0xa0
[<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
[<0>] nvme_features+0x73/0xb0 [nvme_core]
[<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
[<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
[<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
[<0>] process_one_work+0x1a7/0x370
[<0>] worker_thread+0x30/0x380
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x35/0x40

Many functions which call __nvme_submit_sync_cmd treat error code in two
modes: If error code less than 0, treat as command failed. If erroe code
more than 0, treat as target not support or other. So cancel io with
error code: NVME_SC_HOST_PATH_ERROR or NVME_SC_HOST_ABORTED_CMD, we need
set the flag:NVME_REQ_CANCELLED. Thus __nvme_submit_sync_cmd will return
error(less than 0), nvme_set_queue_count will return error, reconnect
progress will fail instead of continue.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/core.c    | 1 +
 drivers/nvme/host/fabrics.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2c5bc4fb702..865645577f2c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -303,6 +303,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 	if (blk_mq_request_completed(req))
 		return true;
 
+	nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
 	blk_mq_force_complete_rq(req);
 	return true;
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 2a6c8190eeb7..4e745603a3af 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -552,6 +552,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;
 
+	nvme_req(rq)->flags |= NVME_REQ_CANCELLED;
 	nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
 	blk_mq_start_request(rq);
 	nvme_complete_rq(rq);
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-05  6:33 [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout Chao Leng
@ 2020-08-05  7:10 ` Sagi Grimberg
  2020-08-05  7:27   ` Chao Leng
  0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2020-08-05  7:10 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch


> A deadlock happens When we test nvme over roce with link blink. The
> reason: link blink will cause error recovery, and then reconnect.If
> reconnect fail due to nvme_set_queue_count timeout, the reconnect
> process will set the queue count as 0 and continue , and then
> nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
> because the admin queue is quiesced.

Why is the admin queue quiesced? if we are calling set_queue_count
it was already unquiesced?

> log:
> Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
> Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
> (881)
> stack:
> root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
> [kworker/u12:4+nvme-wq]
> [<0>] blk_execute_rq+0x69/0xa0
> [<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
> [<0>] nvme_features+0x73/0xb0 [nvme_core]
> [<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
> [<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
> [<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
> [<0>] process_one_work+0x1a7/0x370
> [<0>] worker_thread+0x30/0x380
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x35/0x40
> 
> Many functions which call __nvme_submit_sync_cmd treat error code in two
> modes: If error code less than 0, treat as command failed. If erroe code
> more than 0, treat as target not support or other.

We rely in a lot of places on the nvme status being returned from
nvme_submit_sync_cmd (especially in nvme_revalidate_disk and for
path/aborted cancellations), and this patch breaks it. You need to find
a solution that does not hide the nvme status code from propagating
back.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-05  7:10 ` Sagi Grimberg
@ 2020-08-05  7:27   ` Chao Leng
  2020-08-05  8:22     ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Chao Leng @ 2020-08-05  7:27 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch



On 2020/8/5 15:10, Sagi Grimberg wrote:
> 
>> A deadlock happens When we test nvme over roce with link blink. The
>> reason: link blink will cause error recovery, and then reconnect.If
>> reconnect fail due to nvme_set_queue_count timeout, the reconnect
>> process will set the queue count as 0 and continue , and then
>> nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
>> because the admin queue is quiesced.
> 
> Why is the admin queue quiesced? if we are calling set_queue_count
> it was already unquiesced?
nvme_set_queue_count timeout will nvme_rdma_teardown_admin_queue, the admin queue
will be quiesced in nvme_rdma_teardown_admin_queue.
> 
>> log:
>> Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
>> Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
>> (881)
>> stack:
>> root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
>> [kworker/u12:4+nvme-wq]
>> [<0>] blk_execute_rq+0x69/0xa0
>> [<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
>> [<0>] nvme_features+0x73/0xb0 [nvme_core]
>> [<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
>> [<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
>> [<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
>> [<0>] process_one_work+0x1a7/0x370
>> [<0>] worker_thread+0x30/0x380
>> [<0>] kthread+0x112/0x130
>> [<0>] ret_from_fork+0x35/0x40
>>
>> Many functions which call __nvme_submit_sync_cmd treat error code in two
>> modes: If error code less than 0, treat as command failed. If erroe code
>> more than 0, treat as target not support or other.
> 
> We rely in a lot of places on the nvme status being returned from
> nvme_submit_sync_cmd (especially in nvme_revalidate_disk and for
> path/aborted cancellations), and this patch breaks it. You need to find
> a solution that does not hide the nvme status code from propagating
> back.
The difference is just EINTR and EIO, there is no real impact.
pci driver is already do like this.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-05  7:27   ` Chao Leng
@ 2020-08-05  8:22     ` Sagi Grimberg
  2020-08-05  9:42       ` Chao Leng
  0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2020-08-05  8:22 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch


>>> A deadlock happens When we test nvme over roce with link blink. The
>>> reason: link blink will cause error recovery, and then reconnect.If
>>> reconnect fail due to nvme_set_queue_count timeout, the reconnect
>>> process will set the queue count as 0 and continue , and then
>>> nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
>>> because the admin queue is quiesced.
>>
>> Why is the admin queue quiesced? if we are calling set_queue_count
>> it was already unquiesced?
> nvme_set_queue_count timeout will nvme_rdma_teardown_admin_queue

Not in the patchset I sent.

>, the 
> admin queue
> will be quiesced in nvme_rdma_teardown_admin_queue.
>>
>>> log:
>>> Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
>>> Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
>>> (881)
>>> stack:
>>> root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
>>> [kworker/u12:4+nvme-wq]
>>> [<0>] blk_execute_rq+0x69/0xa0
>>> [<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
>>> [<0>] nvme_features+0x73/0xb0 [nvme_core]
>>> [<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
>>> [<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
>>> [<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
>>> [<0>] process_one_work+0x1a7/0x370
>>> [<0>] worker_thread+0x30/0x380
>>> [<0>] kthread+0x112/0x130
>>> [<0>] ret_from_fork+0x35/0x40
>>>
>>> Many functions which call __nvme_submit_sync_cmd treat error code in two
>>> modes: If error code less than 0, treat as command failed. If erroe code
>>> more than 0, treat as target not support or other.
>>
>> We rely in a lot of places on the nvme status being returned from
>> nvme_submit_sync_cmd (especially in nvme_revalidate_disk and for
>> path/aborted cancellations), and this patch breaks it. You need to find
>> a solution that does not hide the nvme status code from propagating
>> back.
> The difference is just EINTR and EIO, there is no real impact.

It's not EIO, its propagating back the nvme status. And we need the
nvme status back to not falsely remove namespaces when we have
ns scanning during controller resets or network disconnects.

So as I said, you need to solve this issue without preventing the
nvme status propagate back.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-05  8:22     ` Sagi Grimberg
@ 2020-08-05  9:42       ` Chao Leng
  0 siblings, 0 replies; 5+ messages in thread
From: Chao Leng @ 2020-08-05  9:42 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch



On 2020/8/5 16:22, Sagi Grimberg wrote:
> 
>>>> A deadlock happens When we test nvme over roce with link blink. The
>>>> reason: link blink will cause error recovery, and then reconnect.If
>>>> reconnect fail due to nvme_set_queue_count timeout, the reconnect
>>>> process will set the queue count as 0 and continue , and then
>>>> nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
>>>> because the admin queue is quiesced.
>>>
>>> Why is the admin queue quiesced? if we are calling set_queue_count
>>> it was already unquiesced?
>> nvme_set_queue_count timeout will nvme_rdma_teardown_admin_queue
> 
> Not in the patchset I sent.
Yes, the nvme_rdma_teardown_admin_queue is already deleted. The connect
should terminated instead of continue, because continue can not success,
it is just an ineffective action. .
> 
>> , the admin queue
>> will be quiesced in nvme_rdma_teardown_admin_queue.
>>>
>>>> log:
>>>> Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
>>>> Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
>>>> (881)
>>>> stack:
>>>> root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
>>>> [kworker/u12:4+nvme-wq]
>>>> [<0>] blk_execute_rq+0x69/0xa0
>>>> [<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
>>>> [<0>] nvme_features+0x73/0xb0 [nvme_core]
>>>> [<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
>>>> [<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
>>>> [<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
>>>> [<0>] process_one_work+0x1a7/0x370
>>>> [<0>] worker_thread+0x30/0x380
>>>> [<0>] kthread+0x112/0x130
>>>> [<0>] ret_from_fork+0x35/0x40
>>>>
>>>> Many functions which call __nvme_submit_sync_cmd treat error code in two
>>>> modes: If error code less than 0, treat as command failed. If erroe code
>>>> more than 0, treat as target not support or other.
>>>
>>> We rely in a lot of places on the nvme status being returned from
>>> nvme_submit_sync_cmd (especially in nvme_revalidate_disk and for
>>> path/aborted cancellations), and this patch breaks it. You need to find
>>> a solution that does not hide the nvme status code from propagating
>>> back.
>> The difference is just EINTR and EIO, there is no real impact.
> 
> It's not EIO, its propagating back the nvme status. And we need the
> nvme status back to not falsely remove namespaces when we have
> ns scanning during controller resets or network disconnects.
I see your point. I think we already falsely remove namespaces now
if return error is EAGAIN or EBUSY etc. Maybe we need improve the
error code treat for nvme_revalidate_disk. like this:
---
  drivers/nvme/host/core.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 865645577f2c..d2a61798e9a1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2022,10 +2022,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
          * Only fail the function if we got a fatal error back from the
          * device, otherwise ignore the error and just move on.
          */
-       if (ret == -ENOMEM || (ret > 0 && !(ret & NVME_SC_DNR)))
-               ret = 0;
-       else if (ret > 0)
+       if (ret > 0 && (ret & NVME_SC_DNR))
                 ret = blk_status_to_errno(nvme_error_status(ret));
+       else if (ret != -ENODEV)
+               ret = 0;
         return ret;
  }

-- 
2.16.4

> 
> So as I said, you need to solve this issue without preventing the
> nvme status propagate back.
> .
We can solve this issue by check the status. But we need modify many
funtions, this is ugly. To distinguish other error codes, we may need
return EINTR for host cancel request(NVME_SC_HOST_ABORTED_CMD or
NVME_SC_HOST_PATH_ERROR), the caller need treat EINTR as canceled
request. It might be more appropriate.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-05  9:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05  6:33 [PATCH] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout Chao Leng
2020-08-05  7:10 ` Sagi Grimberg
2020-08-05  7:27   ` Chao Leng
2020-08-05  8:22     ` Sagi Grimberg
2020-08-05  9:42       ` Chao Leng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).