linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/3] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
@ 2020-08-20  3:54 Chao Leng
  2020-08-21  7:50 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Chao Leng @ 2020-08-20  3:54 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, kbusch, axboe, hch, sagi, lengchao

A deadlock happens When we test nvme over roce with link blink. The
reason: link blink will cause error recovery, and then reconnect.If
reconnect fail due to nvme_set_queue_count timeout, the reconnect
process will set the queue count as 0 and continue , and then
nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
because the admin queue is quiesced.

log:
Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
(881)
stack:
root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
[kworker/u12:4+nvme-wq]
[<0>] blk_execute_rq+0x69/0xa0
[<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
[<0>] nvme_features+0x73/0xb0 [nvme_core]
[<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
[<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
[<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
[<0>] process_one_work+0x1a7/0x370
[<0>] worker_thread+0x30/0x380
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x35/0x40

Many functions which call __nvme_submit_sync_cmd treat error code in two
modes: If error code less than 0, treat as command failed. If erroe code
more than 0, treat as target not support or other and continue.
NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR both are cancled io
by host, is not the real error code return from target. So we need set
the flag:NVME_REQ_CANCELLED. Thus __nvme_submit_sync_cmd translate
the error to INTR, nvme_set_queue_count will return error, reconnect
process will terminate instead of continue.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/core.c    | 1 +
 drivers/nvme/host/fabrics.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 43ac8a1ad65d..74f76aa78b02 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -307,6 +307,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 	if (blk_mq_request_completed(req))
 		return true;
 
+	nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
 	blk_mq_complete_request(req);
 	return true;
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 4ec4829d6233..6c40054f9fb4 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -552,6 +552,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;
 
+	nvme_req(rq)->flags |= NVME_REQ_CANCELLED;
 	nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
 	blk_mq_start_request(rq);
 	nvme_complete_rq(rq);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-20  3:54 [PATCH 2/3] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout Chao Leng
@ 2020-08-21  7:50 ` Christoph Hellwig
  2020-08-21 20:20   ` Sagi Grimberg
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2020-08-21  7:50 UTC (permalink / raw)
  To: Chao Leng; +Cc: linux-nvme, linux-block, kbusch, axboe, hch, sagi

On Thu, Aug 20, 2020 at 11:54:06AM +0800, Chao Leng wrote:
> A deadlock happens When we test nvme over roce with link blink. The
> reason: link blink will cause error recovery, and then reconnect.If
> reconnect fail due to nvme_set_queue_count timeout, the reconnect
> process will set the queue count as 0 and continue , and then
> nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
> because the admin queue is quiesced.
> 
> log:
> Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
> Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
> (881)
> stack:
> root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
> [kworker/u12:4+nvme-wq]
> [<0>] blk_execute_rq+0x69/0xa0
> [<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
> [<0>] nvme_features+0x73/0xb0 [nvme_core]
> [<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
> [<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
> [<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
> [<0>] process_one_work+0x1a7/0x370
> [<0>] worker_thread+0x30/0x380
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x35/0x40
> 
> Many functions which call __nvme_submit_sync_cmd treat error code in two
> modes: If error code less than 0, treat as command failed. If erroe code
> more than 0, treat as target not support or other and continue.
> NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR both are cancled io
> by host, is not the real error code return from target. So we need set
> the flag:NVME_REQ_CANCELLED. Thus __nvme_submit_sync_cmd translate
> the error to INTR, nvme_set_queue_count will return error, reconnect
> process will terminate instead of continue.

But we could still race with a real completion.  I suspect the right
answer is to translate NVME_SC_HOST_ABORTED_CMD and
NVME_SC_HOST_PATH_ERROR to a negative error code in
__nvme_submit_sync_cmd.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-21  7:50 ` Christoph Hellwig
@ 2020-08-21 20:20   ` Sagi Grimberg
  2020-08-25  7:15     ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2020-08-21 20:20 UTC (permalink / raw)
  To: Christoph Hellwig, Chao Leng; +Cc: linux-nvme, linux-block, kbusch, axboe


>> A deadlock happens When we test nvme over roce with link blink. The
>> reason: link blink will cause error recovery, and then reconnect.If
>> reconnect fail due to nvme_set_queue_count timeout, the reconnect
>> process will set the queue count as 0 and continue , and then
>> nvme_start_ctrl will call nvme_enable_aen, and deadlock happens
>> because the admin queue is quiesced.
>>
>> log:
>> Aug  3 22:47:24 localhost kernel: nvme nvme2: I/O 22 QID 0 timeout
>> Aug  3 22:47:24 localhost kernel: nvme nvme2: Could not set queue count
>> (881)
>> stack:
>> root     23848  0.0  0.0      0     0 ?        D    Aug03   0:00
>> [kworker/u12:4+nvme-wq]
>> [<0>] blk_execute_rq+0x69/0xa0
>> [<0>] __nvme_submit_sync_cmd+0xaf/0x1b0 [nvme_core]
>> [<0>] nvme_features+0x73/0xb0 [nvme_core]
>> [<0>] nvme_start_ctrl+0xa4/0x100 [nvme_core]
>> [<0>] nvme_rdma_setup_ctrl+0x438/0x700 [nvme_rdma]
>> [<0>] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma]
>> [<0>] process_one_work+0x1a7/0x370
>> [<0>] worker_thread+0x30/0x380
>> [<0>] kthread+0x112/0x130
>> [<0>] ret_from_fork+0x35/0x40
>>
>> Many functions which call __nvme_submit_sync_cmd treat error code in two
>> modes: If error code less than 0, treat as command failed. If erroe code
>> more than 0, treat as target not support or other and continue.
>> NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR both are cancled io
>> by host, is not the real error code return from target. So we need set
>> the flag:NVME_REQ_CANCELLED. Thus __nvme_submit_sync_cmd translate
>> the error to INTR, nvme_set_queue_count will return error, reconnect
>> process will terminate instead of continue.
> 
> But we could still race with a real completion.  I suspect the right
> answer is to translate NVME_SC_HOST_ABORTED_CMD and
> NVME_SC_HOST_PATH_ERROR to a negative error code in
> __nvme_submit_sync_cmd.

So the scheme you suggest is:
- treat any negative status or !DNR as "we never made it to
the target"
- Any positive status with DNR is a "controller generated status"

This will need a careful audit of all the call-sites we place such
assumptions...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout
  2020-08-21 20:20   ` Sagi Grimberg
@ 2020-08-25  7:15     ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2020-08-25  7:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Chao Leng, linux-nvme, linux-block, kbusch, axboe

On Fri, Aug 21, 2020 at 01:20:44PM -0700, Sagi Grimberg wrote:
>>> Many functions which call __nvme_submit_sync_cmd treat error code in two
>>> modes: If error code less than 0, treat as command failed. If erroe code
>>> more than 0, treat as target not support or other and continue.
>>> NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR both are cancled io
>>> by host, is not the real error code return from target. So we need set
>>> the flag:NVME_REQ_CANCELLED. Thus __nvme_submit_sync_cmd translate
>>> the error to INTR, nvme_set_queue_count will return error, reconnect
>>> process will terminate instead of continue.
>>
>> But we could still race with a real completion.  I suspect the right
>> answer is to translate NVME_SC_HOST_ABORTED_CMD and
>> NVME_SC_HOST_PATH_ERROR to a negative error code in
>> __nvme_submit_sync_cmd.
>
> So the scheme you suggest is:
> - treat any negative status or !DNR as "we never made it to
> the target"
> - Any positive status with DNR is a "controller generated status"
>
> This will need a careful audit of all the call-sites we place such
> assumptions...

No.  negative error means never made it to the controller, and we need
to map the magic NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR
errors to negative error codes if we want to keep using them (and IIRC
we started because it solved an issue, by my memory is foggy).  All the
real NVMe status codes come from the controller, so the commmand must
have made it there.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-25  7:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20  3:54 [PATCH 2/3] nvme-core: fix deadlock when reconnect failed due to nvme_set_queue_count timeout Chao Leng
2020-08-21  7:50 ` Christoph Hellwig
2020-08-21 20:20   ` Sagi Grimberg
2020-08-25  7:15     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).