linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] avoid double request completion and IO error
@ 2021-01-21  7:03 Chao Leng
  2021-01-21  7:03 ` [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete Chao Leng
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-21  7:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

Add blk_mq_set_request_complete and nvme_complete_failed_req for two bug
fixs.
First avoid double request completion for nvmf_fail_nonready_command.
Second avoid IO error for nvme native multipath.

V3:
	- complete the request just for HBA driver path related error.
V2:
	- use "switch" instead "if" to check return status.

Chao Leng (5):
  blk-mq: introduce blk_mq_set_request_complete
  nvme-core: introduce complete failed request
  nvme-fabrics: avoid double request completion for
    nvmf_fail_nonready_command
  nvme-rdma: avoid IO error for nvme native multipath
  nvme-fc: avoid IO error for nvme native multipath

 drivers/nvme/host/fabrics.c | 4 +---
 drivers/nvme/host/fc.c      | 7 ++++++-
 drivers/nvme/host/nvme.h    | 8 ++++++++
 drivers/nvme/host/rdma.c    | 9 ++++++++-
 include/linux/blk-mq.h      | 5 +++++
 5 files changed, 28 insertions(+), 5 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete
  2021-01-21  7:03 [PATCH v3 0/5] avoid double request completion and IO error Chao Leng
@ 2021-01-21  7:03 ` Chao Leng
  2021-01-21  8:40   ` Christoph Hellwig
  2021-01-21  7:03 ` [PATCH v3 2/5] nvme-core: introduce complete failed request Chao Leng
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-01-21  7:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

nvme drivers need to set the state of request to MQ_RQ_COMPLETE when
directly complete request in queue_rq.
So add blk_mq_set_request_complete.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 include/linux/blk-mq.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 47b021952ac7..fc096b04bb7a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -494,6 +494,11 @@ static inline int blk_mq_request_completed(struct request *rq)
 	return blk_mq_rq_state(rq) == MQ_RQ_COMPLETE;
 }
 
+static inline void blk_mq_set_request_complete(struct request *rq)
+{
+	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
+}
+
 void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, blk_status_t error);
 void __blk_mq_end_request(struct request *rq, blk_status_t error);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/5] nvme-core: introduce complete failed request
  2021-01-21  7:03 [PATCH v3 0/5] avoid double request completion and IO error Chao Leng
  2021-01-21  7:03 ` [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete Chao Leng
@ 2021-01-21  7:03 ` Chao Leng
  2021-01-21  8:41   ` Christoph Hellwig
  2021-01-21  7:03 ` [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command Chao Leng
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-01-21  7:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

Work with nvme native multipath, if a path related error occurs when
queue_rq call HBA drive to send request, queue_rq need complete the
request with NVME_SC_HOST_PATH_ERROR, the request will fail over to
retry if needed.
So introduce nvme_complete_failed_req for queue_rq and
nvmf_fail_nonready_command.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/nvme.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 88a6b97247f5..01fb54ba3d1f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -575,6 +575,14 @@ static inline bool nvme_is_aen_req(u16 qid, __u16 command_id)
 }
 
 void nvme_complete_rq(struct request *req);
+
+static inline void nvme_complete_failed_req(struct request *req)
+{
+	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
+	blk_mq_set_request_complete(req);
+	nvme_complete_rq(req);
+}
+
 bool nvme_cancel_request(struct request *req, void *data, bool reserved);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command
  2021-01-21  7:03 [PATCH v3 0/5] avoid double request completion and IO error Chao Leng
  2021-01-21  7:03 ` [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete Chao Leng
  2021-01-21  7:03 ` [PATCH v3 2/5] nvme-core: introduce complete failed request Chao Leng
@ 2021-01-21  7:03 ` Chao Leng
  2021-01-21  8:58   ` Hannes Reinecke
  2021-01-21  7:03 ` [PATCH v3 4/5] nvme-rdma: avoid IO error for nvme native multipath Chao Leng
  2021-01-21  7:03 ` [PATCH v3 5/5] nvme-fc: " Chao Leng
  4 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-01-21  7:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

When reconnect, the request may be completed with NVME_SC_HOST_PATH_ERROR
in nvmf_fail_nonready_command. The state of request will be changed to
MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
asynchronously such as in nvme_submit_user_cmd, in extreme scenario
the request may be completed again in tear down process.
nvmf_fail_nonready_command do not need calling blk_mq_start_request
before complete the request. nvmf_fail_nonready_command should set
the state of request to MQ_RQ_COMPLETE before complete the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/fabrics.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 72ac00173500..874e4320e214 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -553,9 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;
 
-	nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
-	blk_mq_start_request(rq);
-	nvme_complete_rq(rq);
+	nvme_complete_failed_req(rq);
 	return BLK_STS_OK;
 }
 EXPORT_SYMBOL_GPL(nvmf_fail_nonready_command);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/5] nvme-rdma: avoid IO error for nvme native multipath
  2021-01-21  7:03 [PATCH v3 0/5] avoid double request completion and IO error Chao Leng
                   ` (2 preceding siblings ...)
  2021-01-21  7:03 ` [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command Chao Leng
@ 2021-01-21  7:03 ` Chao Leng
  2021-01-21  7:03 ` [PATCH v3 5/5] nvme-fc: " Chao Leng
  4 siblings, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-21  7:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

Work with nvme native multipath, if a path related error occurs when
queue_rq call HBA drive to send request, queue_rq will return
BLK_STS_IOERR to blk-mq. The request is completed with BLK_STS_IOERR
instead of fail over to retry.
queue_rq need call nvme_complete_rq to complete the request with
NVME_SC_HOST_PATH_ERROR, the request will fail over to retry if needed.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/rdma.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index cf6c49d09c82..5fb0838a5f8b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2030,6 +2030,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	bool queue_ready = test_bit(NVME_RDMA_Q_LIVE, &queue->flags);
 	blk_status_t ret;
 	int err;
+	bool driver_error = false;
 
 	WARN_ON_ONCE(rq->tag < 0);
 
@@ -2077,8 +2078,10 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
 			req->mr ? &req->reg_wr.wr : NULL);
-	if (unlikely(err))
+	if (unlikely(err)) {
+		driver_error = true;
 		goto err_unmap;
+	}
 
 	return BLK_STS_OK;
 
@@ -2093,6 +2096,10 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 unmap_qe:
 	ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
 			    DMA_TO_DEVICE);
+	if (driver_error && ret == BLK_STS_IOERR) {
+		nvme_complete_failed_req(rq);
+		ret = BLK_STS_OK;
+	}
 	return ret;
 }
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/5] nvme-fc: avoid IO error for nvme native multipath
  2021-01-21  7:03 [PATCH v3 0/5] avoid double request completion and IO error Chao Leng
                   ` (3 preceding siblings ...)
  2021-01-21  7:03 ` [PATCH v3 4/5] nvme-rdma: avoid IO error for nvme native multipath Chao Leng
@ 2021-01-21  7:03 ` Chao Leng
  4 siblings, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-21  7:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

Work with nvme native multipath, if a path related error occurs when
queue_rq call HBA drive to send request, queue_rq will return
BLK_STS_IOERR to blk-mq. The request is completed with BLK_STS_IOERR
instead of fail over to retry.
queue_rq need call nvme_complete_rq to complete the request with
NVME_SC_HOST_PATH_ERROR, the request will fail over to retry if needed.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/fc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 5f36cfa8136c..ebc9911f9528 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2791,7 +2791,12 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 
-	return nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
+	ret = nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
+	if (ret == BLK_STS_IOERR) {
+		nvme_complete_failed_req(rq);
+		ret = BLK_STS_OK;
+	}
+	return ret;
 }
 
 static void
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete
  2021-01-21  7:03 ` [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete Chao Leng
@ 2021-01-21  8:40   ` Christoph Hellwig
  2021-01-22  1:46     ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2021-01-21  8:40 UTC (permalink / raw)
  To: Chao Leng; +Cc: linux-nvme, kbusch, axboe, hch, sagi, linux-block, axboe

On Thu, Jan 21, 2021 at 03:03:26PM +0800, Chao Leng wrote:
> nvme drivers need to set the state of request to MQ_RQ_COMPLETE when
> directly complete request in queue_rq.
> So add blk_mq_set_request_complete.

So I'm not happy with this helper.  It should at least:

 a) be named and documented to only apply for the ->queue_rq faіlure case
 b) check that the request is in MQ_RQ_IDLE state

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/5] nvme-core: introduce complete failed request
  2021-01-21  7:03 ` [PATCH v3 2/5] nvme-core: introduce complete failed request Chao Leng
@ 2021-01-21  8:41   ` Christoph Hellwig
  2021-01-22  1:46     ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2021-01-21  8:41 UTC (permalink / raw)
  To: Chao Leng; +Cc: linux-nvme, kbusch, axboe, hch, sagi, linux-block, axboe

> +static inline void nvme_complete_failed_req(struct request *req)

I think the name is too generic, and the function also needs a little
comment, especially as it forces a specific error code.

> +{
> +	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
> +	blk_mq_set_request_complete(req);
> +	nvme_complete_rq(req);
> +}

Also no need to mark this as an inline function.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command
  2021-01-21  7:03 ` [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command Chao Leng
@ 2021-01-21  8:58   ` Hannes Reinecke
  2021-01-21  9:00     ` Christoph Hellwig
  2021-01-22  1:48     ` Chao Leng
  0 siblings, 2 replies; 15+ messages in thread
From: Hannes Reinecke @ 2021-01-21  8:58 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

On 1/21/21 8:03 AM, Chao Leng wrote:
> When reconnect, the request may be completed with NVME_SC_HOST_PATH_ERROR
> in nvmf_fail_nonready_command. The state of request will be changed to
> MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
> asynchronously such as in nvme_submit_user_cmd, in extreme scenario
> the request may be completed again in tear down process.
> nvmf_fail_nonready_command do not need calling blk_mq_start_request
> before complete the request. nvmf_fail_nonready_command should set
> the state of request to MQ_RQ_COMPLETE before complete the request.
> 

So what you are saying is that there is a race condition between
blk_mq_start_request()
and
nvme_complete_request()

> Signed-off-by: Chao Leng <lengchao@huawei.com>
> ---
>   drivers/nvme/host/fabrics.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 72ac00173500..874e4320e214 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -553,9 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
>   	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>   		return BLK_STS_RESOURCE;
>   
> -	nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
> -	blk_mq_start_request(rq);
> -	nvme_complete_rq(rq);
> +	nvme_complete_failed_req(rq);
>   	return BLK_STS_OK;
>   }
>   EXPORT_SYMBOL_GPL(nvmf_fail_nonready_command);
> I'd rather have 'nvme_complete_failed_req()' accept the status as 
argument, like

nvme_complete_failed_request(rq, NVME_SC_HOST_PATH_ERROR)

that way it's obvious what is happening, and the status isn't hidden in 
the function.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command
  2021-01-21  8:58   ` Hannes Reinecke
@ 2021-01-21  9:00     ` Christoph Hellwig
  2021-01-21  9:27       ` Hannes Reinecke
  2021-01-22  1:48     ` Chao Leng
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2021-01-21  9:00 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chao Leng, linux-nvme, axboe, linux-block, sagi, axboe, kbusch, hch

On Thu, Jan 21, 2021 at 09:58:37AM +0100, Hannes Reinecke wrote:
> On 1/21/21 8:03 AM, Chao Leng wrote:
>> When reconnect, the request may be completed with NVME_SC_HOST_PATH_ERROR
>> in nvmf_fail_nonready_command. The state of request will be changed to
>> MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
>> asynchronously such as in nvme_submit_user_cmd, in extreme scenario
>> the request may be completed again in tear down process.
>> nvmf_fail_nonready_command do not need calling blk_mq_start_request
>> before complete the request. nvmf_fail_nonready_command should set
>> the state of request to MQ_RQ_COMPLETE before complete the request.
>>
>
> So what you are saying is that there is a race condition between
> blk_mq_start_request()
> and
> nvme_complete_request()

Between those to a teardwon that cancels all requests can come in.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command
  2021-01-21  9:00     ` Christoph Hellwig
@ 2021-01-21  9:27       ` Hannes Reinecke
  2021-01-22  1:50         ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2021-01-21  9:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, axboe, sagi, linux-nvme, linux-block, Chao Leng, kbusch

On 1/21/21 10:00 AM, Christoph Hellwig wrote:
> On Thu, Jan 21, 2021 at 09:58:37AM +0100, Hannes Reinecke wrote:
>> On 1/21/21 8:03 AM, Chao Leng wrote:
>>> When reconnect, the request may be completed with NVME_SC_HOST_PATH_ERROR
>>> in nvmf_fail_nonready_command. The state of request will be changed to
>>> MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
>>> asynchronously such as in nvme_submit_user_cmd, in extreme scenario
>>> the request may be completed again in tear down process.
>>> nvmf_fail_nonready_command do not need calling blk_mq_start_request
>>> before complete the request. nvmf_fail_nonready_command should set
>>> the state of request to MQ_RQ_COMPLETE before complete the request.
>>>
>>
>> So what you are saying is that there is a race condition between
>> blk_mq_start_request()
>> and
>> nvme_complete_request()
> 
> Between those to a teardown that cancels all requests can come in.
> 
Doesn't nvme_complete_request() insulate against a double completion?
I seem to remember we've gone through great lengths ensuring that.

And if this is just about setting the correct error code on completion 
I'd really prefer to stick with the current code. Moving that into a 
helper is fine, but I'd rather not introduce our own code modifying 
request state.

If there really is a race condition this feels like a more generic 
problem; calling blk_mq_start_request() followed by blk_mq_end_request() 
is a quite common pattern, and from my impression the recommended way.
So if there is an issue it would need to be addressed for all drivers, 
not just some nvme-specific way.
Plus I'd like to have Jens' opinion here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete
  2021-01-21  8:40   ` Christoph Hellwig
@ 2021-01-22  1:46     ` Chao Leng
  0 siblings, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-22  1:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, kbusch, axboe, sagi, linux-block, axboe



On 2021/1/21 16:40, Christoph Hellwig wrote:
> On Thu, Jan 21, 2021 at 03:03:26PM +0800, Chao Leng wrote:
>> nvme drivers need to set the state of request to MQ_RQ_COMPLETE when
>> directly complete request in queue_rq.
>> So add blk_mq_set_request_complete.
> 
> So I'm not happy with this helper.  It should at least:
> 
>   a) be named and documented to only apply for the ->queue_rq faіlure case
Although blk_mq_complete_request_remote can set the markup directly,
blk_mq_complete_request_remote can also use this helper. This helper do
not need special processing for ->queue_rq failure.
>   b) check that the request is in MQ_RQ_IDLE state
No, the request may also be in MQ_RQ_IN_FLIGHT state. Do not need to
care whether the original state is MQ_RQ_IDLE or MQ_RQ_IN_FLIGHT.
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/5] nvme-core: introduce complete failed request
  2021-01-21  8:41   ` Christoph Hellwig
@ 2021-01-22  1:46     ` Chao Leng
  0 siblings, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-22  1:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, kbusch, axboe, sagi, linux-block, axboe



On 2021/1/21 16:41, Christoph Hellwig wrote:
>> +static inline void nvme_complete_failed_req(struct request *req)
> 
> I think the name is too generic, and the function also needs a little
> comment, especially as it forces a specific error code.
Ok, thank you for your suggestion. Use error status as a parameter may
be a better choice.
> 
>> +{
>> +	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
>> +	blk_mq_set_request_complete(req);
>> +	nvme_complete_rq(req);
>> +}
> 
> Also no need to mark this as an inline function.
Yes, export the function is ok.
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command
  2021-01-21  8:58   ` Hannes Reinecke
  2021-01-21  9:00     ` Christoph Hellwig
@ 2021-01-22  1:48     ` Chao Leng
  1 sibling, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-22  1:48 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch



On 2021/1/21 16:58, Hannes Reinecke wrote:
> On 1/21/21 8:03 AM, Chao Leng wrote:
>> When reconnect, the request may be completed with NVME_SC_HOST_PATH_ERROR
>> in nvmf_fail_nonready_command. The state of request will be changed to
>> MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
>> asynchronously such as in nvme_submit_user_cmd, in extreme scenario
>> the request may be completed again in tear down process.
>> nvmf_fail_nonready_command do not need calling blk_mq_start_request
>> before complete the request. nvmf_fail_nonready_command should set
>> the state of request to MQ_RQ_COMPLETE before complete the request.
>>
> 
> So what you are saying is that there is a race condition between
> blk_mq_start_request()
> and
> nvme_complete_request()
Yes. The race is:
process1:error recovery->tear down->quiesce queue(wait dispatch done)
process2:dispatch->queue_rq->nvmf_fail_nonready_command->
     nvme_complete_rq(if the request is freed asynchronously, wake
	nvme_submit_user_cmd( for example) but have no chance to run).
process1:continue ->cancle suspend request, check the state is not
     MQ_RQ_IDLE and MQ_RQ_COMPLETE, complete(free) the request.
process3: nvme_submit_user_cmd now has chance to run, and the free the
     request again.
Test Injection Method: inject a msleep before call blk_mq_free_request
in nvme_submit_user_cmd.
> 
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>> ---
>>   drivers/nvme/host/fabrics.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 72ac00173500..874e4320e214 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -553,9 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
>>           !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>>           return BLK_STS_RESOURCE;
>> -    nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
>> -    blk_mq_start_request(rq);
>> -    nvme_complete_rq(rq);
>> +    nvme_complete_failed_req(rq);
>>       return BLK_STS_OK;
>>   }
>>   EXPORT_SYMBOL_GPL(nvmf_fail_nonready_command);
>> I'd rather have 'nvme_complete_failed_req()' accept the status as 
> argument, like
> 
> nvme_complete_failed_request(rq, NVME_SC_HOST_PATH_ERROR)
> 
> that way it's obvious what is happening, and the status isn't hidden in the function.
Ok, good idea. Thank you for your suggestion.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command
  2021-01-21  9:27       ` Hannes Reinecke
@ 2021-01-22  1:50         ` Chao Leng
  0 siblings, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-01-22  1:50 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: axboe, axboe, sagi, linux-nvme, linux-block, kbusch



On 2021/1/21 17:27, Hannes Reinecke wrote:
> On 1/21/21 10:00 AM, Christoph Hellwig wrote:
>> On Thu, Jan 21, 2021 at 09:58:37AM +0100, Hannes Reinecke wrote:
>>> On 1/21/21 8:03 AM, Chao Leng wrote:
>>>> When reconnect, the request may be completed with NVME_SC_HOST_PATH_ERROR
>>>> in nvmf_fail_nonready_command. The state of request will be changed to
>>>> MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
>>>> asynchronously such as in nvme_submit_user_cmd, in extreme scenario
>>>> the request may be completed again in tear down process.
>>>> nvmf_fail_nonready_command do not need calling blk_mq_start_request
>>>> before complete the request. nvmf_fail_nonready_command should set
>>>> the state of request to MQ_RQ_COMPLETE before complete the request.
>>>>
>>>
>>> So what you are saying is that there is a race condition between
>>> blk_mq_start_request()
>>> and
>>> nvme_complete_request()
>>
>> Between those to a teardown that cancels all requests can come in.
>>
> Doesn't nvme_complete_request() insulate against a double completion?
nvme_complete_request can not insulate against double completion.
Setting the state of request to MQ_RQ_COMPLETE avoid double completion.
tear down(nvme_cancel_request) check the state of the request, if the
state is MQ_RQ_COMPLETE, it will skip completion.
> I seem to remember we've gone through great lengths ensuring that.
> 
> And if this is just about setting the correct error code on completion I'd really prefer to stick with the current code. Moving that into a helper is fine, but I'd rather not introduce our own code modifying request state.
> 
> If there really is a race condition this feels like a more generic problem; calling blk_mq_start_request() followed by blk_mq_end_request() is a quite common pattern, and from my impression the recommended way.
> So if there is an issue it would need to be addressed for all drivers, not just some nvme-specific way.
Currently, it is not safe for nvme. The probability is very low.
I am not sure whether similar occurs in other scenarios.
> Plus I'd like to have Jens' opinion here.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-01-22  1:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21  7:03 [PATCH v3 0/5] avoid double request completion and IO error Chao Leng
2021-01-21  7:03 ` [PATCH v3 1/5] blk-mq: introduce blk_mq_set_request_complete Chao Leng
2021-01-21  8:40   ` Christoph Hellwig
2021-01-22  1:46     ` Chao Leng
2021-01-21  7:03 ` [PATCH v3 2/5] nvme-core: introduce complete failed request Chao Leng
2021-01-21  8:41   ` Christoph Hellwig
2021-01-22  1:46     ` Chao Leng
2021-01-21  7:03 ` [PATCH v3 3/5] nvme-fabrics: avoid double request completion for nvmf_fail_nonready_command Chao Leng
2021-01-21  8:58   ` Hannes Reinecke
2021-01-21  9:00     ` Christoph Hellwig
2021-01-21  9:27       ` Hannes Reinecke
2021-01-22  1:50         ` Chao Leng
2021-01-22  1:48     ` Chao Leng
2021-01-21  7:03 ` [PATCH v3 4/5] nvme-rdma: avoid IO error for nvme native multipath Chao Leng
2021-01-21  7:03 ` [PATCH v3 5/5] nvme-fc: " Chao Leng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).