[PATCH v2 0/6] avoid repeated request completion and IO error

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/6] avoid repeated request completion and IO error
@ 2021-01-07  3:31 ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

First avoid repeated request completion for nvmf_fail_nonready_command.
Second avoid IO error and repeated request completion for queue_rq.

V2:
	- use "switch" instead "if" to check return status
	
Chao Leng (6):
  blk-mq: introduce blk_mq_set_request_complete
  nvme-core: introduce complete failed request
  nvme-fabrics: avoid repeated request completion for
    nvmf_fail_nonready_command
  nvme-rdma: avoid IO error and repeated request completion
  nvme-tcp: avoid IO error and repeated request completion
  nvme-fc: avoid IO error and repeated request completion

 drivers/nvme/host/fabrics.c |  4 +---
 drivers/nvme/host/fc.c      |  6 ++++--
 drivers/nvme/host/nvme.h    | 21 +++++++++++++++++++++
 drivers/nvme/host/rdma.c    |  2 +-
 drivers/nvme/host/tcp.c     |  2 +-
 include/linux/blk-mq.h      |  5 +++++
 6 files changed, 33 insertions(+), 7 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/6] avoid repeated request completion and IO error
@ 2021-01-07  3:31 ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

First avoid repeated request completion for nvmf_fail_nonready_command.
Second avoid IO error and repeated request completion for queue_rq.

V2:
	- use "switch" instead "if" to check return status
	
Chao Leng (6):
  blk-mq: introduce blk_mq_set_request_complete
  nvme-core: introduce complete failed request
  nvme-fabrics: avoid repeated request completion for
    nvmf_fail_nonready_command
  nvme-rdma: avoid IO error and repeated request completion
  nvme-tcp: avoid IO error and repeated request completion
  nvme-fc: avoid IO error and repeated request completion

 drivers/nvme/host/fabrics.c |  4 +---
 drivers/nvme/host/fc.c      |  6 ++++--
 drivers/nvme/host/nvme.h    | 21 +++++++++++++++++++++
 drivers/nvme/host/rdma.c    |  2 +-
 drivers/nvme/host/tcp.c     |  2 +-
 include/linux/blk-mq.h      |  5 +++++
 6 files changed, 33 insertions(+), 7 deletions(-)

-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-07  3:31   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

In some scenarios, nvme need setting the state of request to
MQ_RQ_COMPLETE. So add an inline function blk_mq_set_request_complete.
For details, see the subsequent patches.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 include/linux/blk-mq.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e7482e6ad3ec..cee72d31054d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -493,6 +493,11 @@ static inline int blk_mq_request_completed(struct request *rq)
 	return blk_mq_rq_state(rq) == MQ_RQ_COMPLETE;
 }
 
+static inline void blk_mq_set_request_complete(struct request *rq)
+{
+	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
+}
+
 void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, blk_status_t error);
 void __blk_mq_end_request(struct request *rq, blk_status_t error);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete
@ 2021-01-07  3:31   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

In some scenarios, nvme need setting the state of request to
MQ_RQ_COMPLETE. So add an inline function blk_mq_set_request_complete.
For details, see the subsequent patches.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 include/linux/blk-mq.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e7482e6ad3ec..cee72d31054d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -493,6 +493,11 @@ static inline int blk_mq_request_completed(struct request *rq)
 	return blk_mq_rq_state(rq) == MQ_RQ_COMPLETE;
 }
 
+static inline void blk_mq_set_request_complete(struct request *rq)
+{
+	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
+}
+
 void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, blk_status_t error);
 void __blk_mq_end_request(struct request *rq, blk_status_t error);
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/6] nvme-core: introduce complete failed request
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-07  3:31   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

When a request is queued failed, if the fail status is not
BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE,
the request is need to complete with nvme_complete_rq in queue_rq.
So introduce nvme_try_complete_failed_req.
The request is needed to complete with NVME_SC_HOST_PATH_ERROR in
nvmf_fail_nonready_command and queue_rq.
So introduce nvme_complete_failed_req.
For details, see the subsequent patches.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/nvme.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bfcedfa4b057..fc4eefdfbb34 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -649,6 +649,27 @@ void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx);
 extern const struct attribute_group *nvme_ns_id_attr_groups[];
 extern const struct block_device_operations nvme_ns_head_ops;
 
+static inline void nvme_complete_failed_req(struct request *req)
+{
+	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
+	blk_mq_set_request_complete(req);
+	nvme_complete_rq(req);
+}
+
+static inline blk_status_t nvme_try_complete_failed_req(struct request *req,
+							blk_status_t ret)
+{
+	switch (ret) {
+	case BLK_STS_RESOURCE:
+	case BLK_STS_DEV_RESOURCE:
+	case BLK_STS_ZONE_RESOURCE:
+		return ret;
+	default:
+		nvme_complete_failed_req(req);
+		return BLK_STS_OK;
+	}
+}
+
 #ifdef CONFIG_NVME_MULTIPATH
 static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
 {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/6] nvme-core: introduce complete failed request
@ 2021-01-07  3:31   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

When a request is queued failed, if the fail status is not
BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE,
the request is need to complete with nvme_complete_rq in queue_rq.
So introduce nvme_try_complete_failed_req.
The request is needed to complete with NVME_SC_HOST_PATH_ERROR in
nvmf_fail_nonready_command and queue_rq.
So introduce nvme_complete_failed_req.
For details, see the subsequent patches.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/nvme.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bfcedfa4b057..fc4eefdfbb34 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -649,6 +649,27 @@ void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx);
 extern const struct attribute_group *nvme_ns_id_attr_groups[];
 extern const struct block_device_operations nvme_ns_head_ops;
 
+static inline void nvme_complete_failed_req(struct request *req)
+{
+	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
+	blk_mq_set_request_complete(req);
+	nvme_complete_rq(req);
+}
+
+static inline blk_status_t nvme_try_complete_failed_req(struct request *req,
+							blk_status_t ret)
+{
+	switch (ret) {
+	case BLK_STS_RESOURCE:
+	case BLK_STS_DEV_RESOURCE:
+	case BLK_STS_ZONE_RESOURCE:
+		return ret;
+	default:
+		nvme_complete_failed_req(req);
+		return BLK_STS_OK;
+	}
+}
+
 #ifdef CONFIG_NVME_MULTIPATH
 static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
 {
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/6] nvme-fabrics: avoid repeated request completion for nvmf_fail_nonready_command
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-07  3:31   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

The request may be completed with NVME_SC_HOST_PATH_ERROR in
nvmf_fail_nonready_command. The state of request will be changed to
MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
asynchronously such as in nvme_submit_user_cmd, in extreme scenario
the request will be repeated freed in tear down.
Nvmf_fail_nonready_command do not need calling blk_mq_start_request
before complete the request. Nvmf_fail_nonready_command should set
the state of request to MQ_RQ_COMPLETE before complete the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/fabrics.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 72ac00173500..874e4320e214 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -553,9 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;

-	nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
-	blk_mq_start_request(rq);
-	nvme_complete_rq(rq);
+	nvme_complete_failed_req(rq);
 	return BLK_STS_OK;
 }
 EXPORT_SYMBOL_GPL(nvmf_fail_nonready_command);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/6] nvme-fabrics: avoid repeated request completion for nvmf_fail_nonready_command
@ 2021-01-07  3:31   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

The request may be completed with NVME_SC_HOST_PATH_ERROR in
nvmf_fail_nonready_command. The state of request will be changed to
MQ_RQ_IN_FLIGHT before call nvme_complete_rq. If free the request
asynchronously such as in nvme_submit_user_cmd, in extreme scenario
the request will be repeated freed in tear down.
Nvmf_fail_nonready_command do not need calling blk_mq_start_request
before complete the request. Nvmf_fail_nonready_command should set
the state of request to MQ_RQ_COMPLETE before complete the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/fabrics.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 72ac00173500..874e4320e214 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -553,9 +553,7 @@ blk_status_t nvmf_fail_nonready_command(struct nvme_ctrl *ctrl,
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;
 
-	nvme_req(rq)->status = NVME_SC_HOST_PATH_ERROR;
-	blk_mq_start_request(rq);
-	nvme_complete_rq(rq);
+	nvme_complete_failed_req(rq);
 	return BLK_STS_OK;
 }
 EXPORT_SYMBOL_GPL(nvmf_fail_nonready_command);
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-07  3:31   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs in queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index df9f6f4549f1..4a89bf44ecdc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 unmap_qe:
 	ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
 			    DMA_TO_DEVICE);
-	return ret;
+	return nvme_try_complete_failed_req(rq, ret);
 }

 static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-07  3:31   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs in queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index df9f6f4549f1..4a89bf44ecdc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 unmap_qe:
 	ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
 			    DMA_TO_DEVICE);
-	return ret;
+	return nvme_try_complete_failed_req(rq, ret);
 }

 static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
-- 
2.16.4

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 5/6] nvme-tcp: avoid IO error and repeated request completion
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-07  3:31   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is completed with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs when queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..a81683ce8cff 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2306,7 +2306,7 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,

 	ret = nvme_tcp_setup_cmd_pdu(ns, rq);
 	if (unlikely(ret))
-		return ret;
+		return nvme_try_complete_failed_req(rq, ret);

 	blk_mq_start_request(rq);

-- 
2.16.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 5/6] nvme-tcp: avoid IO error and repeated request completion
@ 2021-01-07  3:31   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is completed with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs when queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..a81683ce8cff 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2306,7 +2306,7 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,

 	ret = nvme_tcp_setup_cmd_pdu(ns, rq);
 	if (unlikely(ret))
-		return ret;
+		return nvme_try_complete_failed_req(rq, ret);

 	blk_mq_start_request(rq);

-- 
2.16.4

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 6/6] nvme-fc: avoid IO error and repeated request completion
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-07  3:31   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi, linux-block, axboe

When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is completed with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs when queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/fc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 38373a0e86ef..f6a5758ef1ea 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2761,7 +2761,7 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,

 	ret = nvme_setup_cmd(ns, rq, sqe);
 	if (ret)
-		return ret;
+		goto fail;

 	/*
 	 * nvme core doesn't quite treat the rq opaquely. Commands such
@@ -2781,7 +2781,9 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}

-	return nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
+	ret = nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
+fail:
+	return nvme_try_complete_failed_req(rq, ret);
 }

 static void
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 6/6] nvme-fc: avoid IO error and repeated request completion
@ 2021-01-07  3:31   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-07  3:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: axboe, linux-block, sagi, axboe, kbusch, hch

When a request is queued failed, blk_status_t is directly returned
to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
blk_mq_end_request to complete the request with BLK_STS_IOERR.
In two scenarios, the request should be retried and may succeed.
First, if work with nvme multipath, the request may be retried
successfully in another path, because the error is probably related to
the path. Second, if work without multipath software, the request may
be retried successfully after error recovery.
If the request is completed with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
request asynchronously such as in nvme_submit_user_cmd, in extreme
scenario the request will be repeated freed in tear down.
If a non-resource error occurs when queue_rq, should directly call
nvme_complete_rq to complete request and set the state of request to
MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
the request.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/fc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 38373a0e86ef..f6a5758ef1ea 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2761,7 +2761,7 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,

 	ret = nvme_setup_cmd(ns, rq, sqe);
 	if (ret)
-		return ret;
+		goto fail;

 	/*
 	 * nvme core doesn't quite treat the rq opaquely. Commands such
@@ -2781,7 +2781,9 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}

-	return nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
+	ret = nvme_fc_start_fcp_op(ctrl, queue, op, data_len, io_dir);
+fail:
+	return nvme_try_complete_failed_req(rq, ret);
 }

 static void
-- 
2.16.4

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/6] avoid repeated request completion and IO error
  2021-01-07  3:31 ` Chao Leng
@ 2021-01-14  0:15   ` Sagi Grimberg
  -1 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14  0:15 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe

> First avoid repeated request completion for nvmf_fail_nonready_command.
> Second avoid IO error and repeated request completion for queue_rq.

Maybe this is me chiming in v2, but what is this fixing? what
is the bug you are seeing?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/6] avoid repeated request completion and IO error
@ 2021-01-14  0:15   ` Sagi Grimberg
  0 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14  0:15 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe

> First avoid repeated request completion for nvmf_fail_nonready_command.
> Second avoid IO error and repeated request completion for queue_rq.

Maybe this is me chiming in v2, but what is this fixing? what
is the bug you are seeing?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete
  2021-01-07  3:31   ` Chao Leng
@ 2021-01-14  0:17     ` Sagi Grimberg
  -1 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14  0:17 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe

> In some scenarios, nvme need setting the state of request to
> MQ_RQ_COMPLETE. So add an inline function blk_mq_set_request_complete.
> For details, see the subsequent patches.

Its kinda difficult to understand the meaning of all of this...
the cover letter tells us nothing, and patches 1/2 also tells us
to see subsequent patches.

This is saved in the git change log history, so please try
describe what it is you are going with this, even if there are
overlaps between patches.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete
@ 2021-01-14  0:17     ` Sagi Grimberg
  0 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14  0:17 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe

> In some scenarios, nvme need setting the state of request to
> MQ_RQ_COMPLETE. So add an inline function blk_mq_set_request_complete.
> For details, see the subsequent patches.

Its kinda difficult to understand the meaning of all of this...
the cover letter tells us nothing, and patches 1/2 also tells us
to see subsequent patches.

This is saved in the git change log history, so please try
describe what it is you are going with this, even if there are
overlaps between patches.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-07  3:31   ` Chao Leng
@ 2021-01-14  0:19     ` Sagi Grimberg
  -1 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14  0:19 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe


> When a request is queued failed, blk_status_t is directly returned
> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
> blk_mq_end_request to complete the request with BLK_STS_IOERR.
> In two scenarios, the request should be retried and may succeed.
> First, if work with nvme multipath, the request may be retried
> successfully in another path, because the error is probably related to
> the path. Second, if work without multipath software, the request may
> be retried successfully after error recovery.
> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
> request asynchronously such as in nvme_submit_user_cmd, in extreme
> scenario the request will be repeated freed in tear down.
> If a non-resource error occurs in queue_rq, should directly call
> nvme_complete_rq to complete request and set the state of request to
> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
> the request.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> ---
>   drivers/nvme/host/rdma.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index df9f6f4549f1..4a89bf44ecdc 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>   unmap_qe:
>   	ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>   			    DMA_TO_DEVICE);
> -	return ret;
> +	return nvme_try_complete_failed_req(rq, ret);

I don't understand this. There are errors that may not be related to
anything that is pathing related (sw bug, memory leak, mapping error,
etc, etc) why should we return this one-shot error?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-14  0:19     ` Sagi Grimberg
  0 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14  0:19 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe


> When a request is queued failed, blk_status_t is directly returned
> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
> blk_mq_end_request to complete the request with BLK_STS_IOERR.
> In two scenarios, the request should be retried and may succeed.
> First, if work with nvme multipath, the request may be retried
> successfully in another path, because the error is probably related to
> the path. Second, if work without multipath software, the request may
> be retried successfully after error recovery.
> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
> request asynchronously such as in nvme_submit_user_cmd, in extreme
> scenario the request will be repeated freed in tear down.
> If a non-resource error occurs in queue_rq, should directly call
> nvme_complete_rq to complete request and set the state of request to
> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
> the request.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> ---
>   drivers/nvme/host/rdma.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index df9f6f4549f1..4a89bf44ecdc 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>   unmap_qe:
>   	ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>   			    DMA_TO_DEVICE);
> -	return ret;
> +	return nvme_try_complete_failed_req(rq, ret);

I don't understand this. There are errors that may not be related to
anything that is pathing related (sw bug, memory leak, mapping error,
etc, etc) why should we return this one-shot error?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/6] avoid repeated request completion and IO error
  2021-01-14  0:15   ` Sagi Grimberg
@ 2021-01-14  6:50     ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-14  6:50 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe



On 2021/1/14 8:15, Sagi Grimberg wrote:
>> First avoid repeated request completion for nvmf_fail_nonready_command.
>> Second avoid IO error and repeated request completion for queue_rq.
> 
> Maybe this is me chiming in v2, but what is this fixing? what
> is the bug you are seeing?The bug is crash and io error in two scenarios.
First inject request time out, crash happens due to request double
completion, the probability is very low. The reason: we will do error
recovery for request time out. When error recovery, new request will
be completed by nvmf_fail_nonready_command in queue_rq, the state of
the request will be changed to MQ_RQ_IN_FLIGHT, the request is freed
asynchronously in nvme_submit_user_cmd, nvme_submit_user_cmd may
run after cancel request(the state of the request is MQ_RQ_IN_FLIGHT)
in error recovery. The request will be double completion.

Second use two HBAs for nvme native multipath, and then inject one HBA
fault, io error happens and a low probability crash happens. The reason
of io error is the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq
call blk_mq_end_request to complete the request. We expect the request
fail over to normal HBA, but the request is directly completed with
BLK_STS_IOERR. The reason of crash is similar to the first scenario.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/6] avoid repeated request completion and IO error
@ 2021-01-14  6:50     ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-14  6:50 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe



On 2021/1/14 8:15, Sagi Grimberg wrote:
>> First avoid repeated request completion for nvmf_fail_nonready_command.
>> Second avoid IO error and repeated request completion for queue_rq.
> 
> Maybe this is me chiming in v2, but what is this fixing? what
> is the bug you are seeing?The bug is crash and io error in two scenarios.
First inject request time out, crash happens due to request double
completion, the probability is very low. The reason: we will do error
recovery for request time out. When error recovery, new request will
be completed by nvmf_fail_nonready_command in queue_rq, the state of
the request will be changed to MQ_RQ_IN_FLIGHT, the request is freed
asynchronously in nvme_submit_user_cmd, nvme_submit_user_cmd may
run after cancel request(the state of the request is MQ_RQ_IN_FLIGHT)
in error recovery. The request will be double completion.

Second use two HBAs for nvme native multipath, and then inject one HBA
fault, io error happens and a low probability crash happens. The reason
of io error is the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq
call blk_mq_end_request to complete the request. We expect the request
fail over to normal HBA, but the request is directly completed with
BLK_STS_IOERR. The reason of crash is similar to the first scenario.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete
  2021-01-14  0:17     ` Sagi Grimberg
@ 2021-01-14  6:50       ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-14  6:50 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe



On 2021/1/14 8:17, Sagi Grimberg wrote:
> 
>> In some scenarios, nvme need setting the state of request to
>> MQ_RQ_COMPLETE. So add an inline function blk_mq_set_request_complete.
>> For details, see the subsequent patches.
> 
> Its kinda difficult to understand the meaning of all of this...
> the cover letter tells us nothing, and patches 1/2 also tells us
> to see subsequent patches.
> 
> This is saved in the git change log history, so please try
> describe what it is you are going with this, even if there are
> overlaps between patches.
ok, thanks for your suggest.
> .

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete
@ 2021-01-14  6:50       ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-14  6:50 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe



On 2021/1/14 8:17, Sagi Grimberg wrote:
> 
>> In some scenarios, nvme need setting the state of request to
>> MQ_RQ_COMPLETE. So add an inline function blk_mq_set_request_complete.
>> For details, see the subsequent patches.
> 
> Its kinda difficult to understand the meaning of all of this...
> the cover letter tells us nothing, and patches 1/2 also tells us
> to see subsequent patches.
> 
> This is saved in the git change log history, so please try
> describe what it is you are going with this, even if there are
> overlaps between patches.
ok, thanks for your suggest.
> .

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-14  0:19     ` Sagi Grimberg
@ 2021-01-14  6:55       ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-14  6:55 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe



On 2021/1/14 8:19, Sagi Grimberg wrote:
> 
>> When a request is queued failed, blk_status_t is directly returned
>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>> In two scenarios, the request should be retried and may succeed.
>> First, if work with nvme multipath, the request may be retried
>> successfully in another path, because the error is probably related to
>> the path. Second, if work without multipath software, the request may
>> be retried successfully after error recovery.
>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>> scenario the request will be repeated freed in tear down.
>> If a non-resource error occurs in queue_rq, should directly call
>> nvme_complete_rq to complete request and set the state of request to
>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>> the request.
>>
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>> ---
>>   drivers/nvme/host/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index df9f6f4549f1..4a89bf44ecdc 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>   unmap_qe:
>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>                   DMA_TO_DEVICE);
>> -    return ret;
>> +    return nvme_try_complete_failed_req(rq, ret);
> 
> I don't understand this. There are errors that may not be related to
> anything that is pathing related (sw bug, memory leak, mapping error,
> etc, etc) why should we return this one-shot error?
Although fail over retry is not required, if we return the error to
blk-mq, a low probability crash may happen. because blk-mq do not set
the state of request to MQ_RQ_COMPLETE before complete the request,
the request may be freed asynchronously such as in nvme_submit_user_cmd.
If race with error recovery, request double completion may happens.

So we can not return the error to blk-mq if the blk_status_t is not
BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
> .

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-14  6:55       ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-14  6:55 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe



On 2021/1/14 8:19, Sagi Grimberg wrote:
> 
>> When a request is queued failed, blk_status_t is directly returned
>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>> In two scenarios, the request should be retried and may succeed.
>> First, if work with nvme multipath, the request may be retried
>> successfully in another path, because the error is probably related to
>> the path. Second, if work without multipath software, the request may
>> be retried successfully after error recovery.
>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>> scenario the request will be repeated freed in tear down.
>> If a non-resource error occurs in queue_rq, should directly call
>> nvme_complete_rq to complete request and set the state of request to
>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>> the request.
>>
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>> ---
>>   drivers/nvme/host/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index df9f6f4549f1..4a89bf44ecdc 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>   unmap_qe:
>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>                   DMA_TO_DEVICE);
>> -    return ret;
>> +    return nvme_try_complete_failed_req(rq, ret);
> 
> I don't understand this. There are errors that may not be related to
> anything that is pathing related (sw bug, memory leak, mapping error,
> etc, etc) why should we return this one-shot error?
Although fail over retry is not required, if we return the error to
blk-mq, a low probability crash may happen. because blk-mq do not set
the state of request to MQ_RQ_COMPLETE before complete the request,
the request may be freed asynchronously such as in nvme_submit_user_cmd.
If race with error recovery, request double completion may happens.

So we can not return the error to blk-mq if the blk_status_t is not
BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
> .

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-14  6:55       ` Chao Leng
@ 2021-01-14 21:25         ` Sagi Grimberg
  -1 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14 21:25 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe


>>> When a request is queued failed, blk_status_t is directly returned
>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>> In two scenarios, the request should be retried and may succeed.
>>> First, if work with nvme multipath, the request may be retried
>>> successfully in another path, because the error is probably related to
>>> the path. Second, if work without multipath software, the request may
>>> be retried successfully after error recovery.
>>> If the request is complete with BLK_STS_IOERR in 
>>> blk_mq_dispatch_rq_list.
>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>> scenario the request will be repeated freed in tear down.
>>> If a non-resource error occurs in queue_rq, should directly call
>>> nvme_complete_rq to complete request and set the state of request to
>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>> the request.
>>>
>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>> ---
>>>   drivers/nvme/host/rdma.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct 
>>> blk_mq_hw_ctx *hctx,
>>>   unmap_qe:
>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct 
>>> nvme_command),
>>>                   DMA_TO_DEVICE);
>>> -    return ret;
>>> +    return nvme_try_complete_failed_req(rq, ret);
>>
>> I don't understand this. There are errors that may not be related to
>> anything that is pathing related (sw bug, memory leak, mapping error,
>> etc, etc) why should we return this one-shot error?
> Although fail over retry is not required, if we return the error to
> blk-mq, a low probability crash may happen. because blk-mq do not set
> the state of request to MQ_RQ_COMPLETE before complete the request,
> the request may be freed asynchronously such as in nvme_submit_user_cmd.
> If race with error recovery, request double completion may happens.

Then fix that, don't work around it.

> 
> So we can not return the error to blk-mq if the blk_status_t is not
> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.

This is not something we should be handling in nvme. block drivers
should be able to fail queue_rq, and this all should live in the
block layer.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-14 21:25         ` Sagi Grimberg
  0 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-14 21:25 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe


>>> When a request is queued failed, blk_status_t is directly returned
>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>> In two scenarios, the request should be retried and may succeed.
>>> First, if work with nvme multipath, the request may be retried
>>> successfully in another path, because the error is probably related to
>>> the path. Second, if work without multipath software, the request may
>>> be retried successfully after error recovery.
>>> If the request is complete with BLK_STS_IOERR in 
>>> blk_mq_dispatch_rq_list.
>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>> scenario the request will be repeated freed in tear down.
>>> If a non-resource error occurs in queue_rq, should directly call
>>> nvme_complete_rq to complete request and set the state of request to
>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>> the request.
>>>
>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>> ---
>>>   drivers/nvme/host/rdma.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct 
>>> blk_mq_hw_ctx *hctx,
>>>   unmap_qe:
>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct 
>>> nvme_command),
>>>                   DMA_TO_DEVICE);
>>> -    return ret;
>>> +    return nvme_try_complete_failed_req(rq, ret);
>>
>> I don't understand this. There are errors that may not be related to
>> anything that is pathing related (sw bug, memory leak, mapping error,
>> etc, etc) why should we return this one-shot error?
> Although fail over retry is not required, if we return the error to
> blk-mq, a low probability crash may happen. because blk-mq do not set
> the state of request to MQ_RQ_COMPLETE before complete the request,
> the request may be freed asynchronously such as in nvme_submit_user_cmd.
> If race with error recovery, request double completion may happens.

Then fix that, don't work around it.

> 
> So we can not return the error to blk-mq if the blk_status_t is not
> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.

This is not something we should be handling in nvme. block drivers
should be able to fail queue_rq, and this all should live in the
block layer.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-14 21:25         ` Sagi Grimberg
@ 2021-01-15  2:53           ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-15  2:53 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe



On 2021/1/15 5:25, Sagi Grimberg wrote:
> 
>>>> When a request is queued failed, blk_status_t is directly returned
>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>> In two scenarios, the request should be retried and may succeed.
>>>> First, if work with nvme multipath, the request may be retried
>>>> successfully in another path, because the error is probably related to
>>>> the path. Second, if work without multipath software, the request may
>>>> be retried successfully after error recovery.
>>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>> scenario the request will be repeated freed in tear down.
>>>> If a non-resource error occurs in queue_rq, should directly call
>>>> nvme_complete_rq to complete request and set the state of request to
>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>>> the request.
>>>>
>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>> ---
>>>>   drivers/nvme/host/rdma.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>> --- a/drivers/nvme/host/rdma.c
>>>> +++ b/drivers/nvme/host/rdma.c
>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>   unmap_qe:
>>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>>>                   DMA_TO_DEVICE);
>>>> -    return ret;
>>>> +    return nvme_try_complete_failed_req(rq, ret);
>>>
>>> I don't understand this. There are errors that may not be related to
>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>> etc, etc) why should we return this one-shot error?
>> Although fail over retry is not required, if we return the error to
>> blk-mq, a low probability crash may happen. because blk-mq do not set
>> the state of request to MQ_RQ_COMPLETE before complete the request,
>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>> If race with error recovery, request double completion may happens.
> 
> Then fix that, don't work around it.
I'm not trying to work around it. The purpose of this is to solve
the problem of nvme native multipathing at the same time.
> 
>>
>> So we can not return the error to blk-mq if the blk_status_t is not
>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
> 
> This is not something we should be handling in nvme. block drivers
> should be able to fail queue_rq, and this all should live in the
> block layer.
Of course, it is also an idea to repair the block drivers directly.
However, block layer is unaware of nvme native multipathing, will cause
the request return error which should be avoided.
The scenario: use two HBAs for nvme native multipath, and then one HBA
fault, the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
blk_mq_end_request to complete the request which bypass name native
multipath. We expect the request fail over to normal HBA, but the request
is directly completed with BLK_STS_IOERR.
The two scenarios can be fixed by directly completing the request in queue_rq.

> .

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-15  2:53           ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-15  2:53 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe



On 2021/1/15 5:25, Sagi Grimberg wrote:
> 
>>>> When a request is queued failed, blk_status_t is directly returned
>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>> In two scenarios, the request should be retried and may succeed.
>>>> First, if work with nvme multipath, the request may be retried
>>>> successfully in another path, because the error is probably related to
>>>> the path. Second, if work without multipath software, the request may
>>>> be retried successfully after error recovery.
>>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>> scenario the request will be repeated freed in tear down.
>>>> If a non-resource error occurs in queue_rq, should directly call
>>>> nvme_complete_rq to complete request and set the state of request to
>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>>> the request.
>>>>
>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>> ---
>>>>   drivers/nvme/host/rdma.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>> --- a/drivers/nvme/host/rdma.c
>>>> +++ b/drivers/nvme/host/rdma.c
>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>   unmap_qe:
>>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>>>                   DMA_TO_DEVICE);
>>>> -    return ret;
>>>> +    return nvme_try_complete_failed_req(rq, ret);
>>>
>>> I don't understand this. There are errors that may not be related to
>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>> etc, etc) why should we return this one-shot error?
>> Although fail over retry is not required, if we return the error to
>> blk-mq, a low probability crash may happen. because blk-mq do not set
>> the state of request to MQ_RQ_COMPLETE before complete the request,
>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>> If race with error recovery, request double completion may happens.
> 
> Then fix that, don't work around it.
I'm not trying to work around it. The purpose of this is to solve
the problem of nvme native multipathing at the same time.
> 
>>
>> So we can not return the error to blk-mq if the blk_status_t is not
>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
> 
> This is not something we should be handling in nvme. block drivers
> should be able to fail queue_rq, and this all should live in the
> block layer.
Of course, it is also an idea to repair the block drivers directly.
However, block layer is unaware of nvme native multipathing, will cause
the request return error which should be avoided.
The scenario: use two HBAs for nvme native multipath, and then one HBA
fault, the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
blk_mq_end_request to complete the request which bypass name native
multipath. We expect the request fail over to normal HBA, but the request
is directly completed with BLK_STS_IOERR.
The two scenarios can be fixed by directly completing the request in queue_rq.

> .

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-15  2:53           ` Chao Leng
@ 2021-01-16  1:18             ` Sagi Grimberg
  -1 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-16  1:18 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe


>>>>> When a request is queued failed, blk_status_t is directly returned
>>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>>> In two scenarios, the request should be retried and may succeed.
>>>>> First, if work with nvme multipath, the request may be retried
>>>>> successfully in another path, because the error is probably related to
>>>>> the path. Second, if work without multipath software, the request may
>>>>> be retried successfully after error recovery.
>>>>> If the request is complete with BLK_STS_IOERR in 
>>>>> blk_mq_dispatch_rq_list.
>>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>>> scenario the request will be repeated freed in tear down.
>>>>> If a non-resource error occurs in queue_rq, should directly call
>>>>> nvme_complete_rq to complete request and set the state of request to
>>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or 
>>>>> end
>>>>> the request.
>>>>>
>>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>>> ---
>>>>>   drivers/nvme/host/rdma.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>>> --- a/drivers/nvme/host/rdma.c
>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct 
>>>>> blk_mq_hw_ctx *hctx,
>>>>>   unmap_qe:
>>>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct 
>>>>> nvme_command),
>>>>>                   DMA_TO_DEVICE);
>>>>> -    return ret;
>>>>> +    return nvme_try_complete_failed_req(rq, ret);
>>>>
>>>> I don't understand this. There are errors that may not be related to
>>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>>> etc, etc) why should we return this one-shot error?
>>> Although fail over retry is not required, if we return the error to
>>> blk-mq, a low probability crash may happen. because blk-mq do not set
>>> the state of request to MQ_RQ_COMPLETE before complete the request,
>>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>>> If race with error recovery, request double completion may happens.
>>
>> Then fix that, don't work around it.
> I'm not trying to work around it. The purpose of this is to solve
> the problem of nvme native multipathing at the same time.

Please explain how this is an nvme-multipath issue?

>>
>>>
>>> So we can not return the error to blk-mq if the blk_status_t is not
>>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
>>
>> This is not something we should be handling in nvme. block drivers
>> should be able to fail queue_rq, and this all should live in the
>> block layer.
> Of course, it is also an idea to repair the block drivers directly.
> However, block layer is unaware of nvme native multipathing,

Nor it should be

> will cause the request return error which should be avoided.

Not sure I understand..
requests should failover for path related errors,
what queue_rq errors are expected to be failed over from your
perspective?

> The scenario: use two HBAs for nvme native multipath, and then one HBA
> fault,

What is the specific error the driver sees?

> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
> blk_mq_end_request to complete the request which bypass name native
> multipath. We expect the request fail over to normal HBA, but the request
> is directly completed with BLK_STS_IOERR.
> The two scenarios can be fixed by directly completing the request in 
> queue_rq.
Well, certainly this one-shot always return 0 and complete the command
with HOST_PATH error is not a good approach IMO

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-16  1:18             ` Sagi Grimberg
  0 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-16  1:18 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe


>>>>> When a request is queued failed, blk_status_t is directly returned
>>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>>> In two scenarios, the request should be retried and may succeed.
>>>>> First, if work with nvme multipath, the request may be retried
>>>>> successfully in another path, because the error is probably related to
>>>>> the path. Second, if work without multipath software, the request may
>>>>> be retried successfully after error recovery.
>>>>> If the request is complete with BLK_STS_IOERR in 
>>>>> blk_mq_dispatch_rq_list.
>>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>>> scenario the request will be repeated freed in tear down.
>>>>> If a non-resource error occurs in queue_rq, should directly call
>>>>> nvme_complete_rq to complete request and set the state of request to
>>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or 
>>>>> end
>>>>> the request.
>>>>>
>>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>>> ---
>>>>>   drivers/nvme/host/rdma.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>>> --- a/drivers/nvme/host/rdma.c
>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct 
>>>>> blk_mq_hw_ctx *hctx,
>>>>>   unmap_qe:
>>>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct 
>>>>> nvme_command),
>>>>>                   DMA_TO_DEVICE);
>>>>> -    return ret;
>>>>> +    return nvme_try_complete_failed_req(rq, ret);
>>>>
>>>> I don't understand this. There are errors that may not be related to
>>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>>> etc, etc) why should we return this one-shot error?
>>> Although fail over retry is not required, if we return the error to
>>> blk-mq, a low probability crash may happen. because blk-mq do not set
>>> the state of request to MQ_RQ_COMPLETE before complete the request,
>>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>>> If race with error recovery, request double completion may happens.
>>
>> Then fix that, don't work around it.
> I'm not trying to work around it. The purpose of this is to solve
> the problem of nvme native multipathing at the same time.

Please explain how this is an nvme-multipath issue?

>>
>>>
>>> So we can not return the error to blk-mq if the blk_status_t is not
>>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
>>
>> This is not something we should be handling in nvme. block drivers
>> should be able to fail queue_rq, and this all should live in the
>> block layer.
> Of course, it is also an idea to repair the block drivers directly.
> However, block layer is unaware of nvme native multipathing,

Nor it should be

> will cause the request return error which should be avoided.

Not sure I understand..
requests should failover for path related errors,
what queue_rq errors are expected to be failed over from your
perspective?

> The scenario: use two HBAs for nvme native multipath, and then one HBA
> fault,

What is the specific error the driver sees?

> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
> blk_mq_end_request to complete the request which bypass name native
> multipath. We expect the request fail over to normal HBA, but the request
> is directly completed with BLK_STS_IOERR.
> The two scenarios can be fixed by directly completing the request in 
> queue_rq.
Well, certainly this one-shot always return 0 and complete the command
with HOST_PATH error is not a good approach IMO

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-16  1:18             ` Sagi Grimberg
@ 2021-01-18  3:22               ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-18  3:22 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe



On 2021/1/16 9:18, Sagi Grimberg wrote:
> 
>>>>>> When a request is queued failed, blk_status_t is directly returned
>>>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>>>> In two scenarios, the request should be retried and may succeed.
>>>>>> First, if work with nvme multipath, the request may be retried
>>>>>> successfully in another path, because the error is probably related to
>>>>>> the path. Second, if work without multipath software, the request may
>>>>>> be retried successfully after error recovery.
>>>>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>>>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>>>> scenario the request will be repeated freed in tear down.
>>>>>> If a non-resource error occurs in queue_rq, should directly call
>>>>>> nvme_complete_rq to complete request and set the state of request to
>>>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>>>>> the request.
>>>>>>
>>>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>>>> ---
>>>>>>   drivers/nvme/host/rdma.c | 2 +-
>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>>>> --- a/drivers/nvme/host/rdma.c
>>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>   unmap_qe:
>>>>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>>>>>                   DMA_TO_DEVICE);
>>>>>> -    return ret;
>>>>>> +    return nvme_try_complete_failed_req(rq, ret);
>>>>>
>>>>> I don't understand this. There are errors that may not be related to
>>>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>>>> etc, etc) why should we return this one-shot error?
>>>> Although fail over retry is not required, if we return the error to
>>>> blk-mq, a low probability crash may happen. because blk-mq do not set
>>>> the state of request to MQ_RQ_COMPLETE before complete the request,
>>>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>>>> If race with error recovery, request double completion may happens.
>>>
>>> Then fix that, don't work around it.
>> I'm not trying to work around it. The purpose of this is to solve
>> the problem of nvme native multipathing at the same time.
> 
> Please explain how this is an nvme-multipath issue?
> 
>>>
>>>>
>>>> So we can not return the error to blk-mq if the blk_status_t is not
>>>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
>>>
>>> This is not something we should be handling in nvme. block drivers
>>> should be able to fail queue_rq, and this all should live in the
>>> block layer.
>> Of course, it is also an idea to repair the block drivers directly.
>> However, block layer is unaware of nvme native multipathing,
> 
> Nor it should be
> 
>> will cause the request return error which should be avoided.
> 
> Not sure I understand..
> requests should failover for path related errors,
> what queue_rq errors are expected to be failed over from your
> perspective?
Although fail over for only path related errors is the best choice, it's
almost impossible to achieve.
The probability of non-path-related errors is very low. Although these
errors do not require fail over retry, the cost of fail over retry
is complete the request with error delay a bit long time(retry several
times). It's not the best choice, but I think it's acceptable, because
HBA driver does not have path-related error codes but only general error
codes. It is difficult to identify whether the general error codes are
path-related.
> 
>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>> fault,
> 
> What is the specific error the driver sees?
The path related error code is closely related to HBA driver
implementation. In general it is EIO. I don't think it's a good idea to
assume what general error code the driver returns in the event of a path
error.
> 
>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>> blk_mq_end_request to complete the request which bypass name native
>> multipath. We expect the request fail over to normal HBA, but the request
>> is directly completed with BLK_STS_IOERR.
>> The two scenarios can be fixed by directly completing the request in queue_rq.
> Well, certainly this one-shot always return 0 and complete the command
> with HOST_PATH error is not a good approach IMO
So what's the better option? Just complete the request with host path
error for non-ENOMEM and EAGAIN returned by the HBA driver?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-18  3:22               ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-18  3:22 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe



On 2021/1/16 9:18, Sagi Grimberg wrote:
> 
>>>>>> When a request is queued failed, blk_status_t is directly returned
>>>>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>>>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>>>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>>>>> In two scenarios, the request should be retried and may succeed.
>>>>>> First, if work with nvme multipath, the request may be retried
>>>>>> successfully in another path, because the error is probably related to
>>>>>> the path. Second, if work without multipath software, the request may
>>>>>> be retried successfully after error recovery.
>>>>>> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
>>>>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>>>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>>>>> scenario the request will be repeated freed in tear down.
>>>>>> If a non-resource error occurs in queue_rq, should directly call
>>>>>> nvme_complete_rq to complete request and set the state of request to
>>>>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>>>>> the request.
>>>>>>
>>>>>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>>>>>> ---
>>>>>>   drivers/nvme/host/rdma.c | 2 +-
>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>>>>> --- a/drivers/nvme/host/rdma.c
>>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>   unmap_qe:
>>>>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>>>>>>                   DMA_TO_DEVICE);
>>>>>> -    return ret;
>>>>>> +    return nvme_try_complete_failed_req(rq, ret);
>>>>>
>>>>> I don't understand this. There are errors that may not be related to
>>>>> anything that is pathing related (sw bug, memory leak, mapping error,
>>>>> etc, etc) why should we return this one-shot error?
>>>> Although fail over retry is not required, if we return the error to
>>>> blk-mq, a low probability crash may happen. because blk-mq do not set
>>>> the state of request to MQ_RQ_COMPLETE before complete the request,
>>>> the request may be freed asynchronously such as in nvme_submit_user_cmd.
>>>> If race with error recovery, request double completion may happens.
>>>
>>> Then fix that, don't work around it.
>> I'm not trying to work around it. The purpose of this is to solve
>> the problem of nvme native multipathing at the same time.
> 
> Please explain how this is an nvme-multipath issue?
> 
>>>
>>>>
>>>> So we can not return the error to blk-mq if the blk_status_t is not
>>>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.
>>>
>>> This is not something we should be handling in nvme. block drivers
>>> should be able to fail queue_rq, and this all should live in the
>>> block layer.
>> Of course, it is also an idea to repair the block drivers directly.
>> However, block layer is unaware of nvme native multipathing,
> 
> Nor it should be
> 
>> will cause the request return error which should be avoided.
> 
> Not sure I understand..
> requests should failover for path related errors,
> what queue_rq errors are expected to be failed over from your
> perspective?
Although fail over for only path related errors is the best choice, it's
almost impossible to achieve.
The probability of non-path-related errors is very low. Although these
errors do not require fail over retry, the cost of fail over retry
is complete the request with error delay a bit long time(retry several
times). It's not the best choice, but I think it's acceptable, because
HBA driver does not have path-related error codes but only general error
codes. It is difficult to identify whether the general error codes are
path-related.
> 
>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>> fault,
> 
> What is the specific error the driver sees?
The path related error code is closely related to HBA driver
implementation. In general it is EIO. I don't think it's a good idea to
assume what general error code the driver returns in the event of a path
error.
> 
>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>> blk_mq_end_request to complete the request which bypass name native
>> multipath. We expect the request fail over to normal HBA, but the request
>> is directly completed with BLK_STS_IOERR.
>> The two scenarios can be fixed by directly completing the request in queue_rq.
> Well, certainly this one-shot always return 0 and complete the command
> with HOST_PATH error is not a good approach IMO
So what's the better option? Just complete the request with host path
error for non-ENOMEM and EAGAIN returned by the HBA driver?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-18  3:22               ` Chao Leng
@ 2021-01-18 17:49                 ` Christoph Hellwig
  -1 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2021-01-18 17:49 UTC (permalink / raw)
  To: Chao Leng
  Cc: Sagi Grimberg, linux-nvme, kbusch, axboe, hch, linux-block, axboe

On Mon, Jan 18, 2021 at 11:22:16AM +0800, Chao Leng wrote:
>> Well, certainly this one-shot always return 0 and complete the command
>> with HOST_PATH error is not a good approach IMO
> So what's the better option? Just complete the request with host path
> error for non-ENOMEM and EAGAIN returned by the HBA driver?

what HBA driver?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-18 17:49                 ` Christoph Hellwig
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2021-01-18 17:49 UTC (permalink / raw)
  To: Chao Leng
  Cc: axboe, linux-block, Sagi Grimberg, linux-nvme, axboe, kbusch, hch

On Mon, Jan 18, 2021 at 11:22:16AM +0800, Chao Leng wrote:
>> Well, certainly this one-shot always return 0 and complete the command
>> with HOST_PATH error is not a good approach IMO
> So what's the better option? Just complete the request with host path
> error for non-ENOMEM and EAGAIN returned by the HBA driver?

what HBA driver?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-18 17:49                 ` Christoph Hellwig
@ 2021-01-19  1:50                   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-19  1:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-nvme, kbusch, axboe, linux-block, axboe



On 2021/1/19 1:49, Christoph Hellwig wrote:
> On Mon, Jan 18, 2021 at 11:22:16AM +0800, Chao Leng wrote:
>>> Well, certainly this one-shot always return 0 and complete the command
>>> with HOST_PATH error is not a good approach IMO
>> So what's the better option? Just complete the request with host path
>> error for non-ENOMEM and EAGAIN returned by the HBA driver?
> 
> what HBA driver?
mlx4 and mlx5.
> .
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-19  1:50                   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-19  1:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, linux-block, Sagi Grimberg, linux-nvme, axboe, kbusch



On 2021/1/19 1:49, Christoph Hellwig wrote:
> On Mon, Jan 18, 2021 at 11:22:16AM +0800, Chao Leng wrote:
>>> Well, certainly this one-shot always return 0 and complete the command
>>> with HOST_PATH error is not a good approach IMO
>> So what's the better option? Just complete the request with host path
>> error for non-ENOMEM and EAGAIN returned by the HBA driver?
> 
> what HBA driver?
mlx4 and mlx5.
> .
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-18  3:22               ` Chao Leng
@ 2021-01-20 21:35                 ` Sagi Grimberg
  -1 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-20 21:35 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe


is not something we should be handling in nvme. block drivers
>>>> should be able to fail queue_rq, and this all should live in the
>>>> block layer.
>>> Of course, it is also an idea to repair the block drivers directly.
>>> However, block layer is unaware of nvme native multipathing,
>>
>> Nor it should be
>>
>>> will cause the request return error which should be avoided.
>>
>> Not sure I understand..
>> requests should failover for path related errors,
>> what queue_rq errors are expected to be failed over from your
>> perspective?
> Although fail over for only path related errors is the best choice, it's
> almost impossible to achieve.
> The probability of non-path-related errors is very low. Although these
> errors do not require fail over retry, the cost of fail over retry
> is complete the request with error delay a bit long time(retry several
> times). It's not the best choice, but I think it's acceptable, because
> HBA driver does not have path-related error codes but only general error
> codes. It is difficult to identify whether the general error codes are
> path-related.

If we have a SW bug or breakage that can happen occasionally, this can
result in a constant failover rather than a simple failure. This is just
not a good approach IMO.

>>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>>> fault,
>>
>> What is the specific error the driver sees?
> The path related error code is closely related to HBA driver
> implementation. In general it is EIO. I don't think it's a good idea to
> assume what general error code the driver returns in the event of a path
> error.

But assuming every error is a path error a good idea?

>>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>>> blk_mq_end_request to complete the request which bypass name native
>>> multipath. We expect the request fail over to normal HBA, but the 
>>> request
>>> is directly completed with BLK_STS_IOERR.
>>> The two scenarios can be fixed by directly completing the request in 
>>> queue_rq.
>> Well, certainly this one-shot always return 0 and complete the command
>> with HOST_PATH error is not a good approach IMO
> So what's the better option? Just complete the request with host path
> error for non-ENOMEM and EAGAIN returned by the HBA driver?

Well, the correct thing to do here would be to clone the bio and
failover if the end_io error status is BLK_STS_IOERR. That sucks
because it adds overhead, but this proposal doesn't sit well. it
looks wrong to me.

Alternatively, a more creative idea would be to encode the error
status somehow in the cookie returned from submit_bio, but that
also feels like a small(er) hack..

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-20 21:35                 ` Sagi Grimberg
  0 siblings, 0 replies; 44+ messages in thread
From: Sagi Grimberg @ 2021-01-20 21:35 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe


is not something we should be handling in nvme. block drivers
>>>> should be able to fail queue_rq, and this all should live in the
>>>> block layer.
>>> Of course, it is also an idea to repair the block drivers directly.
>>> However, block layer is unaware of nvme native multipathing,
>>
>> Nor it should be
>>
>>> will cause the request return error which should be avoided.
>>
>> Not sure I understand..
>> requests should failover for path related errors,
>> what queue_rq errors are expected to be failed over from your
>> perspective?
> Although fail over for only path related errors is the best choice, it's
> almost impossible to achieve.
> The probability of non-path-related errors is very low. Although these
> errors do not require fail over retry, the cost of fail over retry
> is complete the request with error delay a bit long time(retry several
> times). It's not the best choice, but I think it's acceptable, because
> HBA driver does not have path-related error codes but only general error
> codes. It is difficult to identify whether the general error codes are
> path-related.

If we have a SW bug or breakage that can happen occasionally, this can
result in a constant failover rather than a simple failure. This is just
not a good approach IMO.

>>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>>> fault,
>>
>> What is the specific error the driver sees?
> The path related error code is closely related to HBA driver
> implementation. In general it is EIO. I don't think it's a good idea to
> assume what general error code the driver returns in the event of a path
> error.

But assuming every error is a path error a good idea?

>>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>>> blk_mq_end_request to complete the request which bypass name native
>>> multipath. We expect the request fail over to normal HBA, but the 
>>> request
>>> is directly completed with BLK_STS_IOERR.
>>> The two scenarios can be fixed by directly completing the request in 
>>> queue_rq.
>> Well, certainly this one-shot always return 0 and complete the command
>> with HOST_PATH error is not a good approach IMO
> So what's the better option? Just complete the request with host path
> error for non-ENOMEM and EAGAIN returned by the HBA driver?

Well, the correct thing to do here would be to clone the bio and
failover if the end_io error status is BLK_STS_IOERR. That sucks
because it adds overhead, but this proposal doesn't sit well. it
looks wrong to me.

Alternatively, a more creative idea would be to encode the error
status somehow in the cookie returned from submit_bio, but that
also feels like a small(er) hack..

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
  2021-01-20 21:35                 ` Sagi Grimberg
@ 2021-01-21  1:34                   ` Chao Leng
  -1 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-21  1:34 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, hch, linux-block, axboe



On 2021/1/21 5:35, Sagi Grimberg wrote:
> 
> is not something we should be handling in nvme. block drivers
>>>>> should be able to fail queue_rq, and this all should live in the
>>>>> block layer.
>>>> Of course, it is also an idea to repair the block drivers directly.
>>>> However, block layer is unaware of nvme native multipathing,
>>>
>>> Nor it should be
>>>
>>>> will cause the request return error which should be avoided.
>>>
>>> Not sure I understand..
>>> requests should failover for path related errors,
>>> what queue_rq errors are expected to be failed over from your
>>> perspective?
>> Although fail over for only path related errors is the best choice, it's
>> almost impossible to achieve.
>> The probability of non-path-related errors is very low. Although these
>> errors do not require fail over retry, the cost of fail over retry
>> is complete the request with error delay a bit long time(retry several
>> times). It's not the best choice, but I think it's acceptable, because
>> HBA driver does not have path-related error codes but only general error
>> codes. It is difficult to identify whether the general error codes are
>> path-related.
> 
> If we have a SW bug or breakage that can happen occasionally, this can
> result in a constant failover rather than a simple failure. This is just
> not a good approach IMO.
> 
>>>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>>>> fault,
>>>
>>> What is the specific error the driver sees?
>> The path related error code is closely related to HBA driver
>> implementation. In general it is EIO. I don't think it's a good idea to
>> assume what general error code the driver returns in the event of a path
>> error.
> 
> But assuming every error is a path error a good idea?
Of course not, according to the old code logic, assuming !ENOMEM && !EAGIAN
for HBA drivers is a path error. I think it might be reasonable.
> 
>>>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>>>> blk_mq_end_request to complete the request which bypass name native
>>>> multipath. We expect the request fail over to normal HBA, but the request
>>>> is directly completed with BLK_STS_IOERR.
>>>> The two scenarios can be fixed by directly completing the request in queue_rq.
>>> Well, certainly this one-shot always return 0 and complete the command
>>> with HOST_PATH error is not a good approach IMO
>> So what's the better option? Just complete the request with host path
>> error for non-ENOMEM and EAGAIN returned by the HBA driver?
> 
> Well, the correct thing to do here would be to clone the bio and
> failover if the end_io error status is BLK_STS_IOERR. That sucks
> because it adds overhead, but this proposal doesn't sit well. it
> looks wrong to me.
> 
> Alternatively, a more creative idea would be to encode the error
> status somehow in the cookie returned from submit_bio, but that
> also feels like a small(er) hack.
If HBA drivers return !ENOMEM && !EAGIAN, queue_rq Directly call
nvme_complete_rq with NVME_SC_HOST_PATH_ERROR like
nvmf_fail_nonready_command. nvme_complete_rq will decide to retry,
fail over or end the request. This may not be the best, but there seems
to be no better choice.
I will try to send the patch v2.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion
@ 2021-01-21  1:34                   ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-21  1:34 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: kbusch, axboe, linux-block, hch, axboe



On 2021/1/21 5:35, Sagi Grimberg wrote:
> 
> is not something we should be handling in nvme. block drivers
>>>>> should be able to fail queue_rq, and this all should live in the
>>>>> block layer.
>>>> Of course, it is also an idea to repair the block drivers directly.
>>>> However, block layer is unaware of nvme native multipathing,
>>>
>>> Nor it should be
>>>
>>>> will cause the request return error which should be avoided.
>>>
>>> Not sure I understand..
>>> requests should failover for path related errors,
>>> what queue_rq errors are expected to be failed over from your
>>> perspective?
>> Although fail over for only path related errors is the best choice, it's
>> almost impossible to achieve.
>> The probability of non-path-related errors is very low. Although these
>> errors do not require fail over retry, the cost of fail over retry
>> is complete the request with error delay a bit long time(retry several
>> times). It's not the best choice, but I think it's acceptable, because
>> HBA driver does not have path-related error codes but only general error
>> codes. It is difficult to identify whether the general error codes are
>> path-related.
> 
> If we have a SW bug or breakage that can happen occasionally, this can
> result in a constant failover rather than a simple failure. This is just
> not a good approach IMO.
> 
>>>> The scenario: use two HBAs for nvme native multipath, and then one HBA
>>>> fault,
>>>
>>> What is the specific error the driver sees?
>> The path related error code is closely related to HBA driver
>> implementation. In general it is EIO. I don't think it's a good idea to
>> assume what general error code the driver returns in the event of a path
>> error.
> 
> But assuming every error is a path error a good idea?
Of course not, according to the old code logic, assuming !ENOMEM && !EAGIAN
for HBA drivers is a path error. I think it might be reasonable.
> 
>>>> the blk_status_t of queue_rq is BLK_STS_IOERR, blk-mq will call
>>>> blk_mq_end_request to complete the request which bypass name native
>>>> multipath. We expect the request fail over to normal HBA, but the request
>>>> is directly completed with BLK_STS_IOERR.
>>>> The two scenarios can be fixed by directly completing the request in queue_rq.
>>> Well, certainly this one-shot always return 0 and complete the command
>>> with HOST_PATH error is not a good approach IMO
>> So what's the better option? Just complete the request with host path
>> error for non-ENOMEM and EAGAIN returned by the HBA driver?
> 
> Well, the correct thing to do here would be to clone the bio and
> failover if the end_io error status is BLK_STS_IOERR. That sucks
> because it adds overhead, but this proposal doesn't sit well. it
> looks wrong to me.
> 
> Alternatively, a more creative idea would be to encode the error
> status somehow in the cookie returned from submit_bio, but that
> also feels like a small(er) hack.
If HBA drivers return !ENOMEM && !EAGIAN, queue_rq Directly call
nvme_complete_rq with NVME_SC_HOST_PATH_ERROR like
nvmf_fail_nonready_command. nvme_complete_rq will decide to retry,
fail over or end the request. This may not be the best, but there seems
to be no better choice.
I will try to send the patch v2.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/6] nvme-core: introduce complete failed request
  2021-01-07  3:31   ` Chao Leng
  (?)
@ 2021-01-21  8:14   ` Hannes Reinecke
  2021-01-22  1:45     ` Chao Leng
  -1 siblings, 1 reply; 44+ messages in thread
From: Hannes Reinecke @ 2021-01-21  8:14 UTC (permalink / raw)
  To: linux-nvme

On 1/7/21 4:31 AM, Chao Leng wrote:
> When a request is queued failed, if the fail status is not
> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE,
> the request is need to complete with nvme_complete_rq in queue_rq.
> So introduce nvme_try_complete_failed_req.
> The request is needed to complete with NVME_SC_HOST_PATH_ERROR in
> nvmf_fail_nonready_command and queue_rq.
> So introduce nvme_complete_failed_req.
> For details, see the subsequent patches.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> ---
>   drivers/nvme/host/nvme.h | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index bfcedfa4b057..fc4eefdfbb34 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -649,6 +649,27 @@ void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx);
>   extern const struct attribute_group *nvme_ns_id_attr_groups[];
>   extern const struct block_device_operations nvme_ns_head_ops;
>   
> +static inline void nvme_complete_failed_req(struct request *req)
> +{
> +	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
> +	blk_mq_set_request_complete(req);
> +	nvme_complete_rq(req);
> +}
> +
> +static inline blk_status_t nvme_try_complete_failed_req(struct request *req,
> +							blk_status_t ret)
> +{
> +	switch (ret) {
> +	case BLK_STS_RESOURCE:
> +	case BLK_STS_DEV_RESOURCE:
> +	case BLK_STS_ZONE_RESOURCE:
> +		return ret;
> +	default:
> +		nvme_complete_failed_req(req);
> +		return BLK_STS_OK;
> +	}
> +}
> +
>   #ifdef CONFIG_NVME_MULTIPATH
>   static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
>   {
> 
This is not correct.
HOST_PATH_ERROR only should be set _iff_ it is a pathing issue, and the 
HBA/transport is equipped to determine that.
Any other error have other causes, and we need to look at the individual 
error codes (and situations) to determine if this really is a pathing error.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/6] nvme-core: introduce complete failed request
  2021-01-21  8:14   ` Hannes Reinecke
@ 2021-01-22  1:45     ` Chao Leng
  0 siblings, 0 replies; 44+ messages in thread
From: Chao Leng @ 2021-01-22  1:45 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme



On 2021/1/21 16:14, Hannes Reinecke wrote:
> On 1/7/21 4:31 AM, Chao Leng wrote:
>> When a request is queued failed, if the fail status is not
>> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE,
>> the request is need to complete with nvme_complete_rq in queue_rq.
>> So introduce nvme_try_complete_failed_req.
>> The request is needed to complete with NVME_SC_HOST_PATH_ERROR in
>> nvmf_fail_nonready_command and queue_rq.
>> So introduce nvme_complete_failed_req.
>> For details, see the subsequent patches.
>>
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>> ---
>>   drivers/nvme/host/nvme.h | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index bfcedfa4b057..fc4eefdfbb34 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -649,6 +649,27 @@ void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx);
>>   extern const struct attribute_group *nvme_ns_id_attr_groups[];
>>   extern const struct block_device_operations nvme_ns_head_ops;
>> +static inline void nvme_complete_failed_req(struct request *req)
>> +{
>> +    nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
>> +    blk_mq_set_request_complete(req);
>> +    nvme_complete_rq(req);
>> +}
>> +
>> +static inline blk_status_t nvme_try_complete_failed_req(struct request *req,
>> +                            blk_status_t ret)
>> +{
>> +    switch (ret) {
>> +    case BLK_STS_RESOURCE:
>> +    case BLK_STS_DEV_RESOURCE:
>> +    case BLK_STS_ZONE_RESOURCE:
>> +        return ret;
>> +    default:
>> +        nvme_complete_failed_req(req);
>> +        return BLK_STS_OK;
>> +    }
>> +}
>> +
>>   #ifdef CONFIG_NVME_MULTIPATH
>>   static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
>>   {
>>
> This is not correct.
> HOST_PATH_ERROR only should be set _iff_ it is a pathing issue, and the HBA/transport is equipped to determine that.
> Any other error have other causes, and we need to look at the individual error codes (and situations) to determine if this really is a pathing error.
Optimization has been made in patch v3.
> 
> Cheers,
> 
> Hannes

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-01-22  1:46 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07  3:31 [PATCH v2 0/6] avoid repeated request completion and IO error Chao Leng
2021-01-07  3:31 ` Chao Leng
2021-01-07  3:31 ` [PATCH v2 1/6] blk-mq: introduce blk_mq_set_request_complete Chao Leng
2021-01-07  3:31   ` Chao Leng
2021-01-14  0:17   ` Sagi Grimberg
2021-01-14  0:17     ` Sagi Grimberg
2021-01-14  6:50     ` Chao Leng
2021-01-14  6:50       ` Chao Leng
2021-01-07  3:31 ` [PATCH v2 2/6] nvme-core: introduce complete failed request Chao Leng
2021-01-07  3:31   ` Chao Leng
2021-01-21  8:14   ` Hannes Reinecke
2021-01-22  1:45     ` Chao Leng
2021-01-07  3:31 ` [PATCH v2 3/6] nvme-fabrics: avoid repeated request completion for nvmf_fail_nonready_command Chao Leng
2021-01-07  3:31   ` Chao Leng
2021-01-07  3:31 ` [PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion Chao Leng
2021-01-07  3:31   ` Chao Leng
2021-01-14  0:19   ` Sagi Grimberg
2021-01-14  0:19     ` Sagi Grimberg
2021-01-14  6:55     ` Chao Leng
2021-01-14  6:55       ` Chao Leng
2021-01-14 21:25       ` Sagi Grimberg
2021-01-14 21:25         ` Sagi Grimberg
2021-01-15  2:53         ` Chao Leng
2021-01-15  2:53           ` Chao Leng
2021-01-16  1:18           ` Sagi Grimberg
2021-01-16  1:18             ` Sagi Grimberg
2021-01-18  3:22             ` Chao Leng
2021-01-18  3:22               ` Chao Leng
2021-01-18 17:49               ` Christoph Hellwig
2021-01-18 17:49                 ` Christoph Hellwig
2021-01-19  1:50                 ` Chao Leng
2021-01-19  1:50                   ` Chao Leng
2021-01-20 21:35               ` Sagi Grimberg
2021-01-20 21:35                 ` Sagi Grimberg
2021-01-21  1:34                 ` Chao Leng
2021-01-21  1:34                   ` Chao Leng
2021-01-07  3:31 ` [PATCH v2 5/6] nvme-tcp: " Chao Leng
2021-01-07  3:31   ` Chao Leng
2021-01-07  3:31 ` [PATCH v2 6/6] nvme-fc: " Chao Leng
2021-01-07  3:31   ` Chao Leng
2021-01-14  0:15 ` [PATCH v2 0/6] avoid repeated request completion and IO error Sagi Grimberg
2021-01-14  0:15   ` Sagi Grimberg
2021-01-14  6:50   ` Chao Leng
2021-01-14  6:50     ` Chao Leng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.