All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions
@ 2019-08-09 22:12 Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


Hey Hannes and Keith,

This series handles the reset and scanning race saga.

The approach is to have the relevant admin commands return a proper
status code that reflects that we had a transport error and
not remove the namepsace if that is indeed the case.

This should be a reliable way to know if the revalidate_disk failed
due to a transport error or not.

I am able to reproduce this race with the following command (using
tcp/rdma):
for j in `seq 50`; do nvme connect-all; for i in `seq 50`; do nvme reset /dev/nvme0; done ; nvme disconnect-all; done

With this patch set (plus two more tcp/rdma transport specific patches
that address a other issues) I was able to pass the test without
reproducing the hang that you hannes reported.

Changes from v4:
- return nvme_error_status in __nvme_submit_sync_cmd and cast to
  errno in nvme_identify_ns
- modified status print in nvme_report_ns_ids

Changes from v3:
- return blk_status_to_errno instead of blk_status_t in sync cmds
- check for normal return errno from revalidate_disk, this covers
  transport errors, but also spurious allocation errors and any
  type of transient errors.

Changes from v2:
- added fc patch from James (can you test please?)
- made nvme_identify_ns return id or PTR_ERR (Hannes)

Changes from v1:
- different approach

James Smart (1):
  nvme-fc: Fail transport errors with NVME_SC_HOST_PATH

Sagi Grimberg (6):
  nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
  nvme: return a proper status for sync commands failure
  nvme: make nvme_identify_ns propagate errors back
  nvme: make nvme_report_ns_ids propagate error back
  nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
  nvme: fix ns removal hang when failing to revalidate due to a
    transient error

 drivers/nvme/host/core.c | 60 ++++++++++++++++++++++++++++------------
 drivers/nvme/host/fc.c   | 37 ++++++++++++++++++++-----
 drivers/nvme/host/tcp.c  |  2 +-
 3 files changed, 74 insertions(+), 25 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 2/7] nvme: return a proper status for sync commands failure Sagi Grimberg
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


NVME_SC_ABORT_REQ means that the request was aborted due to
an abort command received. In our case, this is a transport
cancellation, so host pathing error is much more appropriate.

Also, convert NVME_SC_HOST_PATH_ERROR to BLK_STS_TRANSPORT for
such that callers can understand that the status is a transport
related error. This will be used by the ns scanning code to
understand if it got an error from the controller or that the
controller happens to be unreachable by the transport.

Reviewed-by: Minwoo Im <minwoo.im.dev at gmail.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6956041224ec..867c8977eb3e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -228,6 +228,8 @@ static blk_status_t nvme_error_status(struct request *req)
 		return BLK_STS_PROTECTION;
 	case NVME_SC_RESERVATION_CONFLICT:
 		return BLK_STS_NEXUS;
+	case NVME_SC_HOST_PATH_ERROR:
+		return BLK_STS_TRANSPORT;
 	default:
 		return BLK_STS_IOERR;
 	}
@@ -294,7 +296,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
 				"Cancelling I/O %d", req->tag);
 
-	nvme_req(req)->status = NVME_SC_ABORT_REQ;
+	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
 	blk_mq_complete_request_sync(req);
 	return true;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/7] nvme: return a proper status for sync commands failure
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  2019-08-10 14:19   ` Keith Busch
  2019-08-09 22:12 ` [PATCH v5 3/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


callers should not rely on raw nvme status, instead return
is more appropriate blk_status_t.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 867c8977eb3e..cba0008dab5a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -805,7 +805,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
 		ret = -EINTR;
 	else
-		ret = nvme_req(req)->status;
+		ret = nvme_error_status(req);
  out:
 	blk_mq_free_request(req);
 	return ret;
@@ -896,7 +896,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
 		ret = -EINTR;
 	else
-		ret = nvme_req(req)->status;
+		ret = nvme_error_status(req);
 	if (result)
 		*result = le32_to_cpu(nvme_req(req)->result.u32);
 	if (meta && !ret && !write) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/7] nvme: make nvme_identify_ns propagate errors back
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 2/7] nvme: return a proper status for sync commands failure Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 4/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


right now callers of nvme_identify_ns only know that it failed,
but don't know why. Make nvme_identify_ns propagate the error back.
Note that if the return status from nvme_submit_sync_cmd() is positive
then we convert it from a blk_status_t to a negative errno.

Reviewed-by: Minwoo Im <minwoo.im.dev at gmail.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cba0008dab5a..c01610733c20 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1110,13 +1110,16 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
 
 	id = kmalloc(sizeof(*id), GFP_KERNEL);
 	if (!id)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	error = nvme_submit_sync_cmd(ctrl->admin_q, &c, id, sizeof(*id));
 	if (error) {
 		dev_warn(ctrl->device, "Identify namespace failed (%d)\n", error);
 		kfree(id);
-		return NULL;
+		if (error > 0)
+			return ERR_PTR(blk_status_to_errno(error));
+		else
+			return ERR_PTR(error);
 	}
 
 	return id;
@@ -1750,8 +1753,8 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	}
 
 	id = nvme_identify_ns(ctrl, ns->head->ns_id);
-	if (!id)
-		return -ENODEV;
+	if (IS_ERR(id))
+		return PTR_ERR(id);
 
 	if (id->ncap == 0) {
 		ret = -ENODEV;
@@ -3339,8 +3342,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	nvme_set_queue_limits(ctrl, ns->queue);
 
 	id = nvme_identify_ns(ctrl, nsid);
-	if (!id) {
-		ret = -EIO;
+	if (IS_ERR(id)) {
+		ret = PTR_ERR(id);
 		goto out_free_queue;
 	}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 4/7] nvme: make nvme_report_ns_ids propagate error back
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (2 preceding siblings ...)
  2019-08-09 22:12 ` [PATCH v5 3/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 5/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


And make the callers check the return status and propagate
back accordingly. Also print the return status.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c01610733c20..7c3bb7fb8eac 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1608,9 +1608,11 @@ static void nvme_config_write_zeroes(struct gendisk *disk, struct nvme_ns *ns)
 	blk_queue_max_write_zeroes_sectors(disk->queue, max_sectors);
 }
 
-static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
+static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 		struct nvme_id_ns *id, struct nvme_ns_ids *ids)
 {
+	int ret = 0;
+
 	memset(ids, 0, sizeof(*ids));
 
 	if (ctrl->vs >= NVME_VS(1, 1, 0))
@@ -1621,10 +1623,12 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 		 /* Don't treat error as fatal we potentially
 		  * already have a NGUID or EUI-64
 		  */
-		if (nvme_identify_ns_descs(ctrl, nsid, ids))
+		ret = nvme_identify_ns_descs(ctrl, nsid, ids);
+		if (ret)
 			dev_warn(ctrl->device,
-				 "%s: Identify Descriptors failed\n", __func__);
+				 "Identify Descriptors failed (%d)\n", ret);
 	}
+	return ret;
 }
 
 static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
@@ -1762,7 +1766,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	}
 
 	__nvme_revalidate_disk(disk, id);
-	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+	ret = nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+	if (ret)
+		goto out;
+
 	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
 		dev_err(ctrl->device,
 			"identifiers changed for nsid %d\n", ns->head->ns_id);
@@ -3187,7 +3194,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 	head->ns_id = nsid;
 	kref_init(&head->ref);
 
-	nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+	ret = nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+	if (ret)
+		goto out_cleanup_srcu;
 
 	ret = __nvme_check_ids(ctrl->subsys, head);
 	if (ret) {
@@ -3235,7 +3244,10 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 	} else {
 		struct nvme_ns_ids ids;
 
-		nvme_report_ns_ids(ctrl, nsid, id, &ids);
+		ret = nvme_report_ns_ids(ctrl, nsid, id, &ids);
+		if (ret)
+			goto out_unlock;
+
 		if (!nvme_ns_ids_equal(&head->ids, &ids)) {
 			dev_err(ctrl->device,
 				"IDs don't match for shared namespace %d\n",
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 5/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (3 preceding siblings ...)
  2019-08-09 22:12 ` [PATCH v5 4/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 6/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg
  6 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


This is a more appropriate error status for a transport error
detected by us (the host).

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 2de33c4c5d14..7daf39f31b08 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -842,7 +842,7 @@ static inline void nvme_tcp_done_send_req(struct nvme_tcp_queue *queue)
 
 static void nvme_tcp_fail_request(struct nvme_tcp_request *req)
 {
-	nvme_tcp_end_request(blk_mq_rq_from_pdu(req), NVME_SC_DATA_XFER_ERROR);
+	nvme_tcp_end_request(blk_mq_rq_from_pdu(req), NVME_SC_HOST_PATH_ERROR);
 }
 
 static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 6/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (4 preceding siblings ...)
  2019-08-09 22:12 ` [PATCH v5 5/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  2019-08-09 22:12 ` [PATCH v5 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg
  6 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


From: James Smart <james.smart@broadcom.com>

NVME_SC_INTERNAL should indicate an internal controller errors
and not host transport errors. These errors will propagate to
upper layers (essentially nvme core) and be interpereted as
transport errors which should not be taken into account for
namespace state or condition.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/fc.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 3f3725d1eca1..c289f46f6d13 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1608,9 +1608,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 				sizeof(op->rsp_iu), DMA_FROM_DEVICE);
 
 	if (opstate == FCPOP_STATE_ABORTED)
-		status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
-	else if (freq->status)
-		status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+		status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+	else if (freq->status) {
+		status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+		dev_info(ctrl->ctrl.device,
+			"NVME-FC{%d}: io failed due to lldd error %d\n",
+			ctrl->cnum, freq->status);
+	}
 
 	/*
 	 * For the linux implementation, if we have an unsuccesful
@@ -1637,8 +1641,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 		 * no payload in the CQE by the transport.
 		 */
 		if (freq->transferred_length !=
-			be32_to_cpu(op->cmd_iu.data_len)) {
-			status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+		    be32_to_cpu(op->cmd_iu.data_len)) {
+			status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+			dev_info(ctrl->ctrl.device,
+				"NVME-FC{%d}: io failed due to bad transfer "
+				"length: %d vs expected %d\n",
+				ctrl->cnum, freq->transferred_length,
+				be32_to_cpu(op->cmd_iu.data_len));
 			goto done;
 		}
 		result.u64 = 0;
@@ -1655,7 +1664,17 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 					freq->transferred_length ||
 			     op->rsp_iu.status_code ||
 			     sqe->common.command_id != cqe->command_id)) {
-			status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+			status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+			dev_info(ctrl->ctrl.device,
+				"NVME-FC{%d}: io failed due to bad NVMe_ERSP: "
+				"iu len %d, xfr len %d vs %d, status code "
+				"%d, cmdid %d vs %d\n",
+				ctrl->cnum, be16_to_cpu(op->rsp_iu.iu_len),
+				be32_to_cpu(op->rsp_iu.xfrd_len),
+				freq->transferred_length,
+				op->rsp_iu.status_code,
+				sqe->common.command_id,
+				cqe->command_id);
 			goto done;
 		}
 		result = cqe->result;
@@ -1663,7 +1682,11 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 		break;
 
 	default:
-		status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+		status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+		dev_info(ctrl->ctrl.device,
+			"NVME-FC{%d}: io failed due to odd NVMe_xRSP iu "
+			"len %d\n",
+			ctrl->cnum, freq->rcv_rsplen);
 		goto done;
 	}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error
  2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (5 preceding siblings ...)
  2019-08-09 22:12 ` [PATCH v5 6/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
@ 2019-08-09 22:12 ` Sagi Grimberg
  6 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-09 22:12 UTC (permalink / raw)


If a controller reset is racing with a namespace revalidation, the
revalidation (admin) I/O will surely fail, but we should not remove the
namespace as we will execute the I/O when the controller is back up.
Same for spurious allocation errors (return -ENOMEM).

Fix this by checking the specific error code that revalidate_disk
returns, and if it is a transient error (for example ENOLINK correlates
to BLK_STS_TRANSPORT or ENOMEM correlates to BLK_STS_RESOURCE or an
allocation failure), do not remove the namespace as it will either
recover when the controller is back up and schedule a subsequent scan,
or the controller is going away and the namespaces will be removed anyways.

This fixes a hang namespace scanning racing with a controller reset and
also sporious I/O errors in path failover coditions where the
controller reset is racing with the namespace scan work with multipath
enabled.

Reported-by: Hannes Reinecke  <hare at suse.de>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7c3bb7fb8eac..2154d47b34c2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3457,8 +3457,17 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	ns = nvme_find_get_ns(ctrl, nsid);
 	if (ns) {
-		if (ns->disk && revalidate_disk(ns->disk))
-			nvme_ns_remove(ns);
+		if (ns->disk) {
+			int ret = revalidate_disk(ns->disk);
+
+			/*
+			 * remove the ns only if the return status is
+			 * not a temporal execution error.
+			 */
+			if (ret && ret != -ENOLINK && ret != -ENOMEM &&
+			    ret != -EAGAIN && ret != -EBUSY)
+				nvme_ns_remove(ns);
+		}
 		nvme_put_ns(ns);
 	} else
 		nvme_alloc_ns(ctrl, nsid);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/7] nvme: return a proper status for sync commands failure
  2019-08-09 22:12 ` [PATCH v5 2/7] nvme: return a proper status for sync commands failure Sagi Grimberg
@ 2019-08-10 14:19   ` Keith Busch
  2019-08-11  5:14     ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2019-08-10 14:19 UTC (permalink / raw)


On Fri, Aug 9, 2019@4:12 PM Sagi Grimberg <sagi@grimberg.me> wrote:
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -805,7 +805,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>         if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
>                 ret = -EINTR;
>         else
> -               ret = nvme_req(req)->status;
> +               ret = nvme_error_status(req);
>   out:
>         blk_mq_free_request(req);
>         return ret;
> @@ -896,7 +896,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>         if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
>                 ret = -EINTR;
>         else
> -               ret = nvme_req(req)->status;
> +               ret = nvme_error_status(req);
>         if (result)
>                 *result = le32_to_cpu(nvme_req(req)->result.u32);
>         if (meta && !ret && !write) {
> --

Fine for the driver initiated commands, but no for user commands.
Tooling definitely wants to see the NVMe status directly, and we'll
break them all if we change this API.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 2/7] nvme: return a proper status for sync commands failure
  2019-08-10 14:19   ` Keith Busch
@ 2019-08-11  5:14     ` Sagi Grimberg
  0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-08-11  5:14 UTC (permalink / raw)



>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -805,7 +805,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
>>          if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
>>                  ret = -EINTR;
>>          else
>> -               ret = nvme_req(req)->status;
>> +               ret = nvme_error_status(req);
>>    out:
>>          blk_mq_free_request(req);
>>          return ret;
>> @@ -896,7 +896,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>>          if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
>>                  ret = -EINTR;
>>          else
>> -               ret = nvme_req(req)->status;
>> +               ret = nvme_error_status(req);
>>          if (result)
>>                  *result = le32_to_cpu(nvme_req(req)->result.u32);
>>          if (meta && !ret && !write) {
>> --
> 
> Fine for the driver initiated commands, but no for user commands.
> Tooling definitely wants to see the NVMe status directly, and we'll
> break them all if we change this API.

OK, I can leave that one alone...

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-11  5:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 22:12 [PATCH v5 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 2/7] nvme: return a proper status for sync commands failure Sagi Grimberg
2019-08-10 14:19   ` Keith Busch
2019-08-11  5:14     ` Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 3/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 4/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 5/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 6/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
2019-08-09 22:12 ` [PATCH v5 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.