[PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions
@ 2019-08-03  2:49 Sagi Grimberg
  2019-08-03  2:49 ` [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


Hey Hannes and Keith,

This is the second attempt to handle the reset and scanning race saga.

The approach is to have the relevant admin commands return a proper
status code that reflects that we had a transport error and
not remove the namepsace if that is indeed the case.

This should be a reliable way to know if the revalidate_disk failed
due to a transport error or not.

I am able to reproduce this race with the following command (using
tcp/rdma):
for j in `seq 50`; do for i in `seq 50`; do nvme reset /dev/nvme0; done ; nvme disconnect-all; nvme connect-all; done

With this patch set (plus two more tcp/rdma transport specific patches
that address a other issues) I was able to pass the test without
reproducing the hang that you hannes reported.

In the patchset:
- first make sure that transport related errors (such as
  nvme_cancel_request) reflect HOST_PATH_ERROR status.
- make NVME_SC_HOST_PATH_ERROR a BLK_STS_TRANSPORT conversion.
- Make sure that the callers indeed propagate the status back
- Then simply look at the status code when calling revalidate_disk
  in nvme_validate_ns, and only remove it if the status code is
  indeed a transport related status.

Please let me know your thoghts.

Sagi Grimberg (6):
  nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
  nvme: return nvme_error_status for sync commands failure
  nvme: make nvme_identify_ns propagate errors back
  nvme: make nvme_report_ns_ids propagate error back
  nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
  nvme: don't remove namespace if revalidate failed because of a
    transport error

 drivers/nvme/host/core.c | 67 ++++++++++++++++++++++++----------------
 drivers/nvme/host/tcp.c  |  2 +-
 2 files changed, 41 insertions(+), 28 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
  2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
@ 2019-08-03  2:49 ` Sagi Grimberg
  2019-08-04  7:56   ` Minwoo Im
  2019-08-03  2:49 ` [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure Sagi Grimberg
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


NVME_SC_ABORT_REQ means that the request was aborted due to
an abort command received. In our case, this is a transport
cancellation, so host pathing error is much more appropriate.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6956041224ec..d2b4f57fa67e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -294,7 +294,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 	dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
 				"Cancelling I/O %d", req->tag);
 
-	nvme_req(req)->status = NVME_SC_ABORT_REQ;
+	nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
 	blk_mq_complete_request_sync(req);
 	return true;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure
  2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
  2019-08-03  2:49 ` [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
@ 2019-08-03  2:49 ` Sagi Grimberg
  2019-08-04  7:57   ` Minwoo Im
  2019-08-03  2:49 ` [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


callers should not rely on raw nvme status, blk_status_t
is more appropriate.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d2b4f57fa67e..f435c85c4062 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -803,7 +803,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
 		ret = -EINTR;
 	else
-		ret = nvme_req(req)->status;
+		ret = nvme_error_status(req);
  out:
 	blk_mq_free_request(req);
 	return ret;
@@ -894,7 +894,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
 		ret = -EINTR;
 	else
-		ret = nvme_req(req)->status;
+		ret = nvme_error_status(req);
 	if (result)
 		*result = le32_to_cpu(nvme_req(req)->result.u32);
 	if (meta && !ret && !write) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back
  2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
  2019-08-03  2:49 ` [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
  2019-08-03  2:49 ` [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure Sagi Grimberg
@ 2019-08-03  2:49 ` Sagi Grimberg
  2019-08-04  8:06   ` Minwoo Im
  2019-08-05 12:39   ` Hannes Reinecke
  2019-08-03  2:49 ` [PATCH rfc v2 4/6] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


right now callers of nvme_identify_ns only know that
it failed, but don't know why. Make nvme_identify_ns propagate
the error back (like nvme_identify_ctrl).

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f435c85c4062..802d1b49ecff 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1094,10 +1094,9 @@ static int nvme_identify_ns_list(struct nvme_ctrl *dev, unsigned nsid, __le32 *n
 				    NVME_IDENTIFY_DATA_SIZE);
 }
 
-static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
-		unsigned nsid)
+static int nvme_identify_ns(struct nvme_ctrl *ctrl,
+		unsigned nsid, struct nvme_id_ns **id)
 {
-	struct nvme_id_ns *id;
 	struct nvme_command c = { };
 	int error;
 
@@ -1106,18 +1105,17 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
 	c.identify.nsid = cpu_to_le32(nsid);
 	c.identify.cns = NVME_ID_CNS_NS;
 
-	id = kmalloc(sizeof(*id), GFP_KERNEL);
-	if (!id)
-		return NULL;
+	*id = kmalloc(sizeof(**id), GFP_KERNEL);
+	if (!*id)
+		return -ENOMEM;
 
-	error = nvme_submit_sync_cmd(ctrl->admin_q, &c, id, sizeof(*id));
+	error = nvme_submit_sync_cmd(ctrl->admin_q, &c, *id, sizeof(**id));
 	if (error) {
 		dev_warn(ctrl->device, "Identify namespace failed (%d)\n", error);
-		kfree(id);
-		return NULL;
+		kfree(*id);
 	}
 
-	return id;
+	return error;
 }
 
 static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
@@ -1747,9 +1745,9 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		return -ENODEV;
 	}
 
-	id = nvme_identify_ns(ctrl, ns->head->ns_id);
-	if (!id)
-		return -ENODEV;
+	ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id);
+	if (ret)
+		return ret;
 
 	if (id->ncap == 0) {
 		ret = -ENODEV;
@@ -3336,11 +3334,9 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
 	nvme_set_queue_limits(ctrl, ns->queue);
 
-	id = nvme_identify_ns(ctrl, nsid);
-	if (!id) {
-		ret = -EIO;
+	ret = nvme_identify_ns(ctrl, nsid, &id);
+	if (ret)
 		goto out_free_queue;
-	}
 
 	if (id->ncap == 0) {
 		ret = -EINVAL;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 4/6] nvme: make nvme_report_ns_ids propagate error back
  2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (2 preceding siblings ...)
  2019-08-03  2:49 ` [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
@ 2019-08-03  2:49 ` Sagi Grimberg
  2019-08-03  2:49 ` [PATCH rfc v2 5/6] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


And make the callers check the return status and propagate
back accordingly.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 802d1b49ecff..b561b61b35d9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1601,9 +1601,11 @@ static void nvme_config_write_zeroes(struct gendisk *disk, struct nvme_ns *ns)
 	blk_queue_max_write_zeroes_sectors(disk->queue, max_sectors);
 }
 
-static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
+static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 		struct nvme_id_ns *id, struct nvme_ns_ids *ids)
 {
+	int ret = 0;
+
 	memset(ids, 0, sizeof(*ids));
 
 	if (ctrl->vs >= NVME_VS(1, 1, 0))
@@ -1614,10 +1616,12 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 		 /* Don't treat error as fatal we potentially
 		  * already have a NGUID or EUI-64
 		  */
-		if (nvme_identify_ns_descs(ctrl, nsid, ids))
+		ret = nvme_identify_ns_descs(ctrl, nsid, ids);
+		if (ret)
 			dev_warn(ctrl->device,
 				 "%s: Identify Descriptors failed\n", __func__);
 	}
+	return ret;
 }
 
 static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
@@ -1755,7 +1759,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	}
 
 	__nvme_revalidate_disk(disk, id);
-	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+	ret = nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+	if (ret)
+		goto out;
+
 	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
 		dev_err(ctrl->device,
 			"identifiers changed for nsid %d\n", ns->head->ns_id);
@@ -3180,7 +3187,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 	head->ns_id = nsid;
 	kref_init(&head->ref);
 
-	nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+	ret = nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+	if (ret)
+		goto out_cleanup_srcu;
 
 	ret = __nvme_check_ids(ctrl->subsys, head);
 	if (ret) {
@@ -3228,7 +3237,10 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 	} else {
 		struct nvme_ns_ids ids;
 
-		nvme_report_ns_ids(ctrl, nsid, id, &ids);
+		ret = nvme_report_ns_ids(ctrl, nsid, id, &ids);
+		if (ret)
+			goto out_unlock;
+
 		if (!nvme_ns_ids_equal(&head->ids, &ids)) {
 			dev_err(ctrl->device,
 				"IDs don't match for shared namespace %d\n",
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 5/6] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
  2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (3 preceding siblings ...)
  2019-08-03  2:49 ` [PATCH rfc v2 4/6] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
@ 2019-08-03  2:49 ` Sagi Grimberg
  2019-08-03  2:49 ` [PATCH rfc v2 6/6] nvme: don't remove namespace if revalidate failed because of a transport error Sagi Grimberg
       [not found] ` <e04be8bf-20cb-db7d-6046-74d8ec6fa485@broadcom.com>
  6 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


This is a more appropriate error status for a transport error
detected by us (the host).

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 2de33c4c5d14..7daf39f31b08 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -842,7 +842,7 @@ static inline void nvme_tcp_done_send_req(struct nvme_tcp_queue *queue)
 
 static void nvme_tcp_fail_request(struct nvme_tcp_request *req)
 {
-	nvme_tcp_end_request(blk_mq_rq_from_pdu(req), NVME_SC_DATA_XFER_ERROR);
+	nvme_tcp_end_request(blk_mq_rq_from_pdu(req), NVME_SC_HOST_PATH_ERROR);
 }
 
 static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 6/6] nvme: don't remove namespace if revalidate failed because of a transport error
  2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
                   ` (4 preceding siblings ...)
  2019-08-03  2:49 ` [PATCH rfc v2 5/6] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
@ 2019-08-03  2:49 ` Sagi Grimberg
       [not found] ` <e04be8bf-20cb-db7d-6046-74d8ec6fa485@broadcom.com>
  6 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-03  2:49 UTC (permalink / raw)


If a controller reset is racing with a namespace revalidation, the
revalidation (admin) I/O will surely fail, but we should not remove the
namespace as we will execute the I/O when the controller is back up.

Fix this by checking the specific error code that revalidate_disk
returns, and if it is a transport related error, do not remove
the namespace as it will either recover when the controller is
back up and schedule a subsequent scan, or the controller is
going away and the namespaces will be removed anyways.

This fixes a hang namespace scanning racing with a controller reset and
also sporious I/O errors in path failover coditions where the
controller reset is racing with the namespace scan work with multipath
enabled.

Reported-by: Hannes Reinecke  <hare at suse.de>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b561b61b35d9..7412989a1f78 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -228,6 +228,8 @@ static blk_status_t nvme_error_status(struct request *req)
 		return BLK_STS_PROTECTION;
 	case NVME_SC_RESERVATION_CONFLICT:
 		return BLK_STS_NEXUS;
+	case NVME_SC_HOST_PATH_ERROR:
+		return BLK_STS_TRANSPORT;
 	default:
 		return BLK_STS_IOERR;
 	}
@@ -3448,8 +3450,11 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	ns = nvme_find_get_ns(ctrl, nsid);
 	if (ns) {
-		if (ns->disk && revalidate_disk(ns->disk))
-			nvme_ns_remove(ns);
+		if (ns->disk) {
+			blk_status_t sts = revalidate_disk(ns->disk);
+			if (sts && sts != BLK_STS_TRANSPORT)
+				nvme_ns_remove(ns);
+		}
 		nvme_put_ns(ns);
 	} else
 		nvme_alloc_ns(ctrl, nsid);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
  2019-08-03  2:49 ` [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
@ 2019-08-04  7:56   ` Minwoo Im
  0 siblings, 0 replies; 16+ messages in thread
From: Minwoo Im @ 2019-08-04  7:56 UTC (permalink / raw)


On 19-08-02 19:49:50, Sagi Grimberg wrote:
> NVME_SC_ABORT_REQ means that the request was aborted due to
> an abort command received. In our case, this is a transport
> cancellation, so host pathing error is much more appropriate.
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

This looks good to me.

Reviewed-by: Minwoo Im <minwoo.im.dev at gmail.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure
  2019-08-03  2:49 ` [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure Sagi Grimberg
@ 2019-08-04  7:57   ` Minwoo Im
  2019-08-05 18:12     ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Minwoo Im @ 2019-08-04  7:57 UTC (permalink / raw)


Hi Sagi,

On 19-08-02 19:49:51, Sagi Grimberg wrote:
> callers should not rely on raw nvme status, blk_status_t
> is more appropriate.

May I ask why caller should not rely on the nvme status?

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back
  2019-08-03  2:49 ` [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
@ 2019-08-04  8:06   ` Minwoo Im
  2019-08-05 12:39   ` Hannes Reinecke
  1 sibling, 0 replies; 16+ messages in thread
From: Minwoo Im @ 2019-08-04  8:06 UTC (permalink / raw)


On 19-08-02 19:49:52, Sagi Grimberg wrote:
> right now callers of nvme_identify_ns only know that
> it failed, but don't know why. Make nvme_identify_ns propagate
> the error back (like nvme_identify_ctrl).
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Reviewd-by: Minwoo Im <minwoo.im.dev at gmail.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back
  2019-08-03  2:49 ` [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
  2019-08-04  8:06   ` Minwoo Im
@ 2019-08-05 12:39   ` Hannes Reinecke
  2019-08-05 18:18     ` Sagi Grimberg
  1 sibling, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2019-08-05 12:39 UTC (permalink / raw)


On 8/3/19 4:49 AM, Sagi Grimberg wrote:
> right now callers of nvme_identify_ns only know that
> it failed, but don't know why. Make nvme_identify_ns propagate
> the error back (like nvme_identify_ctrl).
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>  drivers/nvme/host/core.c | 30 +++++++++++++-----------------
>  1 file changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f435c85c4062..802d1b49ecff 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1094,10 +1094,9 @@ static int nvme_identify_ns_list(struct nvme_ctrl *dev, unsigned nsid, __le32 *n
>  				    NVME_IDENTIFY_DATA_SIZE);
>  }
>  
> -static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
> -		unsigned nsid)
> +static int nvme_identify_ns(struct nvme_ctrl *ctrl,
> +		unsigned nsid, struct nvme_id_ns **id)
>  {
> -	struct nvme_id_ns *id;
>  	struct nvme_command c = { };
>  	int error;
>  
> @@ -1106,18 +1105,17 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
>  	c.identify.nsid = cpu_to_le32(nsid);
>  	c.identify.cns = NVME_ID_CNS_NS;
>  
> -	id = kmalloc(sizeof(*id), GFP_KERNEL);
> -	if (!id)
> -		return NULL;
> +	*id = kmalloc(sizeof(**id), GFP_KERNEL);
> +	if (!*id)
> +		return -ENOMEM;
>  
> -	error = nvme_submit_sync_cmd(ctrl->admin_q, &c, id, sizeof(*id));
> +	error = nvme_submit_sync_cmd(ctrl->admin_q, &c, *id, sizeof(**id));
>  	if (error) {
>  		dev_warn(ctrl->device, "Identify namespace failed (%d)\n", error);
> -		kfree(id);
> -		return NULL;
> +		kfree(*id);
>  	}
>  
> -	return id;
> +	return error;
>  }
>  
I'd prefer using PTR_ERR() here; that would avoid the **id parameter.
(I'm really biased against this calling model, returning an error _and_
a structure. You can never be sure that both match, and you always get
into weird error handling routines.)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure
  2019-08-04  7:57   ` Minwoo Im
@ 2019-08-05 18:12     ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-05 18:12 UTC (permalink / raw)



> Hi Sagi,
> 
> On 19-08-02 19:49:51, Sagi Grimberg wrote:
>> callers should not rely on raw nvme status, blk_status_t
>> is more appropriate.
> 
> May I ask why caller should not rely on the nvme status?

Because the origin of these calls may not be nvme at all, so
a generic block status is definitely more appropriate.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back
  2019-08-05 12:39   ` Hannes Reinecke
@ 2019-08-05 18:18     ` Sagi Grimberg
  2019-08-06  6:33       ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-05 18:18 UTC (permalink / raw)



> I'd prefer using PTR_ERR() here; that would avoid the **id parameter.
> (I'm really biased against this calling model, returning an error _and_
> a structure.

I can do that, shouldn't be a problem.

> You can never be sure that both match, and you always get
> into weird error handling routines.)

Well, the semantics is clear. If the call succeeded, id was allocated
and if not, id wasn't allocated. Not sure how this is confusing or
weird.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Fwd: [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions
       [not found] ` <e04be8bf-20cb-db7d-6046-74d8ec6fa485@broadcom.com>
@ 2019-08-05 19:32   ` James Smart
  2019-08-05 19:48     ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: James Smart @ 2019-08-05 19:32 UTC (permalink / raw)


On 8/5/2019 10:47 AM, James Smart wrote:
> 
> Hey Hannes and Keith,
> 
> This is the second attempt to handle the reset and scanning race saga.
> 
> The approach is to have the relevant admin commands return a proper
> status code that reflects that we had a transport error and
> not remove the namepsace if that is indeed the case.
> 
> This should be a reliable way to know if the revalidate_disk failed
> due to a transport error or not.
> 
> I am able to reproduce this race with the following command (using
> tcp/rdma):
> for j in `seq 50`; do for i in `seq 50`; do nvme reset /dev/nvme0; done 
> ; nvme disconnect-all; nvme connect-all; done
> 
> With this patch set (plus two more tcp/rdma transport specific patches
> that address a other issues) I was able to pass the test without
> reproducing the hang that you hannes reported.
> 
> In the patchset:
> - first make sure that transport related errors (such as
> nvme_cancel_request) reflect HOST_PATH_ERROR status.
> - make NVME_SC_HOST_PATH_ERROR a BLK_STS_TRANSPORT conversion.
> - Make sure that the callers indeed propagate the status back
> - Then simply look at the status code when calling revalidate_disk
> in nvme_validate_ns, and only remove it if the status code is
> indeed a transport related status.
> 
> Please let me know your thoghts.
> 
> Sagi Grimberg (6):
> nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
> nvme: return nvme_error_status for sync commands failure
> nvme: make nvme_identify_ns propagate errors back
> nvme: make nvme_report_ns_ids propagate error back
> nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
> nvme: don't remove namespace if revalidate failed because of a
> transport error
> 
> drivers/nvme/host/core.c | 67 ++++++++++++++++++++++++----------------
> drivers/nvme/host/tcp.c | 2 +-


I have two complaints:

- FC is eliminated from this fix, as FC doesn't use 
nvme_cancel_request() to terminate I/O's on resets/error recovery.

- With this change, any error set by a transport must be either 
NVME_SC_SUCCESS or NVME_SC_HOST_PATH_ERROR or it potentially misses the 
fix. I'm not sure that I like that, although any error from the 
transport is essentially a path error.  Do we lose anything by the loss 
of error clarity ?   Looking at what's present, we don't have much 
diversity when an error is set, so I guess it's ok.

- james

Minimally, include the following fix for FC as well:


Signed-off-by: James Smart <james.smart at broadcom.com>
---
  fc.c.new |   35 +++++++++++++++++++++++++++++------
  1 file changed, 29 insertions(+), 6 deletions(-)

--- fc.c.old	2019-08-05 10:55:51.318627945 -0700
+++ fc.c.new	2019-08-05 11:01:46.278380191 -0700
@@ -1608,9 +1608,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
  				sizeof(op->rsp_iu), DMA_FROM_DEVICE);

  	if (opstate == FCPOP_STATE_ABORTED)
-		status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
-	else if (freq->status)
-		status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+		status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+	else if (freq->status) {
+		status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+		dev_info(ctrl->ctrl.device,
+			"NVME-FC{%d}: io failed due to lldd error %d\n",
+			ctrl->cnum, freq->status);
+	}

  	/*
  	 * For the linux implementation, if we have an unsuccesful
@@ -1638,7 +1642,12 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
  		 */
  		if (freq->transferred_length !=
  			be32_to_cpu(op->cmd_iu.data_len)) {
-			status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+			status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+			dev_info(ctrl->ctrl.device,
+				"NVME-FC{%d}: io failed due to bad transfer "
+				"length: %d vs expected %d\n",
+				ctrl->cnum, freq->transferred_length,
+				be32_to_cpu(op->cmd_iu.data_len));
  			goto done;
  		}
  		result.u64 = 0;
@@ -1655,7 +1664,17 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
  					freq->transferred_length ||
  			     op->rsp_iu.status_code ||
  			     sqe->common.command_id != cqe->command_id)) {
-			status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+			status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+			dev_info(ctrl->ctrl.device,
+				"NVME-FC{%d}: io failed due to bad NVMe_ERSP: "
+				"iu len %d, xfr len %d vs %d, status code "
+				"%d, cmdid %d vs %d\n",
+				ctrl->cnum, be16_to_cpu(op->rsp_iu.iu_len),
+				be32_to_cpu(op->rsp_iu.xfrd_len),
+				freq->transferred_length,
+			     	op->rsp_iu.status_code,
+				sqe->common.command_id,
+				cqe->command_id);
  			goto done;
  		}
  		result = cqe->result;
@@ -1663,7 +1682,11 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
  		break;

  	default:
-		status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+		status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+		dev_info(ctrl->ctrl.device,
+			"NVME-FC{%d}: io failed due to odd NVMe_xRSP iu "
+			"len %d\n",
+			ctrl->cnum, freq->rcv_rsplen);
  		goto done;
  	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Fwd: [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions
  2019-08-05 19:32   ` Fwd: [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions James Smart
@ 2019-08-05 19:48     ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2019-08-05 19:48 UTC (permalink / raw)



> I have two complaints:
> 
> - FC is eliminated from this fix, as FC doesn't use 
> nvme_cancel_request() to terminate I/O's on resets/error recovery.

Becuase I didn't find nvme_cancel_request in fc I assumed that it
doesn't see the issue, but thanks for the extra hunk that is needed!

> - With this change, any error set by a transport must be either 
> NVME_SC_SUCCESS or NVME_SC_HOST_PATH_ERROR or it potentially misses the 
> fix. I'm not sure that I like that, although any error from the 
> transport is essentially a path error.? Do we lose anything by the loss 
> of error clarity ??? Looking at what's present, we don't have much 
> diversity when an error is set, so I guess it's ok.

The main objective is to avoid the issue if we have a transport related
error which means we cannot communicate the the device (hence we set
BLK_STS_TRANSPORT). If we have another transport related error we should
assign that as well to this path.

> Minimally, include the following fix for FC as well:
> 
> 
> Signed-off-by: James Smart <james.smart at broadcom.com>
> ---
>  ?fc.c.new |?? 35 +++++++++++++++++++++++++++++------
>  ?1 file changed, 29 insertions(+), 6 deletions(-)
> 
> --- fc.c.old??? 2019-08-05 10:55:51.318627945 -0700
> +++ fc.c.new??? 2019-08-05 11:01:46.278380191 -0700
> @@ -1608,9 +1608,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ???????????????? sizeof(op->rsp_iu), DMA_FROM_DEVICE);
> 
>  ???? if (opstate == FCPOP_STATE_ABORTED)
> -??????? status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
> -??? else if (freq->status)
> -??????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??? else if (freq->status) {
> +??????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????? dev_info(ctrl->ctrl.device,
> +??????????? "NVME-FC{%d}: io failed due to lldd error %d\n",
> +??????????? ctrl->cnum, freq->status);
> +??? }
> 
>  ???? /*
>  ????? * For the linux implementation, if we have an unsuccesful
> @@ -1638,7 +1642,12 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ????????? */
>  ???????? if (freq->transferred_length !=
>  ???????????? be32_to_cpu(op->cmd_iu.data_len)) {
> -??????????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????????? dev_info(ctrl->ctrl.device,
> +??????????????? "NVME-FC{%d}: io failed due to bad transfer "
> +??????????????? "length: %d vs expected %d\n",
> +??????????????? ctrl->cnum, freq->transferred_length,
> +??????????????? be32_to_cpu(op->cmd_iu.data_len));
>  ???????????? goto done;
>  ???????? }
>  ???????? result.u64 = 0;
> @@ -1655,7 +1664,17 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ???????????????????? freq->transferred_length ||
>  ????????????????? op->rsp_iu.status_code ||
>  ????????????????? sqe->common.command_id != cqe->command_id)) {
> -??????????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????????? dev_info(ctrl->ctrl.device,
> +??????????????? "NVME-FC{%d}: io failed due to bad NVMe_ERSP: "
> +??????????????? "iu len %d, xfr len %d vs %d, status code "
> +??????????????? "%d, cmdid %d vs %d\n",
> +??????????????? ctrl->cnum, be16_to_cpu(op->rsp_iu.iu_len),
> +??????????????? be32_to_cpu(op->rsp_iu.xfrd_len),
> +??????????????? freq->transferred_length,
> +???????????????????? op->rsp_iu.status_code,
> +??????????????? sqe->common.command_id,
> +??????????????? cqe->command_id);
>  ???????????? goto done;
>  ???????? }
>  ???????? result = cqe->result;
> @@ -1663,7 +1682,11 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ???????? break;
> 
>  ???? default:
> -??????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????? dev_info(ctrl->ctrl.device,
> +??????????? "NVME-FC{%d}: io failed due to odd NVMe_xRSP iu "
> +??????????? "len %d\n",
> +??????????? ctrl->cnum, freq->rcv_rsplen);
>  ???????? goto done;
>  ???? }
> 

I will, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back
  2019-08-05 18:18     ` Sagi Grimberg
@ 2019-08-06  6:33       ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2019-08-06  6:33 UTC (permalink / raw)


On 8/5/19 8:18 PM, Sagi Grimberg wrote:
> 
>> I'd prefer using PTR_ERR() here; that would avoid the **id parameter.
>> (I'm really biased against this calling model, returning an error _and_
>> a structure.
> 
> I can do that, shouldn't be a problem.
> 
>> You can never be sure that both match, and you always get
>> into weird error handling routines.)
> 
> Well, the semantics is clear. If the call succeeded, id was allocated
> and if not, id wasn't allocated. Not sure how this is confusing or
> weird.
>
... In theory, yes. But there is no implicit guarantee that this is the
case; you always have _two_ return values which might be conflicting.
Hence I prefer the PTR_ERR() approach; that way we'll only ever having
one return value.

(Plus it'll be easier for me to backport :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-08-06  6:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
2019-08-04  7:56   ` Minwoo Im
2019-08-03  2:49 ` [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure Sagi Grimberg
2019-08-04  7:57   ` Minwoo Im
2019-08-05 18:12     ` Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
2019-08-04  8:06   ` Minwoo Im
2019-08-05 12:39   ` Hannes Reinecke
2019-08-05 18:18     ` Sagi Grimberg
2019-08-06  6:33       ` Hannes Reinecke
2019-08-03  2:49 ` [PATCH rfc v2 4/6] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 5/6] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 6/6] nvme: don't remove namespace if revalidate failed because of a transport error Sagi Grimberg
     [not found] ` <e04be8bf-20cb-db7d-6046-74d8ec6fa485@broadcom.com>
2019-08-05 19:32   ` Fwd: [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions James Smart
2019-08-05 19:48     ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.