* [PATCH v10 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 2/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
NVME_SC_ABORT_REQ means that the request was aborted due to
an abort command received. In our case, this is a transport
cancellation, so host pathing error is much more appropriate.
Also, convert NVME_SC_HOST_PATH_ERROR to BLK_STS_TRANSPORT for
such that callers can understand that the status is a transport
related error. This will be used by the ns scanning code to
understand if it got an error from the controller or that the
controller happens to be unreachable by the transport.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0eb5c1bb2f48..11ef174e8399 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -226,6 +226,8 @@ static blk_status_t nvme_error_status(struct request *req)
return BLK_STS_PROTECTION;
case NVME_SC_RESERVATION_CONFLICT:
return BLK_STS_NEXUS;
+ case NVME_SC_HOST_PATH_ERROR:
+ return BLK_STS_TRANSPORT;
default:
return BLK_STS_IOERR;
}
@@ -294,7 +296,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved)
if (blk_mq_request_completed(req))
return true;
- nvme_req(req)->status = NVME_SC_ABORT_REQ;
+ nvme_req(req)->status = NVME_SC_HOST_PATH_ERROR;
blk_mq_complete_request(req);
return true;
}
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v10 2/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 3/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
This is a more appropriate error status for a transport error
detected by us (the host).
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 2d8ba31cb691..0a0263a364f2 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -842,7 +842,7 @@ static inline void nvme_tcp_done_send_req(struct nvme_tcp_queue *queue)
static void nvme_tcp_fail_request(struct nvme_tcp_request *req)
{
- nvme_tcp_end_request(blk_mq_rq_from_pdu(req), NVME_SC_DATA_XFER_ERROR);
+ nvme_tcp_end_request(blk_mq_rq_from_pdu(req), NVME_SC_HOST_PATH_ERROR);
}
static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v10 3/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 1/7] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 2/7] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 4/7] nvme: pass status to nvme_error_status Sagi Grimberg
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
From: James Smart <james.smart@broadcom.com>
NVME_SC_INTERNAL should indicate an internal controller errors
and not host transport errors. These errors will propagate to
upper layers (essentially nvme core) and be interpereted as
transport errors which should not be taken into account for
namespace state or condition.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/fc.c | 37 ++++++++++++++++++++++++++++++-------
1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index bafe35bdffac..265f89e11d8b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1608,9 +1608,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
sizeof(op->rsp_iu), DMA_FROM_DEVICE);
if (opstate == FCPOP_STATE_ABORTED)
- status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
- else if (freq->status)
- status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+ status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+ else if (freq->status) {
+ status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: io failed due to lldd error %d\n",
+ ctrl->cnum, freq->status);
+ }
/*
* For the linux implementation, if we have an unsuccesful
@@ -1637,8 +1641,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
* no payload in the CQE by the transport.
*/
if (freq->transferred_length !=
- be32_to_cpu(op->cmd_iu.data_len)) {
- status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+ be32_to_cpu(op->cmd_iu.data_len)) {
+ status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: io failed due to bad transfer "
+ "length: %d vs expected %d\n",
+ ctrl->cnum, freq->transferred_length,
+ be32_to_cpu(op->cmd_iu.data_len));
goto done;
}
result.u64 = 0;
@@ -1655,7 +1664,17 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
freq->transferred_length ||
op->rsp_iu.status_code ||
sqe->common.command_id != cqe->command_id)) {
- status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+ status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: io failed due to bad NVMe_ERSP: "
+ "iu len %d, xfr len %d vs %d, status code "
+ "%d, cmdid %d vs %d\n",
+ ctrl->cnum, be16_to_cpu(op->rsp_iu.iu_len),
+ be32_to_cpu(op->rsp_iu.xfrd_len),
+ freq->transferred_length,
+ op->rsp_iu.status_code,
+ sqe->common.command_id,
+ cqe->command_id);
goto done;
}
result = cqe->result;
@@ -1663,7 +1682,11 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
break;
default:
- status = cpu_to_le16(NVME_SC_INTERNAL << 1);
+ status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: io failed due to odd NVMe_xRSP iu "
+ "len %d\n",
+ ctrl->cnum, freq->rcv_rsplen);
goto done;
}
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v10 4/7] nvme: pass status to nvme_error_status
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
` (2 preceding siblings ...)
2019-08-30 19:19 ` [PATCH v10 3/7] nvme-fc: Fail transport errors with NVME_SC_HOST_PATH Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-09-02 8:25 ` Christoph Hellwig
2019-08-30 19:19 ` [PATCH v10 5/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
` (2 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
No need for the full blown request structure.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 11ef174e8399..5e5ff7aab3ca 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -197,9 +197,9 @@ static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
}
-static blk_status_t nvme_error_status(struct request *req)
+static blk_status_t nvme_error_status(u16 status)
{
- switch (nvme_req(req)->status & 0x7ff) {
+ switch (status & 0x7ff) {
case NVME_SC_SUCCESS:
return BLK_STS_OK;
case NVME_SC_CAP_EXCEEDED:
@@ -262,7 +262,7 @@ static void nvme_retry_req(struct request *req)
void nvme_complete_rq(struct request *req)
{
- blk_status_t status = nvme_error_status(req);
+ blk_status_t status = nvme_error_status(nvme_req(req)->status);
trace_nvme_complete_rq(req);
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v10 5/7] nvme: make nvme_identify_ns propagate errors back
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
` (3 preceding siblings ...)
2019-08-30 19:19 ` [PATCH v10 4/7] nvme: pass status to nvme_error_status Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-09-02 8:25 ` Christoph Hellwig
2019-08-30 19:19 ` [PATCH v10 6/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg
6 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
right now callers of nvme_identify_ns only know that it failed,
but don't know why. Make nvme_identify_ns propagate the error back.
Because nvme_submit_sync_cmd may return a positive status code, we
make nvme_identify_ns receive the id by reference and return that
status up the call chain, but make sure not to leak positive nvme
status codes to the upper layers.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/core.c | 39 ++++++++++++++++++++-------------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5e5ff7aab3ca..73d65172006d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1096,10 +1096,9 @@ static int nvme_identify_ns_list(struct nvme_ctrl *dev, unsigned nsid, __le32 *n
NVME_IDENTIFY_DATA_SIZE);
}
-static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
- unsigned nsid)
+static int nvme_identify_ns(struct nvme_ctrl *ctrl,
+ unsigned nsid, struct nvme_id_ns **id)
{
- struct nvme_id_ns *id;
struct nvme_command c = { };
int error;
@@ -1108,18 +1107,17 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
c.identify.nsid = cpu_to_le32(nsid);
c.identify.cns = NVME_ID_CNS_NS;
- id = kmalloc(sizeof(*id), GFP_KERNEL);
- if (!id)
- return NULL;
+ *id = kmalloc(sizeof(**id), GFP_KERNEL);
+ if (!*id)
+ return -ENOMEM;
- error = nvme_submit_sync_cmd(ctrl->admin_q, &c, id, sizeof(*id));
+ error = nvme_submit_sync_cmd(ctrl->admin_q, &c, *id, sizeof(**id));
if (error) {
dev_warn(ctrl->device, "Identify namespace failed (%d)\n", error);
- kfree(id);
- return NULL;
+ kfree(*id);
}
- return id;
+ return error;
}
static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
@@ -1743,13 +1741,13 @@ static int nvme_revalidate_disk(struct gendisk *disk)
return -ENODEV;
}
- id = nvme_identify_ns(ctrl, ns->head->ns_id);
- if (!id)
- return -ENODEV;
+ ret = nvme_identify_ns(ctrl, ns->head->ns_id, &id);
+ if (ret)
+ goto out;
if (id->ncap == 0) {
ret = -ENODEV;
- goto out;
+ goto free_id;
}
__nvme_revalidate_disk(disk, id);
@@ -1760,8 +1758,11 @@ static int nvme_revalidate_disk(struct gendisk *disk)
ret = -ENODEV;
}
-out:
+free_id:
kfree(id);
+out:
+ if (ret > 0)
+ ret = blk_status_to_errno(nvme_error_status(ret));
return ret;
}
@@ -3332,11 +3333,9 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
nvme_set_queue_limits(ctrl, ns->queue);
- id = nvme_identify_ns(ctrl, nsid);
- if (!id) {
- ret = -EIO;
+ ret = nvme_identify_ns(ctrl, nsid, &id);
+ if (ret)
goto out_free_queue;
- }
if (id->ncap == 0) {
ret = -EINVAL;
@@ -3398,6 +3397,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
blk_cleanup_queue(ns->queue);
out_free_ns:
kfree(ns);
+ if (ret > 0)
+ ret = blk_status_to_errno(nvme_error_status(ret));
return ret;
}
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v10 5/7] nvme: make nvme_identify_ns propagate errors back
2019-08-30 19:19 ` [PATCH v10 5/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
@ 2019-09-02 8:25 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2019-09-02 8:25 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, James Smart, Christoph Hellwig, linux-nvme, Hannes Reinecke
On Fri, Aug 30, 2019 at 12:19:12PM -0700, Sagi Grimberg wrote:
> right now callers of nvme_identify_ns only know that it failed,
> but don't know why. Make nvme_identify_ns propagate the error back.
> Because nvme_submit_sync_cmd may return a positive status code, we
> make nvme_identify_ns receive the id by reference and return that
> status up the call chain, but make sure not to leak positive nvme
> status codes to the upper layers.
>
> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> Reviewed-by: James Smart <james.smart@broadcom.com>
> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v10 6/7] nvme: make nvme_report_ns_ids propagate error back
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
` (4 preceding siblings ...)
2019-08-30 19:19 ` [PATCH v10 5/7] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-08-30 19:19 ` [PATCH v10 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error Sagi Grimberg
6 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
Make the callers check the return status and propagate
back accordingly (casting to errno from a positive nvme status).
Also print the return status in nvme_report_ns_ids.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/core.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 73d65172006d..84ff3b12ed50 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1598,9 +1598,11 @@ static void nvme_config_write_zeroes(struct gendisk *disk, struct nvme_ns *ns)
blk_queue_max_write_zeroes_sectors(disk->queue, max_sectors);
}
-static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
+static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
struct nvme_id_ns *id, struct nvme_ns_ids *ids)
{
+ int ret = 0;
+
memset(ids, 0, sizeof(*ids));
if (ctrl->vs >= NVME_VS(1, 1, 0))
@@ -1611,10 +1613,12 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
/* Don't treat error as fatal we potentially
* already have a NGUID or EUI-64
*/
- if (nvme_identify_ns_descs(ctrl, nsid, ids))
+ ret = nvme_identify_ns_descs(ctrl, nsid, ids);
+ if (ret)
dev_warn(ctrl->device,
- "%s: Identify Descriptors failed\n", __func__);
+ "Identify Descriptors failed (%d)\n", ret);
}
+ return ret;
}
static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
@@ -1751,7 +1755,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
}
__nvme_revalidate_disk(disk, id);
- nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+ ret = nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+ if (ret)
+ goto free_id;
+
if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
dev_err(ctrl->device,
"identifiers changed for nsid %d\n", ns->head->ns_id);
@@ -3179,7 +3186,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
head->ns_id = nsid;
kref_init(&head->ref);
- nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+ ret = nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+ if (ret)
+ goto out_cleanup_srcu;
ret = __nvme_check_ids(ctrl->subsys, head);
if (ret) {
@@ -3204,6 +3213,8 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
out_free_head:
kfree(head);
out:
+ if (ret > 0)
+ ret = blk_status_to_errno(nvme_error_status(ret));
return ERR_PTR(ret);
}
@@ -3227,7 +3238,10 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
} else {
struct nvme_ns_ids ids;
- nvme_report_ns_ids(ctrl, nsid, id, &ids);
+ ret = nvme_report_ns_ids(ctrl, nsid, id, &ids);
+ if (ret)
+ goto out_unlock;
+
if (!nvme_ns_ids_equal(&head->ids, &ids)) {
dev_err(ctrl->device,
"IDs don't match for shared namespace %d\n",
@@ -3242,6 +3256,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
out_unlock:
mutex_unlock(&ctrl->subsys->lock);
+ if (ret > 0)
+ ret = blk_status_to_errno(nvme_error_status(ret));
return ret;
}
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v10 7/7] nvme: fix ns removal hang when failing to revalidate due to a transient error
2019-08-30 19:19 [PATCH v10 0/7] nvme controller reset and namespace scan work race conditions Sagi Grimberg
` (5 preceding siblings ...)
2019-08-30 19:19 ` [PATCH v10 6/7] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
@ 2019-08-30 19:19 ` Sagi Grimberg
2019-09-02 8:26 ` Christoph Hellwig
6 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2019-08-30 19:19 UTC (permalink / raw)
To: linux-nvme; +Cc: Keith Busch, James Smart, Christoph Hellwig, Hannes Reinecke
If a controller reset is racing with a namespace revalidation, the
revalidation (admin) I/O will surely fail, but we should not remove the
namespace as we will execute the I/O when the controller is back up.
Same for spurious allocation errors (return -ENOMEM).
Fix this by checking the specific error code in nvme_revalidate_disk and
if it is a transient error (for example non DNR nvme statuses or
a negative ENOMEM as allocation failure), do not remove the namespace as
it will either recover when the controller is back up and schedule
a subsequent scan, or the controller is going away and the namespaces
will be removed anyways.
This fixes a hang namespace scanning racing with a controller reset and
also sporious I/O errors in path failover coditions where the
controller reset is racing with the namespace scan work with multipath
enabled.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 84ff3b12ed50..bb685af53960 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1768,7 +1768,13 @@ static int nvme_revalidate_disk(struct gendisk *disk)
free_id:
kfree(id);
out:
- if (ret > 0)
+ /*
+ * Only fail the function if we got a fatal error back from the
+ * device, otherwise ignore the error and just move on.
+ */
+ if (ret == -ENOMEM || (ret > 0 && !(ret & NVME_SC_DNR)))
+ ret = 0;
+ else if (ret > 0)
ret = blk_status_to_errno(nvme_error_status(ret));
return ret;
}
--
2.17.1
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 11+ messages in thread