All of lore.kernel.org
 help / color / mirror / Atom feed
From: sagi@grimberg.me (Sagi Grimberg)
Subject: Fwd: [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions
Date: Mon, 5 Aug 2019 12:48:41 -0700	[thread overview]
Message-ID: <e76173c0-b398-5e57-7fe6-e943ecd7b625@grimberg.me> (raw)
In-Reply-To: <e80a1e7a-ab7d-fc89-0f62-2c88b7c75b85@gmail.com>


> I have two complaints:
> 
> - FC is eliminated from this fix, as FC doesn't use 
> nvme_cancel_request() to terminate I/O's on resets/error recovery.

Becuase I didn't find nvme_cancel_request in fc I assumed that it
doesn't see the issue, but thanks for the extra hunk that is needed!

> - With this change, any error set by a transport must be either 
> NVME_SC_SUCCESS or NVME_SC_HOST_PATH_ERROR or it potentially misses the 
> fix. I'm not sure that I like that, although any error from the 
> transport is essentially a path error.? Do we lose anything by the loss 
> of error clarity ??? Looking at what's present, we don't have much 
> diversity when an error is set, so I guess it's ok.

The main objective is to avoid the issue if we have a transport related
error which means we cannot communicate the the device (hence we set
BLK_STS_TRANSPORT). If we have another transport related error we should
assign that as well to this path.

> Minimally, include the following fix for FC as well:
> 
> 
> Signed-off-by: James Smart <james.smart at broadcom.com>
> ---
>  ?fc.c.new |?? 35 +++++++++++++++++++++++++++++------
>  ?1 file changed, 29 insertions(+), 6 deletions(-)
> 
> --- fc.c.old??? 2019-08-05 10:55:51.318627945 -0700
> +++ fc.c.new??? 2019-08-05 11:01:46.278380191 -0700
> @@ -1608,9 +1608,13 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ???????????????? sizeof(op->rsp_iu), DMA_FROM_DEVICE);
> 
>  ???? if (opstate == FCPOP_STATE_ABORTED)
> -??????? status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
> -??? else if (freq->status)
> -??????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??? else if (freq->status) {
> +??????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????? dev_info(ctrl->ctrl.device,
> +??????????? "NVME-FC{%d}: io failed due to lldd error %d\n",
> +??????????? ctrl->cnum, freq->status);
> +??? }
> 
>  ???? /*
>  ????? * For the linux implementation, if we have an unsuccesful
> @@ -1638,7 +1642,12 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ????????? */
>  ???????? if (freq->transferred_length !=
>  ???????????? be32_to_cpu(op->cmd_iu.data_len)) {
> -??????????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????????? dev_info(ctrl->ctrl.device,
> +??????????????? "NVME-FC{%d}: io failed due to bad transfer "
> +??????????????? "length: %d vs expected %d\n",
> +??????????????? ctrl->cnum, freq->transferred_length,
> +??????????????? be32_to_cpu(op->cmd_iu.data_len));
>  ???????????? goto done;
>  ???????? }
>  ???????? result.u64 = 0;
> @@ -1655,7 +1664,17 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ???????????????????? freq->transferred_length ||
>  ????????????????? op->rsp_iu.status_code ||
>  ????????????????? sqe->common.command_id != cqe->command_id)) {
> -??????????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????????? dev_info(ctrl->ctrl.device,
> +??????????????? "NVME-FC{%d}: io failed due to bad NVMe_ERSP: "
> +??????????????? "iu len %d, xfr len %d vs %d, status code "
> +??????????????? "%d, cmdid %d vs %d\n",
> +??????????????? ctrl->cnum, be16_to_cpu(op->rsp_iu.iu_len),
> +??????????????? be32_to_cpu(op->rsp_iu.xfrd_len),
> +??????????????? freq->transferred_length,
> +???????????????????? op->rsp_iu.status_code,
> +??????????????? sqe->common.command_id,
> +??????????????? cqe->command_id);
>  ???????????? goto done;
>  ???????? }
>  ???????? result = cqe->result;
> @@ -1663,7 +1682,11 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req
>  ???????? break;
> 
>  ???? default:
> -??????? status = cpu_to_le16(NVME_SC_INTERNAL << 1);
> +??????? status = cpu_to_le16(NVME_SC_HOST_PATH_ERROR << 1);
> +??????? dev_info(ctrl->ctrl.device,
> +??????????? "NVME-FC{%d}: io failed due to odd NVMe_xRSP iu "
> +??????????? "len %d\n",
> +??????????? ctrl->cnum, freq->rcv_rsplen);
>  ???????? goto done;
>  ???? }
> 

I will, thanks!

      reply	other threads:[~2019-08-05 19:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-03  2:49 [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 1/6] nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR Sagi Grimberg
2019-08-04  7:56   ` Minwoo Im
2019-08-03  2:49 ` [PATCH rfc v2 2/6] nvme: return nvme_error_status for sync commands failure Sagi Grimberg
2019-08-04  7:57   ` Minwoo Im
2019-08-05 18:12     ` Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 3/6] nvme: make nvme_identify_ns propagate errors back Sagi Grimberg
2019-08-04  8:06   ` Minwoo Im
2019-08-05 12:39   ` Hannes Reinecke
2019-08-05 18:18     ` Sagi Grimberg
2019-08-06  6:33       ` Hannes Reinecke
2019-08-03  2:49 ` [PATCH rfc v2 4/6] nvme: make nvme_report_ns_ids propagate error back Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 5/6] nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed Sagi Grimberg
2019-08-03  2:49 ` [PATCH rfc v2 6/6] nvme: don't remove namespace if revalidate failed because of a transport error Sagi Grimberg
     [not found] ` <e04be8bf-20cb-db7d-6046-74d8ec6fa485@broadcom.com>
2019-08-05 19:32   ` Fwd: [PATCH rfc v2 0/6] nvme controller reset and namespace scan work race conditions James Smart
2019-08-05 19:48     ` Sagi Grimberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e76173c0-b398-5e57-7fe6-e943ecd7b625@grimberg.me \
    --to=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.