* [PATCH V2] nvme: Add support for ACRE Command Interrupted status @ 2019-11-27 19:12 Meneghini, John 2019-12-03 17:38 ` Meneghini, John 0 siblings, 1 reply; 4+ messages in thread From: Meneghini, John @ 2019-11-27 19:12 UTC (permalink / raw) To: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe, linux-block, Knight, Frederick, Keith Busch Cc: Meneghini, John I’ve rebased this patch onto nvme-5.5 and removed the NVME_SC_CMD_INTERRUPTED definition. johnm@johnm-mac-1:nvme(fix_acre) > git logl -2 * b77e5bbd8847 2019-11-27 (HEAD -> fix_acre) nvme: Add support for ACRE Command Interrupted status [ John Meneghini / johnm@netapp.com ] * 6c6aa2f26c68 2019-11-22 (origin/nvme-5.5, nvme-5.5) nvme: hwmon: add quirk to avoid changing temperature threshold [ Keith Busch / akinobu.mita@gmail.com ] And I've replaced BLK_STS_RESOURCE with BLK_STS_DEV_RESOURCE. This should make it clear to the block layer that this is not a kernel resource problem, but a device resource problem. But the block layer should only see this error when the controller runs out of request retries. johnm@johnm-mac-1:nvme(fix_acre) > cat 0001-nvme-Add-support-for-ACRE-Command-Interrupted-status.patch From 8cff18fe2f52477bfa693ee89d374e689a47a79f Mon Sep 17 00:00:00 2001 From: John Meneghini <johnm@netapp.com> Date: Wed, 27 Nov 2019 13:27:35 -0500 Subject: [PATCH] nvme: Add support for ACRE Command Interrupted status - Fixes bug in nvme_complete_rq logic introduced by Enhanced Command Retry code. According to TP-4033 when ACRE is enabled the host needs to support the Command Interrupted status. - The current code interprets Command Interrupted status as a BLK_STS_IOERR. This results in a controller reset when REQ_NVME_MPATH is set; in nvme_failover_req. Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry") Signed-off-by: John Meneghini <johnm@netapp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/nvme/host/core.c | 2 ++ include/linux/blk_types.h | 1 + 2 files changed, 3 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9696404a6182..24dc9ed1a11b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status) return BLK_STS_NEXUS; case NVME_SC_HOST_PATH_ERROR: return BLK_STS_TRANSPORT; + case NVME_SC_CMD_INTERRUPTED: + return BLK_STS_DEV_RESOURCE; default: return BLK_STS_IOERR; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index d688b96d1d63..6efee8f1b91b 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error) case BLK_STS_NEXUS: case BLK_STS_MEDIUM: case BLK_STS_PROTECTION: + case BLK_STS_DEV_RESOURCE: return false; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status 2019-11-27 19:12 [PATCH V2] nvme: Add support for ACRE Command Interrupted status Meneghini, John @ 2019-12-03 17:38 ` Meneghini, John 2019-12-03 21:00 ` Keith Busch 0 siblings, 1 reply; 4+ messages in thread From: Meneghini, John @ 2019-12-03 17:38 UTC (permalink / raw) To: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe, linux-block, Knight, Frederick, Keith Busch Cc: Meneghini, John This is an update to say that I've tested this patch and it works as expected. When the controller returns a Command Interrupted status the request is avoids nvme_failover_req() and goes down the nvme_retry_req() path where the CRD is implemented and the command is retried after a delay. If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries runs down, this results in a device resource error returned to the block layer. But I think we'll have this problem with any error. [Tue Dec 3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 4610048 [Tue Dec 3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 7112704 The alternative is to stop incrementing nvme_req(req)->retries in nvme_retry_req() when CRD is set. diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 24dc9ed1a11b..ec9794698a20 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -258,8 +258,8 @@ static void nvme_retry_req(struct request *req) crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11; if (ns && crd) delay = ns->ctrl->crdt[crd - 1] * 100; - - nvme_req(req)->retries++; + else + nvme_req(req)->retries++; blk_mq_requeue_request(req, false); blk_mq_delay_kick_requeue_list(req->q, delay); } Thoughts? /John On 11/27/19, 2:12 PM, "Meneghini, John" <John.Meneghini@netapp.com> wrote: From: John Meneghini <johnm@netapp.com> - Fixes bug in nvme_complete_rq logic introduced by Enhanced Command Retry code. According to TP-4033 when ACRE is enabled the host needs to support the Command Interrupted status. - The current code interprets Command Interrupted status as a BLK_STS_IOERR. This results in a controller reset when REQ_NVME_MPATH is set; in nvme_failover_req. Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry") Signed-off-by: John Meneghini <johnm@netapp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/nvme/host/core.c | 2 ++ include/linux/blk_types.h | 1 + 2 files changed, 3 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9696404a6182..24dc9ed1a11b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status) return BLK_STS_NEXUS; case NVME_SC_HOST_PATH_ERROR: return BLK_STS_TRANSPORT; + case NVME_SC_CMD_INTERRUPTED: + return BLK_STS_DEV_RESOURCE; default: return BLK_STS_IOERR; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index d688b96d1d63..6efee8f1b91b 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error) case BLK_STS_NEXUS: case BLK_STS_MEDIUM: case BLK_STS_PROTECTION: + case BLK_STS_DEV_RESOURCE: return false; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status 2019-12-03 17:38 ` Meneghini, John @ 2019-12-03 21:00 ` Keith Busch [not found] ` <04835a2e-1769-36c9-63b8-173f247c862b@suse.de> 0 siblings, 1 reply; 4+ messages in thread From: Keith Busch @ 2019-12-03 21:00 UTC (permalink / raw) To: Meneghini, John Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe, linux-block, Knight, Frederick On Tue, Dec 03, 2019 at 05:38:04PM +0000, Meneghini, John wrote: > This is an update to say that I've tested this patch and it works as expected. > > When the controller returns a Command Interrupted status the request is avoids nvme_failover_req() > and goes down the nvme_retry_req() path where the CRD is implemented and the command is > retried after a delay. > > If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries > runs down, this results in a device resource error returned to the block layer. But I think we'll > have this problem with any error. Why is the controller returning the same error so many times? Are we not waiting the requested delay timed? If so, the controller told us retrying should be successful. It is possible we kick the requeue list early if one command error has a valid CRD, but a subsequent retryable command does not. Is that what's happening? I'm just concerned because if we just skip counting the retry, a broken device could have the driver retry the same command indefinitely, which often leaves a task in an uninterruptible sleep state forever. > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 9696404a6182..24dc9ed1a11b 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status) > return BLK_STS_NEXUS; > case NVME_SC_HOST_PATH_ERROR: > return BLK_STS_TRANSPORT; > + case NVME_SC_CMD_INTERRUPTED: > + return BLK_STS_DEV_RESOURCE; Just for the sake of keeping this change isloted to nvme, perhaps use an existing blk_status_t value that already maps to not path error, like BLK_STS_TARGET. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <04835a2e-1769-36c9-63b8-173f247c862b@suse.de>]
* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status [not found] ` <04835a2e-1769-36c9-63b8-173f247c862b@suse.de> @ 2019-12-04 14:48 ` Meneghini, John 0 siblings, 0 replies; 4+ messages in thread From: Meneghini, John @ 2019-12-04 14:48 UTC (permalink / raw) To: Keith Busch, Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe, linux-block, Knight, Frederick Cc: Hannes Reinecke, Meneghini, John On 12/3/19 10:00 PM, Keith Busch wrote: >> If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries >> runs down, this results in a device resource error returned to the block layer. But I think we'll >> have this problem with any error. > > Why is the controller returning the same error so many times? Are we > not waiting the requested delay timed? If so, the controller told us > retrying should be successful. Yes, this a problem on the controller... but I only did this to test the pathological case. I think we can all agree that if the controller is going to continually return Command Interrupted, the controller is broken. > It is possible we kick the requeue list early if one command error > has a valid CRD, but a subsequent retryable command does not. Is that > what's happening? Yes, as Hannes said, in the current code: NVME_SC_CMD_INTERRUPTED is not handled in nvme_error_status() so it's translated as: default: return BLK_STS_IOERR; This works fine with a single controller, but when REQ_NVME_MPATH is set the code goes down the nvme_failover_req() path, which doesn't handle NVME_SC_CMD_INTERRUPTED either, and we end up with: default: /* * Reset the controller for any non-ANA error as we don't know * what caused the error. */ nvme_reset_ctrl(ns->ctrl); break; } So, the first time a controller with REQ_NVME_MPATH enabled returns NVME_SC_CMD_INTERRUPTED it gets a controller reset. > I'm just concerned because if we just skip counting the retry, a broken > device could have the driver retry the same command indefinitely, which > often leaves a task in an uninterruptible sleep state forever. No, I'm not recommending that we skip retries. My diff was not a part of this patch. I agree that it's not safe to skip retry counting. >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 9696404a6182..24dc9ed1a11b 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status) >> return BLK_STS_NEXUS; >> case NVME_SC_HOST_PATH_ERROR: >> return BLK_STS_TRANSPORT; >> + case NVME_SC_CMD_INTERRUPTED: >> + return BLK_STS_DEV_RESOURCE; > > Just for the sake of keeping this change isloted to nvme, perhaps use an > existing blk_status_t value that already maps to not path error, like > BLK_STS_TARGET. I can make that change... but I think BLK_STS_DEV_RESOURCE might be, semantically, a better choice. [BLK_STS_TARGET] = { -EREMOTEIO, "critical target" }, [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, The one use case we have for NVME_SC_CMD_INTERRUPTED in the Linux NVMe-oF target is a resource allocation failure (e.g. ENOMEM). I think Hannes came across this once while he was prototyping the ANA code in the Linux NVMe-oF target. Another potential use case in the controller might be deadlock avoidance. I was experimenting with NVME_SC_CMD_INTERRUPTED in my controller as a QOS mechanism.... but I don't think NVME_SC_CMD_INTERRUPTED /CRD is well suited for that use case. This that's how I created the pathological error case in my test. Either way, I don't think that running out of retries when NVME_SC_CMD_INTERRUPTED Is returned a critical target error. Moreover, it appears BLK_STS_TARGET is, everywhere, related to some kind of LBA range error. /John ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-12-04 14:48 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-27 19:12 [PATCH V2] nvme: Add support for ACRE Command Interrupted status Meneghini, John 2019-12-03 17:38 ` Meneghini, John 2019-12-03 21:00 ` Keith Busch [not found] ` <04835a2e-1769-36c9-63b8-173f247c862b@suse.de> 2019-12-04 14:48 ` Meneghini, John
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).