linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] nvme: Add support for ACRE Command Interrupted status
@ 2019-11-27 19:12 Meneghini, John
  2019-12-03 17:38 ` Meneghini, John
  0 siblings, 1 reply; 4+ messages in thread
From: Meneghini, John @ 2019-11-27 19:12 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe,
	linux-block, Knight, Frederick, Keith Busch
  Cc: Meneghini, John

I’ve rebased this patch onto nvme-5.5 and removed the NVME_SC_CMD_INTERRUPTED definition.

johnm@johnm-mac-1:nvme(fix_acre) > git logl -2 
* b77e5bbd8847 2019-11-27  (HEAD -> fix_acre) nvme: Add support for ACRE Command Interrupted status [ John Meneghini / johnm@netapp.com ]
* 6c6aa2f26c68 2019-11-22  (origin/nvme-5.5, nvme-5.5) nvme: hwmon: add quirk to avoid changing temperature threshold [ Keith Busch / akinobu.mita@gmail.com ]

And I've replaced BLK_STS_RESOURCE with BLK_STS_DEV_RESOURCE.  This should make it clear to the block layer that this is not a kernel resource problem,
but a device resource problem.  But the block layer should only see this error when the controller runs out of request retries.

johnm@johnm-mac-1:nvme(fix_acre) > cat 0001-nvme-Add-support-for-ACRE-Command-Interrupted-status.patch
From 8cff18fe2f52477bfa693ee89d374e689a47a79f Mon Sep 17 00:00:00 2001
From: John Meneghini <johnm@netapp.com>
Date: Wed, 27 Nov 2019 13:27:35 -0500
Subject: [PATCH] nvme: Add support for ACRE Command Interrupted status

 - Fixes bug in nvme_complete_rq logic introduced by
   Enhanced Command Retry code. According to TP-4033
   when ACRE is enabled the host needs to support
   the Command Interrupted status.
 - The current code interprets Command Interrupted status
   as a BLK_STS_IOERR. This results in a controller reset
   when REQ_NVME_MPATH is set; in nvme_failover_req.

Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry")
Signed-off-by: John Meneghini <johnm@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c  | 2 ++
 include/linux/blk_types.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9696404a6182..24dc9ed1a11b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
                return BLK_STS_NEXUS;
        case NVME_SC_HOST_PATH_ERROR:
                return BLK_STS_TRANSPORT;
+       case NVME_SC_CMD_INTERRUPTED:
+               return BLK_STS_DEV_RESOURCE;
        default:
                return BLK_STS_IOERR;
        }
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d688b96d1d63..6efee8f1b91b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error)
        case BLK_STS_NEXUS:
        case BLK_STS_MEDIUM:
        case BLK_STS_PROTECTION:
+       case BLK_STS_DEV_RESOURCE:
                return false;
        }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-11-27 19:12 [PATCH V2] nvme: Add support for ACRE Command Interrupted status Meneghini, John
@ 2019-12-03 17:38 ` Meneghini, John
  2019-12-03 21:00   ` Keith Busch
  0 siblings, 1 reply; 4+ messages in thread
From: Meneghini, John @ 2019-12-03 17:38 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe,
	linux-block, Knight, Frederick, Keith Busch
  Cc: Meneghini, John

This is an update to say that I've tested this patch and it works as expected. 

When the controller returns a Command Interrupted status the request is avoids nvme_failover_req()
and goes down the nvme_retry_req() path where the CRD is implemented and the command is 
retried after a delay.

If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
runs down, this results in a device resource error returned to the block layer.  But I think we'll
have this problem with any error.  

[Tue Dec  3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 4610048
[Tue Dec  3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 7112704

The alternative is to stop incrementing nvme_req(req)->retries in nvme_retry_req() when CRD is set.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 24dc9ed1a11b..ec9794698a20 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -258,8 +258,8 @@ static void nvme_retry_req(struct request *req)
        crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
        if (ns && crd)
                delay = ns->ctrl->crdt[crd - 1] * 100;
-
-       nvme_req(req)->retries++;
+       else
+               nvme_req(req)->retries++;
        blk_mq_requeue_request(req, false);
        blk_mq_delay_kick_requeue_list(req->q, delay);
 }

Thoughts?

/John

On 11/27/19, 2:12 PM, "Meneghini, John" <John.Meneghini@netapp.com> wrote:

    From: John Meneghini <johnm@netapp.com>

     - Fixes bug in nvme_complete_rq logic introduced by
       Enhanced Command Retry code. According to TP-4033
       when ACRE is enabled the host needs to support
       the Command Interrupted status.
     - The current code interprets Command Interrupted status
       as a BLK_STS_IOERR. This results in a controller reset
       when REQ_NVME_MPATH is set; in nvme_failover_req.
    
    Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry")
    Signed-off-by: John Meneghini <johnm@netapp.com>
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    ---
     drivers/nvme/host/core.c  | 2 ++
     include/linux/blk_types.h | 1 +
     2 files changed, 3 insertions(+)
    
    diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
    index 9696404a6182..24dc9ed1a11b 100644
    --- a/drivers/nvme/host/core.c
    +++ b/drivers/nvme/host/core.c
    @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
                    return BLK_STS_NEXUS;
            case NVME_SC_HOST_PATH_ERROR:
                    return BLK_STS_TRANSPORT;
    +       case NVME_SC_CMD_INTERRUPTED:
    +               return BLK_STS_DEV_RESOURCE;
            default:
                    return BLK_STS_IOERR;
            }
    diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
    index d688b96d1d63..6efee8f1b91b 100644
    --- a/include/linux/blk_types.h
    +++ b/include/linux/blk_types.h
    @@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error)
            case BLK_STS_NEXUS:
            case BLK_STS_MEDIUM:
            case BLK_STS_PROTECTION:
    +       case BLK_STS_DEV_RESOURCE:
                    return false;
            }
     
    -- 
    2.21.0
    


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-12-03 17:38 ` Meneghini, John
@ 2019-12-03 21:00   ` Keith Busch
       [not found]     ` <04835a2e-1769-36c9-63b8-173f247c862b@suse.de>
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2019-12-03 21:00 UTC (permalink / raw)
  To: Meneghini, John
  Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe,
	linux-block, Knight, Frederick

On Tue, Dec 03, 2019 at 05:38:04PM +0000, Meneghini, John wrote:
> This is an update to say that I've tested this patch and it works as expected. 
> 
> When the controller returns a Command Interrupted status the request is avoids nvme_failover_req()
> and goes down the nvme_retry_req() path where the CRD is implemented and the command is 
> retried after a delay.
> 
> If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
> runs down, this results in a device resource error returned to the block layer.  But I think we'll
> have this problem with any error.  

Why is the controller returning the same error so many times? Are we
not waiting the requested delay timed? If so, the controller told us
retrying should be successful.

It is possible we kick the requeue list early if one command error
has a valid CRD, but a subsequent retryable command does not. Is that
what's happening?

I'm just concerned because if we just skip counting the retry, a broken
device could have the driver retry the same command indefinitely, which
often leaves a task in an uninterruptible sleep state forever.

>     diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>     index 9696404a6182..24dc9ed1a11b 100644
>     --- a/drivers/nvme/host/core.c
>     +++ b/drivers/nvme/host/core.c
>     @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
>                     return BLK_STS_NEXUS;
>             case NVME_SC_HOST_PATH_ERROR:
>                     return BLK_STS_TRANSPORT;
>     +       case NVME_SC_CMD_INTERRUPTED:
>     +               return BLK_STS_DEV_RESOURCE;

Just for the sake of keeping this change isloted to nvme, perhaps use an
existing blk_status_t value that already maps to not path error, like
BLK_STS_TARGET.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
       [not found]     ` <04835a2e-1769-36c9-63b8-173f247c862b@suse.de>
@ 2019-12-04 14:48       ` Meneghini, John
  0 siblings, 0 replies; 4+ messages in thread
From: Meneghini, John @ 2019-12-04 14:48 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Jen Axboe, linux-block, Knight, Frederick
  Cc: Hannes Reinecke, Meneghini, John

    On 12/3/19 10:00 PM, Keith Busch wrote:
    >> If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
    >> runs down, this results in a device resource error returned to the block layer.  But I think we'll
    >> have this problem with any error.
    >
    > Why is the controller returning the same error so many times? Are we
    > not waiting the requested delay timed? If so, the controller told us
    > retrying should be successful.

Yes, this a problem on the controller... but I only did this to test the pathological case.
I think we can all agree that if the controller is going to continually return Command
Interrupted, the controller is broken.

    > It is possible we kick the requeue list early if one command error
    > has a valid CRD, but a subsequent retryable command does not. Is that
    > what's happening?

Yes, as Hannes said, in the current code: NVME_SC_CMD_INTERRUPTED is not
handled in nvme_error_status() so it's translated as:

        default:
                return BLK_STS_IOERR;
 
This works fine with a single controller,  but when REQ_NVME_MPATH
is set the code goes down the nvme_failover_req() path, which
doesn't handle NVME_SC_CMD_INTERRUPTED either, and we end up
with:

       default:
                /*
                 * Reset the controller for any non-ANA error as we don't know
                 * what caused the error.
                 */
                nvme_reset_ctrl(ns->ctrl);
                break;
         }
    
So, the first time a controller with REQ_NVME_MPATH enabled returns
NVME_SC_CMD_INTERRUPTED it gets a controller reset.
    
    > I'm just concerned because if we just skip counting the retry, a broken
    > device could have the driver retry the same command indefinitely, which
    > often leaves a task in an uninterruptible sleep state forever.

No, I'm not recommending that we skip retries.  My diff was not a part
of this patch.  I agree that it's not safe to skip retry counting.

    >>     diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
    >>     index 9696404a6182..24dc9ed1a11b 100644
    >>     --- a/drivers/nvme/host/core.c
    >>     +++ b/drivers/nvme/host/core.c
    >>     @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
    >>                     return BLK_STS_NEXUS;
    >>             case NVME_SC_HOST_PATH_ERROR:
    >>                     return BLK_STS_TRANSPORT;
    >>     +       case NVME_SC_CMD_INTERRUPTED:
    >>     +               return BLK_STS_DEV_RESOURCE;
    >
    > Just for the sake of keeping this change isloted to nvme, perhaps use an
    > existing blk_status_t value that already maps to not path error, like
    > BLK_STS_TARGET.

I can make that change... but I think BLK_STS_DEV_RESOURCE might
be, semantically, a better choice.

[BLK_STS_TARGET]        = { -EREMOTEIO, "critical target" },
[BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },

The one use case we have for NVME_SC_CMD_INTERRUPTED in the Linux NVMe-oF target
is a resource allocation failure (e.g. ENOMEM).  I think Hannes came across this once 
while he was prototyping the ANA code in the Linux NVMe-oF target.

Another potential use case in the controller might be deadlock avoidance.

I was experimenting with NVME_SC_CMD_INTERRUPTED in my controller as a 
QOS mechanism.... but I don't think NVME_SC_CMD_INTERRUPTED /CRD is well suited
for that use case.  This that's how I created the pathological error case in my test.

Either way, I don't think that running out of retries when NVME_SC_CMD_INTERRUPTED
Is returned a critical target error. Moreover, it appears BLK_STS_TARGET is, everywhere,
related to some kind of LBA range error.

/John


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-04 14:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27 19:12 [PATCH V2] nvme: Add support for ACRE Command Interrupted status Meneghini, John
2019-12-03 17:38 ` Meneghini, John
2019-12-03 21:00   ` Keith Busch
     [not found]     ` <04835a2e-1769-36c9-63b8-173f247c862b@suse.de>
2019-12-04 14:48       ` Meneghini, John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).