Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V2] nvme: Add support for ACRE Command Interrupted status
@ 2019-11-27 19:12 Meneghini, John
  2019-12-03 17:38 ` Meneghini, John
  0 siblings, 1 reply; 7+ messages in thread
From: Meneghini, John @ 2019-11-27 19:12 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe,
	linux-block, Knight, Frederick, Keith Busch
  Cc: Meneghini, John

I’ve rebased this patch onto nvme-5.5 and removed the NVME_SC_CMD_INTERRUPTED definition.

johnm@johnm-mac-1:nvme(fix_acre) > git logl -2 
* b77e5bbd8847 2019-11-27  (HEAD -> fix_acre) nvme: Add support for ACRE Command Interrupted status [ John Meneghini / johnm@netapp.com ]
* 6c6aa2f26c68 2019-11-22  (origin/nvme-5.5, nvme-5.5) nvme: hwmon: add quirk to avoid changing temperature threshold [ Keith Busch / akinobu.mita@gmail.com ]

And I've replaced BLK_STS_RESOURCE with BLK_STS_DEV_RESOURCE.  This should make it clear to the block layer that this is not a kernel resource problem,
but a device resource problem.  But the block layer should only see this error when the controller runs out of request retries.

johnm@johnm-mac-1:nvme(fix_acre) > cat 0001-nvme-Add-support-for-ACRE-Command-Interrupted-status.patch
From 8cff18fe2f52477bfa693ee89d374e689a47a79f Mon Sep 17 00:00:00 2001
From: John Meneghini <johnm@netapp.com>
Date: Wed, 27 Nov 2019 13:27:35 -0500
Subject: [PATCH] nvme: Add support for ACRE Command Interrupted status

 - Fixes bug in nvme_complete_rq logic introduced by
   Enhanced Command Retry code. According to TP-4033
   when ACRE is enabled the host needs to support
   the Command Interrupted status.
 - The current code interprets Command Interrupted status
   as a BLK_STS_IOERR. This results in a controller reset
   when REQ_NVME_MPATH is set; in nvme_failover_req.

Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry")
Signed-off-by: John Meneghini <johnm@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c  | 2 ++
 include/linux/blk_types.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9696404a6182..24dc9ed1a11b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
                return BLK_STS_NEXUS;
        case NVME_SC_HOST_PATH_ERROR:
                return BLK_STS_TRANSPORT;
+       case NVME_SC_CMD_INTERRUPTED:
+               return BLK_STS_DEV_RESOURCE;
        default:
                return BLK_STS_IOERR;
        }
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d688b96d1d63..6efee8f1b91b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error)
        case BLK_STS_NEXUS:
        case BLK_STS_MEDIUM:
        case BLK_STS_PROTECTION:
+       case BLK_STS_DEV_RESOURCE:
                return false;
        }
 
-- 
2.21.0


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-11-27 19:12 [PATCH V2] nvme: Add support for ACRE Command Interrupted status Meneghini, John
@ 2019-12-03 17:38 ` Meneghini, John
  2019-12-03 21:00   ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Meneghini, John @ 2019-12-03 17:38 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, linux-nvme, Jen Axboe,
	linux-block, Knight, Frederick, Keith Busch
  Cc: Meneghini, John

This is an update to say that I've tested this patch and it works as expected. 

When the controller returns a Command Interrupted status the request is avoids nvme_failover_req()
and goes down the nvme_retry_req() path where the CRD is implemented and the command is 
retried after a delay.

If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
runs down, this results in a device resource error returned to the block layer.  But I think we'll
have this problem with any error.  

[Tue Dec  3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 4610048
[Tue Dec  3 08:18:33 2019] print_req_error: device resource error, dev nvme0c0n1, sector 7112704

The alternative is to stop incrementing nvme_req(req)->retries in nvme_retry_req() when CRD is set.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 24dc9ed1a11b..ec9794698a20 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -258,8 +258,8 @@ static void nvme_retry_req(struct request *req)
        crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
        if (ns && crd)
                delay = ns->ctrl->crdt[crd - 1] * 100;
-
-       nvme_req(req)->retries++;
+       else
+               nvme_req(req)->retries++;
        blk_mq_requeue_request(req, false);
        blk_mq_delay_kick_requeue_list(req->q, delay);
 }

Thoughts?

/John

On 11/27/19, 2:12 PM, "Meneghini, John" <John.Meneghini@netapp.com> wrote:

    From: John Meneghini <johnm@netapp.com>

     - Fixes bug in nvme_complete_rq logic introduced by
       Enhanced Command Retry code. According to TP-4033
       when ACRE is enabled the host needs to support
       the Command Interrupted status.
     - The current code interprets Command Interrupted status
       as a BLK_STS_IOERR. This results in a controller reset
       when REQ_NVME_MPATH is set; in nvme_failover_req.
    
    Fixes: 49cd84b6f8b677e ("nvme: implement Enhanced Command Retry")
    Signed-off-by: John Meneghini <johnm@netapp.com>
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    ---
     drivers/nvme/host/core.c  | 2 ++
     include/linux/blk_types.h | 1 +
     2 files changed, 3 insertions(+)
    
    diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
    index 9696404a6182..24dc9ed1a11b 100644
    --- a/drivers/nvme/host/core.c
    +++ b/drivers/nvme/host/core.c
    @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
                    return BLK_STS_NEXUS;
            case NVME_SC_HOST_PATH_ERROR:
                    return BLK_STS_TRANSPORT;
    +       case NVME_SC_CMD_INTERRUPTED:
    +               return BLK_STS_DEV_RESOURCE;
            default:
                    return BLK_STS_IOERR;
            }
    diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
    index d688b96d1d63..6efee8f1b91b 100644
    --- a/include/linux/blk_types.h
    +++ b/include/linux/blk_types.h
    @@ -84,6 +84,7 @@ static inline bool blk_path_error(blk_status_t error)
            case BLK_STS_NEXUS:
            case BLK_STS_MEDIUM:
            case BLK_STS_PROTECTION:
    +       case BLK_STS_DEV_RESOURCE:
                    return false;
            }
     
    -- 
    2.21.0
    

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-12-03 17:38 ` Meneghini, John
@ 2019-12-03 21:00   ` Keith Busch
  2019-12-04 13:26     ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2019-12-03 21:00 UTC (permalink / raw)
  To: Meneghini, John
  Cc: Jen Axboe, Sagi Grimberg, linux-nvme, linux-block,
	Christoph Hellwig, Knight, Frederick

On Tue, Dec 03, 2019 at 05:38:04PM +0000, Meneghini, John wrote:
> This is an update to say that I've tested this patch and it works as expected. 
> 
> When the controller returns a Command Interrupted status the request is avoids nvme_failover_req()
> and goes down the nvme_retry_req() path where the CRD is implemented and the command is 
> retried after a delay.
> 
> If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
> runs down, this results in a device resource error returned to the block layer.  But I think we'll
> have this problem with any error.  

Why is the controller returning the same error so many times? Are we
not waiting the requested delay timed? If so, the controller told us
retrying should be successful.

It is possible we kick the requeue list early if one command error
has a valid CRD, but a subsequent retryable command does not. Is that
what's happening?

I'm just concerned because if we just skip counting the retry, a broken
device could have the driver retry the same command indefinitely, which
often leaves a task in an uninterruptible sleep state forever.

>     diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>     index 9696404a6182..24dc9ed1a11b 100644
>     --- a/drivers/nvme/host/core.c
>     +++ b/drivers/nvme/host/core.c
>     @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
>                     return BLK_STS_NEXUS;
>             case NVME_SC_HOST_PATH_ERROR:
>                     return BLK_STS_TRANSPORT;
>     +       case NVME_SC_CMD_INTERRUPTED:
>     +               return BLK_STS_DEV_RESOURCE;

Just for the sake of keeping this change isloted to nvme, perhaps use an
existing blk_status_t value that already maps to not path error, like
BLK_STS_TARGET.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-12-03 21:00   ` Keith Busch
@ 2019-12-04 13:26     ` Hannes Reinecke
  2019-12-04 14:48       ` Meneghini, John
  2019-12-04 16:07       ` Keith Busch
  0 siblings, 2 replies; 7+ messages in thread
From: Hannes Reinecke @ 2019-12-04 13:26 UTC (permalink / raw)
  To: linux-nvme

On 12/3/19 10:00 PM, Keith Busch wrote:
> On Tue, Dec 03, 2019 at 05:38:04PM +0000, Meneghini, John wrote:
>> This is an update to say that I've tested this patch and it works as expected. 
>>
>> When the controller returns a Command Interrupted status the request is avoids nvme_failover_req()
>> and goes down the nvme_retry_req() path where the CRD is implemented and the command is 
>> retried after a delay.
>>
>> If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
>> runs down, this results in a device resource error returned to the block layer.  But I think we'll
>> have this problem with any error.  
> 
> Why is the controller returning the same error so many times? Are we
> not waiting the requested delay timed? If so, the controller told us
> retrying should be successful.
> 
> It is possible we kick the requeue list early if one command error
> has a valid CRD, but a subsequent retryable command does not. Is that
> what's happening?
> 
No. The problem is that CRD needs to be observed _in all cases_ whenever
we get a COMMAND INTERRUPTED status.
Currently we're only observing CRD when multipathing is _not_ active,
which is wrong.

And this is what the patch is trying to achieve.

> I'm just concerned because if we just skip counting the retry, a broken
> device could have the driver retry the same command indefinitely, which
> often leaves a task in an uninterruptible sleep state forever.
> 
Oh, but we're not skipping the retries; in fact we do _enable_ retry
counting with this patch as we're skipping the nvme_failover_req()
branch and moving to nvme_retry_req().

>>     diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>     index 9696404a6182..24dc9ed1a11b 100644
>>     --- a/drivers/nvme/host/core.c
>>     +++ b/drivers/nvme/host/core.c
>>     @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
>>                     return BLK_STS_NEXUS;
>>             case NVME_SC_HOST_PATH_ERROR:
>>                     return BLK_STS_TRANSPORT;
>>     +       case NVME_SC_CMD_INTERRUPTED:
>>     +               return BLK_STS_DEV_RESOURCE;
> 
> Just for the sake of keeping this change isloted to nvme, perhaps use an
> existing blk_status_t value that already maps to not path error, like
> BLK_STS_TARGET.
> 
I don't mind; whichever works for you.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-12-04 13:26     ` Hannes Reinecke
@ 2019-12-04 14:48       ` Meneghini, John
  2019-12-04 16:07       ` Keith Busch
  1 sibling, 0 replies; 7+ messages in thread
From: Meneghini, John @ 2019-12-04 14:48 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Jen Axboe, linux-block, Knight, Frederick
  Cc: Hannes Reinecke, Meneghini, John

    On 12/3/19 10:00 PM, Keith Busch wrote:
    >> If the controllers returns Command Interrupted too many times, and nvme_req(req)->retries
    >> runs down, this results in a device resource error returned to the block layer.  But I think we'll
    >> have this problem with any error.
    >
    > Why is the controller returning the same error so many times? Are we
    > not waiting the requested delay timed? If so, the controller told us
    > retrying should be successful.

Yes, this a problem on the controller... but I only did this to test the pathological case.
I think we can all agree that if the controller is going to continually return Command
Interrupted, the controller is broken.

    > It is possible we kick the requeue list early if one command error
    > has a valid CRD, but a subsequent retryable command does not. Is that
    > what's happening?

Yes, as Hannes said, in the current code: NVME_SC_CMD_INTERRUPTED is not
handled in nvme_error_status() so it's translated as:

        default:
                return BLK_STS_IOERR;
 
This works fine with a single controller,  but when REQ_NVME_MPATH
is set the code goes down the nvme_failover_req() path, which
doesn't handle NVME_SC_CMD_INTERRUPTED either, and we end up
with:

       default:
                /*
                 * Reset the controller for any non-ANA error as we don't know
                 * what caused the error.
                 */
                nvme_reset_ctrl(ns->ctrl);
                break;
         }
    
So, the first time a controller with REQ_NVME_MPATH enabled returns
NVME_SC_CMD_INTERRUPTED it gets a controller reset.
    
    > I'm just concerned because if we just skip counting the retry, a broken
    > device could have the driver retry the same command indefinitely, which
    > often leaves a task in an uninterruptible sleep state forever.

No, I'm not recommending that we skip retries.  My diff was not a part
of this patch.  I agree that it's not safe to skip retry counting.

    >>     diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
    >>     index 9696404a6182..24dc9ed1a11b 100644
    >>     --- a/drivers/nvme/host/core.c
    >>     +++ b/drivers/nvme/host/core.c
    >>     @@ -230,6 +230,8 @@ static blk_status_t nvme_error_status(u16 status)
    >>                     return BLK_STS_NEXUS;
    >>             case NVME_SC_HOST_PATH_ERROR:
    >>                     return BLK_STS_TRANSPORT;
    >>     +       case NVME_SC_CMD_INTERRUPTED:
    >>     +               return BLK_STS_DEV_RESOURCE;
    >
    > Just for the sake of keeping this change isloted to nvme, perhaps use an
    > existing blk_status_t value that already maps to not path error, like
    > BLK_STS_TARGET.

I can make that change... but I think BLK_STS_DEV_RESOURCE might
be, semantically, a better choice.

[BLK_STS_TARGET]        = { -EREMOTEIO, "critical target" },
[BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },

The one use case we have for NVME_SC_CMD_INTERRUPTED in the Linux NVMe-oF target
is a resource allocation failure (e.g. ENOMEM).  I think Hannes came across this once 
while he was prototyping the ANA code in the Linux NVMe-oF target.

Another potential use case in the controller might be deadlock avoidance.

I was experimenting with NVME_SC_CMD_INTERRUPTED in my controller as a 
QOS mechanism.... but I don't think NVME_SC_CMD_INTERRUPTED /CRD is well suited
for that use case.  This that's how I created the pathological error case in my test.

Either way, I don't think that running out of retries when NVME_SC_CMD_INTERRUPTED
Is returned a critical target error. Moreover, it appears BLK_STS_TARGET is, everywhere,
related to some kind of LBA range error.

/John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-12-04 13:26     ` Hannes Reinecke
  2019-12-04 14:48       ` Meneghini, John
@ 2019-12-04 16:07       ` Keith Busch
  2019-12-04 18:33         ` Meneghini, John
  1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2019-12-04 16:07 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme

On Wed, Dec 04, 2019 at 02:26:06PM +0100, Hannes Reinecke wrote:
> > Why is the controller returning the same error so many times? Are we
> > not waiting the requested delay timed? If so, the controller told us
> > retrying should be successful.
> > 
> > It is possible we kick the requeue list early if one command error
> > has a valid CRD, but a subsequent retryable command does not. Is that
> > what's happening?
> > 
> No. The problem is that CRD needs to be observed _in all cases_ whenever
> we get a COMMAND INTERRUPTED status.
> Currently we're only observing CRD when multipathing is _not_ active,
> which is wrong.
> 
> And this is what the patch is trying to achieve.

Right, I totally got that part.

I just don't understand this follow-up from John that I cropped it out
of my original reply:

> The alternative is to stop incrementing nvme_req(req)->retries in nvme_retry_req() when CRD is set.
>
>  @@ -258,8 +258,8 @@ static void nvme_retry_req(struct request *req)
>        crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
>        if (ns && crd)
>                delay = ns->ctrl->crdt[crd - 1] * 100;
>-
>-       nvme_req(req)->retries++;
>+       else
>+               nvme_req(req)->retries++;

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] nvme: Add support for ACRE Command Interrupted status
  2019-12-04 16:07       ` Keith Busch
@ 2019-12-04 18:33         ` Meneghini, John
  0 siblings, 0 replies; 7+ messages in thread
From: Meneghini, John @ 2019-12-04 18:33 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke; +Cc: linux-nvme, Meneghini, John

On 12/4/19, 11:12 AM, "linux-nvme on behalf of Keith Busch" <linux-nvme-bounces@lists.infradead.org on behalf of kbusch@kernel.org> wrote:
    
    I just don't understand this follow-up from John that I cropped it out
    of my original reply:
    
    > The alternative is to stop incrementing nvme_req(req)->retries in nvme_retry_req() when CRD is set.
    >
    >  @@ -258,8 +258,8 @@ static void nvme_retry_req(struct request *req)
    >        crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
    >        if (ns && crd)
    >                delay = ns->ctrl->crdt[crd - 1] * 100;
    >-
    >-       nvme_req(req)->retries++;
    >+       else
    >+               nvme_req(req)->retries++;

Sorry for the confusion Keith.  This was not a part of my patch.
Just a diff to illustrate what would need to be done to handle
NVME_SC_CMD_INTERRUPTED internally in the nvme layer...
so that it never returned an error to the block layer.

I agree this is not the right thing to do.

/John  
 
    _______________________________________________
    linux-nvme mailing list
    linux-nvme@lists.infradead.org
    http://lists.infradead.org/mailman/listinfo/linux-nvme
    

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27 19:12 [PATCH V2] nvme: Add support for ACRE Command Interrupted status Meneghini, John
2019-12-03 17:38 ` Meneghini, John
2019-12-03 21:00   ` Keith Busch
2019-12-04 13:26     ` Hannes Reinecke
2019-12-04 14:48       ` Meneghini, John
2019-12-04 16:07       ` Keith Busch
2019-12-04 18:33         ` Meneghini, John

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git