linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: Translate more status codes to blk_status_t
@ 2019-12-05 19:57 Keith Busch
  2019-12-12  9:20 ` Christoph Hellwig
  2019-12-12 19:41 ` Meneghini, John
  0 siblings, 2 replies; 7+ messages in thread
From: Keith Busch @ 2019-12-05 19:57 UTC (permalink / raw)
  To: linux-nvme, hch, sagi; +Cc: Keith Busch, Hannes Reinecke, John Meneghini

Decode interrupted command and not ready namespace nvme status codes to
BLK_STS_TARGET. These are not generic IO errors and should use a non-path
specific error so that it can use the non-failover retry path.

Reported-by: John Meneghini <John.Meneghini@netapp.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2a84e1402244..f1731d847e38 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -222,6 +222,8 @@ static blk_status_t nvme_error_status(u16 status)
 	case NVME_SC_CAP_EXCEEDED:
 		return BLK_STS_NOSPC;
 	case NVME_SC_LBA_RANGE:
+	case NVME_SC_CMD_INTERRUPTED:
+	case NVME_SC_NS_NOT_READY:
 		return BLK_STS_TARGET;
 	case NVME_SC_BAD_ATTRIBUTES:
 	case NVME_SC_ONCS_NOT_SUPPORTED:
-- 
2.21.0


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: Translate more status codes to blk_status_t
  2019-12-05 19:57 [PATCH] nvme: Translate more status codes to blk_status_t Keith Busch
@ 2019-12-12  9:20 ` Christoph Hellwig
  2019-12-12 19:41 ` Meneghini, John
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2019-12-12  9:20 UTC (permalink / raw)
  To: Keith Busch; +Cc: Hannes Reinecke, John Meneghini, hch, linux-nvme, sagi

On Fri, Dec 06, 2019 at 04:57:30AM +0900, Keith Busch wrote:
> Decode interrupted command and not ready namespace nvme status codes to
> BLK_STS_TARGET. These are not generic IO errors and should use a non-path
> specific error so that it can use the non-failover retry path.
> 
> Reported-by: John Meneghini <John.Meneghini@netapp.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: Translate more status codes to blk_status_t
  2019-12-05 19:57 [PATCH] nvme: Translate more status codes to blk_status_t Keith Busch
  2019-12-12  9:20 ` Christoph Hellwig
@ 2019-12-12 19:41 ` Meneghini, John
  2019-12-13  7:32   ` Meneghini, John
  1 sibling, 1 reply; 7+ messages in thread
From: Meneghini, John @ 2019-12-12 19:41 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch, sagi; +Cc: Hannes Reinecke, Meneghini, John

On 12/5/19, 2:58 PM, "Keith Busch" <kbusch@kernel.org> wrote:
    
    Decode interrupted command and not ready namespace nvme status codes to
    BLK_STS_TARGET. These are not generic IO errors and should use a non-path
    specific error so that it can use the non-failover retry path.
    
    Reported-by: John Meneghini <John.Meneghini@netapp.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    ---
     drivers/nvme/host/core.c | 2 ++
     1 file changed, 2 insertions(+)
    
    diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
    index 2a84e1402244..f1731d847e38 100644
    --- a/drivers/nvme/host/core.c
    +++ b/drivers/nvme/host/core.c
    @@ -222,6 +222,8 @@ static blk_status_t nvme_error_status(u16 status)
            case NVME_SC_CAP_EXCEEDED:
                    return BLK_STS_NOSPC;
            case NVME_SC_LBA_RANGE:
    +       case NVME_SC_CMD_INTERRUPTED:
    +       case NVME_SC_NS_NOT_READY:
                    return BLK_STS_TARGET;
            case NVME_SC_BAD_ATTRIBUTES:
            case NVME_SC_ONCS_NOT_SUPPORTED:
    --
    2.21.0
    
So... I think this will address the problem I reported in [PATCH V2] nvme: Add support for ACRE Command Interrupted status

I guess we're not concerned about overloading  BLK_STS_TARGET...  I still think using BLK_STS_DEV_RESOURCE
to handle NVME_SC_CMD_INTERRUPTED is a better idea.

[BLK_STS_TARGET]        = { -EREMOTEIO, "critical target" },
[BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },


Let me test this out and I’ll see what happens.

Thanks,

/John


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: Translate more status codes to blk_status_t
  2019-12-12 19:41 ` Meneghini, John
@ 2019-12-13  7:32   ` Meneghini, John
  2019-12-13 21:02     ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Meneghini, John @ 2019-12-13  7:32 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch, sagi; +Cc: Hannes Reinecke, Meneghini, John

On 12/12/19, 2:41 PM, "Meneghini, John" <John.Meneghini@netapp.com> wrote:
    
    Let me test this out and I’ll see what happens.
    
Keith, I've tested this out, using CRD with both NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY.

It works well enough, but I think the problem goes a little deeper than this.

> These are not generic IO errors and should use a non-path
>  specific error so that it can use the non-failover retry path.

Yes, agreed.  But we have this problem with every/any other NVMe status that gets returned as well.
It doesn't make sense to just keep overloading the half a dozen errors you have in blk_path_error();

I think the real problem is here:

 276         if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
 277                 if ((req->cmd_flags & REQ_NVME_MPATH) &&
 278                     blk_path_error(status)) {
 279                         nvme_failover_req(req);
 280                         return;
 281                 }     
"nvme/drivers/nvme/host/core.c" line 281 of 4267 --6%-- col 3-17

If we are really not allowed to change the blk_path_error() routine because it's a part of 
the block layer, then why do we have it stuck in the middle of our multipathing policy
logic?

Maybe we should create an nvme_path_error() function to replace the blk_path_error() 
function here.

The other problem is: setting REQ_NVME_MPATH completely changes the error
error handling logic.  If my controller has a single path it happily returns all kinds
of NVMe errors not handled by the nvme_error_status() white list.  Those
errors all fall through your retry logic and end up returning  BLK_STS_IOERR.

However, as soon as we add another path to that same controller, and turn on 
REQ_NVME_MPATH, all of a sudden the controller gets a reset for returning
the very same errors that it retuned before.

And that happens before even a single retry is attempted - unless it's an NVMe pathing error.

105         default:
106                 /*
107                  * Reset the controller for any non-ANA error as we don't know
108                  * what caused the error.
109                  */
110                 nvme_reset_ctrl(ns->ctrl);
111                 break;
112         }
"nvme/drivers/nvme/host/multipath.c" line 112 of 739 --15%-- col 1-8

This makes no sense.

/John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: Translate more status codes to blk_status_t
  2019-12-13  7:32   ` Meneghini, John
@ 2019-12-13 21:02     ` Sagi Grimberg
  2019-12-16  8:02       ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2019-12-13 21:02 UTC (permalink / raw)
  To: Meneghini, John, Keith Busch, linux-nvme, hch; +Cc: Hannes Reinecke


>      Let me test this out and I’ll see what happens.
>      
> Keith, I've tested this out, using CRD with both NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY.
> 
> It works well enough, but I think the problem goes a little deeper than this.
> 
>> These are not generic IO errors and should use a non-path
>>   specific error so that it can use the non-failover retry path.
> 
> Yes, agreed.  But we have this problem with every/any other NVMe status that gets returned as well.
> It doesn't make sense to just keep overloading the half a dozen errors you have in blk_path_error();
> 
> I think the real problem is here:
> 
>   276         if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>   277                 if ((req->cmd_flags & REQ_NVME_MPATH) &&
>   278                     blk_path_error(status)) {
>   279                         nvme_failover_req(req);
>   280                         return;
>   281                 }
> "nvme/drivers/nvme/host/core.c" line 281 of 4267 --6%-- col 3-17
> 
> If we are really not allowed to change the blk_path_error() routine because it's a part of
> the block layer, then why do we have it stuck in the middle of our multipathing policy
> logic?
> 
> Maybe we should create an nvme_path_error() function to replace the blk_path_error()
> function here.
> 
> The other problem is: setting REQ_NVME_MPATH completely changes the error
> error handling logic.  If my controller has a single path it happily returns all kinds
> of NVMe errors not handled by the nvme_error_status() white list.  Those
> errors all fall through your retry logic and end up returning  BLK_STS_IOERR.
> 
> However, as soon as we add another path to that same controller, and turn on
> REQ_NVME_MPATH, all of a sudden the controller gets a reset for returning
> the very same errors that it retuned before.

I agree we should lose this controller reset and only do this for
specific error cases where its needed (not thinking of any from the
top of my head).

> And that happens before even a single retry is attempted - unless it's an NVMe pathing error.
> 
> 105         default:
> 106                 /*
> 107                  * Reset the controller for any non-ANA error as we don't know
> 108                  * what caused the error.
> 109                  */
> 110                 nvme_reset_ctrl(ns->ctrl);
> 111                 break;
> 112         }
> "nvme/drivers/nvme/host/multipath.c" line 112 of 739 --15%-- col 1-8
> 
> This makes no sense.

So lets remove this reset. Have the transport take care of this, or do
it only when it is clearly needed.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: Translate more status codes to blk_status_t
  2019-12-13 21:02     ` Sagi Grimberg
@ 2019-12-16  8:02       ` Hannes Reinecke
  2019-12-16 15:30         ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2019-12-16  8:02 UTC (permalink / raw)
  To: Sagi Grimberg, Meneghini, John, Keith Busch, linux-nvme, hch

On 12/13/19 10:02 PM, Sagi Grimberg wrote:
> 
>>      Let me test this out and I’ll see what happens.
>> Keith, I've tested this out, using CRD with both 
>> NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY.
>>
>> It works well enough, but I think the problem goes a little deeper 
>> than this.
>>
>>> These are not generic IO errors and should use a non-path
>>>   specific error so that it can use the non-failover retry path.
>>
>> Yes, agreed.  But we have this problem with every/any other NVMe 
>> status that gets returned as well.
>> It doesn't make sense to just keep overloading the half a dozen errors 
>> you have in blk_path_error();
>>
>> I think the real problem is here:
>>
>>   276         if (unlikely(status != BLK_STS_OK && 
>> nvme_req_needs_retry(req))) {
>>   277                 if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>   278                     blk_path_error(status)) {
>>   279                         nvme_failover_req(req);
>>   280                         return;
>>   281                 }
>> "nvme/drivers/nvme/host/core.c" line 281 of 4267 --6%-- col 3-17
>>
>> If we are really not allowed to change the blk_path_error() routine 
>> because it's a part of
>> the block layer, then why do we have it stuck in the middle of our 
>> multipathing policy
>> logic?
>>
>> Maybe we should create an nvme_path_error() function to replace the 
>> blk_path_error()
>> function here.
>>
>> The other problem is: setting REQ_NVME_MPATH completely changes the error
>> error handling logic.  If my controller has a single path it happily 
>> returns all kinds
>> of NVMe errors not handled by the nvme_error_status() white list.  Those
>> errors all fall through your retry logic and end up returning  
>> BLK_STS_IOERR.
>>
>> However, as soon as we add another path to that same controller, and 
>> turn on
>> REQ_NVME_MPATH, all of a sudden the controller gets a reset for returning
>> the very same errors that it retuned before.
> 
> I agree we should lose this controller reset and only do this for
> specific error cases where its needed (not thinking of any from the
> top of my head).
> 
>> And that happens before even a single retry is attempted - unless it's 
>> an NVMe pathing error.
>>
>> 105         default:
>> 106                 /*
>> 107                  * Reset the controller for any non-ANA error as 
>> we don't know
>> 108                  * what caused the error.
>> 109                  */
>> 110                 nvme_reset_ctrl(ns->ctrl);
>> 111                 break;
>> 112         }
>> "nvme/drivers/nvme/host/multipath.c" line 112 of 739 --15%-- col 1-8
>>
>> This makes no sense.
> 
> So lets remove this reset. Have the transport take care of this, or do
> it only when it is clearly needed.

Yes, this would be the best option.
I never liked the reset here in the first place; I/O errors, okay, but 
reset does imply that something unrecoverable happened.
Which clearly it didn't as the controller did provide us with a reply, 
and it's just us being too daft to understand it.

So I do agree with Sagi here; we should restrict controller reset to 
real communication failures, not failed error handling.
We did that in SCSI, and it turned out to be a mess.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: Translate more status codes to blk_status_t
  2019-12-16  8:02       ` Hannes Reinecke
@ 2019-12-16 15:30         ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2019-12-16 15:30 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: hch, Sagi Grimberg, linux-nvme, Meneghini, John

On Mon, Dec 16, 2019 at 09:02:28AM +0100, Hannes Reinecke wrote:
> On 12/13/19 10:02 PM, Sagi Grimberg wrote:
> I never liked the reset here in the first place; I/O errors, okay, but reset
> does imply that something unrecoverable happened.
> Which clearly it didn't as the controller did provide us with a reply, and
> it's just us being too daft to understand it.
> 
> So I do agree with Sagi here; we should restrict controller reset to real
> communication failures, not failed error handling.
> We did that in SCSI, and it turned out to be a mess.

I also think resetting as the default is too harsh of recovery on the
first escalation to a potential path related error. If we're really having
path communication problems, the timeout work will come in to clean up,
or we can white-list some errors if we agree reset is an appropriate
action. We mentioned so very early in the native mpath development:

  https://patchwork.kernel.org/patch/9918145/#20876583

Beyond reducing the errors that trigger reset, it would be good to make
mpath also delay with ACRE's CRD values. It should work if we retype the
requeue work to a delayed_work like the below (untested) patch:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e6ee34376c5e..a39c125d6305 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -264,17 +264,11 @@ static inline bool nvme_req_needs_retry(struct request *req)
 static void nvme_retry_req(struct request *req)
 {
 	struct nvme_ns *ns = req->q->queuedata;
-	unsigned long delay = 0;
-	u16 crd;
-
-	/* The mask and shift result must be <= 3 */
-	crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
-	if (ns && crd)
-		delay = ns->ctrl->crdt[crd - 1] * 100;
+	u16 status = nvme_req(req)->status;
 
 	nvme_req(req)->retries++;
 	blk_mq_requeue_request(req, false);
-	blk_mq_delay_kick_requeue_list(req->q, delay);
+	blk_mq_delay_kick_requeue_list(req->q, nvme_retry_delay(status, ns));
 }
 
 void nvme_complete_rq(struct request *req)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 797c18337d96..40b33f2a19ad 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -111,7 +111,8 @@ void nvme_failover_req(struct request *req)
 		break;
 	}
 
-	kblockd_schedule_work(&ns->head->requeue_work);
+	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &ns->head->requeue_work,
+				    nvme_retry_delay(status, ns));
 }
 
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
@@ -121,7 +122,7 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		if (ns->head->disk)
-			kblockd_schedule_work(&ns->head->requeue_work);
+			kblockd_schedule_work(&ns->head->requeue_work.work);
 	}
 	up_read(&ctrl->namespaces_rwsem);
 }
@@ -162,7 +163,7 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		if (nvme_mpath_clear_current_path(ns))
-			kblockd_schedule_work(&ns->head->requeue_work);
+			kblockd_schedule_work(&ns->head->requeue_work.work);
 	up_read(&ctrl->namespaces_rwsem);
 	mutex_unlock(&ctrl->scan_lock);
 }
@@ -339,7 +340,7 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
 static void nvme_requeue_work(struct work_struct *work)
 {
 	struct nvme_ns_head *head =
-		container_of(work, struct nvme_ns_head, requeue_work);
+		container_of(work, struct nvme_ns_head, requeue_work.work);
 	struct bio *bio, *next;
 
 	spin_lock_irq(&head->requeue_lock);
@@ -367,7 +368,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	mutex_init(&head->lock);
 	bio_list_init(&head->requeue_list);
 	spin_lock_init(&head->requeue_lock);
-	INIT_WORK(&head->requeue_work, nvme_requeue_work);
+	INIT_DELAYED_WORK(&head->requeue_work, nvme_requeue_work);
 
 	/*
 	 * Add a multipath node if the subsystems supports multiple controllers.
@@ -432,7 +433,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
 	}
 
 	synchronize_srcu(&ns->head->srcu);
-	kblockd_schedule_work(&ns->head->requeue_work);
+	kblockd_schedule_work(&ns->head->requeue_work.work);
 }
 
 static int nvme_parse_ana_log(struct nvme_ctrl *ctrl, void *data,
@@ -680,8 +681,8 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 		del_gendisk(head->disk);
 	blk_set_queue_dying(head->disk->queue);
 	/* make sure all pending bios are cleaned up */
-	kblockd_schedule_work(&head->requeue_work);
-	flush_work(&head->requeue_work);
+	kblockd_schedule_work(&head->requeue_work.work);
+	flush_work(&head->requeue_work.work);
 	blk_cleanup_queue(head->disk->queue);
 	put_disk(head->disk);
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec7914c..929427defbcb 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -356,7 +356,7 @@ struct nvme_ns_head {
 	struct gendisk		*disk;
 	struct bio_list		requeue_list;
 	spinlock_t		requeue_lock;
-	struct work_struct	requeue_work;
+	struct delayed_work	requeue_work;
 	struct mutex		lock;
 	struct nvme_ns __rcu	*current_path[];
 #endif
@@ -393,6 +393,15 @@ struct nvme_ns {
 
 };
 
+static inline unsigned long nvme_retry_delay(u16 status, struct nvme_ns *ns)
+{
+	/* The mask and shift result must be <= 3 */
+	u16 crd = (status & NVME_SC_CRD) >> 11;
+	if (ns && crd)
+		return ns->ctrl->crdt[crd - 1] * 100;
+	return 0;
+}
+
 struct nvme_ctrl_ops {
 	const char *name;
 	struct module *module;
@@ -567,7 +576,7 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 	struct nvme_ns_head *head = ns->head;
 
 	if (head->disk && list_empty(&head->list))
-		kblockd_schedule_work(&head->requeue_work);
+		kblockd_schedule_work(&head->requeue_work.work);
 }
 
 static inline void nvme_trace_bio_complete(struct request *req,
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-16 15:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 19:57 [PATCH] nvme: Translate more status codes to blk_status_t Keith Busch
2019-12-12  9:20 ` Christoph Hellwig
2019-12-12 19:41 ` Meneghini, John
2019-12-13  7:32   ` Meneghini, John
2019-12-13 21:02     ` Sagi Grimberg
2019-12-16  8:02       ` Hannes Reinecke
2019-12-16 15:30         ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).