archive mirror
 help / color / mirror / Atom feed
From: "Meneghini, John" <>
To: Keith Busch <>,
	"" <>,
	"" <>, "" <>
Cc: Hannes Reinecke <>,
	"Meneghini, John" <>
Subject: Re: [PATCH] nvme: Translate more status codes to blk_status_t
Date: Fri, 13 Dec 2019 07:32:12 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 12/12/19, 2:41 PM, "Meneghini, John" <> wrote:
    Let me test this out and I’ll see what happens.
Keith, I've tested this out, using CRD with both NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY.

It works well enough, but I think the problem goes a little deeper than this.

> These are not generic IO errors and should use a non-path
>  specific error so that it can use the non-failover retry path.

Yes, agreed.  But we have this problem with every/any other NVMe status that gets returned as well.
It doesn't make sense to just keep overloading the half a dozen errors you have in blk_path_error();

I think the real problem is here:

 276         if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
 277                 if ((req->cmd_flags & REQ_NVME_MPATH) &&
 278                     blk_path_error(status)) {
 279                         nvme_failover_req(req);
 280                         return;
 281                 }     
"nvme/drivers/nvme/host/core.c" line 281 of 4267 --6%-- col 3-17

If we are really not allowed to change the blk_path_error() routine because it's a part of 
the block layer, then why do we have it stuck in the middle of our multipathing policy

Maybe we should create an nvme_path_error() function to replace the blk_path_error() 
function here.

The other problem is: setting REQ_NVME_MPATH completely changes the error
error handling logic.  If my controller has a single path it happily returns all kinds
of NVMe errors not handled by the nvme_error_status() white list.  Those
errors all fall through your retry logic and end up returning  BLK_STS_IOERR.

However, as soon as we add another path to that same controller, and turn on 
REQ_NVME_MPATH, all of a sudden the controller gets a reset for returning
the very same errors that it retuned before.

And that happens before even a single retry is attempted - unless it's an NVMe pathing error.

105         default:
106                 /*
107                  * Reset the controller for any non-ANA error as we don't know
108                  * what caused the error.
109                  */
110                 nvme_reset_ctrl(ns->ctrl);
111                 break;
112         }
"nvme/drivers/nvme/host/multipath.c" line 112 of 739 --15%-- col 1-8

This makes no sense.


linux-nvme mailing list

  reply	other threads:[~2019-12-13  7:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05 19:57 [PATCH] nvme: Translate more status codes to blk_status_t Keith Busch
2019-12-12  9:20 ` Christoph Hellwig
2019-12-12 19:41 ` Meneghini, John
2019-12-13  7:32   ` Meneghini, John [this message]
2019-12-13 21:02     ` Sagi Grimberg
2019-12-16  8:02       ` Hannes Reinecke
2019-12-16 15:30         ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).