Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: "Meneghini, John" <John.Meneghini@netapp.com>
To: Keith Busch <kbusch@kernel.org>, Hannes Reinecke <hare@suse.de>
Cc: Keith Busch <keith.busch@intel.com>,
	"Meneghini, John" <John.Meneghini@netapp.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] nvme-multipath: do not reset controller on unknown status
Date: Fri, 14 Feb 2020 14:22:12 +0000
Message-ID: <0BF304E1-CD61-4D46-AF0C-51B3FB84CD32@netapp.com> (raw)
In-Reply-To: <20200213170258.GC7634@redsun51.ssa.fujisawa.hgst.com>

This is exactly the problem I discussed in the last patch I sent out (maybe my only patch) to fix ACRE.

Kieth and I also "discussed" this topic at ALPSS.

At issue here is: in the current code, as soon as (req->cmd_flags & REQ_NVME_MPATH) is set the error handling in
nvme_complete_rq() completely changes. This means that a single port subsystem will receive one kind of error
handling treatment when it presents one subsystem port, and then a completely different treatment when
presenting more than one subsystem port (assuming namespace sharing).

I think the error handling on the host should be exactly identical between (REQ_NVME_MPATH)  and (~REQ_NVME_MPATH)
- modulo the path-related NVMe errors. 

        /*
         * Path-related Errors: 
         */
        NVME_SC_ANA_PERSISTENT_LOSS     = 0x301,
        NVME_SC_ANA_INACCESSIBLE        = 0x302,
        NVME_SC_ANA_TRANSITION          = 0x303,
        NVME_SC_HOST_PATH_ERROR         = 0x370,
        NVME_SC_HOST_ABORTED_CMD        = 0x371,

We can't have one error handling policy for namespace sharing and another for non-shared namespaces.

Resetting the controller like this only leads to a fatal embrace.

> I agree, the types of issues a reset may resolve don't seem applicable
> if we're actually getting response. Is there even a single defined NVMe
> status code where a reset would be an appropriate escalation?

I think, in all cases, errors that are not handled by nvme_retry_req(req) return
BLK_STS_IOERR.  This is what happens when REQ_NVME_MPATH is false, so I think
multipath controllers should get the same treatment.

/John

On 2/13/20, 12:03 PM, "Keith Busch" <kbusch@kernel.org> wrote:    
    
    On Thu, Feb 13, 2020 at 08:02:20AM +0100, Hannes Reinecke wrote:
    > But this is precisely the case I'm arguing against here.
    > One of the lessons learned from SCSI is that reset only makes sense if
    > the system misbehaves and resetting it would make this error go away.
    >
    > Receiving a status code which we don't know about does _not_ fall into
    > this category; the very fact that we have a status code proves that the
    > system does _not_ misbehave.
    >
    > So what exactly will be resolved by resetting?
    > There actually is a fair chance that we'll be getting the very same
    > error again...
    
    I agree, the types of issues a reset may resolve don't seem applicable
    if we're actually getting response. Is there even a single defined NVMe
    status code where a reset would be an appropriate escalation?
    
    

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-12 13:41 Hannes Reinecke
2020-02-12 17:53 ` Christoph Hellwig
2020-02-12 19:33   ` Sagi Grimberg
2020-02-13  7:02     ` Hannes Reinecke
2020-02-13  7:19       ` Sagi Grimberg
2020-02-13 17:02       ` Keith Busch
2020-02-14 14:22         ` Meneghini, John [this message]
2020-02-19 15:03           ` Christoph Hellwig
2020-02-13  6:53   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0BF304E1-CD61-4D46-AF0C-51B3FB84CD32@netapp.com \
    --to=john.meneghini@netapp.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git