linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>,
	linux-nvme@lists.infradead.org,
	John Managhini <john.meneghini@netapp.com>
Subject: Re: [PATCH] nvme-multipath: do not reset controller on unknown status
Date: Thu, 13 Feb 2020 08:02:20 +0100	[thread overview]
Message-ID: <3345c55f-3a42-315b-1d62-20f9aaab296e@suse.de> (raw)
In-Reply-To: <6d4d18e3-c3a1-7d93-5abf-1ae46e18ca8c@grimberg.me>

On 2/12/20 8:33 PM, Sagi Grimberg wrote:
> 
> 
> On 2/12/20 9:53 AM, Christoph Hellwig wrote:
>> On Wed, Feb 12, 2020 at 02:41:40PM +0100, Hannes Reinecke wrote:
>>> We're seeing occasional controller resets during straight I/O,
>>> but only when multipath is active.
>>> The problem here is the nvme-multipath will reset the controller
>>> on every unknown status, which really is an odd behaviour, seeing
>>> that the host already received a perfectly good status; it's just
>>> that it's not smart enough to understand it.
>>> And resetting wouldn't help at all; the error status will continue
>>> to be received.
>>> So we should rather pass up any unknown error to the generic
>>> routines and let them deal with this situation.
>>
>> What unknown status do you see?
> 
> My question exactly, I don't understand what is the good status that
> the host is not smart enough to understand?
> 
> I have a vague recollection that John sent some questions in that area
> in the past...
> 
>>
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> Cc: John Managhini <john.meneghini@netapp.com>
>>> ---
>>>   drivers/nvme/host/core.c      |  4 ++--
>>>   drivers/nvme/host/multipath.c | 18 ++++++++++--------
>>>   drivers/nvme/host/nvme.h      |  2 +-
>>>   3 files changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 5dc32b72e7fa..edb081781ae7 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -293,8 +293,8 @@ void nvme_complete_rq(struct request *req)
>>>       if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>>>           if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>>               blk_path_error(status)) {
>>> -            nvme_failover_req(req);
>>> -            return;
>>> +            if (nvme_failover_req(req))
>>> +                return;
>>
>> This reads weird, why not:
>>
>>         if ((req->cmd_flags & REQ_NVME_MPATH) &&
>>             blk_path_error(status) && nvme_failover_req(req))
>>             return;
>>
>> ?
>>
>> But see below, with your updated checks (assuming this makes sense
>> as we need more explanation) we don't even need the blk_path_error
>> call.
> 
> I think it reads better that we call nvme_failover_req only for actual
> path errors.
> 
>>
>>> -void nvme_failover_req(struct request *req)
>>> +bool nvme_failover_req(struct request *req)
>>>   {
>>>       struct nvme_ns *ns = req->q->queuedata;
>>>       u16 status = nvme_req(req)->status;
>>>       unsigned long flags;
>>> +    bool handled = false;
>>>         spin_lock_irqsave(&ns->head->requeue_lock, flags);
>>>       blk_steal_bios(&ns->head->requeue_list, req);
>>>       spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
>>> -    blk_mq_end_request(req, 0);
>>
>> You can't just steal the bios without completing the request.
>>
>>>         switch (status & 0x7ff) {
>>>       case NVME_SC_ANA_TRANSITION:
>>> @@ -88,11 +88,13 @@ void nvme_failover_req(struct request *req)
>>>            * mark the the path as pending and kick of a re-read of
>>> the ANA
>>>            * log page ASAP.
>>>            */
>>> +        blk_mq_end_request(req, 0);
>>>           nvme_mpath_clear_current_path(ns);
>>>           if (ns->ctrl->ana_log_buf) {
>>>               set_bit(NVME_NS_ANA_PENDING, &ns->flags);
>>>               queue_work(nvme_wq, &ns->ctrl->ana_work);
>>>           }
>>> +        handled = true;
>>>           break;
>>>       case NVME_SC_HOST_PATH_ERROR:
>>>       case NVME_SC_HOST_ABORTED_CMD:
>>> @@ -100,18 +102,18 @@ void nvme_failover_req(struct request *req)
>>>            * Temporary transport disruption in talking to the
>>> controller.
>>>            * Try to send on a new path.
>>>            */
>>> +        blk_mq_end_request(req, 0);
>>>           nvme_mpath_clear_current_path(ns);
>>> +        handled = true;
>>>           break;
>>>       default:
>>> -        /*
>>> -         * Reset the controller for any non-ANA error as we don't know
>>> -         * what caused the error.
>>> -         */
>>> -        nvme_reset_ctrl(ns->ctrl);
>>> +        /* Delegate to common error handling */
>>>           break;
> 
> What will happen in the common case? right now it will just retry
> it on the same path, is that the desired behavior?
> 
> I guess we need to understand the phenomenon better.
> 
> Right now the code says, we don't know what went wrong here, so we
> are going to reset the controller because it acts weird, which can
> make some sense, given that the host doesn't understand what is going
> on...
> 
But this is precisely the case I'm arguing against here.
One of the lessons learned from SCSI is that reset only makes sense if
the system misbehaves and resetting it would make this error go away.

Receiving a status code which we don't know about does _not_ fall into
this category; the very fact that we have a status code proves that the
system does _not_ misbehave.

So what exactly will be resolved by resetting?
There actually is a fair chance that we'll be getting the very same
error again...

Consequently I think that resetting the system here is wrong, and we
should just handle it as a normal but unknown error.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-02-13  7:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-12 13:41 [PATCH] nvme-multipath: do not reset controller on unknown status Hannes Reinecke
2020-02-12 17:53 ` Christoph Hellwig
2020-02-12 19:33   ` Sagi Grimberg
2020-02-13  7:02     ` Hannes Reinecke [this message]
2020-02-13  7:19       ` Sagi Grimberg
2020-02-13 17:02       ` Keith Busch
2020-02-14 14:22         ` Meneghini, John
2020-02-19 15:03           ` Christoph Hellwig
2020-02-13  6:53   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3345c55f-3a42-315b-1d62-20f9aaab296e@suse.de \
    --to=hare@suse.de \
    --cc=hch@lst.de \
    --cc=john.meneghini@netapp.com \
    --cc=keith.busch@intel.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).