From: Chao Leng <lengchao@huawei.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@suse.de>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
Christoph Hellwig <hch@infradead.org>,
dm-devel@redhat.com, Ewan Milne <emilne@redhat.com>,
Keith Busch <kbusch@kernel.org>,
"Meneghini, John" <John.Meneghini@netapp.com>
Subject: Re: nvme: explicitly use normal NVMe error handling when appropriate
Date: Tue, 11 Aug 2020 14:17:34 +0800 [thread overview]
Message-ID: <07bd0219-b39b-dcec-35f0-edc22df0d2ae@huawei.com> (raw)
In-Reply-To: <20200811042000.GA22692@redhat.com>
On 2020/8/11 12:20, Mike Snitzer wrote:
> On Mon, Aug 10 2020 at 11:32pm -0400,
> Chao Leng <lengchao@huawei.com> wrote:
>
>>
>>
>> On 2020/8/11 1:22, Mike Snitzer wrote:
>>> On Mon, Aug 10 2020 at 10:36am -0400,
>>> Mike Snitzer <snitzer@redhat.com> wrote:
>>>
>>>> On Fri, Aug 07 2020 at 7:35pm -0400,
>>>> Sagi Grimberg <sagi@grimberg.me> wrote:
>>>>
>>>>>
>>>>>>> Hey Mike,
>>> ...
>>>>>> I think NVMe can easily fix this by having an earlier stage of checking,
>>>>>> e.g. nvme_local_retry_req(), that shortcircuits ever getting to
>>>>>> higher-level multipathing consideration (be it native NVMe or DM
>>>>>> multipathing) for cases like NVME_SC_CMD_INTERRUPTED.
>>>>>> To be clear: the "default" case of nvme_failover_req() that returns
>>>>>> false to fallback to NVMe's "local" normal NVMe error handling -- that
>>>>>> can stay.. but a more explicit handling of cases like
>>>>>> NVME_SC_CMD_INTERRUPTED should be added to a nvme_local_retry_req()
>>>>>> check that happens before nvme_failover_req() in nvme_complete_rq().
>>>>>
>>>>> I don't necessarily agree with having a dedicated nvme_local_retry_req().
>>>>> a request that isn't failed over, goes to local error handling (retry or
>>>>> not). I actually think that just adding the condition to
>>>>> nvme_complete_req and having nvme_failover_req reject it would work.
>>>>>
>>>>> Keith?
>>>>
>>>> I think that is basically what I'm thinking too.
>>>
>>> From: Mike Snitzer <snitzer@redhat.com>
>>> Subject: nvme: explicitly use normal NVMe error handling when appropriate
>>>
>>> Commit 764e9332098c0 ("nvme-multipath: do not reset on unknown
>>> status"), among other things, fixed NVME_SC_CMD_INTERRUPTED error
>>> handling by changing multipathing's nvme_failover_req() to short-circuit
>>> path failover and then fallback to NVMe's normal error handling (which
>>> takes care of NVME_SC_CMD_INTERRUPTED).
>>>
>>> This detour through native NVMe multipathing code is unwelcome because
>>> it prevents NVMe core from handling NVME_SC_CMD_INTERRUPTED independent
>>> of any multipathing concerns.
>>>
>>> Introduce nvme_status_needs_local_error_handling() to prioritize
>>> non-failover retry, when appropriate, in terms of normal NVMe error
>>> handling. nvme_status_needs_local_error_handling() will naturely evolve
>>> to include handling of any other errors that normal error handling must
>>> be used for.
>>>
>>> nvme_failover_req()'s ability to fallback to normal NVMe error handling
>>> has been preserved because it may be useful for future NVME_SC that
>>> nvme_status_needs_local_error_handling() hasn't yet been trained for.
>>>
>>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>>> ---
>>> drivers/nvme/host/core.c | 16 ++++++++++++++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 88cff309d8e4..be749b690af7 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -252,6 +252,16 @@ static inline bool nvme_req_needs_retry(struct request *req)
>>> return true;
>>> }
>>> +static inline bool nvme_status_needs_local_error_handling(u16 status)
>>> +{
>>> + switch (status & 0x7ff) {
>>> + case NVME_SC_CMD_INTERRUPTED:
>>> + return true;
>>> + default:
>>> + return false;
>>> + }
>>> +}
>>> +
>>> static void nvme_retry_req(struct request *req)
>>> {
>>> struct nvme_ns *ns = req->q->queuedata;
>>> @@ -270,7 +280,8 @@ static void nvme_retry_req(struct request *req)
>>> void nvme_complete_rq(struct request *req)
>>> {
>>> - blk_status_t status = nvme_error_status(nvme_req(req)->status);
>>> + u16 nvme_status = nvme_req(req)->status;
>>> + blk_status_t status = nvme_error_status(nvme_status);
>>> trace_nvme_complete_rq(req);
>>> @@ -280,7 +291,8 @@ void nvme_complete_rq(struct request *req)
>>> nvme_req(req)->ctrl->comp_seen = true;
>>> if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>>> - if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
>>> + if (!nvme_status_needs_local_error_handling(nvme_status) &&
>>> + (req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
>>
>> This looks no affect. if work with nvme multipath, now is already retry local.
>
> Not if NVMe is built without multipathing configured.
If without nvme multipathing configured, now is also retry local, do not need
!nvme_status_needs_local_error_handling(nvme_status).
>
>> If work with dm-multipath, still return error.
>
> Yes, I'm aware. Use of REQ_FAILFAST_TRANSPORT isn't something that is
> needed for NVMe, so why are you proposing hacks in NVMe to deal with it?
I just describe the possible scenarios:1.nvme multipathing configured.
2.without any multipath.3. with dm-multipath.
>
>>> return;
>>> if (!blk_queue_dying(req->q)) {
>>>
>>
>> Suggest:
>> REQ_FAILFAST_TRANSPORT may be designed for scsi, because scsi protocol
>> do not difine the local retry mechanism. SCSI implements a fuzzy local
>> retry mechanism, so need the REQ_FAILFAST_TRANSPORT for multipath
>> software, multipath software retry according error code is expected.
>> nvme is different with scsi about this. It define local retry mechanism
>> and path error code, so nvme should not care REQ_FAILFAST_TRANSPORT.
>
> Exactly. Except by "nvme should not care REQ_FAILFAST_TRANSPORT." your
> patch says you mean "nvme shouldn't disallow retry if
> REQ_FAILFAST_TRANSPORT is it". I'm saying: don't try to get such
> changes into NVMe.
no. the patch just mean: if path error, fail over to retry by multipath
(nvme multipath or dm-multipath). Other need local retry local, retry
after a defined time according to status(CRD) and CRDT. Now nvme multipath
is already do like this, the patch make dm-multipath work like nvme multipath.
>
> In general, aspects of your patch may have merit but overall it is doing
> too much.
>
> Mike
>
> .
>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-08-11 6:17 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-27 5:58 [PATCH] nvme-core: fix io interrupt when work with dm-multipah Chao Leng
2020-07-28 11:19 ` Christoph Hellwig
2020-07-29 2:54 ` Chao Leng
2020-07-29 5:59 ` Christoph Hellwig
2020-07-30 1:49 ` Chao Leng
2020-08-05 6:40 ` Chao Leng
2020-08-05 15:29 ` Keith Busch
2020-08-06 5:52 ` Chao Leng
2020-08-06 14:26 ` Keith Busch
2020-08-06 15:59 ` Meneghini, John
2020-08-06 16:17 ` Meneghini, John
2020-08-06 18:40 ` Mike Snitzer
2020-08-06 19:19 ` [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq() Mike Snitzer
2020-08-06 22:42 ` Meneghini, John
2020-08-07 0:07 ` Mike Snitzer
2020-08-07 1:21 ` Sagi Grimberg
2020-08-07 4:50 ` Mike Snitzer
2020-08-07 23:35 ` Sagi Grimberg
2020-08-08 21:08 ` Meneghini, John
2020-08-08 21:11 ` Meneghini, John
2020-08-10 14:48 ` Mike Snitzer
2020-08-11 12:54 ` Meneghini, John
2020-08-10 8:10 ` Chao Leng
2020-08-11 12:36 ` Meneghini, John
2020-08-12 7:51 ` Chao Leng
2020-08-10 14:36 ` Mike Snitzer
2020-08-10 17:22 ` [PATCH] nvme: explicitly use normal NVMe error handling when appropriate Mike Snitzer
2020-08-11 3:32 ` Chao Leng
2020-08-11 4:20 ` Mike Snitzer
2020-08-11 6:17 ` Chao Leng [this message]
2020-08-11 14:12 ` Mike Snitzer
2020-08-13 14:48 ` [RESEND PATCH] " Mike Snitzer
2020-08-13 15:29 ` Meneghini, John
2020-08-13 15:43 ` Mike Snitzer
2020-08-13 15:59 ` Meneghini, John
2020-08-13 15:36 ` Christoph Hellwig
2020-08-13 17:47 ` Mike Snitzer
2020-08-13 18:43 ` Christoph Hellwig
2020-08-13 19:03 ` Mike Snitzer
2020-08-14 4:26 ` Meneghini, John
2020-08-14 6:53 ` Sagi Grimberg
2020-08-14 6:55 ` Christoph Hellwig
2020-08-14 7:02 ` Sagi Grimberg
2020-08-14 3:23 ` Meneghini, John
2020-08-07 0:44 ` [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq() Sagi Grimberg
2020-08-10 12:43 ` Christoph Hellwig
2020-08-10 15:06 ` Mike Snitzer
2020-08-11 3:45 ` [PATCH] " Chao Leng
2020-08-07 0:03 ` [PATCH] nvme-core: fix io interrupt when work with dm-multipah Sagi Grimberg
2020-08-07 2:28 ` Chao Leng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=07bd0219-b39b-dcec-35f0-edc22df0d2ae@huawei.com \
--to=lengchao@huawei.com \
--cc=John.Meneghini@netapp.com \
--cc=dm-devel@redhat.com \
--cc=emilne@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).