From: Sagi Grimberg <sagi@grimberg.me>
To: Mike Snitzer <snitzer@redhat.com>,
"Meneghini, John" <John.Meneghini@netapp.com>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
Christoph Hellwig <hch@infradead.org>,
dm-devel@redhat.com, Ewan Milne <emilne@redhat.com>,
Chao Leng <lengchao@huawei.com>, Keith Busch <kbusch@kernel.org>,
Hannes Reinecke <hare@suse.de>
Subject: Re: nvme: restore use of blk_path_error() in nvme_complete_rq()
Date: Thu, 6 Aug 2020 18:21:03 -0700 [thread overview]
Message-ID: <510f5aff-0437-b1ce-f7ab-c812edbea880@grimberg.me> (raw)
In-Reply-To: <20200807000755.GA28957@redhat.com>
Hey Mike,
>> The point is: blk_path_error() has nothing to do with NVMe errors.
>> This is dm-multipath logic stuck in the middle of the NVMe error
>> handling code.
>
> No, it is a means to have multiple subsystems (to this point both SCSI
> and NVMe) doing the correct thing when translating subsystem specific
> error codes to BLK_STS codes.
Not exactly, don't find any use of this in scsi. The purpose is to make
sure that nvme and dm speak the same language.
> If you, or others you name drop below, understood the point we wouldn't
> be having this conversation. You'd accept the point of blk_path_error()
> to be valid and required codification of what constitutes a retryable
> path error for the Linux block layer.
This incident is a case where the specific nvme status was designed
to retry on the same path respecting the controller retry delay.
And because nvme used blk_path_error at the time it caused us to use a
non-retryable status to get around that. Granted, no one had
dm-multipath in mind.
So in a sense, there is consensus on changing patch 35038bffa87da
_because_ nvme no longer uses blk_path_error. Otherwise it would break.
> Any BLK_STS mapping of NVMe specific error codes would need to not screw
> up by categorizing a retryable error as non-retryable (and vice-versa).
But it is a special type of retryable. There is nothing that fits the
semantics of the current behavior.
> Again, assuming proper testing, commit 35038bffa87 wouldn't have made it
> upstream if NVMe still used blk_path_error().
Agree.
> Yes, your commit 764e9332098c0 needlessly removed NVMe's use of
> blk_path_error(). If you're saying it wasn't needless please explain
> why.
>
>> Fixes: 764e9332098c0 ("nvme-multipath: do not reset on unknown status")
>> Cc: stable@vger.kerneel.org
>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>> ---
>> drivers/nvme/host/core.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 6585d57112ad..072f629da4d8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -290,8 +290,13 @@ void nvme_complete_rq(struct request *req)
>> nvme_req(req)->ctrl->comp_seen = true;
>>
>> if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
>> - if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
>> - return;
>> + if (blk_path_error(status)) {
>> + if (req->cmd_flags & REQ_NVME_MPATH) {
>> + if (nvme_failover_req(req))
>> + return;
>> + /* fallthru to normal error handling */
>> + }
>> + }
>>
>> This would basically undo the patch Hannes, Christoph, and I put together in
>> commit 764e9332098c0. This patch greatly simplified and improved the
>> whole nvme_complete_rq() code path, and I don't support undoing that change.
>
> Please elaborate on how I've undone anything?
I think you're right, this hasn't undone the patch from John, but it
breaks NVME_SC_CMD_INTERRUPTED error handling behavior.
> The only thing I may have done is forced NVMe to take more care about
> properly translating its NVME_SC to BLK_STS in nvme_error_status().
> Which is a good thing.
I don't think there is an issue here of mistakenly converting an nvme
status code to a wrong block status code. This conversion is there
because there is no block status that give us the semantics we need
which is apparently specific to nvme.
I personally don't mind restoring blk_path_error to nvme, I don't
particularly feel strong about it either. But for sure blk_path_error
needs to first provide the semantics needed for NVME_SC_CMD_INTERRUPTED.
...
> Anyway, no new BLK_STS is needed at this point. More discipline with
> how NVMe's error handling is changed is.
Please read the above.
> If NVMe wants to ensure its
> interface isn't broken regularly it _should_ use blk_path_error() to
> validate future nvme_error_status() changes. Miscategorizing NVMe
> errors to upper layers is a bug -- not open for debate.
Again, don't agree is a Miscategorization nor a bug, its just something
that is NVMe specific.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-08-07 1:21 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-27 5:58 [PATCH] nvme-core: fix io interrupt when work with dm-multipah Chao Leng
2020-07-28 11:19 ` Christoph Hellwig
2020-07-29 2:54 ` Chao Leng
2020-07-29 5:59 ` Christoph Hellwig
2020-07-30 1:49 ` Chao Leng
2020-08-05 6:40 ` Chao Leng
2020-08-05 15:29 ` Keith Busch
2020-08-06 5:52 ` Chao Leng
2020-08-06 14:26 ` Keith Busch
2020-08-06 15:59 ` Meneghini, John
2020-08-06 16:17 ` Meneghini, John
2020-08-06 18:40 ` Mike Snitzer
2020-08-06 19:19 ` [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq() Mike Snitzer
2020-08-06 22:42 ` Meneghini, John
2020-08-07 0:07 ` Mike Snitzer
2020-08-07 1:21 ` Sagi Grimberg [this message]
2020-08-07 4:50 ` Mike Snitzer
2020-08-07 23:35 ` Sagi Grimberg
2020-08-08 21:08 ` Meneghini, John
2020-08-08 21:11 ` Meneghini, John
2020-08-10 14:48 ` Mike Snitzer
2020-08-11 12:54 ` Meneghini, John
2020-08-10 8:10 ` Chao Leng
2020-08-11 12:36 ` Meneghini, John
2020-08-12 7:51 ` Chao Leng
2020-08-10 14:36 ` Mike Snitzer
2020-08-10 17:22 ` [PATCH] nvme: explicitly use normal NVMe error handling when appropriate Mike Snitzer
2020-08-11 3:32 ` Chao Leng
2020-08-11 4:20 ` Mike Snitzer
2020-08-11 6:17 ` Chao Leng
2020-08-11 14:12 ` Mike Snitzer
2020-08-13 14:48 ` [RESEND PATCH] " Mike Snitzer
2020-08-13 15:29 ` Meneghini, John
2020-08-13 15:43 ` Mike Snitzer
2020-08-13 15:59 ` Meneghini, John
2020-08-13 15:36 ` Christoph Hellwig
2020-08-13 17:47 ` Mike Snitzer
2020-08-13 18:43 ` Christoph Hellwig
2020-08-13 19:03 ` Mike Snitzer
2020-08-14 4:26 ` Meneghini, John
2020-08-14 6:53 ` Sagi Grimberg
2020-08-14 6:55 ` Christoph Hellwig
2020-08-14 7:02 ` Sagi Grimberg
2020-08-14 3:23 ` Meneghini, John
2020-08-07 0:44 ` [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq() Sagi Grimberg
2020-08-10 12:43 ` Christoph Hellwig
2020-08-10 15:06 ` Mike Snitzer
2020-08-11 3:45 ` [PATCH] " Chao Leng
2020-08-07 0:03 ` [PATCH] nvme-core: fix io interrupt when work with dm-multipah Sagi Grimberg
2020-08-07 2:28 ` Chao Leng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=510f5aff-0437-b1ce-f7ab-c812edbea880@grimberg.me \
--to=sagi@grimberg.me \
--cc=John.Meneghini@netapp.com \
--cc=dm-devel@redhat.com \
--cc=emilne@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=kbusch@kernel.org \
--cc=lengchao@huawei.com \
--cc=linux-nvme@lists.infradead.org \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).