linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: James Smart <james.smart@broadcom.com>,
	Chao Leng <lengchao@huawei.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH] nvme-fabrics: allow to queue requests for live queues
Date: Tue, 28 Jul 2020 16:39:26 -0700	[thread overview]
Message-ID: <8c96aa2a-89e3-b8e5-8506-f489aa9a69f1@grimberg.me> (raw)
In-Reply-To: <520f2317-533d-b546-dd35-80ae5f707aa7@broadcom.com>

>>>> Can you please give an example? NVME_REQ_USERCMD should not be any
>>>> different from any other type of I/O.
>>> ...
>>
>> The problem I have with this code is that checking for NVME_REQ_USERCMD
>> is a big hammer that you land on what can be normal I/O
>> (e.g. nvme read/write/write-zeros), and we must not rely on this
>> indication to prevent what you are describing.
>>
>> It maybe (just maybe) OK to check NVME_REQ_USERCMD only for the admin
>> queue, but we are really circling over the fact that we cannot reliably
>> send admin connect before to go off first, because we have to unquiesce
>> it before we issue the admin connect, and there might be a stray command
>> going into execution before we submit the admin connect.
> 
> agree.
> 
> Didn't you suggest something separate to isolate the connect 
> commands/init commands ?   Can we use a different tag set for 
> "initialization" commands?  We can use different queue_rq routines with 
> different check ready checks.  I don't like the overhead, but sure makes 
> the logic more straightforward.

I think I've added this now that I'm looking back:
e7832cb48a65 ("nvme: make fabrics command run on a separate request queue")

But it still doesn't solve the user commands you are referring to.

>>>>> As for the blk_rq_is_passthrough check - I guess I can see it being 
>>>>> based on the queue state, and the check looks ok  (we should never 
>>>>> see !blk_rq_is_passthrough on the admin q).
>>>>> But...
>>>>> - I don't know why it was that important to change it. On the 
>>>>> connecting path, all you're doing is letting io start flowing 
>>>>> before all the queues have been created.  Did you really need to 
>>>>> start that much sooner ?
>>>>
>>>> The issue is that controller in RESETTING state will have requests that
>>>> are being issued, and if we don't let it pass through, it will hang
>>>> around forever being requeued preventing queue freeze to complete.
>>>
> 
> Isn't this the real problem:  we require the request to complete in 
> order to get off the request queue in order to freeze. But to complete - 
> it has to finish with some status. This is ok if running multipath, as 
> mpath queues and retries - but what mpath is not running ? The issuer of 
> the io will get the error ?

We already fail noretry errors when we cancel (or abort) anything that
was submitted before, so there is nothing new here. non-mpath
does not set FAILFAST so it gets requeued, user commands do, so they
are failed, this is not different than before.

> Should we always be putting even non-mp devices under mp, thus mp 
> catches the errors rather than the blk queue, and can have a short delay 
> while the controller reconnects ?  At least normal io can be handled 
> this way and we'd have to figure out the ioctl path.   But it should 
> still allow the core io, such as scanning, to get the error and fail 
> out, to be picked up after the reconnect.

non-mpath still gets requeued because it doesn't set any sort of
FAILFAST flag.
or

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-07-28 23:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  5:35 [PATCH] nvme-fabrics: allow to queue requests for live queues Sagi Grimberg
2020-07-28  6:44 ` Chao Leng
2020-07-28  6:49   ` Sagi Grimberg
2020-07-28 17:11     ` James Smart
2020-07-28 17:50       ` Sagi Grimberg
2020-07-28 20:11         ` James Smart
2020-07-28 20:38           ` Sagi Grimberg
2020-07-28 22:47             ` James Smart
2020-07-28 23:39               ` Sagi Grimberg [this message]
2020-07-28 10:50 ` Christoph Hellwig
2020-07-28 16:50   ` James Smart
2020-07-29  5:45 ` Christoph Hellwig
2020-07-29  5:53   ` Sagi Grimberg
2020-07-29  6:05     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c96aa2a-89e3-b8e5-8506-f489aa9a69f1@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).