All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: James Smart <james.smart@broadcom.com>,
	Chao Leng <lengchao@huawei.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH] nvme-fabrics: allow to queue requests for live queues
Date: Tue, 28 Jul 2020 16:39:26 -0700	[thread overview]
Message-ID: <8c96aa2a-89e3-b8e5-8506-f489aa9a69f1@grimberg.me> (raw)
In-Reply-To: <520f2317-533d-b546-dd35-80ae5f707aa7@broadcom.com>

>>>> Can you please give an example? NVME_REQ_USERCMD should not be any
>>>> different from any other type of I/O.
>>> ...
>>
>> The problem I have with this code is that checking for NVME_REQ_USERCMD
>> is a big hammer that you land on what can be normal I/O
>> (e.g. nvme read/write/write-zeros), and we must not rely on this
>> indication to prevent what you are describing.
>>
>> It maybe (just maybe) OK to check NVME_REQ_USERCMD only for the admin
>> queue, but we are really circling over the fact that we cannot reliably
>> send admin connect before to go off first, because we have to unquiesce
>> it before we issue the admin connect, and there might be a stray command
>> going into execution before we submit the admin connect.
> 
> agree.
> 
> Didn't you suggest something separate to isolate the connect 
> commands/init commands ?   Can we use a different tag set for 
> "initialization" commands?  We can use different queue_rq routines with 
> different check ready checks.  I don't like the overhead, but sure makes 
> the logic more straightforward.

I think I've added this now that I'm looking back:
e7832cb48a65 ("nvme: make fabrics command run on a separate request queue")

But it still doesn't solve the user commands you are referring to.

>>>>> As for the blk_rq_is_passthrough check - I guess I can see it being 
>>>>> based on the queue state, and the check looks ok  (we should never 
>>>>> see !blk_rq_is_passthrough on the admin q).
>>>>> But...
>>>>> - I don't know why it was that important to change it. On the 
>>>>> connecting path, all you're doing is letting io start flowing 
>>>>> before all the queues have been created.  Did you really need to 
>>>>> start that much sooner ?
>>>>
>>>> The issue is that controller in RESETTING state will have requests that
>>>> are being issued, and if we don't let it pass through, it will hang
>>>> around forever being requeued preventing queue freeze to complete.
>>>
> 
> Isn't this the real problem:  we require the request to complete in 
> order to get off the request queue in order to freeze. But to complete - 
> it has to finish with some status. This is ok if running multipath, as 
> mpath queues and retries - but what mpath is not running ? The issuer of 
> the io will get the error ?

We already fail noretry errors when we cancel (or abort) anything that
was submitted before, so there is nothing new here. non-mpath
does not set FAILFAST so it gets requeued, user commands do, so they
are failed, this is not different than before.

> Should we always be putting even non-mp devices under mp, thus mp 
> catches the errors rather than the blk queue, and can have a short delay 
> while the controller reconnects ?  At least normal io can be handled 
> this way and we'd have to figure out the ioctl path.   But it should 
> still allow the core io, such as scanning, to get the error and fail 
> out, to be picked up after the reconnect.

non-mpath still gets requeued because it doesn't set any sort of
FAILFAST flag.
or

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-07-28 23:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  5:35 [PATCH] nvme-fabrics: allow to queue requests for live queues Sagi Grimberg
2020-07-28  6:44 ` Chao Leng
2020-07-28  6:49   ` Sagi Grimberg
2020-07-28 17:11     ` James Smart
2020-07-28 17:50       ` Sagi Grimberg
2020-07-28 20:11         ` James Smart
2020-07-28 20:38           ` Sagi Grimberg
2020-07-28 22:47             ` James Smart
2020-07-28 23:39               ` Sagi Grimberg [this message]
2020-07-28 10:50 ` Christoph Hellwig
2020-07-28 16:50   ` James Smart
2020-07-29  5:45 ` Christoph Hellwig
2020-07-29  5:53   ` Sagi Grimberg
2020-07-29  6:05     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c96aa2a-89e3-b8e5-8506-f489aa9a69f1@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.