From: Sagi Grimberg <sagi@grimberg.me>
To: Daniel Wagner <dwagner@suse.de>
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
James Smart <james.smart@broadcom.com>,
Keith Busch <kbusch@kernel.org>, Ming Lei <ming.lei@redhat.com>,
Hannes Reinecke <hare@suse.de>, Wen Xiong <wenxiong@us.ibm.com>
Subject: Re: [PATCH v4 2/8] nvme-tcp: Update number of hardware queues before using them
Date: Tue, 10 Aug 2021 18:00:37 -0700 [thread overview]
Message-ID: <01d7878c-e396-1d6b-c383-8739ca0b3d11@grimberg.me> (raw)
In-Reply-To: <20210809085250.xguvx5qiv2gxcoqk@carbon>
On 8/9/21 1:52 AM, Daniel Wagner wrote:
> Hi Sagi,
>
> On Fri, Aug 06, 2021 at 12:57:17PM -0700, Sagi Grimberg wrote:
>>> - ret = nvme_tcp_start_io_queues(ctrl);
>>> - if (ret)
>>> - goto out_cleanup_connect_q;
>>> -
>>> - if (!new) {
>>> - nvme_start_queues(ctrl);
>>> + } else if (prior_q_cnt != ctrl->queue_count) {
>>
>> So if the queue count did not change we don't wait to make sure
>> the queue g_usage_counter ref made it to zero? What guarantees that it
>> did?
>
> Hmm, good point. we should always call nvme_wait_freeze_timeout()
> for !new queues. Is this what you are implying?
I think we should always wait for the freeze to complete.
>
>
>>> if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
>>> /*
>>> * If we timed out waiting for freeze we are likely to
>>> @@ -1828,6 +1822,10 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>>> nvme_unfreeze(ctrl);
>>> }
>>> + ret = nvme_tcp_start_io_queues(ctrl);
>>> + if (ret)
>>> + goto out_cleanup_connect_q;
>>> +
>>
>> Did you test this with both heavy I/O, reset loop and ifdown/ifup
>> loop?
>
> Not sure if this classifies as heavy I/O (on 80 CPU machine)
>
> fio --rw=readwrite --name=test --filename=/dev/nvme16n1 --size=50M \
> --direct=1 --bs=4k --numjobs=40 --group_reporting --runtime=4h \
> --time_based
>
> and then I installed iptables rules to block the traffic on the
> controller side. With this test it is pretty easily to get
> the host hanging. Let me know what test you would like to see
> from me. I am glad to try to get them running.
Lets add iodepth=128
>> If we unquiesce and unfreeze before we start the queues the pending I/Os
>> may resume before the connect and not allow the connect to make forward
>> progress.
>
> So the unfreeze should happen after the connect call? What about the
> newly created queues? Do they not suffer from the same problem? Isn't
> the NVME_TCP_Q_LIVE flag not enough?
Q_LIVE will protect against the transport itself from queueing, however
when multipath is not used, the transport will return BLK_STS_RESOURCE
which will immediately trigger re-submission, in an endless loop, and
that can prevent forward progress. It is also consistent with what
nvme-pci does.
next prev parent reply other threads:[~2021-08-11 1:00 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-02 11:26 [PATCH RESEND v4 0/8] Handle update hardware queues and queue freeze more carefully Daniel Wagner
2021-08-02 11:26 ` [PATCH v4 1/8] nvme-fc: Update hardware queues before using them Daniel Wagner
2021-08-02 19:34 ` Himanshu Madhani
2021-08-02 11:26 ` [PATCH v4 2/8] nvme-tcp: Update number of " Daniel Wagner
2021-08-06 19:57 ` Sagi Grimberg
2021-08-09 8:52 ` Daniel Wagner
2021-08-11 1:00 ` Sagi Grimberg [this message]
2021-08-11 1:07 ` Keith Busch
2021-08-11 5:57 ` Sagi Grimberg
2021-08-11 10:25 ` Daniel Wagner
2021-08-02 11:26 ` [PATCH v4 3/8] nvme-rdma: " Daniel Wagner
2021-08-02 11:26 ` [PATCH v4 4/8] nvme-fc: Wait with a timeout for queue to freeze Daniel Wagner
2021-08-02 19:36 ` Himanshu Madhani
2021-08-02 11:26 ` [PATCH v4 5/8] nvme-fc: avoid race between time out and tear down Daniel Wagner
2021-08-02 19:38 ` Himanshu Madhani
2021-08-02 11:26 ` [PATCH v4 6/8] nvme-fc: fix controller reset hang during traffic Daniel Wagner
2021-08-02 19:39 ` Himanshu Madhani
2021-08-04 7:23 ` Hannes Reinecke
2021-08-04 8:08 ` Daniel Wagner
2021-08-11 1:05 ` Sagi Grimberg
2021-08-11 10:30 ` Daniel Wagner
2021-08-12 20:03 ` James Smart
2021-08-18 11:43 ` Daniel Wagner
2021-08-18 11:49 ` Daniel Wagner
2021-08-02 11:26 ` [PATCH v4 7/8] nvme-tcp: Unfreeze queues on reconnect Daniel Wagner
2021-08-02 11:26 ` [PATCH v4 8/8] nvme-rdma: " Daniel Wagner
2021-08-04 7:25 ` Hannes Reinecke
2021-08-06 19:59 ` Sagi Grimberg
2021-08-09 8:58 ` Daniel Wagner
-- strict thread matches above, loose matches on Subject: below --
2021-08-02 9:14 [PATCH v4 0/8] Handle update hardware queues and queue freeze more carefully Daniel Wagner
2021-08-02 9:14 ` [PATCH v4 2/8] nvme-tcp: Update number of hardware queues before using them Daniel Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01d7878c-e396-1d6b-c383-8739ca0b3d11@grimberg.me \
--to=sagi@grimberg.me \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=james.smart@broadcom.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ming.lei@redhat.com \
--cc=wenxiong@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).