From: Sagi Grimberg <sagi@grimberg.me> To: Daniel Wagner <dwagner@suse.de> Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, James Smart <james.smart@broadcom.com>, Keith Busch <kbusch@kernel.org>, Ming Lei <ming.lei@redhat.com>, Hannes Reinecke <hare@suse.de>, Wen Xiong <wenxiong@us.ibm.com> Subject: Re: [PATCH v4 2/8] nvme-tcp: Update number of hardware queues before using them Date: Tue, 10 Aug 2021 18:00:37 -0700 [thread overview] Message-ID: <01d7878c-e396-1d6b-c383-8739ca0b3d11@grimberg.me> (raw) In-Reply-To: <20210809085250.xguvx5qiv2gxcoqk@carbon> On 8/9/21 1:52 AM, Daniel Wagner wrote: > Hi Sagi, > > On Fri, Aug 06, 2021 at 12:57:17PM -0700, Sagi Grimberg wrote: >>> - ret = nvme_tcp_start_io_queues(ctrl); >>> - if (ret) >>> - goto out_cleanup_connect_q; >>> - >>> - if (!new) { >>> - nvme_start_queues(ctrl); >>> + } else if (prior_q_cnt != ctrl->queue_count) { >> >> So if the queue count did not change we don't wait to make sure >> the queue g_usage_counter ref made it to zero? What guarantees that it >> did? > > Hmm, good point. we should always call nvme_wait_freeze_timeout() > for !new queues. Is this what you are implying? I think we should always wait for the freeze to complete. > > >>> if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) { >>> /* >>> * If we timed out waiting for freeze we are likely to >>> @@ -1828,6 +1822,10 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new) >>> nvme_unfreeze(ctrl); >>> } >>> + ret = nvme_tcp_start_io_queues(ctrl); >>> + if (ret) >>> + goto out_cleanup_connect_q; >>> + >> >> Did you test this with both heavy I/O, reset loop and ifdown/ifup >> loop? > > Not sure if this classifies as heavy I/O (on 80 CPU machine) > > fio --rw=readwrite --name=test --filename=/dev/nvme16n1 --size=50M \ > --direct=1 --bs=4k --numjobs=40 --group_reporting --runtime=4h \ > --time_based > > and then I installed iptables rules to block the traffic on the > controller side. With this test it is pretty easily to get > the host hanging. Let me know what test you would like to see > from me. I am glad to try to get them running. Lets add iodepth=128 >> If we unquiesce and unfreeze before we start the queues the pending I/Os >> may resume before the connect and not allow the connect to make forward >> progress. > > So the unfreeze should happen after the connect call? What about the > newly created queues? Do they not suffer from the same problem? Isn't > the NVME_TCP_Q_LIVE flag not enough? Q_LIVE will protect against the transport itself from queueing, however when multipath is not used, the transport will return BLK_STS_RESOURCE which will immediately trigger re-submission, in an endless loop, and that can prevent forward progress. It is also consistent with what nvme-pci does.
next prev parent reply other threads:[~2021-08-11 1:00 UTC|newest] Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-02 11:26 [PATCH RESEND v4 0/8] Handle update hardware queues and queue freeze more carefully Daniel Wagner 2021-08-02 11:26 ` [PATCH v4 1/8] nvme-fc: Update hardware queues before using them Daniel Wagner 2021-08-02 19:34 ` Himanshu Madhani 2021-08-02 11:26 ` [PATCH v4 2/8] nvme-tcp: Update number of " Daniel Wagner 2021-08-06 19:57 ` Sagi Grimberg 2021-08-09 8:52 ` Daniel Wagner 2021-08-11 1:00 ` Sagi Grimberg [this message] 2021-08-11 1:07 ` Keith Busch 2021-08-11 5:57 ` Sagi Grimberg 2021-08-11 10:25 ` Daniel Wagner 2021-08-02 11:26 ` [PATCH v4 3/8] nvme-rdma: " Daniel Wagner 2021-08-02 11:26 ` [PATCH v4 4/8] nvme-fc: Wait with a timeout for queue to freeze Daniel Wagner 2021-08-02 19:36 ` Himanshu Madhani 2021-08-02 11:26 ` [PATCH v4 5/8] nvme-fc: avoid race between time out and tear down Daniel Wagner 2021-08-02 19:38 ` Himanshu Madhani 2021-08-02 11:26 ` [PATCH v4 6/8] nvme-fc: fix controller reset hang during traffic Daniel Wagner 2021-08-02 19:39 ` Himanshu Madhani 2021-08-04 7:23 ` Hannes Reinecke 2021-08-04 8:08 ` Daniel Wagner 2021-08-11 1:05 ` Sagi Grimberg 2021-08-11 10:30 ` Daniel Wagner 2021-08-12 20:03 ` James Smart 2021-08-18 11:43 ` Daniel Wagner 2021-08-18 11:49 ` Daniel Wagner 2021-08-02 11:26 ` [PATCH v4 7/8] nvme-tcp: Unfreeze queues on reconnect Daniel Wagner 2021-08-02 11:26 ` [PATCH v4 8/8] nvme-rdma: " Daniel Wagner 2021-08-04 7:25 ` Hannes Reinecke 2021-08-06 19:59 ` Sagi Grimberg 2021-08-09 8:58 ` Daniel Wagner -- strict thread matches above, loose matches on Subject: below -- 2021-08-02 9:14 [PATCH v4 0/8] Handle update hardware queues and queue freeze more carefully Daniel Wagner 2021-08-02 9:14 ` [PATCH v4 2/8] nvme-tcp: Update number of hardware queues before using them Daniel Wagner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=01d7878c-e396-1d6b-c383-8739ca0b3d11@grimberg.me \ --to=sagi@grimberg.me \ --cc=dwagner@suse.de \ --cc=hare@suse.de \ --cc=james.smart@broadcom.com \ --cc=kbusch@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=ming.lei@redhat.com \ --cc=wenxiong@us.ibm.com \ --subject='Re: [PATCH v4 2/8] nvme-tcp: Update number of hardware queues before using them' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).