Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH rfc 0/4] improve quiesce time for large amount of namespaces
@ 2020-07-24 21:54 Sagi Grimberg
  2020-07-24 21:54 ` [PATCH rfc 1/4] blk-mq: add async quiesce interface for blocking hw queues Sagi Grimberg
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Sagi Grimberg @ 2020-07-24 21:54 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Jens Axboe, Chao Leng

This set attempts to improve the quiesce time when using a large set of
namespaces, which also improves I/O failover time in a multipath environment.

We improve for both non-blocking hctxs (e.g. pci, fc, rdma) and blocking
hctxs (e.g. tcp).

The original patch from Chao was targeted to rdma, hence Patch #4 is just
for testing purposes in case testing with nvme-tcp is an issue.

Chao Leng (1):
  nvme-core: reduce io failover time

Sagi Grimberg (3):
  blk-mq: add async quiesce interface for blocking hw queues
  nvme: improve quiesce for blocking queues
  nvme-rdma: use blocking quiesce interface

 block/blk-mq.c           | 31 +++++++++++++++++++++++++++++++
 drivers/nvme/host/core.c | 20 +++++++++++++++++++-
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/rdma.c |  5 +++--
 drivers/nvme/host/tcp.c  |  1 +
 include/linux/blk-mq.h   |  4 ++++
 6 files changed, 59 insertions(+), 3 deletions(-)

-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rfc 1/4] blk-mq: add async quiesce interface for blocking hw queues
  2020-07-24 21:54 [PATCH rfc 0/4] improve quiesce time for large amount of namespaces Sagi Grimberg
@ 2020-07-24 21:54 ` Sagi Grimberg
  2020-07-24 21:54 ` [PATCH rfc 2/4] nvme: improve quiesce for blocking queues Sagi Grimberg
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2020-07-24 21:54 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Jens Axboe, Chao Leng

Drivers that use blocking hw queues may have to quiesce a large amount
of request queues at once (e.g. controller or adapter reset). These
drivers would benefit from an async quiesce interface such that
the can trigger quiesce asynchronously and wait for all in parallel.

This leaves the synchronization responsibility to the driver, but adds
a convenient interface to quiesce async and wait in a single pass.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-mq.c         | 31 +++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  4 ++++
 2 files changed, 35 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index abcf590f6238..7326709ed2d1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,6 +209,37 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
+void blk_mq_quiesce_blocking_queue_async(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	blk_mq_quiesce_queue_nowait(q);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+			continue;
+		init_completion(&hctx->rcu_sync.completion);
+		init_rcu_head(&hctx->rcu_sync.head);
+		call_srcu(hctx->srcu, &hctx->rcu_sync.head, wakeme_after_rcu);
+	}
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_blocking_queue_async);
+
+void blk_mq_quiesce_blocking_queue_async_wait(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+			continue;
+		wait_for_completion(&hctx->rcu_sync.completion);
+		destroy_rcu_head(&hctx->rcu_sync.head);
+	}
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_blocking_queue_async_wait);
+
 /**
  * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
  * @q: request queue.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 23230c1d031e..863b372d32aa 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,7 @@
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
 #include <linux/srcu.h>
+#include <linux/rcupdate_wait.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
 	 */
 	struct list_head	hctx_list;
 
+	struct rcu_synchronize	rcu_sync;
 	/**
 	 * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is
 	 * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also
@@ -532,6 +534,8 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap);
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
 
 void blk_mq_quiesce_queue_nowait(struct request_queue *q);
+void blk_mq_quiesce_blocking_queue_async(struct request_queue *q);
+void blk_mq_quiesce_blocking_queue_async_wait(struct request_queue *q);
 
 unsigned int blk_mq_rq_cpu(struct request *rq);
 
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rfc 2/4] nvme: improve quiesce for blocking queues
  2020-07-24 21:54 [PATCH rfc 0/4] improve quiesce time for large amount of namespaces Sagi Grimberg
  2020-07-24 21:54 ` [PATCH rfc 1/4] blk-mq: add async quiesce interface for blocking hw queues Sagi Grimberg
@ 2020-07-24 21:54 ` Sagi Grimberg
  2020-07-24 23:01   ` Sagi Grimberg
  2020-07-24 21:54 ` [PATCH rfc 3/4] nvme-core: reduce io failover time Sagi Grimberg
  2020-07-24 21:54 ` [PATCH for-testing 4/4] nvme-rdma: use blocking quiesce interface Sagi Grimberg
  3 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2020-07-24 21:54 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Jens Axboe, Chao Leng

nvme transports that use blocking hw queues (e.g. nvme-tcp) currently
will synchronize queue quiesce for each namespace at once. This can
slow down failover time (which first quiesce all ns queues) if we have
a large amount of namespaces. Instead, we want to use an async interface
and do the namespaces quiesce in parallel rather than serially.

Introduce nvme_stop_blocking_queues for transports that use blocking
hw queues and convert nvme-tcp to use the new interface.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 13 +++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/tcp.c  |  1 +
 3 files changed, 15 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c16bfdff2953..f1a76bad226e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4529,6 +4529,19 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
+void nvme_stop_blocking_queues(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		_blk_mq_quiesce_blocking_queue_async(ns->queue);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		_blk_mq_quiesce_blocking_queue_async_wait(ns->queue);
+	up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_stop_blocking_queues);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1609267a1f0e..f8c36176518e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -568,6 +568,7 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		volatile union nvme_result *res);
 
+void nvme_stop_blocking_queues(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 7953362e7bb5..9534626379e7 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1887,6 +1887,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
+	nvme_start_freeze(ctrl);
 	nvme_stop_queues(ctrl);
 	nvme_tcp_stop_io_queues(ctrl);
 	if (ctrl->tagset) {
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rfc 3/4] nvme-core: reduce io failover time
  2020-07-24 21:54 [PATCH rfc 0/4] improve quiesce time for large amount of namespaces Sagi Grimberg
  2020-07-24 21:54 ` [PATCH rfc 1/4] blk-mq: add async quiesce interface for blocking hw queues Sagi Grimberg
  2020-07-24 21:54 ` [PATCH rfc 2/4] nvme: improve quiesce for blocking queues Sagi Grimberg
@ 2020-07-24 21:54 ` Sagi Grimberg
  2020-07-24 21:54 ` [PATCH for-testing 4/4] nvme-rdma: use blocking quiesce interface Sagi Grimberg
  3 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2020-07-24 21:54 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Jens Axboe, Chao Leng

From: Chao Leng <lengchao@huawei.com>

We test nvme over roce fail over with multipath when 1000 namespaces
configured, io pause more than 10 seconds. The reason: nvme_stop_queues
will quiesce all queues for each namespace when io timeout cause path
error. Quiesce queue wait all ongoing dispatches finished through
synchronize_rcu, need more than 10 milliseconds for each wait,
thus io pause more than 10 seconds.

To reduce io pause time, nvme_stop_queues use
blk_mq_quiesce_queue_nowait to quiesce the queue, nvme_stop_queues wait
all ongoing dispatches completed after all queues has been quiesced.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f1a76bad226e..0796c448a3c3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4548,8 +4548,13 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		blk_mq_quiesce_queue_nowait(ns->queue);
 	up_read(&ctrl->namespaces_rwsem);
+	/*
+	 * BLK_MQ_F_BLOCKING drivers should never call us
+	 */
+	WARN_ON_ONCE(ctrl->tagset.flags & BLK_MQ_F_BLOCKING);
+	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH for-testing 4/4] nvme-rdma: use blocking quiesce interface
  2020-07-24 21:54 [PATCH rfc 0/4] improve quiesce time for large amount of namespaces Sagi Grimberg
                   ` (2 preceding siblings ...)
  2020-07-24 21:54 ` [PATCH rfc 3/4] nvme-core: reduce io failover time Sagi Grimberg
@ 2020-07-24 21:54 ` Sagi Grimberg
  3 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2020-07-24 21:54 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Jens Axboe, Chao Leng

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5c3848974ccb..edd3cf1d5138 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -793,6 +793,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
 		set->reserved_tags = 2; /* connect + keep-alive */
 		set->numa_node = nctrl->numa_node;
+		set->flags = BLK_MQ_F_BLOCKING;
 		set->cmd_size = sizeof(struct nvme_rdma_request) +
 				NVME_RDMA_DATA_SGL_SIZE;
 		set->driver_data = ctrl;
@@ -806,7 +807,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->queue_depth = nctrl->sqsize + 1;
 		set->reserved_tags = 1; /* fabric connect */
 		set->numa_node = nctrl->numa_node;
-		set->flags = BLK_MQ_F_SHOULD_MERGE;
+		set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
 		set->cmd_size = sizeof(struct nvme_rdma_request) +
 				NVME_RDMA_DATA_SGL_SIZE;
 		if (nctrl->max_integrity_segments)
@@ -1008,7 +1009,7 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
 	if (ctrl->ctrl.queue_count > 1) {
-		nvme_stop_queues(&ctrl->ctrl);
+		nvme_stop_blocking_queues(&ctrl->ctrl);
 		nvme_rdma_stop_io_queues(ctrl);
 		if (ctrl->ctrl.tagset) {
 			blk_mq_tagset_busy_iter(ctrl->ctrl.tagset,
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH rfc 2/4] nvme: improve quiesce for blocking queues
  2020-07-24 21:54 ` [PATCH rfc 2/4] nvme: improve quiesce for blocking queues Sagi Grimberg
@ 2020-07-24 23:01   ` Sagi Grimberg
  0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2020-07-24 23:01 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Jens Axboe, Chao Leng


> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 7953362e7bb5..9534626379e7 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1887,6 +1887,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
>   {
>   	if (ctrl->queue_count <= 1)
>   		return;
> +	nvme_start_freeze(ctrl);

Woops... wrong patch...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 21:54 [PATCH rfc 0/4] improve quiesce time for large amount of namespaces Sagi Grimberg
2020-07-24 21:54 ` [PATCH rfc 1/4] blk-mq: add async quiesce interface for blocking hw queues Sagi Grimberg
2020-07-24 21:54 ` [PATCH rfc 2/4] nvme: improve quiesce for blocking queues Sagi Grimberg
2020-07-24 23:01   ` Sagi Grimberg
2020-07-24 21:54 ` [PATCH rfc 3/4] nvme-core: reduce io failover time Sagi Grimberg
2020-07-24 21:54 ` [PATCH for-testing 4/4] nvme-rdma: use blocking quiesce interface Sagi Grimberg

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git