linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* block: spread MSI(-X) vectors to all possible CPUs
@ 2017-06-26 10:20 Christoph Hellwig
  2017-06-26 10:20 ` [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Christoph Hellwig @ 2017-06-26 10:20 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Hi all,

this series contains the left-over block bits to spread the MSI-X
vectors over all CPU.  Thomas already rewrote and then merged the
irq bits into the tip irq/core branch, and this is the remainder.

As there are no dependencies on other block changes adding them
to the tip tree might be easiest if Jens could ACK them.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping
  2017-06-26 10:20 block: spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
@ 2017-06-26 10:20 ` Christoph Hellwig
  2017-07-02 17:47   ` Sagi Grimberg
  2017-06-26 10:20 ` [PATCH 2/3] blk-mq: create hctx for each present CPU Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-06-26 10:20 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

This way we get a nice distribution independent of the current cpu
online / offline state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-cpumap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 8e61e8640e17..5eaecd40f701 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,7 +35,6 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
 {
 	unsigned int *map = set->mq_map;
 	unsigned int nr_queues = set->nr_hw_queues;
-	const struct cpumask *online_mask = cpu_online_mask;
 	unsigned int i, nr_cpus, nr_uniq_cpus, queue, first_sibling;
 	cpumask_var_t cpus;
 
@@ -44,7 +43,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
 
 	cpumask_clear(cpus);
 	nr_cpus = nr_uniq_cpus = 0;
-	for_each_cpu(i, online_mask) {
+	for_each_present_cpu(i) {
 		nr_cpus++;
 		first_sibling = get_first_sibling(i);
 		if (!cpumask_test_cpu(first_sibling, cpus))
@@ -54,7 +53,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
 
 	queue = 0;
 	for_each_possible_cpu(i) {
-		if (!cpumask_test_cpu(i, online_mask)) {
+		if (!cpumask_test_cpu(i, cpu_present_mask)) {
 			map[i] = 0;
 			continue;
 		}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] blk-mq: create hctx for each present CPU
  2017-06-26 10:20 block: spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
  2017-06-26 10:20 ` [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
@ 2017-06-26 10:20 ` Christoph Hellwig
  2017-07-02 17:47   ` Sagi Grimberg
  2017-06-26 10:20 ` [PATCH 3/3] nvme: allocate queues for all possible CPUs Christoph Hellwig
  2017-06-27 18:24 ` block: spread MSI(-X) vectors to " Jens Axboe
  3 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-06-26 10:20 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c             | 120 +++++----------------------------------------
 block/blk-mq.h             |   5 --
 include/linux/cpuhotplug.h |   1 -
 3 files changed, 11 insertions(+), 115 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bb66c96850b1..dd390e27824d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,9 +37,6 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
-
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
@@ -1975,8 +1972,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		INIT_LIST_HEAD(&__ctx->rq_list);
 		__ctx->queue = q;
 
-		/* If the cpu isn't online, the cpu is mapped to first hctx */
-		if (!cpu_online(i))
+		/* If the cpu isn't present, the cpu is mapped to first hctx */
+		if (!cpu_present(i))
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
@@ -2019,8 +2016,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q,
-			       const struct cpumask *online_mask)
+static void blk_mq_map_swqueue(struct request_queue *q)
 {
 	unsigned int i, hctx_idx;
 	struct blk_mq_hw_ctx *hctx;
@@ -2038,13 +2034,11 @@ static void blk_mq_map_swqueue(struct request_queue *q,
 	}
 
 	/*
-	 * Map software to hardware queues
+	 * Map software to hardware queues.
+	 *
+	 * If the cpu isn't present, the cpu is mapped to first hctx.
 	 */
-	for_each_possible_cpu(i) {
-		/* If the cpu isn't online, the cpu is mapped to first hctx */
-		if (!cpumask_test_cpu(i, online_mask))
-			continue;
-
+	for_each_present_cpu(i) {
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
@@ -2330,16 +2324,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		blk_queue_softirq_done(q, set->ops->complete);
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-
-	get_online_cpus();
-	mutex_lock(&all_q_mutex);
-
-	list_add_tail(&q->all_q_node, &all_q_list);
 	blk_mq_add_queue_tag_set(set, q);
-	blk_mq_map_swqueue(q, cpu_online_mask);
-
-	mutex_unlock(&all_q_mutex);
-	put_online_cpus();
+	blk_mq_map_swqueue(q);
 
 	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
 		int ret;
@@ -2365,18 +2351,12 @@ void blk_mq_free_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
-	mutex_lock(&all_q_mutex);
-	list_del_init(&q->all_q_node);
-	mutex_unlock(&all_q_mutex);
-
 	blk_mq_del_queue_tag_set(q);
-
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q,
-				const struct cpumask *online_mask)
+static void blk_mq_queue_reinit(struct request_queue *q)
 {
 	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
 
@@ -2389,76 +2369,12 @@ static void blk_mq_queue_reinit(struct request_queue *q,
 	 * involves free and re-allocate memory, worthy doing?)
 	 */
 
-	blk_mq_map_swqueue(q, online_mask);
+	blk_mq_map_swqueue(q);
 
 	blk_mq_sysfs_register(q);
 	blk_mq_debugfs_register_hctxs(q);
 }
 
-/*
- * New online cpumask which is going to be set in this hotplug event.
- * Declare this cpumasks as global as cpu-hotplug operation is invoked
- * one-by-one and dynamically allocating this could result in a failure.
- */
-static struct cpumask cpuhp_online_new;
-
-static void blk_mq_queue_reinit_work(void)
-{
-	struct request_queue *q;
-
-	mutex_lock(&all_q_mutex);
-	/*
-	 * We need to freeze and reinit all existing queues.  Freezing
-	 * involves synchronous wait for an RCU grace period and doing it
-	 * one by one may take a long time.  Start freezing all queues in
-	 * one swoop and then wait for the completions so that freezing can
-	 * take place in parallel.
-	 */
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_freeze_queue_start(q);
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_freeze_queue_wait(q);
-
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_queue_reinit(q, &cpuhp_online_new);
-
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_unfreeze_queue(q);
-
-	mutex_unlock(&all_q_mutex);
-}
-
-static int blk_mq_queue_reinit_dead(unsigned int cpu)
-{
-	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
-	blk_mq_queue_reinit_work();
-	return 0;
-}
-
-/*
- * Before hotadded cpu starts handling requests, new mappings must be
- * established.  Otherwise, these requests in hw queue might never be
- * dispatched.
- *
- * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
- * for CPU0, and ctx1 for CPU1).
- *
- * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
- * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
- *
- * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
- * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
- * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
- * ignored.
- */
-static int blk_mq_queue_reinit_prepare(unsigned int cpu)
-{
-	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
-	cpumask_set_cpu(cpu, &cpuhp_online_new);
-	blk_mq_queue_reinit_work();
-	return 0;
-}
-
 static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
@@ -2669,7 +2585,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_realloc_hw_ctxs(set, q);
-		blk_mq_queue_reinit(q, cpu_online_mask);
+		blk_mq_queue_reinit(q);
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2885,24 +2801,10 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 }
 EXPORT_SYMBOL_GPL(blk_mq_poll);
 
-void blk_mq_disable_hotplug(void)
-{
-	mutex_lock(&all_q_mutex);
-}
-
-void blk_mq_enable_hotplug(void)
-{
-	mutex_unlock(&all_q_mutex);
-}
-
 static int __init blk_mq_init(void)
 {
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
-
-	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
-				  blk_mq_queue_reinit_prepare,
-				  blk_mq_queue_reinit_dead);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index cc67b48e3551..558df56544d2 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -56,11 +56,6 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 				bool at_head);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
-/*
- * CPU hotplug helpers
- */
-void blk_mq_enable_hotplug(void);
-void blk_mq_disable_hotplug(void);
 
 /*
  * CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index c15f22c54535..7f815d915977 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -58,7 +58,6 @@ enum cpuhp_state {
 	CPUHP_XEN_EVTCHN_PREPARE,
 	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
 	CPUHP_SH_SH3X_PREPARE,
-	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_NET_FLOW_PREPARE,
 	CPUHP_TOPOLOGY_PREPARE,
 	CPUHP_NET_IUCV_PREPARE,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] nvme: allocate queues for all possible CPUs
  2017-06-26 10:20 block: spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
  2017-06-26 10:20 ` [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
  2017-06-26 10:20 ` [PATCH 2/3] blk-mq: create hctx for each present CPU Christoph Hellwig
@ 2017-06-26 10:20 ` Christoph Hellwig
  2017-07-02 17:47   ` Sagi Grimberg
  2017-06-27 18:24 ` block: spread MSI(-X) vectors to " Jens Axboe
  3 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-06-26 10:20 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Unlike most drіvers that simply pass the maximum possible vectors to
pci_alloc_irq_vectors NVMe needs to configure the device before allocting
the vectors, so it needs a manual update for the new scheme of using
all present CPUs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 951042a375d6..b3dcd7abc6d7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1525,7 +1525,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int result, nr_io_queues, size;
 
-	nr_io_queues = num_online_cpus();
+	nr_io_queues = num_present_cpus();
 	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
 	if (result < 0)
 		return result;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: block: spread MSI(-X) vectors to all possible CPUs
  2017-06-26 10:20 block: spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-06-26 10:20 ` [PATCH 3/3] nvme: allocate queues for all possible CPUs Christoph Hellwig
@ 2017-06-27 18:24 ` Jens Axboe
  3 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2017-06-27 18:24 UTC (permalink / raw)
  To: Christoph Hellwig, Thomas Gleixner
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

On 06/26/2017 04:20 AM, Christoph Hellwig wrote:
> Hi all,
> 
> this series contains the left-over block bits to spread the MSI-X
> vectors over all CPU.  Thomas already rewrote and then merged the
> irq bits into the tip irq/core branch, and this is the remainder.
> 
> As there are no dependencies on other block changes adding them
> to the tip tree might be easiest if Jens could ACK them.

Looks fine to me, you can add my Reviewed-by if you want to funnel
them through the tip tree.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping
  2017-06-26 10:20 ` [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
@ 2017-07-02 17:47   ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2017-07-02 17:47 UTC (permalink / raw)
  To: Christoph Hellwig, Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-block, linux-kernel, linux-nvme

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] blk-mq: create hctx for each present CPU
  2017-06-26 10:20 ` [PATCH 2/3] blk-mq: create hctx for each present CPU Christoph Hellwig
@ 2017-07-02 17:47   ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2017-07-02 17:47 UTC (permalink / raw)
  To: Christoph Hellwig, Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-block, linux-kernel, linux-nvme

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] nvme: allocate queues for all possible CPUs
  2017-06-26 10:20 ` [PATCH 3/3] nvme: allocate queues for all possible CPUs Christoph Hellwig
@ 2017-07-02 17:47   ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2017-07-02 17:47 UTC (permalink / raw)
  To: Christoph Hellwig, Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-block, linux-kernel, linux-nvme

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-07-02 17:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26 10:20 block: spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
2017-06-26 10:20 ` [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
2017-07-02 17:47   ` Sagi Grimberg
2017-06-26 10:20 ` [PATCH 2/3] blk-mq: create hctx for each present CPU Christoph Hellwig
2017-07-02 17:47   ` Sagi Grimberg
2017-06-26 10:20 ` [PATCH 3/3] nvme: allocate queues for all possible CPUs Christoph Hellwig
2017-07-02 17:47   ` Sagi Grimberg
2017-06-27 18:24 ` block: spread MSI(-X) vectors to " Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).