linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* spread MSI(-X) vectors to all possible CPUs
@ 2017-02-03 14:35 Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 1/6] genirq: allow assigning affinity to present but not online CPUs Christoph Hellwig
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Hi all,

this series changes our automatic MSI-X vector assignment so that it
takes all present CPUs into account instead of all online ones.  This
allows to better deal with cpu hotplug events, which could happen
frequently due to power management for example.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] genirq: allow assigning affinity to present but not online CPUs
  2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
@ 2017-02-03 14:35 ` Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 2/6] genirq/affinity: assign vectors to all present CPUs Christoph Hellwig
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

This will allow us to spread MSI/MSI-X affinity over all present CPUs and
thus better deal with systems where cpus are take on and offline all the
time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/irq/manage.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 6b669593e7eb..7a1424330f9c 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -354,22 +354,22 @@ static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
 
 	/*
 	 * Preserve the managed affinity setting and an userspace affinity
-	 * setup, but make sure that one of the targets is online.
+	 * setup, but make sure that one of the targets is present.
 	 */
 	if (irqd_affinity_is_managed(&desc->irq_data) ||
 	    irqd_has_set(&desc->irq_data, IRQD_AFFINITY_SET)) {
 		if (cpumask_intersects(desc->irq_common_data.affinity,
-				       cpu_online_mask))
+				       cpu_present_mask))
 			set = desc->irq_common_data.affinity;
 		else
 			irqd_clear(&desc->irq_data, IRQD_AFFINITY_SET);
 	}
 
-	cpumask_and(mask, cpu_online_mask, set);
+	cpumask_and(mask, cpu_present_mask, set);
 	if (node != NUMA_NO_NODE) {
 		const struct cpumask *nodemask = cpumask_of_node(node);
 
-		/* make sure at least one of the cpus in nodemask is online */
+		/* make sure at least one of the cpus in nodemask is present */
 		if (cpumask_intersects(mask, nodemask))
 			cpumask_and(mask, mask, nodemask);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] genirq/affinity: assign vectors to all present CPUs
  2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 1/6] genirq: allow assigning affinity to present but not online CPUs Christoph Hellwig
@ 2017-02-03 14:35 ` Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events Christoph Hellwig
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Currently we only assign spread vectors to online CPUs, which ties the
IRQ mapping to the currently online devices and doesn't deal nicely with
the fact that CPUs could come and go rapidly due to e.g. power management.

Instead assign vectors to all present CPUs to avoid this churn.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/irq/affinity.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 4544b115f5eb..6cd20a569359 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -4,6 +4,8 @@
 #include <linux/slab.h>
 #include <linux/cpu.h>
 
+static cpumask_var_t node_to_present_cpumask[MAX_NUMNODES] __read_mostly;
+
 static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 				int cpus_per_vec)
 {
@@ -40,8 +42,8 @@ static int get_nodes_in_cpumask(const struct cpumask *mask, nodemask_t *nodemsk)
 	int n, nodes = 0;
 
 	/* Calculate the number of nodes in the supplied affinity mask */
-	for_each_online_node(n) {
-		if (cpumask_intersects(mask, cpumask_of_node(n))) {
+	for_each_node(n) {
+		if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
 			node_set(n, *nodemsk);
 			nodes++;
 		}
@@ -77,9 +79,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
 		cpumask_copy(masks + curvec, irq_default_affinity);
 
-	/* Stabilize the cpumasks */
-	get_online_cpus();
-	nodes = get_nodes_in_cpumask(cpu_online_mask, &nodemsk);
+	nodes = get_nodes_in_cpumask(cpu_present_mask, &nodemsk);
 
 	/*
 	 * If the number of nodes in the mask is greater than or equal the
@@ -87,7 +87,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	 */
 	if (affv <= nodes) {
 		for_each_node_mask(n, nodemsk) {
-			cpumask_copy(masks + curvec, cpumask_of_node(n));
+			cpumask_copy(masks + curvec,
+				     node_to_present_cpumask[n]);
 			if (++curvec == last_affv)
 				break;
 		}
@@ -103,7 +104,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		int ncpus, v, vecs_to_assign = vecs_per_node;
 
 		/* Get the cpus on this node which are in the mask */
-		cpumask_and(nmsk, cpu_online_mask, cpumask_of_node(n));
+		cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
 
 		/* Calculate the number of cpus per vector */
 		ncpus = cpumask_weight(nmsk);
@@ -126,8 +127,6 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	}
 
 done:
-	put_online_cpus();
-
 	/* Fill out vectors at the end that don't need affinity */
 	for (; curvec < nvecs; curvec++)
 		cpumask_copy(masks + curvec, irq_default_affinity);
@@ -145,12 +144,26 @@ int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd)
 {
 	int resv = affd->pre_vectors + affd->post_vectors;
 	int vecs = maxvec - resv;
-	int cpus;
 
-	/* Stabilize the cpumasks */
-	get_online_cpus();
-	cpus = cpumask_weight(cpu_online_mask);
-	put_online_cpus();
+	return min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
+}
+
+static int __init irq_build_cpumap(void)
+{
+	int node, cpu;
+
+	for (node = 0; node < nr_node_ids; node++) {
+		if (!zalloc_cpumask_var(&node_to_present_cpumask[node],
+				GFP_KERNEL))
+			panic("can't allocate early memory\n");
+	}
 
-	return min(cpus, vecs) + resv;
+	for_each_present_cpu(cpu) {
+		node = cpu_to_node(cpu);
+		cpumask_set_cpu(cpu, node_to_present_cpumask[node]);
+	}
+
+	return 0;
 }
+
+subsys_initcall(irq_build_cpumap);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events
  2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 1/6] genirq: allow assigning affinity to present but not online CPUs Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 2/6] genirq/affinity: assign vectors to all present CPUs Christoph Hellwig
@ 2017-02-03 14:35 ` Christoph Hellwig
  2017-02-03 16:17   ` kbuild test robot
                     ` (2 more replies)
  2017-02-03 14:35 ` [PATCH 4/6] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Remove a CPU from the affinity mask when it goes offline and add it
back when it returns.  In case the vetor was assigned only to the CPU
going offline it will be shutdown and re-started when the CPU
reappears.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/kernel/irq.c      |   3 +-
 include/linux/cpuhotplug.h |   1 +
 include/linux/irq.h        |   9 +++
 kernel/cpu.c               |   6 ++
 kernel/irq/affinity.c      | 157 ++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 7c6e9ffe4424..285ef40ae290 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -449,7 +449,8 @@ void fixup_irqs(void)
 
 		data = irq_desc_get_irq_data(desc);
 		affinity = irq_data_get_affinity_mask(data);
-		if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
+		if (irqd_affinity_is_managed(data) ||
+		    !irq_has_action(irq) || irqd_is_per_cpu(data) ||
 		    cpumask_subset(affinity, cpu_online_mask)) {
 			raw_spin_unlock(&desc->lock);
 			continue;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index d936a0021839..63406ae5b2df 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -127,6 +127,7 @@ enum cpuhp_state {
 	CPUHP_AP_ONLINE_IDLE,
 	CPUHP_AP_SMPBOOT_THREADS,
 	CPUHP_AP_X86_VDSO_VMA_ONLINE,
+	CPUHP_AP_IRQ_AFFINIY_ONLINE,
 	CPUHP_AP_PERF_ONLINE,
 	CPUHP_AP_PERF_X86_ONLINE,
 	CPUHP_AP_PERF_X86_UNCORE_ONLINE,
diff --git a/include/linux/irq.h b/include/linux/irq.h
index e79875574b39..4b2a542b2591 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -214,6 +214,7 @@ enum {
 	IRQD_WAKEUP_ARMED		= (1 << 19),
 	IRQD_FORWARDED_TO_VCPU		= (1 << 20),
 	IRQD_AFFINITY_MANAGED		= (1 << 21),
+	IRQD_AFFINITY_SUSPENDED		= (1 << 22),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -312,6 +313,11 @@ static inline bool irqd_affinity_is_managed(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_AFFINITY_MANAGED;
 }
 
+static inline bool irqd_affinity_is_suspended(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_AFFINITY_SUSPENDED;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
@@ -989,4 +995,7 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest);
 int ipi_send_single(unsigned int virq, unsigned int cpu);
 int ipi_send_mask(unsigned int virq, const struct cpumask *dest);
 
+int irq_affinity_online_cpu(unsigned int cpu);
+int irq_affinity_offline_cpu(unsigned int cpu);
+
 #endif /* _LINUX_IRQ_H */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 0a5f630f5c54..fe19af6a896b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -25,6 +25,7 @@
 #include <linux/smpboot.h>
 #include <linux/relay.h>
 #include <linux/slab.h>
+#include <linux/irq.h>
 
 #include <trace/events/power.h>
 #define CREATE_TRACE_POINTS
@@ -1248,6 +1249,11 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 		.startup.single		= smpboot_unpark_threads,
 		.teardown.single	= NULL,
 	},
+	[CPUHP_AP_IRQ_AFFINIY_ONLINE] = {
+		.name			= "irq/affinity:online",
+		.startup.single		= irq_affinity_online_cpu,
+		.teardown.single	= irq_affinity_offline_cpu,
+	},
 	[CPUHP_AP_PERF_ONLINE] = {
 		.name			= "perf:online",
 		.startup.single		= perf_event_init_cpu,
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 6cd20a569359..74006167892d 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -1,8 +1,12 @@
-
+/*
+ * Copyright (C) 2016 Thomas Gleixner.
+ * Copyright (C) 2016-2017 Christoph Hellwig.
+ */
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include "internals.h"
 
 static cpumask_var_t node_to_present_cpumask[MAX_NUMNODES] __read_mostly;
 
@@ -148,6 +152,157 @@ int irq_calc_affinity_vectors(int maxvec, const struct irq_affinity *affd)
 	return min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
 }
 
+static void __irq_affinity_set(unsigned int irq, struct irq_desc *desc,
+		cpumask_t *mask)
+{
+	struct irq_data *data = irq_desc_get_irq_data(desc);
+	struct irq_chip *chip = irq_data_get_irq_chip(data);
+	int ret;
+
+	if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
+		chip->irq_mask(data);
+	ret = chip->irq_set_affinity(data, mask, true);
+	WARN_ON_ONCE(ret);
+
+	/*
+	 * We unmask if the irq was not marked masked by the core code.
+	 * That respects the lazy irq disable behaviour.
+	 */
+	if (!irqd_can_move_in_process_context(data) &&
+	    !irqd_irq_masked(data) && chip->irq_unmask)
+		chip->irq_unmask(data);
+}
+
+static void irq_affinity_online_irq(unsigned int irq, struct irq_desc *desc,
+		unsigned int cpu)
+{
+	const struct cpumask *affinity;
+	struct irq_data *data;
+	struct irq_chip *chip;
+	unsigned long flags;
+	cpumask_var_t mask;
+
+	if (!desc)
+		return;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+
+	data = irq_desc_get_irq_data(desc);
+	affinity = irq_data_get_affinity_mask(data);
+	if (!irqd_affinity_is_managed(data) ||
+	    !irq_has_action(irq) ||
+	    !cpumask_test_cpu(cpu, affinity))
+		goto out_unlock;
+
+	/*
+	 * The interrupt descriptor might have been cleaned up
+	 * already, but it is not yet removed from the radix tree
+	 */
+	chip = irq_data_get_irq_chip(data);
+	if (!chip)
+		goto out_unlock;
+
+	if (WARN_ON_ONCE(!chip->irq_set_affinity))
+		goto out_unlock;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		pr_err("failed to allocate memory for cpumask\n");
+		goto out_unlock;
+	}
+
+	cpumask_and(mask, affinity, cpu_online_mask);
+	cpumask_set_cpu(cpu, mask);
+	if (irqd_has_set(data, IRQD_AFFINITY_SUSPENDED)) {
+		irq_startup(desc, false);
+		irqd_clear(data, IRQD_AFFINITY_SUSPENDED);
+	} else {
+		__irq_affinity_set(irq, desc, mask);
+	}
+
+	free_cpumask_var(mask);
+out_unlock:
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+int irq_affinity_online_cpu(unsigned int cpu)
+{
+	struct irq_desc *desc;
+	unsigned int irq;
+
+	for_each_irq_desc(irq, desc)
+		irq_affinity_online_irq(irq, desc, cpu);
+	return 0;
+}
+
+static void irq_affinity_offline_irq(unsigned int irq, struct irq_desc *desc,
+		unsigned int cpu)
+{
+	const struct cpumask *affinity;
+	struct irq_data *data;
+	struct irq_chip *chip;
+	unsigned long flags;
+	cpumask_var_t mask;
+
+	if (!desc)
+		return;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+
+	data = irq_desc_get_irq_data(desc);
+	affinity = irq_data_get_affinity_mask(data);
+	if (!irqd_affinity_is_managed(data) ||
+	    !irq_has_action(irq) ||
+	    irqd_has_set(data, IRQD_AFFINITY_SUSPENDED) ||
+	    !cpumask_test_cpu(cpu, affinity))
+		goto out_unlock;
+
+	/*
+	 * Complete the irq move. This cpu is going down and for
+	 * non intr-remapping case, we can't wait till this interrupt
+	 * arrives at this cpu before completing the irq move.
+	 */
+	irq_force_complete_move(desc);
+
+	/*
+	 * The interrupt descriptor might have been cleaned up
+	 * already, but it is not yet removed from the radix tree
+	 */
+	chip = irq_data_get_irq_chip(data);
+	if (!chip)
+		goto out_unlock;
+
+	if (WARN_ON_ONCE(!chip->irq_set_affinity))
+		goto out_unlock;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		pr_err("failed to allocate memory for cpumask\n");
+		goto out_unlock;
+	}
+
+	cpumask_copy(mask, affinity);
+	cpumask_clear_cpu(cpu, mask);
+	if (cpumask_empty(mask)) {
+		irqd_set(data, IRQD_AFFINITY_SUSPENDED);
+		irq_shutdown(desc);
+	} else {
+		__irq_affinity_set(irq, desc, mask);
+	}
+
+	free_cpumask_var(mask);
+out_unlock:
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+int irq_affinity_offline_cpu(unsigned int cpu)
+{
+	struct irq_desc *desc;
+	unsigned int irq;
+
+	for_each_irq_desc(irq, desc)
+		irq_affinity_offline_irq(irq, desc, cpu);
+	return 0;
+}
+
 static int __init irq_build_cpumap(void)
 {
 	int node, cpu;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] blk-mq: include all present CPUs in the default queue mapping
  2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-02-03 14:35 ` [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events Christoph Hellwig
@ 2017-02-03 14:35 ` Christoph Hellwig
  2017-02-03 14:35 ` [PATCH 5/6] blk-mq: create hctx for each present CPU Christoph Hellwig
  2017-02-03 14:36 ` [PATCH 6/6] nvme: allocate queues for all possible CPUs Christoph Hellwig
  5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

This way we get a nice distribution independent of the current cpu
online / offline state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-cpumap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 8e61e8640e17..5eaecd40f701 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,7 +35,6 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
 {
 	unsigned int *map = set->mq_map;
 	unsigned int nr_queues = set->nr_hw_queues;
-	const struct cpumask *online_mask = cpu_online_mask;
 	unsigned int i, nr_cpus, nr_uniq_cpus, queue, first_sibling;
 	cpumask_var_t cpus;
 
@@ -44,7 +43,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
 
 	cpumask_clear(cpus);
 	nr_cpus = nr_uniq_cpus = 0;
-	for_each_cpu(i, online_mask) {
+	for_each_present_cpu(i) {
 		nr_cpus++;
 		first_sibling = get_first_sibling(i);
 		if (!cpumask_test_cpu(first_sibling, cpus))
@@ -54,7 +53,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
 
 	queue = 0;
 	for_each_possible_cpu(i) {
-		if (!cpumask_test_cpu(i, online_mask)) {
+		if (!cpumask_test_cpu(i, cpu_present_mask)) {
 			map[i] = 0;
 			continue;
 		}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] blk-mq: create hctx for each present CPU
  2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
                   ` (3 preceding siblings ...)
  2017-02-03 14:35 ` [PATCH 4/6] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
@ 2017-02-03 14:35 ` Christoph Hellwig
  2017-02-03 14:36 ` [PATCH 6/6] nvme: allocate queues for all possible CPUs Christoph Hellwig
  5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-sysfs.c       |  26 +++------
 block/blk-mq.c             | 137 +++++----------------------------------------
 block/blk-mq.h             |   5 --
 include/linux/cpuhotplug.h |   1 -
 4 files changed, 20 insertions(+), 149 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 295e69670c39..c69ec94a2ad8 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -239,7 +239,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx)
 	return ret;
 }
 
-static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
+void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
 	struct blk_mq_ctx *ctx;
@@ -265,13 +265,6 @@ static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
 	q->mq_sysfs_init_done = false;
 }
 
-void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
-{
-	blk_mq_disable_hotplug();
-	__blk_mq_unregister_dev(dev, q);
-	blk_mq_enable_hotplug();
-}
-
 void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx)
 {
 	kobject_init(&hctx->kobj, &blk_mq_hw_ktype);
@@ -295,13 +288,11 @@ int blk_mq_register_dev(struct device *dev, struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	int ret, i;
 
-	blk_mq_disable_hotplug();
-
 	blk_mq_sysfs_init(q);
 
 	ret = kobject_add(&q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
 	if (ret < 0)
-		goto out;
+		return ret;
 
 	kobject_uevent(&q->mq_kobj, KOBJ_ADD);
 
@@ -310,16 +301,13 @@ int blk_mq_register_dev(struct device *dev, struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_register_hctx(hctx);
 		if (ret)
-			break;
+			goto fail;
 	}
 
-	if (ret)
-		__blk_mq_unregister_dev(dev, q);
-	else
-		q->mq_sysfs_init_done = true;
-out:
-	blk_mq_enable_hotplug();
-
+	q->mq_sysfs_init_done = true;
+	return 0;
+fail:
+	blk_mq_unregister_dev(dev, q);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blk_mq_register_dev);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index be183e6115a1..3578d678a871 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -34,9 +34,6 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
-
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
@@ -1948,8 +1945,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		blk_stat_init(&__ctx->stat[BLK_STAT_READ]);
 		blk_stat_init(&__ctx->stat[BLK_STAT_WRITE]);
 
-		/* If the cpu isn't online, the cpu is mapped to first hctx */
-		if (!cpu_online(i))
+		/* If the cpu isn't present, the cpu is mapped to first hctx */
+		if (!cpu_present(i))
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
@@ -1992,8 +1989,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q,
-			       const struct cpumask *online_mask)
+static void blk_mq_map_swqueue(struct request_queue *q)
 {
 	unsigned int i, hctx_idx;
 	struct blk_mq_hw_ctx *hctx;
@@ -2011,13 +2007,11 @@ static void blk_mq_map_swqueue(struct request_queue *q,
 	}
 
 	/*
-	 * Map software to hardware queues
+	 * Map software to hardware queues.
+	 *
+	 * If the cpu isn't present, the cpu is mapped to first hctx.
 	 */
-	for_each_possible_cpu(i) {
-		/* If the cpu isn't online, the cpu is mapped to first hctx */
-		if (!cpumask_test_cpu(i, online_mask))
-			continue;
-
+	for_each_present_cpu(i) {
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
@@ -2293,16 +2287,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		blk_queue_softirq_done(q, set->ops->complete);
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-
-	get_online_cpus();
-	mutex_lock(&all_q_mutex);
-
-	list_add_tail(&q->all_q_node, &all_q_list);
 	blk_mq_add_queue_tag_set(set, q);
-	blk_mq_map_swqueue(q, cpu_online_mask);
-
-	mutex_unlock(&all_q_mutex);
-	put_online_cpus();
+	blk_mq_map_swqueue(q);
 
 	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
 		int ret;
@@ -2328,10 +2314,6 @@ void blk_mq_free_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
-	mutex_lock(&all_q_mutex);
-	list_del_init(&q->all_q_node);
-	mutex_unlock(&all_q_mutex);
-
 	wbt_exit(q);
 
 	blk_mq_del_queue_tag_set(q);
@@ -2340,89 +2322,6 @@ void blk_mq_free_queue(struct request_queue *q)
 	blk_mq_free_hw_queues(q, set);
 }
 
-/* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q,
-				const struct cpumask *online_mask)
-{
-	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
-
-	blk_mq_sysfs_unregister(q);
-
-	/*
-	 * redo blk_mq_init_cpu_queues and blk_mq_init_hw_queues. FIXME: maybe
-	 * we should change hctx numa_node according to new topology (this
-	 * involves free and re-allocate memory, worthy doing?)
-	 */
-
-	blk_mq_map_swqueue(q, online_mask);
-
-	blk_mq_sysfs_register(q);
-}
-
-/*
- * New online cpumask which is going to be set in this hotplug event.
- * Declare this cpumasks as global as cpu-hotplug operation is invoked
- * one-by-one and dynamically allocating this could result in a failure.
- */
-static struct cpumask cpuhp_online_new;
-
-static void blk_mq_queue_reinit_work(void)
-{
-	struct request_queue *q;
-
-	mutex_lock(&all_q_mutex);
-	/*
-	 * We need to freeze and reinit all existing queues.  Freezing
-	 * involves synchronous wait for an RCU grace period and doing it
-	 * one by one may take a long time.  Start freezing all queues in
-	 * one swoop and then wait for the completions so that freezing can
-	 * take place in parallel.
-	 */
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_freeze_queue_start(q);
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_freeze_queue_wait(q);
-
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_queue_reinit(q, &cpuhp_online_new);
-
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_mq_unfreeze_queue(q);
-
-	mutex_unlock(&all_q_mutex);
-}
-
-static int blk_mq_queue_reinit_dead(unsigned int cpu)
-{
-	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
-	blk_mq_queue_reinit_work();
-	return 0;
-}
-
-/*
- * Before hotadded cpu starts handling requests, new mappings must be
- * established.  Otherwise, these requests in hw queue might never be
- * dispatched.
- *
- * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
- * for CPU0, and ctx1 for CPU1).
- *
- * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
- * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
- *
- * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
- * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
- * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
- * ignored.
- */
-static int blk_mq_queue_reinit_prepare(unsigned int cpu)
-{
-	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
-	cpumask_set_cpu(cpu, &cpuhp_online_new);
-	blk_mq_queue_reinit_work();
-	return 0;
-}
-
 static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
@@ -2632,7 +2531,11 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
 		else
 			blk_queue_make_request(q, blk_sq_make_request);
 
-		blk_mq_queue_reinit(q, cpu_online_mask);
+		/* Basically redo blk_mq_init_queue with queue frozen */
+		WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
+		blk_mq_sysfs_unregister(q);
+		blk_mq_map_swqueue(q);
+		blk_mq_sysfs_register(q);
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2802,24 +2705,10 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 }
 EXPORT_SYMBOL_GPL(blk_mq_poll);
 
-void blk_mq_disable_hotplug(void)
-{
-	mutex_lock(&all_q_mutex);
-}
-
-void blk_mq_enable_hotplug(void)
-{
-	mutex_unlock(&all_q_mutex);
-}
-
 static int __init blk_mq_init(void)
 {
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
-
-	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
-				  blk_mq_queue_reinit_prepare,
-				  blk_mq_queue_reinit_dead);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 24b2256186f3..0d77b914d29f 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -57,11 +57,6 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 				bool at_head);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
-/*
- * CPU hotplug helpers
- */
-void blk_mq_enable_hotplug(void);
-void blk_mq_disable_hotplug(void);
 
 /*
  * CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 63406ae5b2df..992a09a297da 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -61,7 +61,6 @@ enum cpuhp_state {
 	CPUHP_XEN_EVTCHN_PREPARE,
 	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
 	CPUHP_SH_SH3X_PREPARE,
-	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_NET_FLOW_PREPARE,
 	CPUHP_TOPOLOGY_PREPARE,
 	CPUHP_NET_IUCV_PREPARE,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] nvme: allocate queues for all possible CPUs
  2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
                   ` (4 preceding siblings ...)
  2017-02-03 14:35 ` [PATCH 5/6] blk-mq: create hctx for each present CPU Christoph Hellwig
@ 2017-02-03 14:36 ` Christoph Hellwig
  5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 14:36 UTC (permalink / raw)
  To: Thomas Gleixner, Jens Axboe
  Cc: Keith Busch, linux-nvme, linux-block, linux-kernel

Unlike most drіvers that simply pass the maximum possible vectors to
pci_alloc_irq_vectors NVMe needs to configure the device before allocting
the vectors, so it needs a manual update for the new scheme of using
all present CPUs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 032237c7ee56..3eaafa25b4fd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1405,7 +1405,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int result, nr_io_queues, size;
 
-	nr_io_queues = num_online_cpus();
+	nr_io_queues = num_present_cpus();
 	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
 	if (result < 0)
 		return result;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events
  2017-02-03 14:35 ` [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events Christoph Hellwig
@ 2017-02-03 16:17   ` kbuild test robot
  2017-02-03 17:13   ` kbuild test robot
  2017-02-10 11:13   ` Thomas Gleixner
  2 siblings, 0 replies; 10+ messages in thread
From: kbuild test robot @ 2017-02-03 16:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kbuild-all, Thomas Gleixner, Jens Axboe, Keith Busch, linux-nvme,
	linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1718 bytes --]

Hi Christoph,

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.10-rc6]
[cannot apply to next-20170203]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christoph-Hellwig/genirq-allow-assigning-affinity-to-present-but-not-online-CPUs/20170203-224056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   kernel/irq/affinity.c: In function 'irq_affinity_offline_irq':
>> kernel/irq/affinity.c:264:2: error: implicit declaration of function 'irq_force_complete_move' [-Werror=implicit-function-declaration]
     irq_force_complete_move(desc);
     ^~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/irq_force_complete_move +264 kernel/irq/affinity.c

   258	
   259		/*
   260		 * Complete the irq move. This cpu is going down and for
   261		 * non intr-remapping case, we can't wait till this interrupt
   262		 * arrives at this cpu before completing the irq move.
   263		 */
 > 264		irq_force_complete_move(desc);
   265	
   266		/*
   267		 * The interrupt descriptor might have been cleaned up

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 45922 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events
  2017-02-03 14:35 ` [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events Christoph Hellwig
  2017-02-03 16:17   ` kbuild test robot
@ 2017-02-03 17:13   ` kbuild test robot
  2017-02-10 11:13   ` Thomas Gleixner
  2 siblings, 0 replies; 10+ messages in thread
From: kbuild test robot @ 2017-02-03 17:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kbuild-all, Thomas Gleixner, Jens Axboe, Keith Busch, linux-nvme,
	linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 22820 bytes --]

Hi Christoph,

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.10-rc6]
[cannot apply to next-20170203]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christoph-Hellwig/genirq-allow-assigning-affinity-to-present-but-not-online-CPUs/20170203-224056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm-socfpga_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

   In file included from kernel/irq/internals.h:8:0,
                    from kernel/irq/affinity.c:9:
>> include/linux/irqdesc.h:52:25: error: field 'irq_common_data' has incomplete type
     struct irq_common_data irq_common_data;
                            ^~~~~~~~~~~~~~~
>> include/linux/irqdesc.h:53:19: error: field 'irq_data' has incomplete type
     struct irq_data  irq_data;
                      ^~~~~~~~
>> include/linux/irqdesc.h:55:2: error: unknown type name 'irq_flow_handler_t'
     irq_flow_handler_t handle_irq;
     ^~~~~~~~~~~~~~~~~~
   In file included from include/linux/interrupt.h:5:0,
                    from kernel/irq/affinity.c:5:
   include/linux/irqdesc.h: In function 'irq_data_to_desc':
>> include/linux/irqdesc.h:111:26: error: dereferencing pointer to incomplete type 'struct irq_data'
     return container_of(data->common, struct irq_desc, irq_common_data);
                             ^
   include/linux/kernel.h:850:49: note: in definition of macro 'container_of'
     const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                    ^~~
   In file included from kernel/irq/internals.h:8:0,
                    from kernel/irq/affinity.c:9:
   include/linux/irqdesc.h: In function 'generic_handle_irq_desc':
>> include/linux/irqdesc.h:150:2: error: called object is not a function or function pointer
     desc->handle_irq(desc);
     ^~~~
   include/linux/irqdesc.h: At top level:
   include/linux/irqdesc.h:194:8: error: unknown type name 'irq_flow_handler_t'
           irq_flow_handler_t handler)
           ^~~~~~~~~~~~~~~~~~
   include/linux/irqdesc.h:215:6: error: unknown type name 'irq_flow_handler_t'
         irq_flow_handler_t handler, const char *name)
         ^~~~~~~~~~~~~~~~~~
   include/linux/irqdesc.h: In function 'irq_balancing_disabled':
>> include/linux/irqdesc.h:229:38: error: 'IRQ_NO_BALANCING_MASK' undeclared (first use in this function)
     return desc->status_use_accessors & IRQ_NO_BALANCING_MASK;
                                         ^~~~~~~~~~~~~~~~~~~~~
   include/linux/irqdesc.h:229:38: note: each undeclared identifier is reported only once for each function it appears in
   include/linux/irqdesc.h: In function 'irq_is_percpu':
>> include/linux/irqdesc.h:237:38: error: 'IRQ_PER_CPU' undeclared (first use in this function)
     return desc->status_use_accessors & IRQ_PER_CPU;
                                         ^~~~~~~~~~~
   In file included from kernel/irq/internals.h:62:0,
                    from kernel/irq/affinity.c:9:
   kernel/irq/debug.h: In function 'print_irq_desc':
>> kernel/irq/debug.h:16:28: warning: format '%p' expects argument of type 'void *', but argument 2 has type 'int' [-Wformat=]
     printk("->handle_irq():  %p, ", desc->handle_irq);
                               ^
>> kernel/irq/debug.h:26:7: error: 'IRQ_LEVEL' undeclared (first use in this function)
     ___P(IRQ_LEVEL);
          ^
   kernel/irq/debug.h:7:50: note: in definition of macro '___P'
    #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
                                                     ^
>> kernel/irq/debug.h:27:7: error: 'IRQ_PER_CPU' undeclared (first use in this function)
     ___P(IRQ_PER_CPU);
          ^
   kernel/irq/debug.h:7:50: note: in definition of macro '___P'
    #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
                                                     ^
>> kernel/irq/debug.h:28:7: error: 'IRQ_NOPROBE' undeclared (first use in this function)
     ___P(IRQ_NOPROBE);
          ^
   kernel/irq/debug.h:7:50: note: in definition of macro '___P'
    #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
                                                     ^
>> kernel/irq/debug.h:29:7: error: 'IRQ_NOREQUEST' undeclared (first use in this function)
     ___P(IRQ_NOREQUEST);
          ^
   kernel/irq/debug.h:7:50: note: in definition of macro '___P'
    #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
                                                     ^
>> kernel/irq/debug.h:30:7: error: 'IRQ_NOTHREAD' undeclared (first use in this function)
     ___P(IRQ_NOTHREAD);
          ^
   kernel/irq/debug.h:7:50: note: in definition of macro '___P'
    #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
                                                     ^
>> kernel/irq/debug.h:31:7: error: 'IRQ_NOAUTOEN' undeclared (first use in this function)
     ___P(IRQ_NOAUTOEN);
          ^
   kernel/irq/debug.h:7:50: note: in definition of macro '___P'
    #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
                                                     ^
   In file included from kernel/irq/internals.h:63:0,
                    from kernel/irq/affinity.c:9:
   kernel/irq/settings.h: At top level:
>> kernel/irq/settings.h:6:28: error: 'IRQ_DEFAULT_INIT_FLAGS' undeclared here (not in a function)
     _IRQ_DEFAULT_INIT_FLAGS = IRQ_DEFAULT_INIT_FLAGS,
                               ^~~~~~~~~~~~~~~~~~~~~~
>> kernel/irq/settings.h:7:18: error: 'IRQ_PER_CPU' undeclared here (not in a function)
     _IRQ_PER_CPU  = IRQ_PER_CPU,
                     ^~~~~~~~~~~
>> kernel/irq/settings.h:8:16: error: 'IRQ_LEVEL' undeclared here (not in a function)
     _IRQ_LEVEL  = IRQ_LEVEL,
                   ^~~~~~~~~
>> kernel/irq/settings.h:9:18: error: 'IRQ_NOPROBE' undeclared here (not in a function)
     _IRQ_NOPROBE  = IRQ_NOPROBE,
                     ^~~~~~~~~~~
>> kernel/irq/settings.h:10:20: error: 'IRQ_NOREQUEST' undeclared here (not in a function)
     _IRQ_NOREQUEST  = IRQ_NOREQUEST,
                       ^~~~~~~~~~~~~
>> kernel/irq/settings.h:11:19: error: 'IRQ_NOTHREAD' undeclared here (not in a function)
     _IRQ_NOTHREAD  = IRQ_NOTHREAD,
                      ^~~~~~~~~~~~

vim +/irq_data +53 include/linux/irqdesc.h

425a5072 Thomas Gleixner           2015-12-13   46   * @rcu:		rcu head for delayed free
ecb3f394 Craig Gallek              2016-09-13   47   * @kobj:		kobject used to represent this struct in sysfs
e144710b Thomas Gleixner           2010-10-01   48   * @dir:		/proc/irq/ procfs entry
e144710b Thomas Gleixner           2010-10-01   49   * @name:		flow handler name for /proc/interrupts output
e144710b Thomas Gleixner           2010-10-01   50   */
e144710b Thomas Gleixner           2010-10-01   51  struct irq_desc {
0d0b4c86 Jiang Liu                 2015-06-01  @52  	struct irq_common_data	irq_common_data;
e144710b Thomas Gleixner           2010-10-01  @53  	struct irq_data		irq_data;
6c9ae009 Eric Dumazet              2011-01-13   54  	unsigned int __percpu	*kstat_irqs;
e144710b Thomas Gleixner           2010-10-01  @55  	irq_flow_handler_t	handle_irq;
78129576 Thomas Gleixner           2011-02-10   56  #ifdef CONFIG_IRQ_PREFLOW_FASTEOI
78129576 Thomas Gleixner           2011-02-10   57  	irq_preflow_handler_t	preflow_handler;
78129576 Thomas Gleixner           2011-02-10   58  #endif
e144710b Thomas Gleixner           2010-10-01   59  	struct irqaction	*action;	/* IRQ action list */
a6967caf Thomas Gleixner           2011-02-10   60  	unsigned int		status_use_accessors;
dbec07ba Thomas Gleixner           2011-02-07   61  	unsigned int		core_internal_state__do_not_mess_with_it;
e144710b Thomas Gleixner           2010-10-01   62  	unsigned int		depth;		/* nested irq disables */
e144710b Thomas Gleixner           2010-10-01   63  	unsigned int		wake_depth;	/* nested wake enables */
e144710b Thomas Gleixner           2010-10-01   64  	unsigned int		irq_count;	/* For detecting broken IRQs */
e144710b Thomas Gleixner           2010-10-01   65  	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
e144710b Thomas Gleixner           2010-10-01   66  	unsigned int		irqs_unhandled;
1e77d0a1 Thomas Gleixner           2013-03-07   67  	atomic_t		threads_handled;
1e77d0a1 Thomas Gleixner           2013-03-07   68  	int			threads_handled_last;
e144710b Thomas Gleixner           2010-10-01   69  	raw_spinlock_t		lock;
31d9d9b6 Marc Zyngier              2011-09-23   70  	struct cpumask		*percpu_enabled;
222df54f Marc Zyngier              2016-04-11   71  	const struct cpumask	*percpu_affinity;
e144710b Thomas Gleixner           2010-10-01   72  #ifdef CONFIG_SMP
e144710b Thomas Gleixner           2010-10-01   73  	const struct cpumask	*affinity_hint;
cd7eab44 Ben Hutchings             2011-01-19   74  	struct irq_affinity_notify *affinity_notify;
e144710b Thomas Gleixner           2010-10-01   75  #ifdef CONFIG_GENERIC_PENDING_IRQ
e144710b Thomas Gleixner           2010-10-01   76  	cpumask_var_t		pending_mask;
e144710b Thomas Gleixner           2010-10-01   77  #endif
e144710b Thomas Gleixner           2010-10-01   78  #endif
b5faba21 Thomas Gleixner           2011-02-23   79  	unsigned long		threads_oneshot;
e144710b Thomas Gleixner           2010-10-01   80  	atomic_t		threads_active;
e144710b Thomas Gleixner           2010-10-01   81  	wait_queue_head_t       wait_for_threads;
cab303be Thomas Gleixner           2014-08-28   82  #ifdef CONFIG_PM_SLEEP
cab303be Thomas Gleixner           2014-08-28   83  	unsigned int		nr_actions;
cab303be Thomas Gleixner           2014-08-28   84  	unsigned int		no_suspend_depth;
17f48034 Rafael J. Wysocki         2015-02-27   85  	unsigned int		cond_suspend_depth;
cab303be Thomas Gleixner           2014-08-28   86  	unsigned int		force_resume_depth;
cab303be Thomas Gleixner           2014-08-28   87  #endif
e144710b Thomas Gleixner           2010-10-01   88  #ifdef CONFIG_PROC_FS
e144710b Thomas Gleixner           2010-10-01   89  	struct proc_dir_entry	*dir;
e144710b Thomas Gleixner           2010-10-01   90  #endif
425a5072 Thomas Gleixner           2015-12-13   91  #ifdef CONFIG_SPARSE_IRQ
425a5072 Thomas Gleixner           2015-12-13   92  	struct rcu_head		rcu;
ecb3f394 Craig Gallek              2016-09-13   93  	struct kobject		kobj;
425a5072 Thomas Gleixner           2015-12-13   94  #endif
293a7a0a Thomas Gleixner           2012-10-16   95  	int			parent_irq;
b6873807 Sebastian Andrzej Siewior 2011-07-11   96  	struct module		*owner;
e144710b Thomas Gleixner           2010-10-01   97  	const char		*name;
e144710b Thomas Gleixner           2010-10-01   98  } ____cacheline_internodealigned_in_smp;
e144710b Thomas Gleixner           2010-10-01   99  
a8994181 Thomas Gleixner           2015-07-05  100  #ifdef CONFIG_SPARSE_IRQ
a8994181 Thomas Gleixner           2015-07-05  101  extern void irq_lock_sparse(void);
a8994181 Thomas Gleixner           2015-07-05  102  extern void irq_unlock_sparse(void);
a8994181 Thomas Gleixner           2015-07-05  103  #else
a8994181 Thomas Gleixner           2015-07-05  104  static inline void irq_lock_sparse(void) { }
a8994181 Thomas Gleixner           2015-07-05  105  static inline void irq_unlock_sparse(void) { }
e144710b Thomas Gleixner           2010-10-01  106  extern struct irq_desc irq_desc[NR_IRQS];
e144710b Thomas Gleixner           2010-10-01  107  #endif
e144710b Thomas Gleixner           2010-10-01  108  
7bbf1dd2 Jiang Liu                 2015-06-01  109  static inline struct irq_desc *irq_data_to_desc(struct irq_data *data)
7bbf1dd2 Jiang Liu                 2015-06-01  110  {
755d119a Thomas Gleixner           2015-09-16 @111  	return container_of(data->common, struct irq_desc, irq_common_data);
7bbf1dd2 Jiang Liu                 2015-06-01  112  }
7bbf1dd2 Jiang Liu                 2015-06-01  113  
304adf8a Jiang Liu                 2015-06-04  114  static inline unsigned int irq_desc_get_irq(struct irq_desc *desc)
304adf8a Jiang Liu                 2015-06-04  115  {
304adf8a Jiang Liu                 2015-06-04  116  	return desc->irq_data.irq;
304adf8a Jiang Liu                 2015-06-04  117  }
304adf8a Jiang Liu                 2015-06-04  118  
d9936bb3 Thomas Gleixner           2011-03-11  119  static inline struct irq_data *irq_desc_get_irq_data(struct irq_desc *desc)
d9936bb3 Thomas Gleixner           2011-03-11  120  {
d9936bb3 Thomas Gleixner           2011-03-11  121  	return &desc->irq_data;
d9936bb3 Thomas Gleixner           2011-03-11 @122  }
d9936bb3 Thomas Gleixner           2011-03-11  123  
a0cd9ca2 Thomas Gleixner           2011-02-10  124  static inline struct irq_chip *irq_desc_get_chip(struct irq_desc *desc)
a0cd9ca2 Thomas Gleixner           2011-02-10  125  {
a0cd9ca2 Thomas Gleixner           2011-02-10  126  	return desc->irq_data.chip;
a0cd9ca2 Thomas Gleixner           2011-02-10  127  }
a0cd9ca2 Thomas Gleixner           2011-02-10  128  
a0cd9ca2 Thomas Gleixner           2011-02-10  129  static inline void *irq_desc_get_chip_data(struct irq_desc *desc)
a0cd9ca2 Thomas Gleixner           2011-02-10  130  {
a0cd9ca2 Thomas Gleixner           2011-02-10  131  	return desc->irq_data.chip_data;
a0cd9ca2 Thomas Gleixner           2011-02-10  132  }
a0cd9ca2 Thomas Gleixner           2011-02-10  133  
a0cd9ca2 Thomas Gleixner           2011-02-10  134  static inline void *irq_desc_get_handler_data(struct irq_desc *desc)
a0cd9ca2 Thomas Gleixner           2011-02-10  135  {
af7080e0 Jiang Liu                 2015-06-01  136  	return desc->irq_common_data.handler_data;
a0cd9ca2 Thomas Gleixner           2011-02-10  137  }
a0cd9ca2 Thomas Gleixner           2011-02-10  138  
a0cd9ca2 Thomas Gleixner           2011-02-10  139  static inline struct msi_desc *irq_desc_get_msi_desc(struct irq_desc *desc)
a0cd9ca2 Thomas Gleixner           2011-02-10  140  {
b237721c Jiang Liu                 2015-06-01  141  	return desc->irq_common_data.msi_desc;
a0cd9ca2 Thomas Gleixner           2011-02-10  142  }
a0cd9ca2 Thomas Gleixner           2011-02-10  143  
e144710b Thomas Gleixner           2010-10-01  144  /*
e144710b Thomas Gleixner           2010-10-01  145   * Architectures call this to let the generic IRQ layer
6584d84c Huang Shijie              2015-09-01  146   * handle an interrupt.
e144710b Thomas Gleixner           2010-10-01  147   */
bd0b9ac4 Thomas Gleixner           2015-09-14  148  static inline void generic_handle_irq_desc(struct irq_desc *desc)
e144710b Thomas Gleixner           2010-10-01  149  {
bd0b9ac4 Thomas Gleixner           2015-09-14 @150  	desc->handle_irq(desc);
e144710b Thomas Gleixner           2010-10-01  151  }
e144710b Thomas Gleixner           2010-10-01  152  
fe12bc2c Thomas Gleixner           2011-05-18  153  int generic_handle_irq(unsigned int irq);
e144710b Thomas Gleixner           2010-10-01  154  
76ba59f8 Marc Zyngier              2014-08-26  155  #ifdef CONFIG_HANDLE_DOMAIN_IRQ
76ba59f8 Marc Zyngier              2014-08-26  156  /*
76ba59f8 Marc Zyngier              2014-08-26  157   * Convert a HW interrupt number to a logical one using a IRQ domain,
76ba59f8 Marc Zyngier              2014-08-26  158   * and handle the result interrupt number. Return -EINVAL if
76ba59f8 Marc Zyngier              2014-08-26  159   * conversion failed. Providing a NULL domain indicates that the
76ba59f8 Marc Zyngier              2014-08-26  160   * conversion has already been done.
76ba59f8 Marc Zyngier              2014-08-26  161   */
76ba59f8 Marc Zyngier              2014-08-26  162  int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
76ba59f8 Marc Zyngier              2014-08-26  163  			bool lookup, struct pt_regs *regs);
76ba59f8 Marc Zyngier              2014-08-26  164  
76ba59f8 Marc Zyngier              2014-08-26  165  static inline int handle_domain_irq(struct irq_domain *domain,
76ba59f8 Marc Zyngier              2014-08-26  166  				    unsigned int hwirq, struct pt_regs *regs)
76ba59f8 Marc Zyngier              2014-08-26  167  {
76ba59f8 Marc Zyngier              2014-08-26  168  	return __handle_domain_irq(domain, hwirq, true, regs);
76ba59f8 Marc Zyngier              2014-08-26  169  }
76ba59f8 Marc Zyngier              2014-08-26  170  #endif
76ba59f8 Marc Zyngier              2014-08-26  171  
e144710b Thomas Gleixner           2010-10-01  172  /* Test to see if a driver has successfully requested an irq */
f61ae4fb Thomas Gleixner           2015-08-02  173  static inline int irq_desc_has_action(struct irq_desc *desc)
e144710b Thomas Gleixner           2010-10-01  174  {
e144710b Thomas Gleixner           2010-10-01  175  	return desc->action != NULL;
e144710b Thomas Gleixner           2010-10-01  176  }
e144710b Thomas Gleixner           2010-10-01  177  
f61ae4fb Thomas Gleixner           2015-08-02  178  static inline int irq_has_action(unsigned int irq)
f61ae4fb Thomas Gleixner           2015-08-02  179  {
f61ae4fb Thomas Gleixner           2015-08-02  180  	return irq_desc_has_action(irq_to_desc(irq));
f61ae4fb Thomas Gleixner           2015-08-02  181  }
f61ae4fb Thomas Gleixner           2015-08-02  182  
bbc9d21f Thomas Gleixner           2015-06-23  183  /**
bbc9d21f Thomas Gleixner           2015-06-23  184   * irq_set_handler_locked - Set irq handler from a locked region
bbc9d21f Thomas Gleixner           2015-06-23  185   * @data:	Pointer to the irq_data structure which identifies the irq
bbc9d21f Thomas Gleixner           2015-06-23  186   * @handler:	Flow control handler function for this interrupt
bbc9d21f Thomas Gleixner           2015-06-23  187   *
bbc9d21f Thomas Gleixner           2015-06-23  188   * Sets the handler in the irq descriptor associated to @data.
bbc9d21f Thomas Gleixner           2015-06-23  189   *
bbc9d21f Thomas Gleixner           2015-06-23  190   * Must be called with irq_desc locked and valid parameters. Typical
bbc9d21f Thomas Gleixner           2015-06-23  191   * call site is the irq_set_type() callback.
bbc9d21f Thomas Gleixner           2015-06-23  192   */
bbc9d21f Thomas Gleixner           2015-06-23  193  static inline void irq_set_handler_locked(struct irq_data *data,
bbc9d21f Thomas Gleixner           2015-06-23 @194  					  irq_flow_handler_t handler)
bbc9d21f Thomas Gleixner           2015-06-23  195  {
bbc9d21f Thomas Gleixner           2015-06-23  196  	struct irq_desc *desc = irq_data_to_desc(data);
bbc9d21f Thomas Gleixner           2015-06-23  197  
bbc9d21f Thomas Gleixner           2015-06-23  198  	desc->handle_irq = handler;
bbc9d21f Thomas Gleixner           2015-06-23  199  }
bbc9d21f Thomas Gleixner           2015-06-23  200  
bbc9d21f Thomas Gleixner           2015-06-23  201  /**
bbc9d21f Thomas Gleixner           2015-06-23  202   * irq_set_chip_handler_name_locked - Set chip, handler and name from a locked region
bbc9d21f Thomas Gleixner           2015-06-23  203   * @data:	Pointer to the irq_data structure for which the chip is set
bbc9d21f Thomas Gleixner           2015-06-23  204   * @chip:	Pointer to the new irq chip
bbc9d21f Thomas Gleixner           2015-06-23  205   * @handler:	Flow control handler function for this interrupt
bbc9d21f Thomas Gleixner           2015-06-23  206   * @name:	Name of the interrupt
bbc9d21f Thomas Gleixner           2015-06-23  207   *
bbc9d21f Thomas Gleixner           2015-06-23  208   * Replace the irq chip at the proper hierarchy level in @data and
bbc9d21f Thomas Gleixner           2015-06-23  209   * sets the handler and name in the associated irq descriptor.
bbc9d21f Thomas Gleixner           2015-06-23  210   *
bbc9d21f Thomas Gleixner           2015-06-23  211   * Must be called with irq_desc locked and valid parameters.
bbc9d21f Thomas Gleixner           2015-06-23  212   */
bbc9d21f Thomas Gleixner           2015-06-23  213  static inline void
bbc9d21f Thomas Gleixner           2015-06-23  214  irq_set_chip_handler_name_locked(struct irq_data *data, struct irq_chip *chip,
bbc9d21f Thomas Gleixner           2015-06-23 @215  				 irq_flow_handler_t handler, const char *name)
bbc9d21f Thomas Gleixner           2015-06-23  216  {
bbc9d21f Thomas Gleixner           2015-06-23  217  	struct irq_desc *desc = irq_data_to_desc(data);
bbc9d21f Thomas Gleixner           2015-06-23  218  
bbc9d21f Thomas Gleixner           2015-06-23  219  	desc->handle_irq = handler;
bbc9d21f Thomas Gleixner           2015-06-23  220  	desc->name = name;
bbc9d21f Thomas Gleixner           2015-06-23  221  	data->chip = chip;
bbc9d21f Thomas Gleixner           2015-06-23  222  }
bbc9d21f Thomas Gleixner           2015-06-23  223  
a2e8461a Thomas Gleixner           2011-03-23  224  static inline int irq_balancing_disabled(unsigned int irq)
a2e8461a Thomas Gleixner           2011-03-23  225  {
e144710b Thomas Gleixner           2010-10-01  226  	struct irq_desc *desc;
e144710b Thomas Gleixner           2010-10-01  227  
e144710b Thomas Gleixner           2010-10-01  228  	desc = irq_to_desc(irq);
0c6f8a8b Thomas Gleixner           2011-03-28 @229  	return desc->status_use_accessors & IRQ_NO_BALANCING_MASK;
e144710b Thomas Gleixner           2010-10-01  230  }
78129576 Thomas Gleixner           2011-02-10  231  
7f4a8e7b Vinayak Kale              2013-12-04  232  static inline int irq_is_percpu(unsigned int irq)
7f4a8e7b Vinayak Kale              2013-12-04  233  {
7f4a8e7b Vinayak Kale              2013-12-04  234  	struct irq_desc *desc;
7f4a8e7b Vinayak Kale              2013-12-04  235  
7f4a8e7b Vinayak Kale              2013-12-04  236  	desc = irq_to_desc(irq);
7f4a8e7b Vinayak Kale              2013-12-04 @237  	return desc->status_use_accessors & IRQ_PER_CPU;
7f4a8e7b Vinayak Kale              2013-12-04  238  }
7f4a8e7b Vinayak Kale              2013-12-04  239  
d3e17deb Thomas Gleixner           2011-03-22  240  static inline void

:::::: The code at line 53 was first introduced by commit
:::::: e144710b302525de5b90b9c3ba43562458d8957f genirq: Distangle irq.h

:::::: TO: Thomas Gleixner <tglx@linutronix.de>
:::::: CC: Thomas Gleixner <tglx@linutronix.de>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19463 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events
  2017-02-03 14:35 ` [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events Christoph Hellwig
  2017-02-03 16:17   ` kbuild test robot
  2017-02-03 17:13   ` kbuild test robot
@ 2017-02-10 11:13   ` Thomas Gleixner
  2 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2017-02-10 11:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, linux-nvme, linux-block, linux-kernel

On Fri, 3 Feb 2017, Christoph Hellwig wrote:
> @@ -127,6 +127,7 @@ enum cpuhp_state {
>  	CPUHP_AP_ONLINE_IDLE,
>  	CPUHP_AP_SMPBOOT_THREADS,
>  	CPUHP_AP_X86_VDSO_VMA_ONLINE,
> +	CPUHP_AP_IRQ_AFFINIY_ONLINE,

s/AFFINIY/AFFINITY/ perhaps?

> +static void __irq_affinity_set(unsigned int irq, struct irq_desc *desc,
> +		cpumask_t *mask)

static void __irq_affinity_set(unsigned int irq, struct irq_desc *desc,
			       cpumask_t *mask)

Please

> +{
> +	struct irq_data *data = irq_desc_get_irq_data(desc);
> +	struct irq_chip *chip = irq_data_get_irq_chip(data);
> +	int ret;
> +
> +	if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
> +		chip->irq_mask(data);
> +	ret = chip->irq_set_affinity(data, mask, true);
> +	WARN_ON_ONCE(ret);
> +
> +	/*
> +	 * We unmask if the irq was not marked masked by the core code.
> +	 * That respects the lazy irq disable behaviour.
> +	 */
> +	if (!irqd_can_move_in_process_context(data) &&
> +	    !irqd_irq_masked(data) && chip->irq_unmask)
> +		chip->irq_unmask(data);
> +}

This looks very familiar. arch/x86/kernel/irq.c comes to mind

> +
> +static void irq_affinity_online_irq(unsigned int irq, struct irq_desc *desc,
> +		unsigned int cpu)
> +{
> +	const struct cpumask *affinity;
> +	struct irq_data *data;
> +	struct irq_chip *chip;
> +	unsigned long flags;
> +	cpumask_var_t mask;
> +
> +	if (!desc)
> +		return;
> +
> +	raw_spin_lock_irqsave(&desc->lock, flags);
> +
> +	data = irq_desc_get_irq_data(desc);
> +	affinity = irq_data_get_affinity_mask(data);
> +	if (!irqd_affinity_is_managed(data) ||
> +	    !irq_has_action(irq) ||
> +	    !cpumask_test_cpu(cpu, affinity))
> +		goto out_unlock;
> +
> +	/*
> +	 * The interrupt descriptor might have been cleaned up
> +	 * already, but it is not yet removed from the radix tree
> +	 */
> +	chip = irq_data_get_irq_chip(data);
> +	if (!chip)
> +		goto out_unlock;
> +
> +	if (WARN_ON_ONCE(!chip->irq_set_affinity))
> +		goto out_unlock;
> +
> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {

You really want to allocate that _BEFORE_ locking desc->lock. GFP_KERNEL
inside the lock held region is wrong and shows that this was never tested :)

And no, we don't want GFP_ATOMIC here. You can allocate is once at the call
site and hand it in, so you avoid the alloc/free dance when iterating over
a large number of descriptors.

> +		pr_err("failed to allocate memory for cpumask\n");
> +		goto out_unlock;
> +	}
> +
> +	cpumask_and(mask, affinity, cpu_online_mask);
> +	cpumask_set_cpu(cpu, mask);
> +	if (irqd_has_set(data, IRQD_AFFINITY_SUSPENDED)) {
> +		irq_startup(desc, false);
> +		irqd_clear(data, IRQD_AFFINITY_SUSPENDED);
> +	} else {
> +		__irq_affinity_set(irq, desc, mask);
> +	}
> +
> +	free_cpumask_var(mask);
> +out_unlock:
> +	raw_spin_unlock_irqrestore(&desc->lock, flags);
> +}



> +int irq_affinity_online_cpu(unsigned int cpu)
> +{
> +	struct irq_desc *desc;
> +	unsigned int irq;
> +
> +	for_each_irq_desc(irq, desc)
> +		irq_affinity_online_irq(irq, desc, cpu);

That lacks protection against concurrent irq setup/teardown. Wants to be
protected with irq_lock_sparse()

> +	return 0;
> +}
> +
> +static void irq_affinity_offline_irq(unsigned int irq, struct irq_desc *desc,
> +		unsigned int cpu)
> +{
> +	const struct cpumask *affinity;
> +	struct irq_data *data;
> +	struct irq_chip *chip;
> +	unsigned long flags;
> +	cpumask_var_t mask;
> +
> +	if (!desc)
> +		return;
> +
> +	raw_spin_lock_irqsave(&desc->lock, flags);
> +
> +	data = irq_desc_get_irq_data(desc);
> +	affinity = irq_data_get_affinity_mask(data);
> +	if (!irqd_affinity_is_managed(data) ||
> +	    !irq_has_action(irq) ||
> +	    irqd_has_set(data, IRQD_AFFINITY_SUSPENDED) ||
> +	    !cpumask_test_cpu(cpu, affinity))
> +		goto out_unlock;
> +
> +	/*
> +	 * Complete the irq move. This cpu is going down and for
> +	 * non intr-remapping case, we can't wait till this interrupt
> +	 * arrives at this cpu before completing the irq move.
> +	 */
> +	irq_force_complete_move(desc);

Hmm. That's what we do in x86 when the cpu is really dying, i.e. before it
really goes away. It's the last resort we have.

So if a move is pending, then you force it here and then you call
__irq_affinity_set() further down, which queues another pending move, which
then gets cleaned up in the cpu dying code.

If a move is pending, then you should first verify whether the pending
affinity mask is different from the one you are going to set. If it's the
same, you can just let the final cleanup code do its job. If not, then you
need to check whether it has something to do with the current affinity mask
or whether its completely different. Otherwise you just destroy the
previous setting which tried to move the interrupt to some other place
already.

> +	/*
> +	 * The interrupt descriptor might have been cleaned up
> +	 * already, but it is not yet removed from the radix tree
> +	 */
> +	chip = irq_data_get_irq_chip(data);
> +	if (!chip)
> +		goto out_unlock;
> +
> +	if (WARN_ON_ONCE(!chip->irq_set_affinity))
> +		goto out_unlock;
> +
> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> +		pr_err("failed to allocate memory for cpumask\n");
> +		goto out_unlock;
> +	}

Same allocation issue.

> +
> +	cpumask_copy(mask, affinity);
> +	cpumask_clear_cpu(cpu, mask);
> +	if (cpumask_empty(mask)) {
> +		irqd_set(data, IRQD_AFFINITY_SUSPENDED);
> +		irq_shutdown(desc);
> +	} else {
> +		__irq_affinity_set(irq, desc, mask);
> +	}
> +
> +	free_cpumask_var(mask);
> +out_unlock:
> +	raw_spin_unlock_irqrestore(&desc->lock, flags);
> +}
> +
> +int irq_affinity_offline_cpu(unsigned int cpu)
> +{
> +	struct irq_desc *desc;
> +	unsigned int irq;
> +
> +	for_each_irq_desc(irq, desc)
> +		irq_affinity_offline_irq(irq, desc, cpu);

Same protection issue.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-10 11:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 14:35 spread MSI(-X) vectors to all possible CPUs Christoph Hellwig
2017-02-03 14:35 ` [PATCH 1/6] genirq: allow assigning affinity to present but not online CPUs Christoph Hellwig
2017-02-03 14:35 ` [PATCH 2/6] genirq/affinity: assign vectors to all present CPUs Christoph Hellwig
2017-02-03 14:35 ` [PATCH 3/6] genirq/affinity: update CPU affinity for CPU hotplug events Christoph Hellwig
2017-02-03 16:17   ` kbuild test robot
2017-02-03 17:13   ` kbuild test robot
2017-02-10 11:13   ` Thomas Gleixner
2017-02-03 14:35 ` [PATCH 4/6] blk-mq: include all present CPUs in the default queue mapping Christoph Hellwig
2017-02-03 14:35 ` [PATCH 5/6] blk-mq: create hctx for each present CPU Christoph Hellwig
2017-02-03 14:36 ` [PATCH 6/6] nvme: allocate queues for all possible CPUs Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).