[PATCH][RFC] x86/irq: Spread vectors on different CPUs

* [PATCH][RFC] x86/irq: Spread vectors on different CPUs
@ 2017-05-13 12:40 Chen Yu
  2017-06-28 19:03 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Chen Yu @ 2017-05-13 12:40 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Mika Westerberg, Rui Zhang, Chen Yu,
	Thomas Gleixner, Ingo Molnar, Anvin, H Peter, Van De Ven, Arjan,
	Brown, Len, Wysocki, Rafael J

Currently we encountered the CPU offline problem on a 16 cores server
when doing hibernation:

CPU 31 disable failed: CPU has 62 vectors assigned and there are only 0 available.

This is because:
1. One of the drivers has declare many vector resource via
   pci_enable_msix_range(), say, this driver might likely want
   to reserve 6 per logical CPU, then there would be 192 of them.
2. Besides, most of the vectors for this driver are allocated
   on CPU0 due to the current code strategy, so there would be
   insufficient slots left on CPU0 to receive any migrated
   IRQs from the other CPUs during CPU offine.
3. Furthermore, many vectors this driver reserved do no have
   any IRQ handler attached.

As a result, all vectors on CPU0 were used out and the last alive
CPU (31) failed to migrate its IRQs to the CPU0.

As we might have difficulty to reduce the number of vectors reserved
by that driver, there could be a compromising solution that, to spread
the vector allocation on different CPUs rather than always choosing
the *first* CPU in the cpumask. In this way, there would be a balanced
vector distribution. Because many vectors reserved but without used(point 3
above) will not be counted in during CPU offline, and they are now
on nonboot CPUs this problem will be solved.

Here's the trial version of this proposal and it works in my case,
it just tries to find the target CPU with the least vectors allocated(AKA,
the 'idlest' CPU). And the algorithm can be optimized but
firstly I'd like to get suggestions from you experts if this is in
the right direction, or what is the proper solution for such kind
of problems? Any comments/suggestions are appreciated.

Reported-by: Xiang Li <xiang.z.li@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Anvin, H Peter" <h.peter.anvin@intel.com>
Cc: "Van De Ven, Arjan" <arjan.van.de.ven@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: "Wysocki, Rafael J" <rafael.j.wysocki@intel.com>
Cc: x86@kernel.org
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/kernel/apic/vector.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f3557a1..d220365 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -102,6 +102,27 @@ static void free_apic_chip_data(struct apic_chip_data *data)
 	}
 }
 
+static int pick_leisure_cpu(const struct cpumask *mask)
+{
+	int cpu, vector;
+	int min_nr_vector = NR_VECTORS;
+	int target_cpu = cpumask_first_and(mask, cpu_online_mask);
+
+	for_each_cpu_and(cpu, mask, cpu_online_mask) {
+		int nr_vectors = 0;
+
+		for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
+			if (!IS_ERR_OR_NULL(per_cpu(vector_irq, cpu)[vector]))
+				nr_vectors++;
+		}
+		if (nr_vectors < min_nr_vector) {
+			min_nr_vector = nr_vectors;
+			target_cpu = cpu;
+		}
+	}
+	return target_cpu;
+}
+
 static int __assign_irq_vector(int irq, struct apic_chip_data *d,
 			       const struct cpumask *mask)
 {
@@ -131,7 +152,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
 	/* Only try and allocate irqs on cpus that are present */
 	cpumask_clear(d->old_domain);
 	cpumask_clear(searched_cpumask);
-	cpu = cpumask_first_and(mask, cpu_online_mask);
+	cpu = pick_leisure_cpu(mask);
 	while (cpu < nr_cpu_ids) {
 		int new_cpu, offset;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread