[RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts

From: Dou Liyang <dou_liyang@163.com>
To: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de, kashyap.desai@broadcom.com,
	shivasharan.srikanteshwara@broadcom.com,
	sumit.saxena@broadcom.com, ming.lei@redhat.com, hch@lst.de,
	douly.fnst@cn.fujitsu.com
Subject: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts
Date: Thu, 13 Sep 2018 11:10:11 +0800	[thread overview]
Message-ID: <20180913031011.17376-1-dou_liyang@163.com> (raw)

From: Dou Liyang <douly.fnst@cn.fujitsu.com>

As Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
may be used to some extra reply queues for performance. the best way to
map the pre/post vectors is map them to the local numa node.

But, current Linux can't do that, because

  The pre and post vectors are marked managed and their affinity mask
  is set to the irq default affinity mask.

  The default affinity mask is by default ALL cpus, but it can be tweaked
  both on the kernel command line and via proc.

  If that mask is only a subset of CPUs and all of them go offline
  then these vectors are shutdown in managed mode.

So, clear these affinity mask and check it in alloc_desc() to leave them
as regular interrupts which can be affinity controlled and also can move
freely on hotplug.

Note: will break the validation of affinity mask(s)

Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reported-by: Sumit Saxena <sumit.saxena@broadcom.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 kernel/irq/affinity.c |  9 ++++++---
 kernel/irq/irqdesc.c  | 24 ++++++++++--------------
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..ba35a5050dd2 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -204,7 +204,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 
 	/* Fill out vectors at the beginning that don't need affinity */
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
-		cpumask_copy(masks + curvec, irq_default_affinity);
+		cpumask_clear(masks + curvec);
 
 	/* Stabilize the cpumasks */
 	get_online_cpus();
@@ -234,10 +234,13 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	/* Fill out vectors at the end that don't need affinity */
 	if (usedvecs >= affvecs)
 		curvec = affd->pre_vectors + affvecs;
-	else
+	else {
 		curvec = affd->pre_vectors + usedvecs;
+		for (; curvec < affd->pre_vectors + affvecs; curvec++)
+			cpumask_copy(masks + curvec, irq_default_affinity);
+	}
 	for (; curvec < nvecs; curvec++)
-		cpumask_copy(masks + curvec, irq_default_affinity);
+		cpumask_clear(masks + curvec);
 
 outnodemsk:
 	free_node_to_cpumask(node_to_cpumask);
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 578d0e5f1b5b..5cffa791a20b 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -453,24 +453,20 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node,
 {
 	const struct cpumask *mask = NULL;
 	struct irq_desc *desc;
-	unsigned int flags;
+	unsigned int flags = 0;
 	int i;
 
-	/* Validate affinity mask(s) */
-	if (affinity) {
-		for (i = 0, mask = affinity; i < cnt; i++, mask++) {
-			if (cpumask_empty(mask))
-				return -EINVAL;
-		}
-	}
-
-	flags = affinity ? IRQD_AFFINITY_MANAGED | IRQD_MANAGED_SHUTDOWN : 0;
-	mask = NULL;
-
 	for (i = 0; i < cnt; i++) {
 		if (affinity) {
-			node = cpu_to_node(cpumask_first(affinity));
-			mask = affinity;
+			if (cpumask_empty(affinity)) {
+				flags = 0;
+				mask = NULL;
+			} else {
+				flags = IRQD_AFFINITY_MANAGED |
+					IRQD_MANAGED_SHUTDOWN;
+				mask = affinity;
+				node = cpu_to_node(cpumask_first(affinity));
+			}
 			affinity++;
 		}
 		desc = alloc_desc(start + i, node, flags, mask, owner);
-- 
2.14.3