linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] genirq/affinity: Spread IRQs to all available NUMA nodes
@ 2018-11-01 23:51 Long Li
  2018-11-02 17:37 ` Michael Kelley
  0 siblings, 1 reply; 3+ messages in thread
From: Long Li @ 2018-11-01 23:51 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel; +Cc: Long Li

From: Long Li <longli@microsoft.com>

On systems with large number of NUMA nodes, there may be more NUMA nodes than
the number of MSI/MSI-X interrupts that device requests for. The current code
always picks up the NUMA nodes starting from the node 0, up to the number of
interrupts requested. This may left some later NUMA nodes unused.

For example, if the system has 16 NUMA nodes, and the device reqeusts for 8
interrupts, NUMA node 0 to 7 are assigned for those interrupts, NUMA 8 to 15
are unused.

There are several problems with this approach:
1. Later, when those managed IRQs are allocated, they can not be assigned to
NUMA 8 to 15, this may create an IRQ concentration on NUMA 0 to 7.
2. Some upper layers assume affinity mask has a complete coverage over NUMA nodes.
For example, block layer use the affinity mask to decide how to map CPU queues to
hardware queues, missing NUMA nodes in the masks may result in an uneven mapping
of queues. For the above example of 16 NUMA nodes, CPU queues on NUMA node 0 to 7
are assigned to the hardware queues 0 to 7, respectively. But CPU queues on NUMA
node 8 to 15 are all assigned to the hardware queue 0.

Fix this problem by going over all NUMA nodes and assign them round-robin to
all IRQs.

Signed-off-by: Long Li <longli@microsoft.com>
---
 kernel/irq/affinity.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..2d08b560d4b6 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -117,12 +117,13 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
 	 */
 	if (numvecs <= nodes) {
 		for_each_node_mask(n, nodemsk) {
-			cpumask_copy(masks + curvec, node_to_cpumask[n]);
-			if (++done == numvecs)
-				break;
+			cpumask_or(masks + curvec, masks + curvec, node_to_cpumask[n]);
+			done++;
 			if (++curvec == last_affv)
 				curvec = affd->pre_vectors;
 		}
+		if (done > numvecs)
+			done = numvecs;
 		goto out;
 	}
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] genirq/affinity: Spread IRQs to all available NUMA nodes
  2018-11-01 23:51 [PATCH] genirq/affinity: Spread IRQs to all available NUMA nodes Long Li
@ 2018-11-02 17:37 ` Michael Kelley
  2018-11-02 17:53   ` Long Li
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Kelley @ 2018-11-02 17:37 UTC (permalink / raw)
  To: Long Li, Thomas Gleixner, linux-kernel

From: Long Li <longli@microsoft.com>  Sent: Thursday, November 1, 2018 4:52 PM
> 
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -117,12 +117,13 @@ static int irq_build_affinity_masks(const struct irq_affinity *affd,
>  	 */
>  	if (numvecs <= nodes) {
>  		for_each_node_mask(n, nodemsk) {
> -			cpumask_copy(masks + curvec, node_to_cpumask[n]);
> -			if (++done == numvecs)
> -				break;
> +			cpumask_or(masks + curvec, masks + curvec, node_to_cpumask[n]);
> +			done++;
>  			if (++curvec == last_affv)
>  				curvec = affd->pre_vectors;
>  		}

When the above for loop is exited, 'done' will always be equal to 'nodes' since
there are no early exits from the loop.  Hence there's no need to be
incrementing 'done' in the loop. 

> +		if (done > numvecs)
> +			done = numvecs;

And if 'done' would always be equal to 'nodes', there is no need for the test.
Just always set 'done' to 'numvecs'.

>  		goto out;
>  	}
> 
> --
> 2.14.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] genirq/affinity: Spread IRQs to all available NUMA nodes
  2018-11-02 17:37 ` Michael Kelley
@ 2018-11-02 17:53   ` Long Li
  0 siblings, 0 replies; 3+ messages in thread
From: Long Li @ 2018-11-02 17:53 UTC (permalink / raw)
  To: Michael Kelley, Thomas Gleixner, linux-kernel

> Subject: RE: [PATCH] genirq/affinity: Spread IRQs to all available NUMA
> nodes
> 
> From: Long Li <longli@microsoft.com>  Sent: Thursday, November 1, 2018
> 4:52 PM
> >
> > --- a/kernel/irq/affinity.c
> > +++ b/kernel/irq/affinity.c
> > @@ -117,12 +117,13 @@ static int irq_build_affinity_masks(const struct
> irq_affinity *affd,
> >  	 */
> >  	if (numvecs <= nodes) {
> >  		for_each_node_mask(n, nodemsk) {
> > -			cpumask_copy(masks + curvec,
> node_to_cpumask[n]);
> > -			if (++done == numvecs)
> > -				break;
> > +			cpumask_or(masks + curvec, masks + curvec,
> node_to_cpumask[n]);
> > +			done++;
> >  			if (++curvec == last_affv)
> >  				curvec = affd->pre_vectors;
> >  		}
> 
> When the above for loop is exited, 'done' will always be equal to 'nodes'
> since there are no early exits from the loop.  Hence there's no need to be
> incrementing 'done' in the loop.
> 
> > +		if (done > numvecs)
> > +			done = numvecs;
> 
> And if 'done' would always be equal to 'nodes', there is no need for the test.
> Just always set 'done' to 'numvecs'.

Thanks. I will fix this in v2.

> 
> >  		goto out;
> >  	}
> >
> > --
> > 2.14.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-02 17:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 23:51 [PATCH] genirq/affinity: Spread IRQs to all available NUMA nodes Long Li
2018-11-02 17:37 ` Michael Kelley
2018-11-02 17:53   ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).