linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs
@ 2018-11-01  3:13 Long Li
  2018-11-01  8:49 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Long Li @ 2018-11-01  3:13 UTC (permalink / raw)
  To: Michael Kelley, Thomas Gleixner, linux-kernel; +Cc: Long Li

From: Long Li <longli@microsoft.com>

On a large system with multiple devices of the same class (e.g. NVMe disks,
using managed IRQs), the kernel tends to concentrate their IRQs on several
CPUs.

The issue is that when NVMe calls irq_matrix_alloc_managed(), the assigned
CPU tends to be the first several CPUs in the cpumask, because they check for
cpumap->available that will not change after managed IRQs are reserved.

In irq_matrix->cpumap, "available" is set when IRQs are allocated earlier
in the IRQ allocation process. This value is caculated based on
1. how many unmanaged IRQs are allocated on this CPU
2. how many managed IRQs are reserved on this CPU

But "available" is not accurate in accouting the real IRQs load on a given CPU.

For a managed IRQ, it tends to reserve more than one CPU, based on cpumask in
irq_matrix_reserve_managed. But later when actually allocating CPU for this
IRQ, only one CPU is allocated. Because "available" is calculated at the time
managed IRQ is reserved, it tends to indicate a CPU has more IRQs than it's
actually assigned.

When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(), it
decreases "allocated" based on the actually assignment of this IRQ to this CPU.
Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this CPU.
For this reason, checking "allocated" is more accurate than checking
"available" for a given CPU, and result in a more evenly distributed IRQ
across all CPUs.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 kernel/irq/matrix.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 6e6d467f3dec..a51689e3e7c0 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -128,7 +128,7 @@ static unsigned int matrix_alloc_area(struct irq_matrix *m, struct cpumap *cm,
 static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
 					const struct cpumask *msk)
 {
-	unsigned int cpu, best_cpu, maxavl = 0;
+	unsigned int cpu, best_cpu, min_allocated = UINT_MAX;
 	struct cpumap *cm;
 
 	best_cpu = UINT_MAX;
@@ -136,11 +136,11 @@ static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
 	for_each_cpu(cpu, msk) {
 		cm = per_cpu_ptr(m->maps, cpu);
 
-		if (!cm->online || cm->available <= maxavl)
+		if (!cm->online || cm->allocated > min_allocated)
 			continue;
 
 		best_cpu = cpu;
-		maxavl = cm->available;
+		min_allocated = cm->allocated;
 	}
 	return best_cpu;
 }
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs
  2018-11-01  3:13 [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs Long Li
@ 2018-11-01  8:49 ` Thomas Gleixner
  2018-11-01 16:39   ` Long Li
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2018-11-01  8:49 UTC (permalink / raw)
  To: Long Li; +Cc: Michael Kelley, linux-kernel

Long,

On Thu, 1 Nov 2018, Long Li wrote:
> On a large system with multiple devices of the same class (e.g. NVMe disks,
> using managed IRQs), the kernel tends to concentrate their IRQs on several
> CPUs.
> 
> The issue is that when NVMe calls irq_matrix_alloc_managed(), the assigned
> CPU tends to be the first several CPUs in the cpumask, because they check for
> cpumap->available that will not change after managed IRQs are reserved.
> 
> In irq_matrix->cpumap, "available" is set when IRQs are allocated earlier
> in the IRQ allocation process. This value is caculated based on

calculated

> 1. how many unmanaged IRQs are allocated on this CPU
> 2. how many managed IRQs are reserved on this CPU
> 
> But "available" is not accurate in accouting the real IRQs load on a given CPU.
> 
> For a managed IRQ, it tends to reserve more than one CPU, based on cpumask in
> irq_matrix_reserve_managed. But later when actually allocating CPU for this
> IRQ, only one CPU is allocated. Because "available" is calculated at the time
> managed IRQ is reserved, it tends to indicate a CPU has more IRQs than it's
> actually assigned.
> 
> When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(), it
> decreases "allocated" based on the actually assignment of this IRQ to this CPU.

decreases?

> Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this CPU.

ditto

> For this reason, checking "allocated" is more accurate than checking
> "available" for a given CPU, and result in a more evenly distributed IRQ
> across all CPUs.

Again, this approach is only correct for managed interrupts. Why?

Assume that total vector space size  = 10

CPU 0:
       allocated	=  8
       available	=  1

       i.e. there are 2 managed reserved, but not assigned interrupts

CPU 1:
       allocated	=  7
       available	=  0

       i.e. there are 3 managed reserved, but not assigned interrupts

Now allocate a non managed interrupt:

irq_matrix_alloc()

	cpu = find_best_cpu() <-- returns CPU1

	---> FAIL

The allocation fails because it cannot allocate from the managed reserved
space. The managed reserved space is guaranteed even if the vectors are not
assigned. This is required to make hotplug work and to allow late
activation without breaking the guarantees.

Non managed has no guarantees, it's a best effort approach, so it can fail.
But the fail above is just wrong.

You really need to treat managed and unmanaged CPU selection differently.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs
  2018-11-01  8:49 ` Thomas Gleixner
@ 2018-11-01 16:39   ` Long Li
  0 siblings, 0 replies; 3+ messages in thread
From: Long Li @ 2018-11-01 16:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Michael Kelley, linux-kernel

> Subject: Re: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts
> based on allocated IRQs
> 
> Long,
> 
> On Thu, 1 Nov 2018, Long Li wrote:
> > On a large system with multiple devices of the same class (e.g. NVMe
> > disks, using managed IRQs), the kernel tends to concentrate their IRQs
> > on several CPUs.
> >
> > The issue is that when NVMe calls irq_matrix_alloc_managed(), the
> > assigned CPU tends to be the first several CPUs in the cpumask,
> > because they check for
> > cpumap->available that will not change after managed IRQs are reserved.
> >
> > In irq_matrix->cpumap, "available" is set when IRQs are allocated
> > earlier in the IRQ allocation process. This value is caculated based
> > on
> 
> calculated
> 
> > 1. how many unmanaged IRQs are allocated on this CPU 2. how many
> > managed IRQs are reserved on this CPU
> >
> > But "available" is not accurate in accouting the real IRQs load on a given CPU.
> >
> > For a managed IRQ, it tends to reserve more than one CPU, based on
> > cpumask in irq_matrix_reserve_managed. But later when actually
> > allocating CPU for this IRQ, only one CPU is allocated. Because
> > "available" is calculated at the time managed IRQ is reserved, it
> > tends to indicate a CPU has more IRQs than it's actually assigned.
> >
> > When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(),
> > it decreases "allocated" based on the actually assignment of this IRQ to this
> CPU.
> 
> decreases?
> 
> > Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this
> CPU.
> 
> ditto
> 
> > For this reason, checking "allocated" is more accurate than checking
> > "available" for a given CPU, and result in a more evenly distributed
> > IRQ across all CPUs.
> 
> Again, this approach is only correct for managed interrupts. Why?
> 
> Assume that total vector space size  = 10
> 
> CPU 0:
>        allocated	=  8
>        available	=  1
> 
>        i.e. there are 2 managed reserved, but not assigned interrupts
> 
> CPU 1:
>        allocated	=  7
>        available	=  0
> 
>        i.e. there are 3 managed reserved, but not assigned interrupts
> 
> Now allocate a non managed interrupt:
> 
> irq_matrix_alloc()
> 
> 	cpu = find_best_cpu() <-- returns CPU1
> 
> 	---> FAIL
> 
> The allocation fails because it cannot allocate from the managed reserved
> space. The managed reserved space is guaranteed even if the vectors are not
> assigned. This is required to make hotplug work and to allow late activation
> without breaking the guarantees.
> 
> Non managed has no guarantees, it's a best effort approach, so it can fail.
> But the fail above is just wrong.
> 
> You really need to treat managed and unmanaged CPU selection differently.

Thank you for the explanation. I will send another patch to do it properly.

Long

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-01 16:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01  3:13 [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs Long Li
2018-11-01  8:49 ` Thomas Gleixner
2018-11-01 16:39   ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).