linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCI, isolcpus, and irq affinity
@ 2020-10-12 15:49 Chris Friesen
  2020-10-12 16:58 ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Friesen @ 2020-10-12 15:49 UTC (permalink / raw)
  To: linux-pci

Hi,

I'm not subscribed to the list so please CC me on replies.

I've got a linux system running the RT kernel with threaded irqs.  On 
startup we affine the various irq threads to the housekeeping CPUs, but 
I recently hit a scenario where after some days of uptime we ended up 
with a number of NVME irq threads affined to application cores instead 
(not good when we're trying to run low-latency applications).

Looking at the code, it appears that the NVME driver can in some 
scenarios end up calling pci_alloc_irq_vectors_affinity() after initial 
system startup, which seems to determine CPU affinity without any regard 
for things like "isolcpus" or "cset shield".

There seem to be other reports of similar issues:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566

It looks like some SCSI drivers and virtio_pci_common.c will also call 
pci_alloc_irq_vectors_affinity(), though I'm not sure if they would ever 
do it after system startup.

How does it make sense for the PCI subsystem to affine interrupts to 
CPUs which have explicitly been designated as "isolated"?

Thanks,

Chris



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 15:49 PCI, isolcpus, and irq affinity Chris Friesen
@ 2020-10-12 16:58 ` Bjorn Helgaas
  2020-10-12 17:39   ` Sean V Kelley
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2020-10-12 16:58 UTC (permalink / raw)
  To: Chris Friesen
  Cc: linux-pci, Christoph Hellwig, Thomas Gleixner, Nitesh Narayan Lal

[+cc Christoph, Thomas, Nitesh]

On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
> I've got a linux system running the RT kernel with threaded irqs.  On
> startup we affine the various irq threads to the housekeeping CPUs, but I
> recently hit a scenario where after some days of uptime we ended up with a
> number of NVME irq threads affined to application cores instead (not good
> when we're trying to run low-latency applications).

pci_alloc_irq_vectors_affinity() basically just passes affinity
information through to kernel/irq/affinity.c, and the PCI core doesn't
change affinity after that.

> Looking at the code, it appears that the NVME driver can in some scenarios
> end up calling pci_alloc_irq_vectors_affinity() after initial system
> startup, which seems to determine CPU affinity without any regard for things
> like "isolcpus" or "cset shield".
> 
> There seem to be other reports of similar issues:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
> 
> It looks like some SCSI drivers and virtio_pci_common.c will also call
> pci_alloc_irq_vectors_affinity(), though I'm not sure if they would ever do
> it after system startup.
> 
> How does it make sense for the PCI subsystem to affine interrupts to CPUs
> which have explicitly been designated as "isolated"?

This recent thread may be useful:

  https://lore.kernel.org/linux-pci/20200928183529.471328-1-nitesh@redhat.com/

It contains a patch to "Limit pci_alloc_irq_vectors() to housekeeping
CPUs".  I'm not sure that patch summary is 100% accurate because IIUC
that particular patch only reduces the *number* of vectors allocated
and does not actually *limit* them to housekeeping CPUs.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 16:58 ` Bjorn Helgaas
@ 2020-10-12 17:39   ` Sean V Kelley
  2020-10-12 19:18     ` Chris Friesen
  2020-10-12 17:42   ` Nitesh Narayan Lal
  2020-10-12 17:50   ` Thomas Gleixner
  2 siblings, 1 reply; 14+ messages in thread
From: Sean V Kelley @ 2020-10-12 17:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Chris Friesen, linux-pci, Christoph Hellwig, Thomas Gleixner,
	Nitesh Narayan Lal

On 12 Oct 2020, at 9:58, Bjorn Helgaas wrote:

> [+cc Christoph, Thomas, Nitesh]
>
> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>> I've got a linux system running the RT kernel with threaded irqs.  
>> On
>> startup we affine the various irq threads to the housekeeping CPUs, 
>> but I
>> recently hit a scenario where after some days of uptime we ended up 
>> with a
>> number of NVME irq threads affined to application cores instead (not 
>> good
>> when we're trying to run low-latency applications).
>
> pci_alloc_irq_vectors_affinity() basically just passes affinity
> information through to kernel/irq/affinity.c, and the PCI core doesn't
> change affinity after that.
>
>> Looking at the code, it appears that the NVME driver can in some 
>> scenarios
>> end up calling pci_alloc_irq_vectors_affinity() after initial system
>> startup, which seems to determine CPU affinity without any regard for 
>> things
>> like "isolcpus" or "cset shield".
>>
>> There seem to be other reports of similar issues:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
>>
>> It looks like some SCSI drivers and virtio_pci_common.c will also 
>> call
>> pci_alloc_irq_vectors_affinity(), though I'm not sure if they would 
>> ever do
>> it after system startup.
>>
>> How does it make sense for the PCI subsystem to affine interrupts to 
>> CPUs
>> which have explicitly been designated as "isolated"?
>
> This recent thread may be useful:
>
>   https://lore.kernel.org/linux-pci/20200928183529.471328-1-nitesh@redhat.com/
>
> It contains a patch to "Limit pci_alloc_irq_vectors() to housekeeping
> CPUs".  I'm not sure that patch summary is 100% accurate because IIUC
> that particular patch only reduces the *number* of vectors allocated
> and does not actually *limit* them to housekeeping CPUs.
>
> Bjorn


Chris,

Are you attempting a tick-less run?  I’ve seen the NO_HZ_FULL (full 
dynticks) feature behave somewhat inconsistently when PREEMPT_RT is 
enabled.  The timer ticks suppression feature can at times appear to be 
not functioning. I’m curious about how you are attempting to isolate 
the cores.

Thanks,

Sean



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 16:58 ` Bjorn Helgaas
  2020-10-12 17:39   ` Sean V Kelley
@ 2020-10-12 17:42   ` Nitesh Narayan Lal
  2020-10-12 17:50   ` Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Nitesh Narayan Lal @ 2020-10-12 17:42 UTC (permalink / raw)
  To: Bjorn Helgaas, Chris Friesen
  Cc: linux-pci, Christoph Hellwig, Thomas Gleixner


[-- Attachment #1.1: Type: text/plain, Size: 2399 bytes --]


On 10/12/20 12:58 PM, Bjorn Helgaas wrote:
> [+cc Christoph, Thomas, Nitesh]
>
> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>> I've got a linux system running the RT kernel with threaded irqs.  On
>> startup we affine the various irq threads to the housekeeping CPUs, but I
>> recently hit a scenario where after some days of uptime we ended up with a
>> number of NVME irq threads affined to application cores instead (not good
>> when we're trying to run low-latency applications).
> pci_alloc_irq_vectors_affinity() basically just passes affinity
> information through to kernel/irq/affinity.c, and the PCI core doesn't
> change affinity after that.
>
>> Looking at the code, it appears that the NVME driver can in some scenarios
>> end up calling pci_alloc_irq_vectors_affinity() after initial system
>> startup, which seems to determine CPU affinity without any regard for things
>> like "isolcpus" or "cset shield".
>>
>> There seem to be other reports of similar issues:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
>>
>> It looks like some SCSI drivers and virtio_pci_common.c will also call
>> pci_alloc_irq_vectors_affinity(), though I'm not sure if they would ever do
>> it after system startup.
>>
>> How does it make sense for the PCI subsystem to affine interrupts to CPUs
>> which have explicitly been designated as "isolated"?
> This recent thread may be useful:
>
>   https://lore.kernel.org/linux-pci/20200928183529.471328-1-nitesh@redhat.com/
>
> It contains a patch to "Limit pci_alloc_irq_vectors() to housekeeping
> CPUs".  I'm not sure that patch summary is 100% accurate because IIUC
> that particular patch only reduces the *number* of vectors allocated
> and does not actually *limit* them to housekeeping CPUs.

That is correct the above-mentioned patch is just to reduce the number of
vectors.

Based on the problem that has been described here, I think the issue could
be the usage of cpu_online_mask/cpu_possible_mask while creating the
affinity mask or while distributing the jobs. What we should be doing in
these cases is to basically use the housekeeping_cpumask instead.

A few months back similar issue has been fixed for cpumask_local_spread
and some other sub-systems [1].

[1] https://lore.kernel.org/lkml/20200625223443.2684-1-nitesh@redhat.com/

-- 
Nitesh



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 16:58 ` Bjorn Helgaas
  2020-10-12 17:39   ` Sean V Kelley
  2020-10-12 17:42   ` Nitesh Narayan Lal
@ 2020-10-12 17:50   ` Thomas Gleixner
  2020-10-12 18:58     ` Chris Friesen
  2 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2020-10-12 17:50 UTC (permalink / raw)
  To: Bjorn Helgaas, Chris Friesen
  Cc: linux-pci, Christoph Hellwig, Nitesh Narayan Lal

On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>> I've got a linux system running the RT kernel with threaded irqs.  On
>> startup we affine the various irq threads to the housekeeping CPUs, but I
>> recently hit a scenario where after some days of uptime we ended up with a
>> number of NVME irq threads affined to application cores instead (not good
>> when we're trying to run low-latency applications).

These threads and the associated interupt vectors are completely
harmless and fully idle as long as there is nothing on those isolated
CPUs which does disk I/O.

> pci_alloc_irq_vectors_affinity() basically just passes affinity
> information through to kernel/irq/affinity.c, and the PCI core doesn't
> change affinity after that.

Correct.

> This recent thread may be useful:
>
>   https://lore.kernel.org/linux-pci/20200928183529.471328-1-nitesh@redhat.com/
>
> It contains a patch to "Limit pci_alloc_irq_vectors() to housekeeping
> CPUs".  I'm not sure that patch summary is 100% accurate because IIUC
> that particular patch only reduces the *number* of vectors allocated
> and does not actually *limit* them to housekeeping CPUs.

That patch is a bandaid at best and for the managed interrupt scenario
not really preventing that interrupts + threads are affine to isolated
CPUs.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 17:50   ` Thomas Gleixner
@ 2020-10-12 18:58     ` Chris Friesen
  2020-10-12 19:07       ` Keith Busch
  2020-10-12 19:31       ` Thomas Gleixner
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Friesen @ 2020-10-12 18:58 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas
  Cc: linux-pci, Christoph Hellwig, Nitesh Narayan Lal

On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
> On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
>> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>>> I've got a linux system running the RT kernel with threaded irqs.  On
>>> startup we affine the various irq threads to the housekeeping CPUs, but I
>>> recently hit a scenario where after some days of uptime we ended up with a
>>> number of NVME irq threads affined to application cores instead (not good
>>> when we're trying to run low-latency applications).
> 
> These threads and the associated interupt vectors are completely
> harmless and fully idle as long as there is nothing on those isolated
> CPUs which does disk I/O.

Some of the irq threads are affined (by the kernel presumably) to 
multiple CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a 
couple of other queues were affined 0x1c00001c0).

In this case could disk I/O submitted by one of those CPUs end up 
interrupting another one?

Chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 18:58     ` Chris Friesen
@ 2020-10-12 19:07       ` Keith Busch
  2020-10-12 19:44         ` Thomas Gleixner
  2020-10-15 18:47         ` Chris Friesen
  2020-10-12 19:31       ` Thomas Gleixner
  1 sibling, 2 replies; 14+ messages in thread
From: Keith Busch @ 2020-10-12 19:07 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Thomas Gleixner, Bjorn Helgaas, linux-pci, Christoph Hellwig,
	Nitesh Narayan Lal

On Mon, Oct 12, 2020 at 12:58:41PM -0600, Chris Friesen wrote:
> On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
> > On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
> > > On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
> > > > I've got a linux system running the RT kernel with threaded irqs.  On
> > > > startup we affine the various irq threads to the housekeeping CPUs, but I
> > > > recently hit a scenario where after some days of uptime we ended up with a
> > > > number of NVME irq threads affined to application cores instead (not good
> > > > when we're trying to run low-latency applications).
> > 
> > These threads and the associated interupt vectors are completely
> > harmless and fully idle as long as there is nothing on those isolated
> > CPUs which does disk I/O.
> 
> Some of the irq threads are affined (by the kernel presumably) to multiple
> CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a couple of other
> queues were affined 0x1c00001c0).

That means you have more CPUs than your controller has queues. When that
happens, some sharing of the queue resources among CPUs is required.
 
> In this case could disk I/O submitted by one of those CPUs end up
> interrupting another one?

If you dispatch IO from any CPU in the mask, then the completion side
wakes the thread to run on one of the CPUs in the affinity mask.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 17:39   ` Sean V Kelley
@ 2020-10-12 19:18     ` Chris Friesen
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Friesen @ 2020-10-12 19:18 UTC (permalink / raw)
  To: Sean V Kelley, Bjorn Helgaas
  Cc: linux-pci, Christoph Hellwig, Thomas Gleixner, Nitesh Narayan Lal

On 10/12/2020 11:39 AM, Sean V Kelley wrote:

> Are you attempting a tick-less run?  I’ve seen the NO_HZ_FULL (full 
> dynticks) feature behave somewhat inconsistently when PREEMPT_RT is 
> enabled.  The timer ticks suppression feature can at times appear to be 
> not functioning. I’m curious about how you are attempting to isolate the 
> cores.

We're trying to run tickess on a subset of the CPUs, using a combination 
of isolcpus, irqaffinity, rcu_nocbs, and nohz_full boot args, as well as 
runtime affinity adjustment for tasks and IRQs.

I don't think we're seeing full suppression of timer ticks, but partial 
suppression is happening.  Application CPUs are showing as low as 10K 
interrupts for LOC while the housekeeping CPUs are 300-500 million.

Chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 18:58     ` Chris Friesen
  2020-10-12 19:07       ` Keith Busch
@ 2020-10-12 19:31       ` Thomas Gleixner
  2020-10-12 20:24         ` David Woodhouse
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2020-10-12 19:31 UTC (permalink / raw)
  To: Chris Friesen, Bjorn Helgaas
  Cc: linux-pci, Christoph Hellwig, Nitesh Narayan Lal

On Mon, Oct 12 2020 at 12:58, Chris Friesen wrote:
> On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
>> On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
>>> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>>>> I've got a linux system running the RT kernel with threaded irqs.  On
>>>> startup we affine the various irq threads to the housekeeping CPUs, but I
>>>> recently hit a scenario where after some days of uptime we ended up with a
>>>> number of NVME irq threads affined to application cores instead (not good
>>>> when we're trying to run low-latency applications).
>> 
>> These threads and the associated interupt vectors are completely
>> harmless and fully idle as long as there is nothing on those isolated
>> CPUs which does disk I/O.
>
> Some of the irq threads are affined (by the kernel presumably) to 
> multiple CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a 
> couple of other queues were affined 0x1c00001c0).
>
> In this case could disk I/O submitted by one of those CPUs end up 
> interrupting another one?

On older kernels, yes.

X86 enforces effective single CPU affinity for interrupts since v4.15.

The associated irq thread is always following the hardware effective
interrupt affinity since v4.17.

The hardware interrupt itself is routed to a housekeeping CPU in the
affinity mask as long as there is one online since v5.6.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 19:07       ` Keith Busch
@ 2020-10-12 19:44         ` Thomas Gleixner
  2020-10-15 18:47         ` Chris Friesen
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2020-10-12 19:44 UTC (permalink / raw)
  To: Keith Busch, Chris Friesen
  Cc: Bjorn Helgaas, linux-pci, Christoph Hellwig, Nitesh Narayan Lal

On Mon, Oct 12 2020 at 12:07, Keith Busch wrote:

> On Mon, Oct 12, 2020 at 12:58:41PM -0600, Chris Friesen wrote:
>> On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
>> > On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
>> > > On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>> > > > I've got a linux system running the RT kernel with threaded irqs.  On
>> > > > startup we affine the various irq threads to the housekeeping CPUs, but I
>> > > > recently hit a scenario where after some days of uptime we ended up with a
>> > > > number of NVME irq threads affined to application cores instead (not good
>> > > > when we're trying to run low-latency applications).
>> > 
>> > These threads and the associated interupt vectors are completely
>> > harmless and fully idle as long as there is nothing on those isolated
>> > CPUs which does disk I/O.
>> 
>> Some of the irq threads are affined (by the kernel presumably) to multiple
>> CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a couple of other
>> queues were affined 0x1c00001c0).
>
> That means you have more CPUs than your controller has queues. When that
> happens, some sharing of the queue resources among CPUs is required.
>  
>> In this case could disk I/O submitted by one of those CPUs end up
>> interrupting another one?
>
> If you dispatch IO from any CPU in the mask, then the completion side
> wakes the thread to run on one of the CPUs in the affinity mask.

Pre 4.17, yes.

From 4.17 onwards the irq thread is following the effective affinity of
the hardware interrupt which is a single CPU target.

Since 5.6 the effective affinity is steered to a housekeeping CPU if the
cpumask of a queue spawns multiple CPUs.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 19:31       ` Thomas Gleixner
@ 2020-10-12 20:24         ` David Woodhouse
  2020-10-12 22:25           ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: David Woodhouse @ 2020-10-12 20:24 UTC (permalink / raw)
  To: Thomas Gleixner, Chris Friesen, Bjorn Helgaas
  Cc: linux-pci, Christoph Hellwig, Nitesh Narayan Lal

[-- Attachment #1: Type: text/plain, Size: 1186 bytes --]

On Mon, 2020-10-12 at 21:31 +0200, Thomas Gleixner wrote:
> > In this case could disk I/O submitted by one of those CPUs end up 
> > interrupting another one?
> 
> On older kernels, yes.
> 
> X86 enforces effective single CPU affinity for interrupts since v4.15.

Is that here to stay? Because it means that sending external interrupts
in logical mode is kind of pointless, and we might as well do this...

--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -187,3 +187,3 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
        .irq_delivery_mode              = dest_Fixed,
-       .irq_dest_mode                  = 1, /* logical */
+       .irq_dest_mode                  = 0, /* physical */
 
@@ -205,3 +205,3 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
 
-       .calc_dest_apicid               = x2apic_calc_apicid,
+       .calc_dest_apicid               = apic_default_calc_apicid,
 

And then a bunch of things which currently set x2apic_phys just because
of *external* IRQ limitations, no longer have to, and can still benefit
from multicast of IPIs to whole clusters at a time.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 20:24         ` David Woodhouse
@ 2020-10-12 22:25           ` Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2020-10-12 22:25 UTC (permalink / raw)
  To: David Woodhouse, Chris Friesen, Bjorn Helgaas
  Cc: linux-pci, Christoph Hellwig, Nitesh Narayan Lal

On Mon, Oct 12 2020 at 21:24, David Woodhouse wrote:
> On Mon, 2020-10-12 at 21:31 +0200, Thomas Gleixner wrote:
>> > In this case could disk I/O submitted by one of those CPUs end up 
>> > interrupting another one?
>> 
>> On older kernels, yes.
>> 
>> X86 enforces effective single CPU affinity for interrupts since v4.15.
>
> Is that here to stay?

Yes. The way how logical mode works is that it sends the vast majority
to the first CPU in the logical mask. So the benefit is pretty much zero
and we haven't had anyone complaining since we switched to that mode.

Having single CPU affinity enforced made the whole x86 affinity
disaster^Wlogic way simpler and also reduced vector pressure
significantly.

> Because it means that sending external interrupts
> in logical mode is kind of pointless, and we might as well do this...
>
> --- a/arch/x86/kernel/apic/x2apic_cluster.c
> +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> @@ -187,3 +187,3 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
>         .irq_delivery_mode              = dest_Fixed,
> -       .irq_dest_mode                  = 1, /* logical */
> +       .irq_dest_mode                  = 0, /* physical */
>  
> @@ -205,3 +205,3 @@ static struct apic apic_x2apic_cluster __ro_after_init = {
>  
> -       .calc_dest_apicid               = x2apic_calc_apicid,
> +       .calc_dest_apicid               = apic_default_calc_apicid,
>  
>
> And then a bunch of things which currently set x2apic_phys just because
> of *external* IRQ limitations, no longer have to, and can still benefit
> from multicast of IPIs to whole clusters at a time.

Indeed, never thought about that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-12 19:07       ` Keith Busch
  2020-10-12 19:44         ` Thomas Gleixner
@ 2020-10-15 18:47         ` Chris Friesen
  2020-10-15 19:02           ` Keith Busch
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Friesen @ 2020-10-15 18:47 UTC (permalink / raw)
  To: Keith Busch
  Cc: Thomas Gleixner, Bjorn Helgaas, linux-pci, Christoph Hellwig,
	Nitesh Narayan Lal

On 10/12/2020 1:07 PM, Keith Busch wrote:
> On Mon, Oct 12, 2020 at 12:58:41PM -0600, Chris Friesen wrote:
>> On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
>>> On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
>>>> On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
>>>>> I've got a linux system running the RT kernel with threaded irqs.  On
>>>>> startup we affine the various irq threads to the housekeeping CPUs, but I
>>>>> recently hit a scenario where after some days of uptime we ended up with a
>>>>> number of NVME irq threads affined to application cores instead (not good
>>>>> when we're trying to run low-latency applications).
>>>
>>> These threads and the associated interupt vectors are completely
>>> harmless and fully idle as long as there is nothing on those isolated
>>> CPUs which does disk I/O.
>>
>> Some of the irq threads are affined (by the kernel presumably) to multiple
>> CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a couple of other
>> queues were affined 0x1c00001c0).
> 
> That means you have more CPUs than your controller has queues. When that
> happens, some sharing of the queue resources among CPUs is required.

Is it required that every CPU is part of the mask for at least one queue?

If we can preferentially route interrupts to the housekeeping CPUs (for 
queues with multiple CPUs in the mask), how is that different than just 
affining all the queues to the housekeeping CPUs and leaving the 
isolated CPUs out of the mask entirely?

Thanks,
Chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PCI, isolcpus, and irq affinity
  2020-10-15 18:47         ` Chris Friesen
@ 2020-10-15 19:02           ` Keith Busch
  0 siblings, 0 replies; 14+ messages in thread
From: Keith Busch @ 2020-10-15 19:02 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Thomas Gleixner, Bjorn Helgaas, linux-pci, Christoph Hellwig,
	Nitesh Narayan Lal

On Thu, Oct 15, 2020 at 12:47:23PM -0600, Chris Friesen wrote:
> On 10/12/2020 1:07 PM, Keith Busch wrote:
> > On Mon, Oct 12, 2020 at 12:58:41PM -0600, Chris Friesen wrote:
> > > On 10/12/2020 11:50 AM, Thomas Gleixner wrote:
> > > > On Mon, Oct 12 2020 at 11:58, Bjorn Helgaas wrote:
> > > > > On Mon, Oct 12, 2020 at 09:49:37AM -0600, Chris Friesen wrote:
> > > > > > I've got a linux system running the RT kernel with threaded irqs.  On
> > > > > > startup we affine the various irq threads to the housekeeping CPUs, but I
> > > > > > recently hit a scenario where after some days of uptime we ended up with a
> > > > > > number of NVME irq threads affined to application cores instead (not good
> > > > > > when we're trying to run low-latency applications).
> > > > 
> > > > These threads and the associated interupt vectors are completely
> > > > harmless and fully idle as long as there is nothing on those isolated
> > > > CPUs which does disk I/O.
> > > 
> > > Some of the irq threads are affined (by the kernel presumably) to multiple
> > > CPUs (nvme1q2 and nvme0q2 were both affined 0x38000038, a couple of other
> > > queues were affined 0x1c00001c0).
> > 
> > That means you have more CPUs than your controller has queues. When that
> > happens, some sharing of the queue resources among CPUs is required.
> 
> Is it required that every CPU is part of the mask for at least one queue?
>
> If we can preferentially route interrupts to the housekeeping CPUs (for
> queues with multiple CPUs in the mask), how is that different than just
> affining all the queues to the housekeeping CPUs and leaving the isolated
> CPUs out of the mask entirely?

The same mask is used for submission affinity. Any CPU can dispatch IO,
so every CPU has to affine to a queue.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-10-15 19:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 15:49 PCI, isolcpus, and irq affinity Chris Friesen
2020-10-12 16:58 ` Bjorn Helgaas
2020-10-12 17:39   ` Sean V Kelley
2020-10-12 19:18     ` Chris Friesen
2020-10-12 17:42   ` Nitesh Narayan Lal
2020-10-12 17:50   ` Thomas Gleixner
2020-10-12 18:58     ` Chris Friesen
2020-10-12 19:07       ` Keith Busch
2020-10-12 19:44         ` Thomas Gleixner
2020-10-15 18:47         ` Chris Friesen
2020-10-15 19:02           ` Keith Busch
2020-10-12 19:31       ` Thomas Gleixner
2020-10-12 20:24         ` David Woodhouse
2020-10-12 22:25           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).