linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [IRQ] IRQ affinity not working properly?
@ 2021-01-29 19:17 Chris Friesen
  2021-03-28 18:45 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2021-01-29 19:17 UTC (permalink / raw)
  To: Thomas Gleixner, LKML

Hi,

I'm not subscribed to the list, please cc me on replies.

I have a CentOS 7 linux system with 48 logical CPUs and a number of 
Intel NICs running the i40e driver.  It was booted with 
irqaffinity=0-1,24-25 in the kernel boot args, resulting in 
/proc/irq/default_smp_affinity showing "0000,03000003".   CPUs 2-11 are 
set as "isolated" in the kernel boot args.  The irqbalance daemon is not 
running.

The problem I'm seeing is that /proc/interrupts shows iavf interrupts 
(associated with physical devices running the i40e driver) on other CPUs 
than the expected affinity.  For example, here are some iavf interrupts 
on CPU 4 where I would not expect to see any interrupts given that "cat 
/proc/irq/<NUM>/smp_affinity_list" reports "0-1,24-25" for all these 
interrupts.  (Sorry for the line wrapping.)

cat /proc/interrupts | grep -e CPU -e 941: -e 942: -e 943: -e 944: -e 
945: -e 961: -e 962: -e 963: -e 964: -e 965:

              CPU0       CPU1       CPU2       CPU3 CPU4       CPU5
941:          0          0          0          0 28490          0 
IR-PCI-MSI-edge iavf-0000:b5:03.6:mbx
942:          0          0          0          0 333832         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-0
943:          0          0          0          0 300842         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-1
944:          0          0          0          0 333845         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-2
945:          0          0          0          0 333822         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-3
961:          0          0          0          0 28492         0 
IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
962:          0          0          0          0 435608         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-0
963:          0          0          0          0 394832         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-1
964:          0          0          0          0 398414         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-2
965:          0          0          0          0 192847         0 
IR-PCI-MSI-edge      iavf-net1-TxRx-3

There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at 
roughly 1 per second without any traffic, while the interrupt rate on 
the "iavf-net1-TxRx-<X>" seemed to be related to traffic.

Is this expected?  It seems like the IRQ subsystem is not respecting the 
configured SMP affinity for the interrupt in question.  I've also seen 
the same behaviour with igb interrupts.

Anyone have any ideas?

Thanks,

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [IRQ] IRQ affinity not working properly?
  2021-01-29 19:17 [IRQ] IRQ affinity not working properly? Chris Friesen
@ 2021-03-28 18:45 ` Thomas Gleixner
  2021-04-21 13:31   ` Nitesh Narayan Lal
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2021-03-28 18:45 UTC (permalink / raw)
  To: Chris Friesen, LKML

On Fri, Jan 29 2021 at 13:17, Chris Friesen wrote:
> I have a CentOS 7 linux system with 48 logical CPUs and a number of

Kernel version?

> Intel NICs running the i40e driver.  It was booted with 
> irqaffinity=0-1,24-25 in the kernel boot args, resulting in 
> /proc/irq/default_smp_affinity showing "0000,03000003".   CPUs 2-11 are 
> set as "isolated" in the kernel boot args.  The irqbalance daemon is not 
> running.
>
> The problem I'm seeing is that /proc/interrupts shows iavf interrupts 
> (associated with physical devices running the i40e driver) on other CPUs 
> than the expected affinity.  For example, here are some iavf interrupts 
> on CPU 4 where I would not expect to see any interrupts given that "cat 
> /proc/irq/<NUM>/smp_affinity_list" reports "0-1,24-25" for all these 
> interrupts.  (Sorry for the line wrapping.)
>
> cat /proc/interrupts | grep -e CPU -e 941: -e 942: -e 943: -e 944: -e 
> 945: -e 961: -e 962: -e 963: -e 964: -e 965:
>
>               CPU0       CPU1       CPU2       CPU3 CPU4       CPU5
> 941:          0          0          0          0 28490          0 
> IR-PCI-MSI-edge iavf-0000:b5:03.6:mbx
> 942:          0          0          0          0 333832         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-0
> 943:          0          0          0          0 300842         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-1
> 944:          0          0          0          0 333845         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-2
> 945:          0          0          0          0 333822         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-3
> 961:          0          0          0          0 28492         0 
> IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
> 962:          0          0          0          0 435608         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-0
> 963:          0          0          0          0 394832         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-1
> 964:          0          0          0          0 398414         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-2
> 965:          0          0          0          0 192847         0 
> IR-PCI-MSI-edge      iavf-net1-TxRx-3
>
> There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at 
> roughly 1 per second without any traffic, while the interrupt rate on 
> the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
>
> Is this expected?  It seems like the IRQ subsystem is not respecting the 
> configured SMP affinity for the interrupt in question.  I've also seen 
> the same behaviour with igb interrupts.

No it's not expected. Do you see the same behaviour with a recent
mainline kernel, i.e. 5.10 or 5.11?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [IRQ] IRQ affinity not working properly?
  2021-03-28 18:45 ` Thomas Gleixner
@ 2021-04-21 13:31   ` Nitesh Narayan Lal
  2021-04-22 15:42     ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Nitesh Narayan Lal @ 2021-04-21 13:31 UTC (permalink / raw)
  To: Thomas Gleixner, Chris Friesen, LKML, Jesse Brandeburg


On 3/28/21 2:45 PM, Thomas Gleixner wrote:
> On Fri, Jan 29 2021 at 13:17, Chris Friesen wrote:
>> I have a CentOS 7 linux system with 48 logical CPUs and a number of

<snip>

>> IR-PCI-MSI-edge      iavf-net1-TxRx-3
>> 961:          0          0          0          0 28492         0 
>> IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
>> 962:          0          0          0          0 435608         0 
>> IR-PCI-MSI-edge      iavf-net1-TxRx-0
>> 963:          0          0          0          0 394832         0 
>> IR-PCI-MSI-edge      iavf-net1-TxRx-1
>> 964:          0          0          0          0 398414         0 
>> IR-PCI-MSI-edge      iavf-net1-TxRx-2
>> 965:          0          0          0          0 192847         0 
>> IR-PCI-MSI-edge      iavf-net1-TxRx-3
>>
>> There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at 
>> roughly 1 per second without any traffic, while the interrupt rate on 
>> the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
>>
>> Is this expected?  It seems like the IRQ subsystem is not respecting the 
>> configured SMP affinity for the interrupt in question.  I've also seen 
>> the same behaviour with igb interrupts.
> No it's not expected. Do you see the same behaviour with a recent
> mainline kernel, i.e. 5.10 or 5.11?
>
>

Jesse pointed me to this thread and apologies that it took a while for me
to respond here.

I agree it will be interesting to see with which kernel version Chris is
reproducing the issue.

Initially, I thought that this issue is the same as the one that we have
been discussing in another thread [1].

However, in that case, the smp affinity mask itself is incorrect and doesn't
follow the default smp affinity mask (with irqbalance disabled).


[1] https://lore.kernel.org/lkml/1a044a14-0884-eedb-5d30-28b4bec24b23@redhat.com/

-- 
Thanks
Nitesh


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [IRQ] IRQ affinity not working properly?
  2021-04-21 13:31   ` Nitesh Narayan Lal
@ 2021-04-22 15:42     ` Thomas Gleixner
  2021-04-22 17:00       ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2021-04-22 15:42 UTC (permalink / raw)
  To: Nitesh Narayan Lal, Chris Friesen, LKML, Jesse Brandeburg

On Wed, Apr 21 2021 at 09:31, Nitesh Narayan Lal wrote:
> On 3/28/21 2:45 PM, Thomas Gleixner wrote:
>> On Fri, Jan 29 2021 at 13:17, Chris Friesen wrote:
>>> I have a CentOS 7 linux system with 48 logical CPUs and a number of
>
> <snip>
>
>>> IR-PCI-MSI-edge      iavf-net1-TxRx-3
>>> 961:          0          0          0          0 28492         0 
>>> IR-PCI-MSI-edge iavf-0000:b5:02.7:mbx
>>> 962:          0          0          0          0 435608         0 
>>> IR-PCI-MSI-edge      iavf-net1-TxRx-0
>>> 963:          0          0          0          0 394832         0 
>>> IR-PCI-MSI-edge      iavf-net1-TxRx-1
>>> 964:          0          0          0          0 398414         0 
>>> IR-PCI-MSI-edge      iavf-net1-TxRx-2
>>> 965:          0          0          0          0 192847         0 
>>> IR-PCI-MSI-edge      iavf-net1-TxRx-3
>>>
>>> There were IRQs coming in on the "iavf-0000:b5:02.7:mbx" interrupt at 
>>> roughly 1 per second without any traffic, while the interrupt rate on 
>>> the "iavf-net1-TxRx-<X>" seemed to be related to traffic.
>>>
>>> Is this expected?  It seems like the IRQ subsystem is not respecting the 
>>> configured SMP affinity for the interrupt in question.  I've also seen 
>>> the same behaviour with igb interrupts.
>> No it's not expected. Do you see the same behaviour with a recent
>> mainline kernel, i.e. 5.10 or 5.11?
>>
>>
> Jesse pointed me to this thread and apologies that it took a while for me
> to respond here.
>
> I agree it will be interesting to see with which kernel version Chris is
> reproducing the issue.

And the output of

 /proc/irq/$NUMBER/smp_affinity_list
 /proc/irq/$NUMBER/effective_affinity_list

> Initially, I thought that this issue is the same as the one that we have
> been discussing in another thread [1].
>
> However, in that case, the smp affinity mask itself is incorrect and doesn't
> follow the default smp affinity mask (with irqbalance disabled).

That's the question...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [IRQ] IRQ affinity not working properly?
  2021-04-22 15:42     ` Thomas Gleixner
@ 2021-04-22 17:00       ` Chris Friesen
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Friesen @ 2021-04-22 17:00 UTC (permalink / raw)
  To: Thomas Gleixner, Nitesh Narayan Lal, LKML, Jesse Brandeburg

On 4/22/2021 9:42 AM, Thomas Gleixner wrote:
> On Wed, Apr 21 2021 at 09:31, Nitesh Narayan Lal wrote:
>> I agree it will be interesting to see with which kernel version Chris is
>> reproducing the issue.
> 
> And the output of
> 
>   /proc/irq/$NUMBER/smp_affinity_list
>   /proc/irq/$NUMBER/effective_affinity_list

I haven't forgotten about this, but I've had other priorities.  Hoping 
to get back to it in May sometime.

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-22 17:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-29 19:17 [IRQ] IRQ affinity not working properly? Chris Friesen
2021-03-28 18:45 ` Thomas Gleixner
2021-04-21 13:31   ` Nitesh Narayan Lal
2021-04-22 15:42     ` Thomas Gleixner
2021-04-22 17:00       ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).