netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
       [not found]                               ` <1a044a14-0884-eedb-5d30-28b4bec24b23@redhat.com>
@ 2021-04-08 18:49                                 ` Nitesh Narayan Lal
  2021-04-14 16:11                                 ` Jesse Brandeburg
  1 sibling, 0 replies; 12+ messages in thread
From: Nitesh Narayan Lal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: Jesse Brandeburg, Marcelo Tosatti, Thomas Gleixner, frederic
  Cc: Robin Murphy, linux-kernel, linux-api, juri.lelli, abelits,
	bhelgaas, linux-pci, rostedt, mingo, peterz, davem, akpm, sfr,
	stephen, rppt, jinyuqi, zhangshaokun, Network Development,
	sassmann, Yang, Lihong


On 4/7/21 11:18 AM, Nitesh Narayan Lal wrote:
> On 4/6/21 1:22 PM, Jesse Brandeburg wrote:
>> Continuing a thread from a bit ago...
>>
>> Nitesh Narayan Lal wrote:
>>
>>>> After a little more digging, I found out why cpumask_local_spread change
>>>> affects the general/initial smp_affinity for certain device IRQs.
>>>>
>>>> After the introduction of the commit:
>>>>
>>>>     e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
>>>>
>>> Continuing the conversation about the above commit and adding Jesse.
>>> I was trying to understand the problem that the commit message explains
>>> "The default behavior of the kernel is somewhat undesirable as all
>>> requested interrupts end up on CPU0 after registration.", I have also been
>>> trying to reproduce this behavior without the patch but I failed in doing
>>> so, maybe because I am missing something here.
>>>
>>> @Jesse Can you please explain? FWIU IRQ affinity should be decided based on
>>> the default affinity mask.
> Thanks, Jesse for responding.
>
>> The original issue as seen, was that if you rmmod/insmod a driver
>> *without* irqbalance running, the default irq mask is -1, which means
>> any CPU. The older kernels (this issue was patched in 2014) used to use
>> that affinity mask, but the value programmed into all the interrupt
>> registers "actual affinity" would end up delivering all interrupts to
>> CPU0,
> So does that mean the affinity mask for the IRQs was different wrt where
> the IRQs were actually delivered?
> Or, the affinity mask itself for the IRQs after rmmod, insmod was changed
> to 0 instead of -1?
>
> I did a quick test on top of 5.12.0-rc6 by comparing the i40e IRQ affinity
> mask before removing the kernel module and after doing rmmod+insmod
> and didn't find any difference.
>
>>  and if the machine was under traffic load incoming when the
>> driver loaded, CPU0 would start to poll among all the different netdev
>> queues, all on CPU0.
>>
>> The above then leads to the condition that the device is stuck polling
>> even if the affinity gets updated from user space, and the polling will
>> continue until traffic stops.
>>
>>> The problem with the commit is that when we overwrite the affinity mask
>>> based on the hinting mask we completely ignore the default SMP affinity
>>> mask. If we do want to overwrite the affinity based on the hint mask we
>>> should atleast consider the default SMP affinity.
> For the issue where the IRQs don't follow the default_smp_affinity mask
> because of this patch, the following are the steps by which it can be easily
> reproduced with the latest linux kernel:
>
> # Kernel
> 5.12.0-rc6+
>
> # Other pramaeters in the cmdline
> isolcpus=2-39,44-79 nohz=on nohz_full=2-39,44-79
> rcu_nocbs=2-39,44-79
>
> # cat /proc/irq/default_smp_affinity
> 0000,00000f00,00000003 [Corresponds to HK CPUs - 0, 1, 40, 41, 42 and 43]
>
> # Create VFs and check IRQ affinity mask
>
> /proc/irq/1423/iavf-ens1f1v3-TxRx-3
> 3
> /proc/irq/1424/iavf-0000:3b:0b.0:mbx
> 0
> 40
> 42
> /proc/irq/1425/iavf-ens1f1v8-TxRx-0
> 0
> /proc/irq/1426/iavf-ens1f1v8-TxRx-1
> 1
> /proc/irq/1427/iavf-ens1f1v8-TxRx-2
> 2
> /proc/irq/1428/iavf-ens1f1v8-TxRx-3
> 3
> ...
> /proc/irq/1475/iavf-ens1f1v15-TxRx-0
> 0
> /proc/irq/1476/iavf-ens1f1v15-TxRx-1
> 1
> /proc/irq/1477/iavf-ens1f1v15-TxRx-2
> 2
> /proc/irq/1478/iavf-ens1f1v15-TxRx-3
> 3
> /proc/irq/1479/iavf-0000:3b:0a.0:mbx
> 0
> 40
> 42
> ...
> /proc/irq/240/iavf-ens1f1v3-TxRx-0
> 0
> /proc/irq/248/iavf-ens1f1v3-TxRx-1
> 1
> /proc/irq/249/iavf-ens1f1v3-TxRx-2
> 2
>
>
> Trace dump:
> ----------
> ..
> 11551082:  NetworkManager-1734  [040]  8167.465719: vector_activate:    
>             irq=1478 is_managed=0 can_reserve=1 reserve=0
> 11551090:  NetworkManager-1734  [040]  8167.465720: vector_alloc:
>             irq=1478 vector=65 reserved=1 ret=0
> 11551093:  NetworkManager-1734  [040]  8167.465721: vector_update:      
>             irq=1478 vector=65 cpu=42 prev_vector=0 prev_cpu=0
> 11551097:  NetworkManager-1734  [040]  8167.465721: vector_config:      
>             irq=1478 vector=65 cpu=42 apicdest=0x00000200
> 11551357:  NetworkManager-1734  [040]  8167.465768: vector_alloc:        
>             irq=1478 vector=46 reserved=0 ret=0
>
> 11551360:  NetworkManager-1734  [040]  8167.465769: vector_update:      
>             irq=1478 vector=46 cpu=3 prev_vector=65 prev_cpu=42
>
> 11551364:  NetworkManager-1734  [040]  8167.465770: vector_config:      
>             irq=1478 vector=46 cpu=3 apicdest=0x00040100
> ..
>
> As we can see in the above trace the initial affinity for the IRQ 1478 was
> correctly set as per the default_smp_affinity mask which includes CPU 42,
> however, later on, it is updated with CPU3 which is returned from
> cpumask_local_spread().
>
>> Maybe the right thing is to fix which CPUs are passed in as the valid
>> mask, or make sure the kernel cross checks that what the driver asks
>> for is a "valid CPU"?
>>
> Sure, if we can still reproduce the problem that your patch was fixing then
> maybe we can consider adding a new API like cpumask_local_spread_irq in
> which we should consider deafult_smp_affinity mask as well before returning
> the CPU.
>

Didn't realize that netdev ml was not included, so adding that.

-- 
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
       [not found]                               ` <1a044a14-0884-eedb-5d30-28b4bec24b23@redhat.com>
  2021-04-08 18:49                                 ` [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs Nitesh Narayan Lal
@ 2021-04-14 16:11                                 ` Jesse Brandeburg
  2021-04-15 22:11                                   ` Nitesh Narayan Lal
  1 sibling, 1 reply; 12+ messages in thread
From: Jesse Brandeburg @ 2021-04-14 16:11 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: Marcelo Tosatti, Thomas Gleixner, frederic, Robin Murphy,
	linux-kernel, linux-api, juri.lelli, abelits, bhelgaas,
	linux-pci, rostedt, mingo, peterz, davem, akpm, sfr, stephen,
	rppt, jinyuqi, zhangshaokun, netdev, chris.friesen

Nitesh Narayan Lal wrote:

> > The original issue as seen, was that if you rmmod/insmod a driver
> > *without* irqbalance running, the default irq mask is -1, which means
> > any CPU. The older kernels (this issue was patched in 2014) used to use
> > that affinity mask, but the value programmed into all the interrupt
> > registers "actual affinity" would end up delivering all interrupts to
> > CPU0,
> 
> So does that mean the affinity mask for the IRQs was different wrt where
> the IRQs were actually delivered?
> Or, the affinity mask itself for the IRQs after rmmod, insmod was changed
> to 0 instead of -1?

The smp_affinity was 0xfff, and the kernel chooses which interrupt to
place the interrupt on, among any of the bits set.

 
> I did a quick test on top of 5.12.0-rc6 by comparing the i40e IRQ affinity
> mask before removing the kernel module and after doing rmmod+insmod
> and didn't find any difference.

with the patch in question removed? Sorry, I'm confused what you tried.

> 
> >  and if the machine was under traffic load incoming when the
> > driver loaded, CPU0 would start to poll among all the different netdev
> > queues, all on CPU0.
> >
> > The above then leads to the condition that the device is stuck polling
> > even if the affinity gets updated from user space, and the polling will
> > continue until traffic stops.
> >
> >> The problem with the commit is that when we overwrite the affinity mask
> >> based on the hinting mask we completely ignore the default SMP affinity
> >> mask. If we do want to overwrite the affinity based on the hint mask we
> >> should atleast consider the default SMP affinity.
> 
> For the issue where the IRQs don't follow the default_smp_affinity mask
> because of this patch, the following are the steps by which it can be easily
> reproduced with the latest linux kernel:
> 
> # Kernel
> 5.12.0-rc6+

<snip>

> As we can see in the above trace the initial affinity for the IRQ 1478 was
> correctly set as per the default_smp_affinity mask which includes CPU 42,
> however, later on, it is updated with CPU3 which is returned from
> cpumask_local_spread().
> 
> > Maybe the right thing is to fix which CPUs are passed in as the valid
> > mask, or make sure the kernel cross checks that what the driver asks
> > for is a "valid CPU"?
> >
> 
> Sure, if we can still reproduce the problem that your patch was fixing then
> maybe we can consider adding a new API like cpumask_local_spread_irq in
> which we should consider deafult_smp_affinity mask as well before returning
> the CPU.

I'm sure I don't have a reproducer of the original problem any more, it
is lost somewhere 8 years ago. I'd like to be able to repro the original
issue, but I can't.

Your description of the problem makes it obvious there is an issue. It
appears as if cpumask_local_spread() is the wrong function to use here.
If you have any suggestions please let me know.

We had one other report of this problem as well (I'm not sure if it's
the same as your report)
https://lkml.org/lkml/2021/3/28/206
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210125/023120.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-14 16:11                                 ` Jesse Brandeburg
@ 2021-04-15 22:11                                   ` Nitesh Narayan Lal
  2021-04-29 21:44                                     ` Nitesh Lal
  0 siblings, 1 reply; 12+ messages in thread
From: Nitesh Narayan Lal @ 2021-04-15 22:11 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Marcelo Tosatti, Thomas Gleixner, frederic, Robin Murphy,
	linux-kernel, linux-api, juri.lelli, abelits, bhelgaas,
	linux-pci, rostedt, mingo, peterz, davem, akpm, sfr, stephen,
	rppt, jinyuqi, zhangshaokun, netdev, chris.friesen


On 4/14/21 12:11 PM, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
>
>>> The original issue as seen, was that if you rmmod/insmod a driver
>>> *without* irqbalance running, the default irq mask is -1, which means
>>> any CPU. The older kernels (this issue was patched in 2014) used to use
>>> that affinity mask, but the value programmed into all the interrupt
>>> registers "actual affinity" would end up delivering all interrupts to
>>> CPU0,
>> So does that mean the affinity mask for the IRQs was different wrt where
>> the IRQs were actually delivered?
>> Or, the affinity mask itself for the IRQs after rmmod, insmod was changed
>> to 0 instead of -1?
> The smp_affinity was 0xfff, and the kernel chooses which interrupt to
> place the interrupt on, among any of the bits set.


I think what you are referring to here is the effective affinity mask.
From one of Thomas's commit message that you pointed me to:

"The affinity mask is either the system-wide default or set by userspace,
but the architecture can or even must reduce the mask to the effective set,
which means that checking the affinity mask itself does not really tell
about the actual target CPUs."

I was looking into the code changes around IRQ and there has been major
rework from Thomas in 2017. I recently tried testing the kernel just before
those patches got merged.

Specifically on top of
05161b9cbe5:     x86/irq: Get rid of the 'first_system_vector' indirection
                 bogosity

On the box where I tested this, I was able to see the effective affinity
being set to 0 (not SMP affinity) for several network device IRQs.

and I think the reason for it is the usage of "cpumask_first_and(mask,
cpu_online_mask)" in __assign_irq_vector().

But with the latest kernel, this has been replaced and that's why I didn't
see the effective affinity being set to only 0 for all of the device IRQs.

Having said that I am still not sure if I was able to mimic what you have
seen in the past. But it looked similar to what you were explaining.
What do you think?



>
>  
>> I did a quick test on top of 5.12.0-rc6 by comparing the i40e IRQ affinity
>> mask before removing the kernel module and after doing rmmod+insmod
>> and didn't find any difference.
> with the patch in question removed? Sorry, I'm confused what you tried.

Yeah, but I was only referring to the SMP affinity mask. Please see more
up-to-date testing results above.

>>>  and if the machine was under traffic load incoming when the
>>> driver loaded, CPU0 would start to poll among all the different netdev
>>> queues, all on CPU0.
>>>
>>> The above then leads to the condition that the device is stuck polling
>>> even if the affinity gets updated from user space, and the polling will
>>> continue until traffic stops.
>>>

[...]

>> As we can see in the above trace the initial affinity for the IRQ 1478 was
>> correctly set as per the default_smp_affinity mask which includes CPU 42,
>> however, later on, it is updated with CPU3 which is returned from
>> cpumask_local_spread().
>>
>>> Maybe the right thing is to fix which CPUs are passed in as the valid
>>> mask, or make sure the kernel cross checks that what the driver asks
>>> for is a "valid CPU"?
>>>
>> Sure, if we can still reproduce the problem that your patch was fixing then
>> maybe we can consider adding a new API like cpumask_local_spread_irq in
>> which we should consider deafult_smp_affinity mask as well before returning
>> the CPU.
> I'm sure I don't have a reproducer of the original problem any more, it
> is lost somewhere 8 years ago. I'd like to be able to repro the original
> issue, but I can't.
>
> Your description of the problem makes it obvious there is an issue. It
> appears as if cpumask_local_spread() is the wrong function to use here.
> If you have any suggestions please let me know.
>
> We had one other report of this problem as well (I'm not sure if it's
> the same as your report)
> https://lkml.org/lkml/2021/3/28/206
> https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210125/023120.html


How about we introduce a new API just for IRQ spreading,
cpumask_local_spread_irq() and then utilize the default_smp_affinity mask
in that before returning the CPU?

Although, I think the right way to deal with this would be to fix this from
the source that is where the CPU mask is assigned to an IRQ for the very
first time.


-- 
Thanks
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-15 22:11                                   ` Nitesh Narayan Lal
@ 2021-04-29 21:44                                     ` Nitesh Lal
  2021-04-30  1:48                                       ` Jesse Brandeburg
  2021-04-30  7:10                                       ` Thomas Gleixner
  0 siblings, 2 replies; 12+ messages in thread
From: Nitesh Lal @ 2021-04-29 21:44 UTC (permalink / raw)
  To: Jesse Brandeburg, Thomas Gleixner, frederic, juri.lelli,
	Marcelo Tosatti, abelits
  Cc: Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

On Thu, Apr 15, 2021, at 6:11 PM Nitesh Narayan Lal <nitesh@redhat.com> wrote:
>
>
> On 4/14/21 12:11 PM, Jesse Brandeburg wrote:
> > Nitesh Narayan Lal wrote:
> >
> >>> The original issue as seen, was that if you rmmod/insmod a driver
> >>> *without* irqbalance running, the default irq mask is -1, which means
> >>> any CPU. The older kernels (this issue was patched in 2014) used to use
> >>> that affinity mask, but the value programmed into all the interrupt
> >>> registers "actual affinity" would end up delivering all interrupts to
> >>> CPU0,
> >> So does that mean the affinity mask for the IRQs was different wrt where
> >> the IRQs were actually delivered?
> >> Or, the affinity mask itself for the IRQs after rmmod, insmod was changed
> >> to 0 instead of -1?
> > The smp_affinity was 0xfff, and the kernel chooses which interrupt to
> > place the interrupt on, among any of the bits set.
>
>

<snip>

> >
> > Your description of the problem makes it obvious there is an issue. It
> > appears as if cpumask_local_spread() is the wrong function to use here.
> > If you have any suggestions please let me know.
> >
> > We had one other report of this problem as well (I'm not sure if it's
> > the same as your report)
> > https://lkml.org/lkml/2021/3/28/206
> > https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210125/023120.html
>
>

So to understand further what the problem was with the older kernel based
on Jesse's description and whether it is still there I did some more
digging. Following are some of the findings (kindly correct me if
there is a gap in my understanding):

Part-1: Why there was a problem with the older kernel?
------
With a kernel built on top of the tag v4.0.0 (with Jesse's patch reverted
and irqbalance disabled), if we observe the/proc/irq for ixgbe device IRQs
then there are two things to note:

# No separate effective affinity (Since it has been introduced as a part of
  the 2017 IRQ re-work)
  $ ls /proc/irq/86/
    affinity_hint  node  p2p1  smp_affinity  smp_affinity_list  spurious

# Multiple CPUs are set in the smp_affinity_list and the first CPU is CPU0:

  $ proc/irq/60/p2p1-TxRx-0
    0,2,4,6,8,10,12,14,16,18,20,22

  $ /proc/irq/61/p2p1-TxRx-1
    0,2,4,6,8,10,12,14,16,18,20,22

  $ /proc/irq/62/p2p1-TxRx-2
    0,2,4,6,8,10,12,14,16,18,20,22
     ...


Now,  if we read the commit message from Thomas's patch that was part of
this IRQ re-work:
fdba46ff:  x86/apic: Get rid of multi CPU affinity
"
..
2) Experiments have shown that the benefit of multi CPU affinity is close
   to zero and in some tests even worse than setting the affinity to a single
   CPU.

The reason for this is that the delivery targets the APIC with the lowest
ID first and only if that APIC is busy (servicing an interrupt, i.e. ISR is
not empty) it hands it over to the next APIC. In the conducted tests the
vast majority of interrupts ends up on the APIC with the lowest ID anyway,
so there is no natural spreading of the interrupts possible.”
"

I think this explains why even if we have multiple CPUs in the SMP affinity
mask the interrupts may only land on CPU0.

With Jesse's patch in the kernel initial affinity mask that included
multiple CPUs is overwritten with a single CPU. This CPU was previously
selected based on vector_index, later on, this has been replaced with a logic
where the CPU was fetched from cpumask_local_spread. Hence, in this
case, the interrupts will be spread across to different CPUs.

# listing the IRQ smp_affinity_list on the v4.0.0 kernel with Jesse's patch
  $ /proc/irq/60/p2p1-TxRx-0
    0
  $ /proc/irq/61/p2p1-TxRx-1
    1
  $ /proc/irq/62/p2p1-TxRx-2
    2
    ...
  $ /proc/irq/65/p2p1-TxRx-5
    5
  $ /proc/irq/66/p2p1-TxRx-6
    6
   ...



Part-2: Why this may not be a problem anymore?
------
With the latest kernel, if we check the effective_affinity_list for i40e
IRQs without irqblance and with Jesse's patch reverted, it is already set
to a single CPU that is not always 0. This CPU is retrieved based on vector
allocation count i.e., we get a CPU that has the lowest vector
allocation count.

  $ /proc/irq/100/i40e-eno1-TxRx-5
    28
  $ /proc/irq/101/i40e-eno1-TxRx-6
    30
  $ /proc/irq/102/i40e-eno1-TxRx-7
    32
    …
  $ /proc/irq/121/i40e-eno1-TxRx-18
    16
  $ /proc/irq/122/i40e-eno1-TxRx-19
    18
    ..

@Jesse do you think the Part-1 findings explain the behavior that you have
observed in the past?

Also, let me know if there are any suggestions or experiments to try here.


--
Thanks
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-29 21:44                                     ` Nitesh Lal
@ 2021-04-30  1:48                                       ` Jesse Brandeburg
  2021-04-30 13:10                                         ` Nitesh Lal
  2021-04-30  7:10                                       ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Jesse Brandeburg @ 2021-04-30  1:48 UTC (permalink / raw)
  To: Nitesh Lal
  Cc: Thomas Gleixner, frederic, juri.lelli, Marcelo Tosatti, abelits,
	Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

Nitesh Lal wrote:

> @Jesse do you think the Part-1 findings explain the behavior that you have
> observed in the past?
> 
> Also, let me know if there are any suggestions or experiments to try here.

Wow Nitesh, nice work! That's quite a bit of spelunking you had to do
there!

Your results that show the older kernels with ranged affinity issues is
consistent with what I remember from that time, and the original
problem.

I'm glad to see that a) Thomas fixed the kernel to even do better than
ranged affinity masks, and that b) if you revert my patch, the new
behavior is better and still maintains the fix from a).

For me this explains the whole picture and makes me feel comfortable
with the patch that reverts the initial affinity mask (that also
introduces a subtle bug with the reserved CPUs that I believe you've
noted already).

Thanks for this work!
Jesse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-29 21:44                                     ` Nitesh Lal
  2021-04-30  1:48                                       ` Jesse Brandeburg
@ 2021-04-30  7:10                                       ` Thomas Gleixner
  2021-04-30 16:14                                         ` Nitesh Lal
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2021-04-30  7:10 UTC (permalink / raw)
  To: Nitesh Lal, Jesse Brandeburg, frederic, juri.lelli,
	Marcelo Tosatti, abelits
  Cc: Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

Nitesh,

On Thu, Apr 29 2021 at 17:44, Nitesh Lal wrote:

First of all: Nice analysis, well done!

> So to understand further what the problem was with the older kernel based
> on Jesse's description and whether it is still there I did some more
> digging. Following are some of the findings (kindly correct me if
> there is a gap in my understanding):
>
> Part-1: Why there was a problem with the older kernel?
> ------
> With a kernel built on top of the tag v4.0.0 (with Jesse's patch reverted
> and irqbalance disabled), if we observe the/proc/irq for ixgbe device IRQs
> then there are two things to note:
>
> # No separate effective affinity (Since it has been introduced as a part of
>   the 2017 IRQ re-work)
>   $ ls /proc/irq/86/
>     affinity_hint  node  p2p1  smp_affinity  smp_affinity_list  spurious
>
> # Multiple CPUs are set in the smp_affinity_list and the first CPU is CPU0:
>
>   $ proc/irq/60/p2p1-TxRx-0
>     0,2,4,6,8,10,12,14,16,18,20,22
>
>   $ /proc/irq/61/p2p1-TxRx-1
>     0,2,4,6,8,10,12,14,16,18,20,22
>
>   $ /proc/irq/62/p2p1-TxRx-2
>     0,2,4,6,8,10,12,14,16,18,20,22
>      ...
>
>
> Now,  if we read the commit message from Thomas's patch that was part of
> this IRQ re-work:
> fdba46ff:  x86/apic: Get rid of multi CPU affinity
> "
> ..
> 2) Experiments have shown that the benefit of multi CPU affinity is close
>    to zero and in some tests even worse than setting the affinity to a single
>    CPU.
>
> The reason for this is that the delivery targets the APIC with the lowest
> ID first and only if that APIC is busy (servicing an interrupt, i.e. ISR is
> not empty) it hands it over to the next APIC. In the conducted tests the
> vast majority of interrupts ends up on the APIC with the lowest ID anyway,
> so there is no natural spreading of the interrupts possible.”
> "
>
> I think this explains why even if we have multiple CPUs in the SMP affinity
> mask the interrupts may only land on CPU0.

There are two issues in the pre rework vector management:

  1) The allocation logic itself which preferred lower numbered CPUs and
     did not try to spread out the vectors accross CPUs. This was pretty
     much true for any APIC addressing mode.

  2) The multi CPU affinity support if supported by the APIC
     mode. That's restricted to logical APIC addressing mode. That is
     available for non X2APIC up to 8 CPUs and with X2APIC it requires
     to be in cluster mode.
     
     All other addressing modes had a single CPU target selected under
     the hood which due to #1 was ending up on CPU0 most of the time at
     least up to the point where it still had vectors available.

     Also logical addressing mode with multiple target CPUs was subject
     to #1 and due to the delivery logic the lowest numbered CPU (APIC)
     was where most interrupts ended up.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-30  1:48                                       ` Jesse Brandeburg
@ 2021-04-30 13:10                                         ` Nitesh Lal
  0 siblings, 0 replies; 12+ messages in thread
From: Nitesh Lal @ 2021-04-30 13:10 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Thomas Gleixner, frederic, juri.lelli, Marcelo Tosatti, abelits,
	Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

On Thu, Apr 29, 2021 at 9:48 PM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> Nitesh Lal wrote:
>
> > @Jesse do you think the Part-1 findings explain the behavior that you have
> > observed in the past?
> >
> > Also, let me know if there are any suggestions or experiments to try here.
>
> Wow Nitesh, nice work! That's quite a bit of spelunking you had to do
> there!
>
> Your results that show the older kernels with ranged affinity issues is
> consistent with what I remember from that time, and the original
> problem.

That's nice.

>
> I'm glad to see that a) Thomas fixed the kernel to even do better than
> ranged affinity masks, and that b) if you revert my patch, the new
> behavior is better and still maintains the fix from a).

Right, the interrupts are naturally spread now.

>
> For me this explains the whole picture and makes me feel comfortable
> with the patch that reverts the initial affinity mask (that also
> introduces a subtle bug with the reserved CPUs that I believe you've
> noted already).
>

Thank you for confirming!

--
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-30  7:10                                       ` Thomas Gleixner
@ 2021-04-30 16:14                                         ` Nitesh Lal
  2021-04-30 18:21                                           ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Nitesh Lal @ 2021-04-30 16:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jesse Brandeburg, frederic, juri.lelli, Marcelo Tosatti, abelits,
	Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

On Fri, Apr 30, 2021 at 3:10 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Nitesh,
>
> On Thu, Apr 29 2021 at 17:44, Nitesh Lal wrote:
>
> First of all: Nice analysis, well done!

Thanks, Thomas.

>
> > So to understand further what the problem was with the older kernel based
> > on Jesse's description and whether it is still there I did some more
> > digging. Following are some of the findings (kindly correct me if
> > there is a gap in my understanding):

<snip>

> >
> > I think this explains why even if we have multiple CPUs in the SMP affinity
> > mask the interrupts may only land on CPU0.
>
> There are two issues in the pre rework vector management:
>
>   1) The allocation logic itself which preferred lower numbered CPUs and
>      did not try to spread out the vectors accross CPUs. This was pretty
>      much true for any APIC addressing mode.
>
>   2) The multi CPU affinity support if supported by the APIC
>      mode. That's restricted to logical APIC addressing mode. That is
>      available for non X2APIC up to 8 CPUs and with X2APIC it requires
>      to be in cluster mode.
>
>      All other addressing modes had a single CPU target selected under
>      the hood which due to #1 was ending up on CPU0 most of the time at
>      least up to the point where it still had vectors available.
>
>      Also logical addressing mode with multiple target CPUs was subject
>      to #1 and due to the delivery logic the lowest numbered CPU (APIC)
>      was where most interrupts ended up.
>

Right, thank you for confirming.


Based on this analysis and the fact that with your re-work the interrupts
seems to be naturally spread across the CPUs, will it be safe to revert
Jesse's patch

e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()

as it overwrites the previously set IRQ affinity mask for some of the
devices?

IMHO if we think that this patch is still solving some issue other than
what Jesse has mentioned then perhaps we should reproduce that and fix it
directly from the request_irq code path.

-- 
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-30 16:14                                         ` Nitesh Lal
@ 2021-04-30 18:21                                           ` Thomas Gleixner
  2021-04-30 21:07                                             ` Nitesh Lal
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2021-04-30 18:21 UTC (permalink / raw)
  To: Nitesh Lal
  Cc: Jesse Brandeburg, frederic, juri.lelli, Marcelo Tosatti, abelits,
	Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

Nitesh,

On Fri, Apr 30 2021 at 12:14, Nitesh Lal wrote:
> Based on this analysis and the fact that with your re-work the interrupts
> seems to be naturally spread across the CPUs, will it be safe to revert
> Jesse's patch
>
> e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
>
> as it overwrites the previously set IRQ affinity mask for some of the
> devices?

That's a good question. My gut feeling says yes.

> IMHO if we think that this patch is still solving some issue other than
> what Jesse has mentioned then perhaps we should reproduce that and fix it
> directly from the request_irq code path.

Makes sense.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-30 18:21                                           ` Thomas Gleixner
@ 2021-04-30 21:07                                             ` Nitesh Lal
  2021-05-01  2:21                                               ` Jesse Brandeburg
  0 siblings, 1 reply; 12+ messages in thread
From: Nitesh Lal @ 2021-04-30 21:07 UTC (permalink / raw)
  To: Thomas Gleixner, Jesse Brandeburg
  Cc: frederic, juri.lelli, Marcelo Tosatti, abelits, Robin Murphy,
	linux-kernel, linux-api, bhelgaas, linux-pci, rostedt, mingo,
	peterz, davem, akpm, sfr, stephen, rppt, jinyuqi, zhangshaokun,
	netdev, chris.friesen

On Fri, Apr 30, 2021 at 2:21 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Nitesh,
>
> On Fri, Apr 30 2021 at 12:14, Nitesh Lal wrote:
> > Based on this analysis and the fact that with your re-work the interrupts
> > seems to be naturally spread across the CPUs, will it be safe to revert
> > Jesse's patch
> >
> > e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
> >
> > as it overwrites the previously set IRQ affinity mask for some of the
> > devices?
>
> That's a good question. My gut feeling says yes.
>

Jesse do you want to send the revert for the patch?

Also, I think it was you who suggested cc'ing
intel-wired-lan ml as that allows intel folks, to do some initial
testing?
If so, we can do that here (IMHO).

> > IMHO if we think that this patch is still solving some issue other than
> > what Jesse has mentioned then perhaps we should reproduce that and fix it
> > directly from the request_irq code path.
>
> Makes sense.
>
> Thanks,
>
>         tglx
>


-- 
Thanks
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-04-30 21:07                                             ` Nitesh Lal
@ 2021-05-01  2:21                                               ` Jesse Brandeburg
  2021-05-03 13:15                                                 ` Nitesh Lal
  0 siblings, 1 reply; 12+ messages in thread
From: Jesse Brandeburg @ 2021-05-01  2:21 UTC (permalink / raw)
  To: Nitesh Lal
  Cc: Thomas Gleixner, frederic, juri.lelli, Marcelo Tosatti, abelits,
	Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

Nitesh Lal wrote:

> On Fri, Apr 30, 2021 at 2:21 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Nitesh,
> >
> > On Fri, Apr 30 2021 at 12:14, Nitesh Lal wrote:
> > > Based on this analysis and the fact that with your re-work the interrupts
> > > seems to be naturally spread across the CPUs, will it be safe to revert
> > > Jesse's patch
> > >
> > > e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
> > >
> > > as it overwrites the previously set IRQ affinity mask for some of the
> > > devices?
> >
> > That's a good question. My gut feeling says yes.
> >
> 
> Jesse do you want to send the revert for the patch?
> 
> Also, I think it was you who suggested cc'ing
> intel-wired-lan ml as that allows intel folks, to do some initial
> testing?
> If so, we can do that here (IMHO).

Patch sent here:
https://lore.kernel.org/lkml/20210501021832.743094-1-jesse.brandeburg@intel.com/T/#u

Any testing appreciated!

Jesse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
  2021-05-01  2:21                                               ` Jesse Brandeburg
@ 2021-05-03 13:15                                                 ` Nitesh Lal
  0 siblings, 0 replies; 12+ messages in thread
From: Nitesh Lal @ 2021-05-03 13:15 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Thomas Gleixner, frederic, juri.lelli, Marcelo Tosatti, abelits,
	Robin Murphy, linux-kernel, linux-api, bhelgaas, linux-pci,
	rostedt, mingo, peterz, davem, akpm, sfr, stephen, rppt, jinyuqi,
	zhangshaokun, netdev, chris.friesen

On Fri, Apr 30, 2021 at 10:21 PM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> Nitesh Lal wrote:
>
> > On Fri, Apr 30, 2021 at 2:21 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Nitesh,
> > >
> > > On Fri, Apr 30 2021 at 12:14, Nitesh Lal wrote:
> > > > Based on this analysis and the fact that with your re-work the interrupts
> > > > seems to be naturally spread across the CPUs, will it be safe to revert
> > > > Jesse's patch
> > > >
> > > > e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
> > > >
> > > > as it overwrites the previously set IRQ affinity mask for some of the
> > > > devices?
> > >
> > > That's a good question. My gut feeling says yes.
> > >
> >
> > Jesse do you want to send the revert for the patch?
> >
> > Also, I think it was you who suggested cc'ing
> > intel-wired-lan ml as that allows intel folks, to do some initial
> > testing?
> > If so, we can do that here (IMHO).
>
> Patch sent here:
> https://lore.kernel.org/lkml/20210501021832.743094-1-jesse.brandeburg@intel.com/T/#u
>
> Any testing appreciated!
>
> Jesse
>

Thank you!

--
Nitesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-03 13:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200625223443.2684-1-nitesh@redhat.com>
     [not found] ` <20200625223443.2684-2-nitesh@redhat.com>
     [not found]   ` <3e9ce666-c9cd-391b-52b6-3471fe2be2e6@arm.com>
     [not found]     ` <20210127121939.GA54725@fuller.cnet>
     [not found]       ` <87r1m5can2.fsf@nanos.tec.linutronix.de>
     [not found]         ` <20210128165903.GB38339@fuller.cnet>
     [not found]           ` <87h7n0de5a.fsf@nanos.tec.linutronix.de>
     [not found]             ` <20210204181546.GA30113@fuller.cnet>
     [not found]               ` <cfa138e9-38e3-e566-8903-1d64024c917b@redhat.com>
     [not found]                 ` <20210204190647.GA32868@fuller.cnet>
     [not found]                   ` <d8884413-84b4-b204-85c5-810342807d21@redhat.com>
     [not found]                     ` <87y2g26tnt.fsf@nanos.tec.linutronix.de>
     [not found]                       ` <d0aed683-87ae-91a2-d093-de3f5d8a8251@redhat.com>
     [not found]                         ` <7780ae60-efbd-2902-caaa-0249a1f277d9@redhat.com>
     [not found]                           ` <07c04bc7-27f0-9c07-9f9e-2d1a450714ef@redhat.com>
     [not found]                             ` <20210406102207.0000485c@intel.com>
     [not found]                               ` <1a044a14-0884-eedb-5d30-28b4bec24b23@redhat.com>
2021-04-08 18:49                                 ` [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs Nitesh Narayan Lal
2021-04-14 16:11                                 ` Jesse Brandeburg
2021-04-15 22:11                                   ` Nitesh Narayan Lal
2021-04-29 21:44                                     ` Nitesh Lal
2021-04-30  1:48                                       ` Jesse Brandeburg
2021-04-30 13:10                                         ` Nitesh Lal
2021-04-30  7:10                                       ` Thomas Gleixner
2021-04-30 16:14                                         ` Nitesh Lal
2021-04-30 18:21                                           ` Thomas Gleixner
2021-04-30 21:07                                             ` Nitesh Lal
2021-05-01  2:21                                               ` Jesse Brandeburg
2021-05-03 13:15                                                 ` Nitesh Lal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).