Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

* Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
       [not found]                               ` <1a044a14-0884-eedb-5d30-28b4bec24b23@redhat.com>
@ 2021-04-08 18:49                                 ` Nitesh Narayan Lal
  2021-04-14 16:11                                 ` Jesse Brandeburg
  1 sibling, 0 replies; 12+ messages in thread
From: Nitesh Narayan Lal @ 2021-04-08 18:49 UTC (permalink / raw)
  To: Jesse Brandeburg, Marcelo Tosatti, Thomas Gleixner, frederic
  Cc: Robin Murphy, linux-kernel, linux-api, juri.lelli, abelits,
	bhelgaas, linux-pci, rostedt, mingo, peterz, davem, akpm, sfr,
	stephen, rppt, jinyuqi, zhangshaokun, Network Development,
	sassmann, Yang, Lihong

On 4/7/21 11:18 AM, Nitesh Narayan Lal wrote:
> On 4/6/21 1:22 PM, Jesse Brandeburg wrote:
>> Continuing a thread from a bit ago...
>>
>> Nitesh Narayan Lal wrote:
>>
>>>> After a little more digging, I found out why cpumask_local_spread change
>>>> affects the general/initial smp_affinity for certain device IRQs.
>>>>
>>>> After the introduction of the commit:
>>>>
>>>>     e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
>>>>
>>> Continuing the conversation about the above commit and adding Jesse.
>>> I was trying to understand the problem that the commit message explains
>>> "The default behavior of the kernel is somewhat undesirable as all
>>> requested interrupts end up on CPU0 after registration.", I have also been
>>> trying to reproduce this behavior without the patch but I failed in doing
>>> so, maybe because I am missing something here.
>>>
>>> @Jesse Can you please explain? FWIU IRQ affinity should be decided based on
>>> the default affinity mask.
> Thanks, Jesse for responding.
>
>> The original issue as seen, was that if you rmmod/insmod a driver
>> *without* irqbalance running, the default irq mask is -1, which means
>> any CPU. The older kernels (this issue was patched in 2014) used to use
>> that affinity mask, but the value programmed into all the interrupt
>> registers "actual affinity" would end up delivering all interrupts to
>> CPU0,
> So does that mean the affinity mask for the IRQs was different wrt where
> the IRQs were actually delivered?
> Or, the affinity mask itself for the IRQs after rmmod, insmod was changed
> to 0 instead of -1?
>
> I did a quick test on top of 5.12.0-rc6 by comparing the i40e IRQ affinity
> mask before removing the kernel module and after doing rmmod+insmod
> and didn't find any difference.
>
>>  and if the machine was under traffic load incoming when the
>> driver loaded, CPU0 would start to poll among all the different netdev
>> queues, all on CPU0.
>>
>> The above then leads to the condition that the device is stuck polling
>> even if the affinity gets updated from user space, and the polling will
>> continue until traffic stops.
>>
>>> The problem with the commit is that when we overwrite the affinity mask
>>> based on the hinting mask we completely ignore the default SMP affinity
>>> mask. If we do want to overwrite the affinity based on the hint mask we
>>> should atleast consider the default SMP affinity.
> For the issue where the IRQs don't follow the default_smp_affinity mask
> because of this patch, the following are the steps by which it can be easily
> reproduced with the latest linux kernel:
>
> # Kernel
> 5.12.0-rc6+
>
> # Other pramaeters in the cmdline
> isolcpus=2-39,44-79 nohz=on nohz_full=2-39,44-79
> rcu_nocbs=2-39,44-79
>
> # cat /proc/irq/default_smp_affinity
> 0000,00000f00,00000003 [Corresponds to HK CPUs - 0, 1, 40, 41, 42 and 43]
>
> # Create VFs and check IRQ affinity mask
>
> /proc/irq/1423/iavf-ens1f1v3-TxRx-3
> 3
> /proc/irq/1424/iavf-0000:3b:0b.0:mbx
> 0
> 40
> 42
> /proc/irq/1425/iavf-ens1f1v8-TxRx-0
> 0
> /proc/irq/1426/iavf-ens1f1v8-TxRx-1
> 1
> /proc/irq/1427/iavf-ens1f1v8-TxRx-2
> 2
> /proc/irq/1428/iavf-ens1f1v8-TxRx-3
> 3
> ...
> /proc/irq/1475/iavf-ens1f1v15-TxRx-0
> 0
> /proc/irq/1476/iavf-ens1f1v15-TxRx-1
> 1
> /proc/irq/1477/iavf-ens1f1v15-TxRx-2
> 2
> /proc/irq/1478/iavf-ens1f1v15-TxRx-3
> 3
> /proc/irq/1479/iavf-0000:3b:0a.0:mbx
> 0
> 40
> 42
> ...
> /proc/irq/240/iavf-ens1f1v3-TxRx-0
> 0
> /proc/irq/248/iavf-ens1f1v3-TxRx-1
> 1
> /proc/irq/249/iavf-ens1f1v3-TxRx-2
> 2
>
>
> Trace dump:
> ----------
> ..
> 11551082:  NetworkManager-1734  [040]  8167.465719: vector_activate:    
>             irq=1478 is_managed=0 can_reserve=1 reserve=0
> 11551090:  NetworkManager-1734  [040]  8167.465720: vector_alloc:
>             irq=1478 vector=65 reserved=1 ret=0
> 11551093:  NetworkManager-1734  [040]  8167.465721: vector_update:      
>             irq=1478 vector=65 cpu=42 prev_vector=0 prev_cpu=0
> 11551097:  NetworkManager-1734  [040]  8167.465721: vector_config:      
>             irq=1478 vector=65 cpu=42 apicdest=0x00000200
> 11551357:  NetworkManager-1734  [040]  8167.465768: vector_alloc:        
>             irq=1478 vector=46 reserved=0 ret=0
>
> 11551360:  NetworkManager-1734  [040]  8167.465769: vector_update:      
>             irq=1478 vector=46 cpu=3 prev_vector=65 prev_cpu=42
>
> 11551364:  NetworkManager-1734  [040]  8167.465770: vector_config:      
>             irq=1478 vector=46 cpu=3 apicdest=0x00040100
> ..
>
> As we can see in the above trace the initial affinity for the IRQ 1478 was
> correctly set as per the default_smp_affinity mask which includes CPU 42,
> however, later on, it is updated with CPU3 which is returned from
> cpumask_local_spread().
>
>> Maybe the right thing is to fix which CPUs are passed in as the valid
>> mask, or make sure the kernel cross checks that what the driver asks
>> for is a "valid CPU"?
>>
> Sure, if we can still reproduce the problem that your patch was fixing then
> maybe we can consider adding a new API like cpumask_local_spread_irq in
> which we should consider deafult_smp_affinity mask as well before returning
> the CPU.
>

Didn't realize that netdev ml was not included, so adding that.

-- 
Nitesh

^ permalink raw reply	[flat|nested] 12+ messages in thread