linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nitesh Narayan Lal <nitesh@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Marcelo Tosatti <mtosatti@redhat.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	frederic@kernel.org, juri.lelli@redhat.com, abelits@marvell.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	rostedt@goodmis.org, mingo@kernel.org, peterz@infradead.org,
	davem@davemloft.net, akpm@linux-foundation.org,
	sfr@canb.auug.org.au, stephen@networkplumber.org,
	rppt@linux.vnet.ibm.com, jinyuqi@huawei.com,
	zhangshaokun@hisilicon.com
Subject: Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
Date: Sat, 6 Feb 2021 19:43:30 -0500	[thread overview]
Message-ID: <d0aed683-87ae-91a2-d093-de3f5d8a8251@redhat.com> (raw)
In-Reply-To: <87y2g26tnt.fsf@nanos.tec.linutronix.de>


On 2/5/21 5:23 PM, Thomas Gleixner wrote:
> On Thu, Feb 04 2021 at 14:17, Nitesh Narayan Lal wrote:
>> On 2/4/21 2:06 PM, Marcelo Tosatti wrote:
>>>>> How about adding a new flag for isolcpus instead?
>>>>>
>>>> Do you mean a flag based on which we can switch the affinity mask to
>>>> housekeeping for all the devices at the time of IRQ distribution?
>>> Yes a new flag for isolcpus. HK_FLAG_IRQ_SPREAD or some better name.
>> Does sounds like a nice idea to explore, lets see what Thomas thinks about it.
> I just read back up on that whole discussion and stared into the usage
> sites a bit.
>
> There are a couple of issues here in a larger picture. Looking at it
> from the device side first:
>
> The spreading is done for non-managed queues/interrupts which makes them
> movable by user space. So it could be argued from both sides that the
> damage done by allowing the full online mask or by allowing only the
> house keeping mask can be fixed up by user space.
>
> But that's the trivial part of the problem. The real problem is CPU
> hotplug and offline CPUs and the way how interrupts are set up for their
> initial affinity.
>
> As Robin noticed, the change in 1abdfe706a57 ("lib: Restrict
> cpumask_local_spread to houskeeping CPUs") is broken as it can return
> offline CPUs in both the NOHZ_FULL and the !NOHZ_FULL case.

A quick question here, is there any reason why we don't have cpu_online_mask
instead of cpu_possible_mask in the housekeeping_cpumask()?
(not for this particular patch but in general)

>
> The original code is racy vs. hotplug unless the callers block hotplug.
>
> Let's look at all the callers and what they do with it.
>
>   cptvf_set_irq_affinity()     affinity hint
>   safexcel_request_ring_irq()  affinity hint
>   mv_cesa_probe()              affinity hint
>   bnxt_request_irq()           affinity hint
>   nicvf_set_irq_affinity()     affinity hint
>   cxgb4_set_msix_aff()         affinity hint
>   enic_init_affinity_hint(()   affinity hint
>   iavf_request_traffic_irqs()  affinity hint
>   ionic_alloc_qcq_interrupt()  affinity hint
>   efx_set_interrupt_affinity() affinity hint
>   i40e_vsi_request_irq_msix()  affinity hint
>
>   be_evt_queues_create()       affinity hint, queue affinity
>   hns3_nic_set_cpumask()       affinity hint, queue affinity
>   mlx4_en_init_affinity_hint() affinity hint, queue affinity
>   mlx4_en_create_tx_ring()     affinity hint, queue affinity
>   set_comp_irq_affinity_hint() affinity hint, queue affinity
>   i40e_config_xps_tx_ring()    affinity hint, queue affinity
>   
>   hclge_configure              affinity_hint, queue affinity, workqueue selection
>
>   ixgbe_alloc_q_vector()       node selection, affinity hint, queue affinity
>
> All of them do not care about disabling hotplug. Taking cpu_read_lock()
> inside of that spread function would not solve anything because once the
> lock is dropped the CPU can go away.
>
> There are 3 classes of this:
>
>    1) Does not matter: affinity hint
>
>    2) Might fail to set up the network queue when the selected CPU
>       is offline.
>
>    3) Broken: The hclge driver which uses the cpu to schedule work on
>       that cpu. That's broken, but unfortunately neither the workqueue
>       code nor the timer code will ever notice. The work just wont be
>       scheduled until the CPU comes online again which might be never.

Agreed.

> But looking at the above I really have to ask the question what the
> commit in question is actually trying to solve.
>
> AFAICT, nothing at all. Why?
>
>   1) The majority of the drivers sets the hint __after_ requesting the
>      interrupt
>
>   2) Even if set _before_ requesting the interrupt it does not solve
>      anything because it's a hint and the interrupt core code does
>      not care about it at all. It provides the storage and the procfs
>      interface nothing else.
>
> So how does that prevent the interrupt subsystem from assigning an
> interrupt to an isolated CPU? Not at all.
>
> Interrupts which are freshly allocated get the default interrupt
> affinity mask, which is either set on the command line or via /proc. The
> affinity of the interrupt can be changed after it has been populated in
> /proc.
>
> When the interrupt is requested then one of the online CPUs in it's
> affinity mask is chosen.
>
> X86 is special here because this also requires that there are free
> vectors on one of the online CPUs in the mask. If the CPUs in the
> affinity mask run out of vectors then it will grab a vector from some
> other CPU which might be an isolated CPU.
>
> When the affinity mask of the interrupt at the time when it is actually
> requested contains an isolated CPU then nothing prevents the kernel from
> steering it at an isolated CPU. But that has absolutely nothing to do
> with that spreading thingy.
>
> The only difference which this change makes is the fact that the
> affinity hint changes. Nothing else.
>

Thanks for the detailed explanation.

Before I posted this patch, I was doing some debugging on a setup where I
was observing some latency issues due to the iavf IRQs that were pinned on
the isolated CPUs.

Based on some initial traces I had this impression that the affinity hint
or cpumask_local_spread was somehow playing a role in deciding the affinity
mask of these IRQs. Although, that does look incorrect after going through
your explanation.
For some reason, with a kernel that had this patch when I tried creating
VFs iavf IRQs always ended up on the HK CPUs.

The reasoning for the above is still not very clear to me. I will investigate
this further to properly understand this behavior.

-- 
Nitesh


  parent reply	other threads:[~2021-02-07  0:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-25 22:34 [PATCH v4 0/3] Preventing job distribution to isolated CPUs Nitesh Narayan Lal
2020-06-25 22:34 ` [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs Nitesh Narayan Lal
2020-06-29 16:11   ` Nitesh Narayan Lal
2020-07-01  0:32     ` Andrew Morton
2020-07-01  0:47       ` Nitesh Narayan Lal
2021-01-27 11:57   ` Robin Murphy
2021-01-27 12:19     ` Marcelo Tosatti
2021-01-27 12:36       ` Robin Murphy
2021-01-27 13:09         ` Marcelo Tosatti
2021-01-27 13:49           ` Robin Murphy
2021-01-27 14:16           ` Nitesh Narayan Lal
2021-01-28 15:56           ` Thomas Gleixner
2021-01-28 16:33             ` Marcelo Tosatti
     [not found]             ` <02ac9d85-7ddd-96da-1252-4663feea7c9f@marvell.com>
2021-02-01 17:50               ` [EXT] " Marcelo Tosatti
2021-01-28 16:02       ` Thomas Gleixner
2021-01-28 16:59         ` Marcelo Tosatti
2021-01-28 17:35           ` Nitesh Narayan Lal
2021-01-28 20:01           ` Thomas Gleixner
     [not found]             ` <d2a4dc97-a9ed-e0e7-3b9c-c56ae46f6608@redhat.com>
     [not found]               ` <20210129142356.GB40876@fuller.cnet>
2021-01-29 17:34                 ` [EXT] " Alex Belits
     [not found]                 ` <18584612-868c-0f88-5de2-dc93c8638816@redhat.com>
2021-02-05 19:56                   ` Thomas Gleixner
2021-02-04 18:15             ` Marcelo Tosatti
2021-02-04 18:47               ` Nitesh Narayan Lal
2021-02-04 19:06                 ` Marcelo Tosatti
2021-02-04 19:17                   ` Nitesh Narayan Lal
2021-02-05 22:23                     ` Thomas Gleixner
2021-02-05 22:26                       ` Thomas Gleixner
2021-02-07  0:43                       ` Nitesh Narayan Lal [this message]
2021-02-11 15:55                         ` Nitesh Narayan Lal
2021-03-04 18:15                           ` Nitesh Narayan Lal
     [not found]                             ` <faa8d84e-db67-7fbe-891e-f4987f106b20@marvell.com>
2021-03-04 23:23                               ` [EXT] " Nitesh Narayan Lal
2021-04-06 17:22                             ` Jesse Brandeburg
2021-04-07 15:18                               ` Nitesh Narayan Lal
2021-04-08 18:49                                 ` Nitesh Narayan Lal
2021-04-14 16:11                                 ` Jesse Brandeburg
2021-04-15 22:11                                   ` Nitesh Narayan Lal
2021-04-29 21:44                                     ` Nitesh Lal
2021-04-30  1:48                                       ` Jesse Brandeburg
2021-04-30 13:10                                         ` Nitesh Lal
2021-04-30  7:10                                       ` Thomas Gleixner
2021-04-30 16:14                                         ` Nitesh Lal
2021-04-30 18:21                                           ` Thomas Gleixner
2021-04-30 21:07                                             ` Nitesh Lal
2021-05-01  2:21                                               ` Jesse Brandeburg
2021-05-03 13:15                                                 ` Nitesh Lal
2020-06-25 22:34 ` [Patch v4 2/3] PCI: Restrict probe functions to housekeeping CPUs Nitesh Narayan Lal
2020-06-25 22:34 ` [Patch v4 3/3] net: Restrict receive packets queuing " Nitesh Narayan Lal
2020-06-26 11:14   ` Peter Zijlstra
2020-06-26 17:20     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0aed683-87ae-91a2-d093-de3f5d8a8251@redhat.com \
    --to=nitesh@redhat.com \
    --cc=abelits@marvell.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=frederic@kernel.org \
    --cc=jinyuqi@huawei.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=stephen@networkplumber.org \
    --cc=tglx@linutronix.de \
    --cc=zhangshaokun@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).