Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

From: Robin Murphy <robin.murphy@arm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	frederic@kernel.org, juri.lelli@redhat.com, abelits@marvell.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	rostedt@goodmis.org, mingo@kernel.org, peterz@infradead.org,
	davem@davemloft.net, akpm@linux-foundation.org,
	sfr@canb.auug.org.au, stephen@networkplumber.org,
	rppt@linux.vnet.ibm.com, jinyuqi@huawei.com,
	zhangshaokun@hisilicon.com
Subject: Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
Date: Wed, 27 Jan 2021 13:49:48 +0000	[thread overview]
Message-ID: <87b81c08-878f-8394-be94-f5b99ee0be5b@arm.com> (raw)
In-Reply-To: <20210127130925.GA64740@fuller.cnet>

On 2021-01-27 13:09, Marcelo Tosatti wrote:
> On Wed, Jan 27, 2021 at 12:36:30PM +0000, Robin Murphy wrote:
>> On 2021-01-27 12:19, Marcelo Tosatti wrote:
>>> On Wed, Jan 27, 2021 at 11:57:16AM +0000, Robin Murphy wrote:
>>>> Hi,
>>>>
>>>> On 2020-06-25 23:34, Nitesh Narayan Lal wrote:
>>>>> From: Alex Belits <abelits@marvell.com>
>>>>>
>>>>> The current implementation of cpumask_local_spread() does not respect the
>>>>> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
>>>>> it will return it to the caller for pinning of its IRQ threads. Having
>>>>> these unwanted IRQ threads on an isolated CPU adds up to a latency
>>>>> overhead.
>>>>>
>>>>> Restrict the CPUs that are returned for spreading IRQs only to the
>>>>> available housekeeping CPUs.
>>>>>
>>>>> Signed-off-by: Alex Belits <abelits@marvell.com>
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>>> ---
>>>>>     lib/cpumask.c | 16 +++++++++++-----
>>>>>     1 file changed, 11 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/lib/cpumask.c b/lib/cpumask.c
>>>>> index fb22fb266f93..85da6ab4fbb5 100644
>>>>> --- a/lib/cpumask.c
>>>>> +++ b/lib/cpumask.c
>>>>> @@ -6,6 +6,7 @@
>>>>>     #include <linux/export.h>
>>>>>     #include <linux/memblock.h>
>>>>>     #include <linux/numa.h>
>>>>> +#include <linux/sched/isolation.h>
>>>>>     /**
>>>>>      * cpumask_next - get the next cpu in a cpumask
>>>>> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>>>>>      */
>>>>>     unsigned int cpumask_local_spread(unsigned int i, int node)
>>>>>     {
>>>>> -	int cpu;
>>>>> +	int cpu, hk_flags;
>>>>> +	const struct cpumask *mask;
>>>>> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_MANAGED_IRQ;
>>>>> +	mask = housekeeping_cpumask(hk_flags);
>>>>
>>>> AFAICS, this generally resolves to something based on cpu_possible_mask
>>>> rather than cpu_online_mask as before, so could now potentially return an
>>>> offline CPU. Was that an intentional change?
>>>
>>> Robin,
>>>
>>> AFAICS online CPUs should be filtered.
>>
>> Apologies if I'm being thick, but can you explain how? In the case of
>> isolation being disabled or compiled out, housekeeping_cpumask() is
>> literally just "return cpu_possible_mask;". If we then iterate over that
>> with for_each_cpu() and just return the i'th possible CPU (e.g. in the
>> NUMA_NO_NODE case), what guarantees that CPU is actually online?
>>
>> Robin.
> 
> Nothing, but that was the situation before 1abdfe706a579a702799fce465bceb9fb01d407c
> as well.

True, if someone calls from a racy context then there's not much we can 
do to ensure that any CPU *remains* online after we initially observed 
it to be, but when it's called from somewhere safe like a cpuhp offline 
handler, then picking from cpu_online_mask *did* always do the right 
thing (by my interpretation), whereas picking from 
housekeeping_cpumask() might not.

This is why I decided to ask rather than just send a patch to fix what I 
think might be a bug - I have no objection if this *is* intended 
behaviour, other than suggesting we amend the "...selects an online 
CPU..." comment if that aspect was never meant to be relied upon.

Thanks,
Robin.

> 
> cpumask_local_spread() should probably be disabling CPU hotplug.
> 
> Thomas?
> 
>>
>>>> I was just looking at the current code since I had the rare presence of mind
>>>> to check if something suitable already existed before I start open-coding
>>>> "any online CPU, but local node preferred" logic for handling IRQ affinity
>>>> in a driver - cpumask_local_spread() appears to be almost what I want (if a
>>>> bit more heavyweight), if only it would actually guarantee an online CPU as
>>>> the kerneldoc claims :(
>>>>
>>>> Robin.
>>>>
>>>>>     	/* Wrap: we always want a cpu. */
>>>>> -	i %= num_online_cpus();
>>>>> +	i %= cpumask_weight(mask);
>>>>>     	if (node == NUMA_NO_NODE) {
>>>>> -		for_each_cpu(cpu, cpu_online_mask)
>>>>> +		for_each_cpu(cpu, mask) {
>>>>>     			if (i-- == 0)
>>>>>     				return cpu;
>>>>> +		}
>>>>>     	} else {
>>>>>     		/* NUMA first. */
>>>>> -		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
>>>>> +		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
>>>>>     			if (i-- == 0)
>>>>>     				return cpu;
>>>>> +		}
>>>>> -		for_each_cpu(cpu, cpu_online_mask) {
>>>>> +		for_each_cpu(cpu, mask) {
>>>>>     			/* Skip NUMA nodes, done above. */
>>>>>     			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
>>>>>     				continue;
>>>>>
>>>
>