Re: [PATCH] driver core: ensure a device has valid node id in device_add()

From: Yunsheng Lin <linyunsheng@huawei.com>
To: Greg KH <gregkh@linuxfoundation.org>, Michal Hocko <mhocko@kernel.org>
Cc: <rafael@kernel.org>, <linux-kernel@vger.kernel.org>,
	<peterz@infradead.org>, <mingo@kernel.org>, <linuxarm@huawei.com>
Subject: Re: [PATCH] driver core: ensure a device has valid node id in device_add()
Date: Tue, 10 Sep 2019 20:47:40 +0800	[thread overview]
Message-ID: <5a5645d2-030f-7921-432f-ff7d657405b8@huawei.com> (raw)
In-Reply-To: <20190910111252.GA8970@kroah.com>

On 2019/9/10 19:12, Greg KH wrote:
> On Tue, Sep 10, 2019 at 01:04:51PM +0200, Michal Hocko wrote:
>> On Tue 10-09-19 18:58:05, Yunsheng Lin wrote:
>>> On 2019/9/10 17:31, Greg KH wrote:
>>>> On Tue, Sep 10, 2019 at 02:43:32PM +0800, Yunsheng Lin wrote:
>>>>> On 2019/9/9 17:53, Greg KH wrote:
>>>>>> On Mon, Sep 09, 2019 at 02:04:23PM +0800, Yunsheng Lin wrote:
>>>>>>> Currently a device does not belong to any of the numa nodes
>>>>>>> (dev->numa_node is NUMA_NO_NODE) when the node id is neither
>>>>>>> specified by fw nor by virtual device layer and the device has
>>>>>>> no parent device.
>>>>>>
>>>>>> Is this really a problem?
>>>>>
>>>>> Not really.
>>>>> Someone need to guess the node id when it is not specified, right?
>>>>
>>>> No, why?  Guessing guarantees you will get it wrong on some systems.
>>>>
>>>> Are you seeing real problems because the id is not being set?  What
>>>> problem is this fixing that you can actually observe?
>>>
>>> When passing the return value of dev_to_node() to cpumask_of_node()
>>> without checking the node id if the node id is not valid, there is
>>> global-out-of-bounds detected by KASAN as below:
>>
>> OK, I seem to remember this being brought up already. And now when I
>> think about it, we really want to make cpumask_of_node NUMA_NO_NODE
>> aware. That means using the same trick the allocator does for this
>> special case.
> 
> That seems reasonable to me, and much more "obvious" as to what is going
> on.
> 

Ok, thanks for the suggestion.

For arm64 and x86, there are two versions of cpumask_of_node().

when CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node()
   in arch/x86/mm/numa.c is used, which does partial node id checking:

const struct cpumask *cpumask_of_node(int node)
{
        if (node >= nr_node_ids) {
                printk(KERN_WARNING
                        "cpumask_of_node(%d): node > nr_node_ids(%u)\n",
                        node, nr_node_ids);
                dump_stack();
                return cpu_none_mask;
        }
        if (node_to_cpumask_map[node] == NULL) {
                printk(KERN_WARNING
                        "cpumask_of_node(%d): no node_to_cpumask_map!\n",
                        node);
                dump_stack();
                return cpu_online_mask;
        }
        return node_to_cpumask_map[node];
}

when CONFIG_DEBUG_PER_CPU_MAPS is undefined, the cpumask_of_node()
   in arch/x86/include/asm/topology.h is used:

static inline const struct cpumask *cpumask_of_node(int node)
{
        return node_to_cpumask_map[node];
}

As discussion in [1], adding the checking in cpumask_of_node() with
CONFIG_DEBUG_PER_CPU_MAPS not defined increases overhead for everyone,
and it is already true that cpumask_of_node() requires a valid node_id.

So maybe the overhead is worth it?

Hi, Peter
	Does the argument in this thread about making cpumask_of_node()
NUMA_NO_NODE aware make sense to you?

[1] https://lore.kernel.org/patchwork/patch/1122516/