linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	rafael@kernel.org, linux-kernel@vger.kernel.org,
	peterz@infradead.org, mingo@kernel.org, linuxarm@huawei.com
Subject: Re: [PATCH] driver core: ensure a device has valid node id in device_add()
Date: Wed, 11 Sep 2019 08:49:26 +0200	[thread overview]
Message-ID: <20190911064926.GJ4023@dhcp22.suse.cz> (raw)
In-Reply-To: <ca590101-bfc8-3934-d803-537aacb707e0@huawei.com>

On Wed 11-09-19 14:15:51, Yunsheng Lin wrote:
> On 2019/9/11 13:33, Michal Hocko wrote:
> > On Tue 10-09-19 14:53:39, Michal Hocko wrote:
> >> On Tue 10-09-19 20:47:40, Yunsheng Lin wrote:
> >>> On 2019/9/10 19:12, Greg KH wrote:
> >>>> On Tue, Sep 10, 2019 at 01:04:51PM +0200, Michal Hocko wrote:
> >>>>> On Tue 10-09-19 18:58:05, Yunsheng Lin wrote:
> >>>>>> On 2019/9/10 17:31, Greg KH wrote:
> >>>>>>> On Tue, Sep 10, 2019 at 02:43:32PM +0800, Yunsheng Lin wrote:
> >>>>>>>> On 2019/9/9 17:53, Greg KH wrote:
> >>>>>>>>> On Mon, Sep 09, 2019 at 02:04:23PM +0800, Yunsheng Lin wrote:
> >>>>>>>>>> Currently a device does not belong to any of the numa nodes
> >>>>>>>>>> (dev->numa_node is NUMA_NO_NODE) when the node id is neither
> >>>>>>>>>> specified by fw nor by virtual device layer and the device has
> >>>>>>>>>> no parent device.
> >>>>>>>>>
> >>>>>>>>> Is this really a problem?
> >>>>>>>>
> >>>>>>>> Not really.
> >>>>>>>> Someone need to guess the node id when it is not specified, right?
> >>>>>>>
> >>>>>>> No, why?  Guessing guarantees you will get it wrong on some systems.
> >>>>>>>
> >>>>>>> Are you seeing real problems because the id is not being set?  What
> >>>>>>> problem is this fixing that you can actually observe?
> >>>>>>
> >>>>>> When passing the return value of dev_to_node() to cpumask_of_node()
> >>>>>> without checking the node id if the node id is not valid, there is
> >>>>>> global-out-of-bounds detected by KASAN as below:
> >>>>>
> >>>>> OK, I seem to remember this being brought up already. And now when I
> >>>>> think about it, we really want to make cpumask_of_node NUMA_NO_NODE
> >>>>> aware. That means using the same trick the allocator does for this
> >>>>> special case.
> >>>>
> >>>> That seems reasonable to me, and much more "obvious" as to what is going
> >>>> on.
> >>>>
> >>>
> >>> Ok, thanks for the suggestion.
> >>>
> >>> For arm64 and x86, there are two versions of cpumask_of_node().
> >>>
> >>> when CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node()
> >>>    in arch/x86/mm/numa.c is used, which does partial node id checking:
> >>>
> >>> const struct cpumask *cpumask_of_node(int node)
> >>> {
> >>>         if (node >= nr_node_ids) {
> >>>                 printk(KERN_WARNING
> >>>                         "cpumask_of_node(%d): node > nr_node_ids(%u)\n",
> >>>                         node, nr_node_ids);
> >>>                 dump_stack();
> >>>                 return cpu_none_mask;
> >>>         }
> >>>         if (node_to_cpumask_map[node] == NULL) {
> >>>                 printk(KERN_WARNING
> >>>                         "cpumask_of_node(%d): no node_to_cpumask_map!\n",
> >>>                         node);
> >>>                 dump_stack();
> >>>                 return cpu_online_mask;
> >>>         }
> >>>         return node_to_cpumask_map[node];
> >>> }
> >>>
> >>> when CONFIG_DEBUG_PER_CPU_MAPS is undefined, the cpumask_of_node()
> >>>    in arch/x86/include/asm/topology.h is used:
> >>>
> >>> static inline const struct cpumask *cpumask_of_node(int node)
> >>> {
> >>>         return node_to_cpumask_map[node];
> >>> }
> >>
> >> I would simply go with. There shouldn't be any need for heavy weight
> >> checks that CONFIG_DEBUG_PER_CPU_MAPS has.
> >>
> >> static inline const struct cpumask *cpumask_of_node(int node)
> >> {
> >> 	/* A nice comment goes here */
> >> 	if (node == NUMA_NO_NODE)
> 
> How about "(unsigned int)node >= nr_node_ids", this is suggested
> by Peter, it checks the case where the node id set by fw is bigger
> or equal than nr_node_ids, and still handle the < 0 case, which
> includes NUMA_NO_NODE.

Isn't that a plain bug? Is something like that really happening?

> Maybe define a macro like below to do that in order to do
> the node checking consistently through kernel:
> 
> #define numa_node_valid(node)	((unsigned int)(node) < nr_node_ids)
> 
> 
> >> 		return node_to_cpumask_map[numa_mem_id()];
> >>         return node_to_cpumask_map[node];
> >> }
> > 
> > Sleeping over this and thinking more about the actual semantic the above
> > is wrong. We cannot really copy the page allocator logic. Why? Simply
> > because the page allocator doesn't enforce the near node affinity. It
> > just picks it up as a preferred node but then it is free to fallback to
> > any other numa node. This is not the case here and node_to_cpumask_map will
> > only restrict to the particular node's cpus which would have really non
> > deterministic behavior depending on where the code is executed. So in
> > fact we really want to return cpu_online_mask for NUMA_NO_NODE.
> 
> From below, if the __GFP_THISNODE is set, the fallback is not performed.

__GFP_THISNODE is a very specific situation when the caller knows which
node should be used for the allocation. NUMA_NO_NODE && __GFP_THISNODE
is a bug and I would dare to call it an undefined behavior.

> For node_to_cpumask_map() case, maybe we can return the cpumask that is
> on the node of cpu_to_node(raw_smp_processor_id()) for NUMA_NO_NODE,
> because the current cpu does belong to a node, and the node does have at
> least one cpu, which is the cpu is calling the node_to_cpumask_map().
> 
> Make any sense?

No. Please read the above paragraph again. NUMA_NO_NODE really means no
node affinity. So all cpus should be usable. Making any assumptions
about a local context is just wrong.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-09-11  6:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  6:04 [PATCH] driver core: ensure a device has valid node id in device_add() Yunsheng Lin
2019-09-09  9:53 ` Greg KH
2019-09-10  6:43   ` Yunsheng Lin
2019-09-10  7:13     ` Michal Hocko
2019-09-10  9:31     ` Greg KH
2019-09-10 10:58       ` Yunsheng Lin
2019-09-10 11:04         ` Michal Hocko
2019-09-10 11:12           ` Greg KH
2019-09-10 12:47             ` Yunsheng Lin
2019-09-10 12:53               ` Michal Hocko
2019-09-11  5:33                 ` Michal Hocko
2019-09-11  6:15                   ` Yunsheng Lin
2019-09-11  6:49                     ` Michal Hocko [this message]
2019-09-11  7:22                       ` Yunsheng Lin
2019-09-11  7:34                         ` Michal Hocko
2019-09-11 11:03                           ` Yunsheng Lin
2019-09-11 11:41                             ` Yunsheng Lin
2019-09-11 12:02                               ` Michal Hocko
2019-09-23 15:09                       ` Peter Zijlstra
2019-09-09 18:50 ` Michal Hocko
2019-09-10  7:08   ` Yunsheng Lin
2019-09-10  7:24     ` Michal Hocko
2019-09-10 10:40       ` Yunsheng Lin
2019-09-10 11:01         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190911064926.GJ4023@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=linyunsheng@huawei.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).