All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: dalias@libc.org, linux-sh@vger.kernel.org,
	catalin.marinas@arm.com, dave.hansen@linux.intel.com,
	heiko.carstens@de.ibm.com, linuxarm@huawei.com,
	jiaxun.yang@flygoat.com, linux-kernel@vger.kernel.org,
	mwb@linux.vnet.ibm.com, paulus@samba.org, hpa@zytor.com,
	sparclinux@vger.kernel.org, chenhc@lemote.com, will@kernel.org,
	linux-s390@vger.kernel.org, ysato@users.sourceforge.jp,
	mpe@ellerman.id.au, x86@kernel.org, rppt@linux.ibm.com,
	borntraeger@de.ibm.com, dledford@redhat.com, mingo@redhat.com,
	jeffrey.t.kirsher@intel.com, benh@kernel.crashing.org,
	jhogan@kernel.org, nfont@linux.vnet.ibm.com, mattst88@gmail.com,
	len.brown@intel.com, gor@linux.ibm.com,
	anshuman.khandual@arm.com, ink@jurassic.park.msu.ru, cai@lca.pw,
	luto@kernel.org, tglx@linutronix.de,
	naveen.n.rao@linux.vnet.ibm.com,
	linux-arm-kernel@lists.infradead.org, rth@twiddle.net,
	axboe@kernel.dk, robin.murphy@arm.com,
	linux-mips@vger.kernel.org, ralf@linux-mips.org,
	tbogendoerfer@suse.de, paul.burton@mips.com,
	linux-alpha@vger.kernel.org, bp@alien8.de,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	davem@davemloft.net,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86
Date: Tue, 03 Sep 2019 08:31:11 +0000	[thread overview]
Message-ID: <06eee8d0-ce56-03da-30a5-6b07e989a5e0@huawei.com> (raw)
In-Reply-To: <20190903071111.GU2369@hirez.programming.kicks-ass.net>

On 2019/9/3 15:11, Peter Zijlstra wrote:
> On Tue, Sep 03, 2019 at 02:19:04PM +0800, Yunsheng Lin wrote:
>> On 2019/9/2 20:56, Peter Zijlstra wrote:
>>> On Mon, Sep 02, 2019 at 08:25:24PM +0800, Yunsheng Lin wrote:
>>>> On 2019/9/2 15:25, Peter Zijlstra wrote:
>>>>> On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote:
>>>>>> On 2019/9/1 0:12, Peter Zijlstra wrote:
>>>>>
>>>>>>> 1) because even it is not set, the device really does belong to a node.
>>>>>>> It is impossible a device will have magic uniform access to memory when
>>>>>>> CPUs cannot.
>>>>>>
>>>>>> So it means dev_to_node() will return either NUMA_NO_NODE or a
>>>>>> valid node id?
>>>>>
>>>>> NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I
>>>>> said, not a valid device location on a NUMA system.
>>>>>
>>>>> Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a
>>>>> node association. It just means we don't know and might have to guess.
>>>>
>>>> How do we guess the device's location when ACPI/BIOS does not set it?
>>>
>>> See device_add(), it looks to the device's parent and on NO_NODE, puts
>>> it there.
>>>
>>> Lacking any hints, just stick it to node0 and print a FW_BUG or
>>> something.
>>>
>>>> It seems dev_to_node() does not do anything about that and leave the
>>>> job to the caller or whatever function that get called with its return
>>>> value, such as cpumask_of_node().
>>>
>>> Well, dev_to_node() doesn't do anything; nor should it. It are the
>>> callers of set_dev_node() that should be taking care.
>>>
>>> Also note how device_add() sets the device node to the parent device's
>>> node on NUMA_NO_NODE. Arguably we should change it to complain when it
>>> finds NUMA_NO_NODE and !parent.
>>
>> Is it possible that the node id set by device_add() become invalid
>> if the node is offlined, then dev_to_node() may return a invalid
>> node id.
> 
> In that case I would expect the device to go away too. Once the memory
> controller goes away, the PCI bus connected to it cannot continue to
> function.

Ok. To summarize the discussion in order to for me to understand it
correctly:

1) Make sure device_add() set to default node0 to a device if
   ACPI/BIOS does not set the node id and it has not no parent device.

2) Use '(unsigned)node_id >= nr_node_ids' to fix the
   CONFIG_DEBUG_PER_CPU_MAPS version of cpumask_of_node() for x86
   and arm64, x86 just has a fix from you now, a patch for arm64 is
   also needed.

3) Maybe fix some other the sign bug for node id checking through the
   kernel using the '(unsigned)node_id >= nr_node_ids'.

Please see if I understand it correctly or miss something.
Maybe I can begin by sending a patch about item one to see if everyone
is ok with the idea?


> 
>> From the comment in select_fallback_rq(), it seems that a node can
>> be offlined, not sure if node offline process has taken cared of that?
>>
>> 	/*
>>          * If the node that the CPU is on has been offlined, cpu_to_node()
>>          * will return -1. There is no CPU on the node, and we should
>>          * select the CPU on the other node.
>>          */
> 
> Ugh, so I disagree with that notion. cpu_to_node() mapping should be
> fixed, you simply cannot change it after boot, too much stuff relies on
> it.
> 
> Setting cpu_to_node to -1 on node offline is just wrong. But alas, it
> seems this is already so.

WARNING: multiple messages have this Message-ID (diff)
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: <dalias@libc.org>, <linux-sh@vger.kernel.org>,
	<catalin.marinas@arm.com>, <dave.hansen@linux.intel.com>,
	<heiko.carstens@de.ibm.com>, <linuxarm@huawei.com>,
	<jiaxun.yang@flygoat.com>, <linux-kernel@vger.kernel.org>,
	<mwb@linux.vnet.ibm.com>, <paulus@samba.org>, <hpa@zytor.com>,
	<sparclinux@vger.kernel.org>, <chenhc@lemote.com>,
	<will@kernel.org>, <linux-s390@vger.kernel.org>,
	<ysato@users.sourceforge.jp>, <mpe@ellerman.id.au>,
	<x86@kernel.org>, <rppt@linux.ibm.com>, <borntraeger@de.ibm.com>,
	<dledford@redhat.com>, <mingo@redhat.com>,
	<jeffrey.t.kirsher@intel.com>, <benh@kernel.crashing.org>,
	<jhogan@kernel.org>, <nfont@linux.vnet.ibm.com>,
	<mattst88@gmail.com>, <len.brown@intel.com>, <gor@linux.ibm.com>,
	<anshuman.khandual@arm.com>, <ink@jurassic.park.msu.ru>,
	<cai@lca.pw>, <luto@kernel.org>, <tglx@linutronix.de>,
	<naveen.n.rao@linux.vnet.ibm.com>,
	<linux-arm-kernel@lists.infradead.org>, <rth@twiddle.net>,
	<axboe@kernel.dk>, <robin.murphy@arm.com>,
	<linux-mips@vger.kernel.org>, <ralf@linux-mips.org>,
	<tbogendoerfer@suse.de>, <paul.burton@mips.com>,
	<linux-alpha@vger.kernel.org>, <bp@alien8.de>,
	<akpm@linux-foundation.org>, <linuxppc-dev@lists.ozlabs.org>,
	<davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86
Date: Tue, 3 Sep 2019 16:31:11 +0800	[thread overview]
Message-ID: <06eee8d0-ce56-03da-30a5-6b07e989a5e0@huawei.com> (raw)
In-Reply-To: <20190903071111.GU2369@hirez.programming.kicks-ass.net>

On 2019/9/3 15:11, Peter Zijlstra wrote:
> On Tue, Sep 03, 2019 at 02:19:04PM +0800, Yunsheng Lin wrote:
>> On 2019/9/2 20:56, Peter Zijlstra wrote:
>>> On Mon, Sep 02, 2019 at 08:25:24PM +0800, Yunsheng Lin wrote:
>>>> On 2019/9/2 15:25, Peter Zijlstra wrote:
>>>>> On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote:
>>>>>> On 2019/9/1 0:12, Peter Zijlstra wrote:
>>>>>
>>>>>>> 1) because even it is not set, the device really does belong to a node.
>>>>>>> It is impossible a device will have magic uniform access to memory when
>>>>>>> CPUs cannot.
>>>>>>
>>>>>> So it means dev_to_node() will return either NUMA_NO_NODE or a
>>>>>> valid node id?
>>>>>
>>>>> NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I
>>>>> said, not a valid device location on a NUMA system.
>>>>>
>>>>> Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a
>>>>> node association. It just means we don't know and might have to guess.
>>>>
>>>> How do we guess the device's location when ACPI/BIOS does not set it?
>>>
>>> See device_add(), it looks to the device's parent and on NO_NODE, puts
>>> it there.
>>>
>>> Lacking any hints, just stick it to node0 and print a FW_BUG or
>>> something.
>>>
>>>> It seems dev_to_node() does not do anything about that and leave the
>>>> job to the caller or whatever function that get called with its return
>>>> value, such as cpumask_of_node().
>>>
>>> Well, dev_to_node() doesn't do anything; nor should it. It are the
>>> callers of set_dev_node() that should be taking care.
>>>
>>> Also note how device_add() sets the device node to the parent device's
>>> node on NUMA_NO_NODE. Arguably we should change it to complain when it
>>> finds NUMA_NO_NODE and !parent.
>>
>> Is it possible that the node id set by device_add() become invalid
>> if the node is offlined, then dev_to_node() may return a invalid
>> node id.
> 
> In that case I would expect the device to go away too. Once the memory
> controller goes away, the PCI bus connected to it cannot continue to
> function.

Ok. To summarize the discussion in order to for me to understand it
correctly:

1) Make sure device_add() set to default node0 to a device if
   ACPI/BIOS does not set the node id and it has not no parent device.

2) Use '(unsigned)node_id >= nr_node_ids' to fix the
   CONFIG_DEBUG_PER_CPU_MAPS version of cpumask_of_node() for x86
   and arm64, x86 just has a fix from you now, a patch for arm64 is
   also needed.

3) Maybe fix some other the sign bug for node id checking through the
   kernel using the '(unsigned)node_id >= nr_node_ids'.

Please see if I understand it correctly or miss something.
Maybe I can begin by sending a patch about item one to see if everyone
is ok with the idea?


> 
>> From the comment in select_fallback_rq(), it seems that a node can
>> be offlined, not sure if node offline process has taken cared of that?
>>
>> 	/*
>>          * If the node that the CPU is on has been offlined, cpu_to_node()
>>          * will return -1. There is no CPU on the node, and we should
>>          * select the CPU on the other node.
>>          */
> 
> Ugh, so I disagree with that notion. cpu_to_node() mapping should be
> fixed, you simply cannot change it after boot, too much stuff relies on
> it.
> 
> Setting cpu_to_node to -1 on node offline is just wrong. But alas, it
> seems this is already so.



WARNING: multiple messages have this Message-ID (diff)
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: dalias@libc.org, linux-sh@vger.kernel.org,
	catalin.marinas@arm.com, dave.hansen@linux.intel.com,
	heiko.carstens@de.ibm.com, linuxarm@huawei.com,
	jiaxun.yang@flygoat.com, linux-kernel@vger.kernel.org,
	mwb@linux.vnet.ibm.com, paulus@samba.org, hpa@zytor.com,
	sparclinux@vger.kernel.org, chenhc@lemote.com, will@kernel.org,
	linux-s390@vger.kernel.org, ysato@users.sourceforge.jp,
	mpe@ellerman.id.au, x86@kernel.org, rppt@linux.ibm.com,
	borntraeger@de.ibm.com, dledford@redhat.com, mingo@redhat.com,
	jeffrey.t.kirsher@intel.com, benh@kernel.crashing.org,
	jhogan@kernel.org, nfont@linux.vnet.ibm.com, mattst88@gmail.com,
	len.brown@intel.com, gor@linux.ibm.com,
	anshuman.khandual@arm.com, ink@jurassic.park.msu.ru, cai@lca.pw,
	luto@kernel.org, tglx@linutronix.de,
	naveen.n.rao@linux.vnet.ibm.com,
	linux-arm-kernel@lists.infradead.org, rth@twiddle.net,
	axboe@kernel.dk, robin.murphy@arm.com,
	linux-mips@vger.kernel.org, ralf@linux-mips.org,
	tbogendoerfer@suse.de, paul.burton@mips.com,
	linux-alpha@vger.kernel.org, bp@alien8.de,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	davem@davemloft.net,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86
Date: Tue, 3 Sep 2019 16:31:11 +0800	[thread overview]
Message-ID: <06eee8d0-ce56-03da-30a5-6b07e989a5e0@huawei.com> (raw)
In-Reply-To: <20190903071111.GU2369@hirez.programming.kicks-ass.net>

On 2019/9/3 15:11, Peter Zijlstra wrote:
> On Tue, Sep 03, 2019 at 02:19:04PM +0800, Yunsheng Lin wrote:
>> On 2019/9/2 20:56, Peter Zijlstra wrote:
>>> On Mon, Sep 02, 2019 at 08:25:24PM +0800, Yunsheng Lin wrote:
>>>> On 2019/9/2 15:25, Peter Zijlstra wrote:
>>>>> On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote:
>>>>>> On 2019/9/1 0:12, Peter Zijlstra wrote:
>>>>>
>>>>>>> 1) because even it is not set, the device really does belong to a node.
>>>>>>> It is impossible a device will have magic uniform access to memory when
>>>>>>> CPUs cannot.
>>>>>>
>>>>>> So it means dev_to_node() will return either NUMA_NO_NODE or a
>>>>>> valid node id?
>>>>>
>>>>> NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I
>>>>> said, not a valid device location on a NUMA system.
>>>>>
>>>>> Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a
>>>>> node association. It just means we don't know and might have to guess.
>>>>
>>>> How do we guess the device's location when ACPI/BIOS does not set it?
>>>
>>> See device_add(), it looks to the device's parent and on NO_NODE, puts
>>> it there.
>>>
>>> Lacking any hints, just stick it to node0 and print a FW_BUG or
>>> something.
>>>
>>>> It seems dev_to_node() does not do anything about that and leave the
>>>> job to the caller or whatever function that get called with its return
>>>> value, such as cpumask_of_node().
>>>
>>> Well, dev_to_node() doesn't do anything; nor should it. It are the
>>> callers of set_dev_node() that should be taking care.
>>>
>>> Also note how device_add() sets the device node to the parent device's
>>> node on NUMA_NO_NODE. Arguably we should change it to complain when it
>>> finds NUMA_NO_NODE and !parent.
>>
>> Is it possible that the node id set by device_add() become invalid
>> if the node is offlined, then dev_to_node() may return a invalid
>> node id.
> 
> In that case I would expect the device to go away too. Once the memory
> controller goes away, the PCI bus connected to it cannot continue to
> function.

Ok. To summarize the discussion in order to for me to understand it
correctly:

1) Make sure device_add() set to default node0 to a device if
   ACPI/BIOS does not set the node id and it has not no parent device.

2) Use '(unsigned)node_id >= nr_node_ids' to fix the
   CONFIG_DEBUG_PER_CPU_MAPS version of cpumask_of_node() for x86
   and arm64, x86 just has a fix from you now, a patch for arm64 is
   also needed.

3) Maybe fix some other the sign bug for node id checking through the
   kernel using the '(unsigned)node_id >= nr_node_ids'.

Please see if I understand it correctly or miss something.
Maybe I can begin by sending a patch about item one to see if everyone
is ok with the idea?


> 
>> From the comment in select_fallback_rq(), it seems that a node can
>> be offlined, not sure if node offline process has taken cared of that?
>>
>> 	/*
>>          * If the node that the CPU is on has been offlined, cpu_to_node()
>>          * will return -1. There is no CPU on the node, and we should
>>          * select the CPU on the other node.
>>          */
> 
> Ugh, so I disagree with that notion. cpu_to_node() mapping should be
> fixed, you simply cannot change it after boot, too much stuff relies on
> it.
> 
> Setting cpu_to_node to -1 on node offline is just wrong. But alas, it
> seems this is already so.

WARNING: multiple messages have this Message-ID (diff)
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: dalias@libc.org, linux-sh@vger.kernel.org,
	catalin.marinas@arm.com, dave.hansen@linux.intel.com,
	heiko.carstens@de.ibm.com, linuxarm@huawei.com,
	jiaxun.yang@flygoat.com, linux-mips@vger.kernel.org,
	mwb@linux.vnet.ibm.com, paulus@samba.org, hpa@zytor.com,
	sparclinux@vger.kernel.org, chenhc@lemote.com, will@kernel.org,
	bp@alien8.de, linux-s390@vger.kernel.org,
	ysato@users.sourceforge.jp, x86@kernel.org, rppt@linux.ibm.com,
	borntraeger@de.ibm.com, dledford@redhat.com, mingo@redhat.com,
	jeffrey.t.kirsher@intel.com, jhogan@kernel.org,
	nfont@linux.vnet.ibm.com, mattst88@gmail.com,
	len.brown@intel.com, gor@linux.ibm.com,
	anshuman.khandual@arm.com,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	ink@jurassic.park.msu.ru, luto@kernel.org, tglx@linutronix.de,
	naveen.n.rao@linux.vnet.ibm.com,
	linux-arm-kernel@lists.infradead.org, rth@twiddle.net,
	axboe@kernel.dk, linuxppc-dev@lists.ozlabs.org,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	tbogendoerfer@suse.de, paul.burton@mips.com,
	linux-alpha@vger.kernel.org, cai@lca.pw,
	akpm@linux-foundation.org, robin.murphy@arm.com,
	davem@davemloft.net
Subject: Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86
Date: Tue, 3 Sep 2019 16:31:11 +0800	[thread overview]
Message-ID: <06eee8d0-ce56-03da-30a5-6b07e989a5e0@huawei.com> (raw)
In-Reply-To: <20190903071111.GU2369@hirez.programming.kicks-ass.net>

On 2019/9/3 15:11, Peter Zijlstra wrote:
> On Tue, Sep 03, 2019 at 02:19:04PM +0800, Yunsheng Lin wrote:
>> On 2019/9/2 20:56, Peter Zijlstra wrote:
>>> On Mon, Sep 02, 2019 at 08:25:24PM +0800, Yunsheng Lin wrote:
>>>> On 2019/9/2 15:25, Peter Zijlstra wrote:
>>>>> On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote:
>>>>>> On 2019/9/1 0:12, Peter Zijlstra wrote:
>>>>>
>>>>>>> 1) because even it is not set, the device really does belong to a node.
>>>>>>> It is impossible a device will have magic uniform access to memory when
>>>>>>> CPUs cannot.
>>>>>>
>>>>>> So it means dev_to_node() will return either NUMA_NO_NODE or a
>>>>>> valid node id?
>>>>>
>>>>> NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I
>>>>> said, not a valid device location on a NUMA system.
>>>>>
>>>>> Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a
>>>>> node association. It just means we don't know and might have to guess.
>>>>
>>>> How do we guess the device's location when ACPI/BIOS does not set it?
>>>
>>> See device_add(), it looks to the device's parent and on NO_NODE, puts
>>> it there.
>>>
>>> Lacking any hints, just stick it to node0 and print a FW_BUG or
>>> something.
>>>
>>>> It seems dev_to_node() does not do anything about that and leave the
>>>> job to the caller or whatever function that get called with its return
>>>> value, such as cpumask_of_node().
>>>
>>> Well, dev_to_node() doesn't do anything; nor should it. It are the
>>> callers of set_dev_node() that should be taking care.
>>>
>>> Also note how device_add() sets the device node to the parent device's
>>> node on NUMA_NO_NODE. Arguably we should change it to complain when it
>>> finds NUMA_NO_NODE and !parent.
>>
>> Is it possible that the node id set by device_add() become invalid
>> if the node is offlined, then dev_to_node() may return a invalid
>> node id.
> 
> In that case I would expect the device to go away too. Once the memory
> controller goes away, the PCI bus connected to it cannot continue to
> function.

Ok. To summarize the discussion in order to for me to understand it
correctly:

1) Make sure device_add() set to default node0 to a device if
   ACPI/BIOS does not set the node id and it has not no parent device.

2) Use '(unsigned)node_id >= nr_node_ids' to fix the
   CONFIG_DEBUG_PER_CPU_MAPS version of cpumask_of_node() for x86
   and arm64, x86 just has a fix from you now, a patch for arm64 is
   also needed.

3) Maybe fix some other the sign bug for node id checking through the
   kernel using the '(unsigned)node_id >= nr_node_ids'.

Please see if I understand it correctly or miss something.
Maybe I can begin by sending a patch about item one to see if everyone
is ok with the idea?


> 
>> From the comment in select_fallback_rq(), it seems that a node can
>> be offlined, not sure if node offline process has taken cared of that?
>>
>> 	/*
>>          * If the node that the CPU is on has been offlined, cpu_to_node()
>>          * will return -1. There is no CPU on the node, and we should
>>          * select the CPU on the other node.
>>          */
> 
> Ugh, so I disagree with that notion. cpu_to_node() mapping should be
> fixed, you simply cannot change it after boot, too much stuff relies on
> it.
> 
> Setting cpu_to_node to -1 on node offline is just wrong. But alas, it
> seems this is already so.



WARNING: multiple messages have this Message-ID (diff)
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: dalias@libc.org, linux-sh@vger.kernel.org,
	catalin.marinas@arm.com, dave.hansen@linux.intel.com,
	heiko.carstens@de.ibm.com, linuxarm@huawei.com,
	jiaxun.yang@flygoat.com, linux-kernel@vger.kernel.org,
	mwb@linux.vnet.ibm.com, paulus@samba.org, hpa@zytor.com,
	sparclinux@vger.kernel.org, chenhc@lemote.com, will@kernel.org,
	linux-s390@vger.kernel.org, ysato@users.sourceforge.jp,
	mpe@ellerman.id.au, x86@kernel.org, rppt@linux.ibm.com,
	borntraeger@de.ibm.com, dledford@redhat.com, mingo@redhat.com,
	jeffrey.t.kirsher@intel.com, benh@kernel.crashing.org,
	jhogan@kernel.org, nfont@linux.vnet.ibm.com, mattst88@gmail.com,
	len.brown@intel.com, gor@linux.ibm.com,
	anshuman.khandual@arm.com, ink@jurassic.park.msu.ru, cai@lca.pw
Subject: Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86
Date: Tue, 3 Sep 2019 16:31:11 +0800	[thread overview]
Message-ID: <06eee8d0-ce56-03da-30a5-6b07e989a5e0@huawei.com> (raw)
In-Reply-To: <20190903071111.GU2369@hirez.programming.kicks-ass.net>

On 2019/9/3 15:11, Peter Zijlstra wrote:
> On Tue, Sep 03, 2019 at 02:19:04PM +0800, Yunsheng Lin wrote:
>> On 2019/9/2 20:56, Peter Zijlstra wrote:
>>> On Mon, Sep 02, 2019 at 08:25:24PM +0800, Yunsheng Lin wrote:
>>>> On 2019/9/2 15:25, Peter Zijlstra wrote:
>>>>> On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote:
>>>>>> On 2019/9/1 0:12, Peter Zijlstra wrote:
>>>>>
>>>>>>> 1) because even it is not set, the device really does belong to a node.
>>>>>>> It is impossible a device will have magic uniform access to memory when
>>>>>>> CPUs cannot.
>>>>>>
>>>>>> So it means dev_to_node() will return either NUMA_NO_NODE or a
>>>>>> valid node id?
>>>>>
>>>>> NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I
>>>>> said, not a valid device location on a NUMA system.
>>>>>
>>>>> Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a
>>>>> node association. It just means we don't know and might have to guess.
>>>>
>>>> How do we guess the device's location when ACPI/BIOS does not set it?
>>>
>>> See device_add(), it looks to the device's parent and on NO_NODE, puts
>>> it there.
>>>
>>> Lacking any hints, just stick it to node0 and print a FW_BUG or
>>> something.
>>>
>>>> It seems dev_to_node() does not do anything about that and leave the
>>>> job to the caller or whatever function that get called with its return
>>>> value, such as cpumask_of_node().
>>>
>>> Well, dev_to_node() doesn't do anything; nor should it. It are the
>>> callers of set_dev_node() that should be taking care.
>>>
>>> Also note how device_add() sets the device node to the parent device's
>>> node on NUMA_NO_NODE. Arguably we should change it to complain when it
>>> finds NUMA_NO_NODE and !parent.
>>
>> Is it possible that the node id set by device_add() become invalid
>> if the node is offlined, then dev_to_node() may return a invalid
>> node id.
> 
> In that case I would expect the device to go away too. Once the memory
> controller goes away, the PCI bus connected to it cannot continue to
> function.

Ok. To summarize the discussion in order to for me to understand it
correctly:

1) Make sure device_add() set to default node0 to a device if
   ACPI/BIOS does not set the node id and it has not no parent device.

2) Use '(unsigned)node_id >= nr_node_ids' to fix the
   CONFIG_DEBUG_PER_CPU_MAPS version of cpumask_of_node() for x86
   and arm64, x86 just has a fix from you now, a patch for arm64 is
   also needed.

3) Maybe fix some other the sign bug for node id checking through the
   kernel using the '(unsigned)node_id >= nr_node_ids'.

Please see if I understand it correctly or miss something.
Maybe I can begin by sending a patch about item one to see if everyone
is ok with the idea?


> 
>> From the comment in select_fallback_rq(), it seems that a node can
>> be offlined, not sure if node offline process has taken cared of that?
>>
>> 	/*
>>          * If the node that the CPU is on has been offlined, cpu_to_node()
>>          * will return -1. There is no CPU on the node, and we should
>>          * select the CPU on the other node.
>>          */
> 
> Ugh, so I disagree with that notion. cpu_to_node() mapping should be
> fixed, you simply cannot change it after boot, too much stuff relies on
> it.
> 
> Setting cpu_to_node to -1 on node offline is just wrong. But alas, it
> seems this is already so.



  reply	other threads:[~2019-09-03  8:31 UTC|newest]

Thread overview: 164+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-31  5:58 [PATCH v2 0/9] check the node id consistently across different arches Yunsheng Lin
2019-08-31  5:58 ` Yunsheng Lin
2019-08-31  5:58 ` Yunsheng Lin
2019-08-31  5:58 ` Yunsheng Lin
2019-08-31  5:58 ` Yunsheng Lin
2019-08-31  5:58 ` Yunsheng Lin
2019-08-31  5:58 ` [PATCH v2 1/9] arm64: numa: check the node id consistently for arm64 Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58 ` [PATCH v2 2/9] x86: numa: check the node id consistently for x86 Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  8:55   ` Peter Zijlstra
2019-08-31  8:55     ` Peter Zijlstra
2019-08-31  8:55     ` Peter Zijlstra
2019-08-31  8:55     ` Peter Zijlstra
2019-08-31  8:55     ` Peter Zijlstra
2019-08-31 10:09     ` Yunsheng Lin
2019-08-31 10:09       ` Yunsheng Lin
2019-08-31 10:09       ` Yunsheng Lin
2019-08-31 10:09       ` Yunsheng Lin
2019-08-31 10:09       ` Yunsheng Lin
2019-08-31 10:09       ` Yunsheng Lin
2019-08-31 16:12       ` Peter Zijlstra
2019-08-31 16:12         ` Peter Zijlstra
2019-08-31 16:12         ` Peter Zijlstra
2019-08-31 16:12         ` Peter Zijlstra
2019-08-31 16:12         ` Peter Zijlstra
2019-09-01  4:45         ` Something about loongson_llsc_mb 陈华才
     [not found]           ` <2019090410032559707512@loongson.cn>
2019-09-04  9:21             ` Peter Zijlstra
2019-09-04 10:04               ` Peter Zijlstra
2019-09-04 12:57               ` Huang Pei
2019-09-02  5:46         ` [PATCH v2 2/9] x86: numa: check the node id consistently for x86 Yunsheng Lin
2019-09-02  5:46           ` Yunsheng Lin
2019-09-02  5:46           ` Yunsheng Lin
2019-09-02  5:46           ` Yunsheng Lin
2019-09-02  5:46           ` Yunsheng Lin
2019-09-02  5:46           ` Yunsheng Lin
2019-09-02  7:25           ` Peter Zijlstra
2019-09-02  7:25             ` Peter Zijlstra
2019-09-02  7:25             ` Peter Zijlstra
2019-09-02  7:25             ` Peter Zijlstra
2019-09-02  7:25             ` Peter Zijlstra
2019-09-02 12:25             ` Yunsheng Lin
2019-09-02 12:25               ` Yunsheng Lin
2019-09-02 12:25               ` Yunsheng Lin
2019-09-02 12:25               ` Yunsheng Lin
2019-09-02 12:25               ` Yunsheng Lin
2019-09-02 12:25               ` Yunsheng Lin
2019-09-02 12:56               ` Peter Zijlstra
2019-09-02 12:56                 ` Peter Zijlstra
2019-09-02 12:56                 ` Peter Zijlstra
2019-09-02 12:56                 ` Peter Zijlstra
2019-09-02 18:22                 ` Ingo Molnar
2019-09-02 18:22                   ` Ingo Molnar
2019-09-02 18:22                   ` Ingo Molnar
2019-09-02 18:22                   ` Ingo Molnar
2019-09-02 19:14                   ` Peter Zijlstra
2019-09-02 19:14                     ` Peter Zijlstra
2019-09-02 19:14                     ` Peter Zijlstra
2019-09-02 19:14                     ` Peter Zijlstra
2019-09-03  6:19                 ` Yunsheng Lin
2019-09-03  6:19                   ` Yunsheng Lin
2019-09-03  6:19                   ` Yunsheng Lin
2019-09-03  6:19                   ` Yunsheng Lin
2019-09-03  6:19                   ` Yunsheng Lin
2019-09-03  7:11                   ` Peter Zijlstra
2019-09-03  7:11                     ` Peter Zijlstra
2019-09-03  7:11                     ` Peter Zijlstra
2019-09-03  7:11                     ` Peter Zijlstra
2019-09-03  8:31                     ` Yunsheng Lin [this message]
2019-09-03  8:31                       ` Yunsheng Lin
2019-09-03  8:31                       ` Yunsheng Lin
2019-09-03  8:31                       ` Yunsheng Lin
2019-09-03  8:31                       ` Yunsheng Lin
2019-09-03 12:15                     ` Salil Mehta
2019-09-03 12:15                       ` Salil Mehta
2019-09-03 14:28                       ` Peter Zijlstra
2019-09-03 14:28                         ` Peter Zijlstra
2019-09-02 18:17             ` Ingo Molnar
2019-09-02 18:17               ` Ingo Molnar
2019-09-02 18:17               ` Ingo Molnar
2019-09-02 18:17               ` Ingo Molnar
2019-09-03  7:53               ` [PATCH] x86/mm: Fix cpumask_of_node() error condition Peter Zijlstra
2019-09-03  7:53                 ` Peter Zijlstra
2019-09-03  7:53                 ` Peter Zijlstra
2019-09-03  7:53                 ` Peter Zijlstra
2019-08-31  5:58 ` [PATCH v2 3/9] alpha: numa: check the node id consistently for alpha Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58 ` [PATCH v2 4/9] powerpc: numa: check the node id consistently for powerpc Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58 ` [PATCH v2 5/9] s390: numa: check the node id consistently for s390 Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-09-02  4:05   ` kbuild test robot
2019-09-02  4:05     ` kbuild test robot
2019-09-02  4:05     ` kbuild test robot
2019-09-02  4:05     ` kbuild test robot
2019-08-31  5:58 ` [PATCH v2 6/9] sh: numa: check the node id consistently for sh Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58 ` [PATCH v2 7/9] sparc64: numa: check the node id consistently for sparc64 Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  6:53   ` David Miller
2019-08-31  6:53     ` David Miller
2019-08-31  6:53     ` David Miller
2019-08-31  6:53     ` David Miller
2019-08-31  6:53     ` David Miller
2019-08-31  8:57     ` Yunsheng Lin
2019-08-31  8:57       ` Yunsheng Lin
2019-08-31  8:57       ` Yunsheng Lin
2019-08-31  8:57       ` Yunsheng Lin
2019-08-31  8:57       ` Yunsheng Lin
2019-08-31  8:57       ` Yunsheng Lin
2019-08-31 20:02       ` David Miller
2019-08-31 20:02         ` David Miller
2019-08-31 20:02         ` David Miller
2019-08-31 20:02         ` David Miller
2019-08-31 20:02         ` David Miller
2019-09-02  6:08         ` Yunsheng Lin
2019-09-02  6:08           ` Yunsheng Lin
2019-09-02  6:08           ` Yunsheng Lin
2019-09-02  6:08           ` Yunsheng Lin
2019-09-02  6:08           ` Yunsheng Lin
2019-09-02  6:08           ` Yunsheng Lin
2019-09-02 15:17           ` David Miller
2019-09-02 15:17             ` David Miller
2019-09-02 15:17             ` David Miller
2019-09-02 15:17             ` David Miller
2019-09-02 15:17             ` David Miller
2019-08-31  5:58 ` [PATCH v2 8/9] mips: numa: check the node id consistently for mips ip27 Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31 15:45   ` Paul Burton
2019-08-31 15:45     ` Paul Burton
2019-08-31 15:45     ` Paul Burton
2019-09-02  6:11     ` Yunsheng Lin
2019-09-02  6:11       ` Yunsheng Lin
2019-09-02  6:11       ` Yunsheng Lin
2019-08-31  5:58 ` [PATCH v2 9/9] mips: numa: check the node id consistently for mips loongson64 Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin
2019-08-31  5:58   ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06eee8d0-ce56-03da-30a5-6b07e989a5e0@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@de.ibm.com \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=catalin.marinas@arm.com \
    --cc=chenhc@lemote.com \
    --cc=dalias@libc.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jhogan@kernel.org \
    --cc=jiaxun.yang@flygoat.com \
    --cc=len.brown@intel.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mattst88@gmail.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=mwb@linux.vnet.ibm.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=paul.burton@mips.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=ralf@linux-mips.org \
    --cc=rjw@rjwysocki.net \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.ibm.com \
    --cc=rth@twiddle.net \
    --cc=sparclinux@vger.kernel.org \
    --cc=tbogendoerfer@suse.de \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.