linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
@ 2019-10-30  9:34 Yunsheng Lin
  2019-10-30 10:14 ` Peter Zijlstra
  2019-10-30 10:20 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Yunsheng Lin @ 2019-10-30  9:34 UTC (permalink / raw)
  To: catalin.marinas, will, mingo, bp, rth, ink, mattst88, benh,
	paulus, mpe, heiko.carstens, gor, borntraeger, ysato, dalias,
	davem, ralf, paul.burton, jhogan, jiaxun.yang, chenhc
  Cc: akpm, rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	peterz, len.brown, axboe, dledford, jeffrey.t.kirsher,
	linux-alpha, naveen.n.rao, mwb, linuxppc-dev, linux-s390,
	linux-sh, sparclinux, tbogendoerfer, linux-mips, rafael, mhocko,
	gregkh, bhelgaas, linux-pci, rjw, lenb, linux-acpi

When passing the return value of dev_to_node() to cpumask_of_node()
without checking if the device's node id is NUMA_NO_NODE, there is
global-out-of-bounds detected by KASAN.

From the discussion [1], NUMA_NO_NODE really means no node affinity,
which also means all cpus should be usable. So the cpumask_of_node()
should always return all cpus online when user passes the node id as
NUMA_NO_NODE, just like similar semantic that page allocator handles
NUMA_NO_NODE.

But we cannot really copy the page allocator logic. Simply because the
page allocator doesn't enforce the near node affinity. It just picks it
up as a preferred node but then it is free to fallback to any other numa
node. This is not the case here and node_to_cpumask_map will only restrict
to the particular node's cpus which would have really non deterministic
behavior depending on where the code is executed. So in fact we really
want to return cpu_online_mask for NUMA_NO_NODE.

Also there is a debugging version of node_to_cpumask_map() for x86 and
arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().

[1] https://lkml.org/lkml/2019/9/11/66
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits
---
V7: replace -1 with NUMA_NO_NODE for mips ip27 as suggested by Paul.
V6: Drop the cpu_all_mask -> cpu_online_mask change for it seems a
    little controversial, may need deeper investigation, and rebased
    on the latest linux-next.
V5: Drop unsigned "fix" change for x86/arm64, and change comment log
    according to Michal's comment.
V4: Have all these changes in a single patch.
V3: Change to only handle NUMA_NO_NODE, and return cpu_online_mask
    for NUMA_NO_NODE case, and change the commit log to better justify
    the change.
V2: make the node id checking change to other arches too.
---
 arch/arm64/include/asm/numa.h                    | 3 +++
 arch/arm64/mm/numa.c                             | 3 +++
 arch/mips/include/asm/mach-ip27/topology.h       | 2 +-
 arch/mips/include/asm/mach-loongson64/topology.h | 4 +++-
 arch/s390/include/asm/topology.h                 | 3 +++
 arch/x86/include/asm/topology.h                  | 3 +++
 arch/x86/mm/numa.c                               | 3 +++
 7 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 626ad01..c8a4b31 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -25,6 +25,9 @@ const struct cpumask *cpumask_of_node(int node);
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
 static inline const struct cpumask *cpumask_of_node(int node)
 {
+	if (node == NUMA_NO_NODE)
+		return cpu_online_mask;
+
 	return node_to_cpumask_map[node];
 }
 #endif
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 4decf16..5ae7eea 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -46,6 +46,9 @@ EXPORT_SYMBOL(node_to_cpumask_map);
  */
 const struct cpumask *cpumask_of_node(int node)
 {
+	if (node == NUMA_NO_NODE)
+		return cpu_online_mask;
+
 	if (WARN_ON(node >= nr_node_ids))
 		return cpu_none_mask;
 
diff --git a/arch/mips/include/asm/mach-ip27/topology.h b/arch/mips/include/asm/mach-ip27/topology.h
index 965f079..db293cf 100644
--- a/arch/mips/include/asm/mach-ip27/topology.h
+++ b/arch/mips/include/asm/mach-ip27/topology.h
@@ -15,7 +15,7 @@ struct cpuinfo_ip27 {
 extern struct cpuinfo_ip27 sn_cpu_info[NR_CPUS];
 
 #define cpu_to_node(cpu)	(sn_cpu_info[(cpu)].p_nodeid)
-#define cpumask_of_node(node)	((node) == -1 ?				\
+#define cpumask_of_node(node)	((node) == NUMA_NO_NODE ?		\
 				 cpu_all_mask :				\
 				 &hub_data(node)->h_cpus)
 struct pci_bus;
diff --git a/arch/mips/include/asm/mach-loongson64/topology.h b/arch/mips/include/asm/mach-loongson64/topology.h
index 7ff819a..e78daa6 100644
--- a/arch/mips/include/asm/mach-loongson64/topology.h
+++ b/arch/mips/include/asm/mach-loongson64/topology.h
@@ -5,7 +5,9 @@
 #ifdef CONFIG_NUMA
 
 #define cpu_to_node(cpu)	(cpu_logical_map(cpu) >> 2)
-#define cpumask_of_node(node)	(&__node_data[(node)]->cpumask)
+#define cpumask_of_node(node)	((node) == NUMA_NO_NODE ?		\
+				 cpu_online_mask :			\
+				 &__node_data[(node)]->cpumask)
 
 struct pci_bus;
 extern int pcibus_to_node(struct pci_bus *);
diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index cca406f..1bd2e73 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -78,6 +78,9 @@ static inline int cpu_to_node(int cpu)
 #define cpumask_of_node cpumask_of_node
 static inline const struct cpumask *cpumask_of_node(int node)
 {
+	if (node == NUMA_NO_NODE)
+		return cpu_online_mask;
+
 	return &node_to_cpumask_map[node];
 }
 
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 4b14d23..7fa82e1 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -69,6 +69,9 @@ extern const struct cpumask *cpumask_of_node(int node);
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
 static inline const struct cpumask *cpumask_of_node(int node)
 {
+	if (node == NUMA_NO_NODE)
+		return cpu_online_mask;
+
 	return node_to_cpumask_map[node];
 }
 #endif
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 4123100e..9859acb 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -861,6 +861,9 @@ void numa_remove_cpu(int cpu)
  */
 const struct cpumask *cpumask_of_node(int node)
 {
+	if (node == NUMA_NO_NODE)
+		return cpu_online_mask;
+
 	if ((unsigned)node >= nr_node_ids) {
 		printk(KERN_WARNING
 			"cpumask_of_node(%d): (unsigned)node >= nr_node_ids(%u)\n",
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30  9:34 [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware Yunsheng Lin
@ 2019-10-30 10:14 ` Peter Zijlstra
  2019-10-30 10:22   ` Michal Hocko
  2019-10-31  3:26   ` Yunsheng Lin
  2019-10-30 10:20 ` Michal Hocko
  1 sibling, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-10-30 10:14 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: catalin.marinas, will, mingo, bp, rth, ink, mattst88, benh,
	paulus, mpe, heiko.carstens, gor, borntraeger, ysato, dalias,
	davem, ralf, paul.burton, jhogan, jiaxun.yang, chenhc, akpm,
	rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, mhocko, gregkh,
	bhelgaas, linux-pci, rjw, lenb, linux-acpi

On Wed, Oct 30, 2019 at 05:34:28PM +0800, Yunsheng Lin wrote:
> When passing the return value of dev_to_node() to cpumask_of_node()
> without checking if the device's node id is NUMA_NO_NODE, there is
> global-out-of-bounds detected by KASAN.
> 
> From the discussion [1], NUMA_NO_NODE really means no node affinity,
> which also means all cpus should be usable. So the cpumask_of_node()
> should always return all cpus online when user passes the node id as
> NUMA_NO_NODE, just like similar semantic that page allocator handles
> NUMA_NO_NODE.
> 
> But we cannot really copy the page allocator logic. Simply because the
> page allocator doesn't enforce the near node affinity. It just picks it
> up as a preferred node but then it is free to fallback to any other numa
> node. This is not the case here and node_to_cpumask_map will only restrict
> to the particular node's cpus which would have really non deterministic
> behavior depending on where the code is executed. So in fact we really
> want to return cpu_online_mask for NUMA_NO_NODE.
> 
> Also there is a debugging version of node_to_cpumask_map() for x86 and
> arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
> patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
> 
> [1] https://lkml.org/lkml/2019/9/11/66
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits

Still:

Nacked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30  9:34 [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware Yunsheng Lin
  2019-10-30 10:14 ` Peter Zijlstra
@ 2019-10-30 10:20 ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2019-10-30 10:20 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: catalin.marinas, will, mingo, bp, rth, ink, mattst88, benh,
	paulus, mpe, heiko.carstens, gor, borntraeger, ysato, dalias,
	davem, ralf, paul.burton, jhogan, jiaxun.yang, chenhc, akpm,
	rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	peterz, len.brown, axboe, dledford, jeffrey.t.kirsher,
	linux-alpha, naveen.n.rao, mwb, linuxppc-dev, linux-s390,
	linux-sh, sparclinux, tbogendoerfer, linux-mips, rafael, gregkh,
	bhelgaas, linux-pci, rjw, lenb, linux-acpi

On Wed 30-10-19 17:34:28, Yunsheng Lin wrote:
> When passing the return value of dev_to_node() to cpumask_of_node()
> without checking if the device's node id is NUMA_NO_NODE, there is
> global-out-of-bounds detected by KASAN.
> 
> >From the discussion [1], NUMA_NO_NODE really means no node affinity,
> which also means all cpus should be usable. So the cpumask_of_node()
> should always return all cpus online when user passes the node id as
> NUMA_NO_NODE, just like similar semantic that page allocator handles
> NUMA_NO_NODE.
> 
> But we cannot really copy the page allocator logic. Simply because the
> page allocator doesn't enforce the near node affinity. It just picks it
> up as a preferred node but then it is free to fallback to any other numa
> node. This is not the case here and node_to_cpumask_map will only restrict
> to the particular node's cpus which would have really non deterministic
> behavior depending on where the code is executed. So in fact we really
> want to return cpu_online_mask for NUMA_NO_NODE.
> 
> Also there is a debugging version of node_to_cpumask_map() for x86 and
> arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
> patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
> 
> [1] https://lkml.org/lkml/2019/9/11/66

Please do not use lkml.org links. They tend to break quite often.
Use http://lkml.kernel.org/r/$msg_id or lore.kernel.org

> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits
> ---
> V7: replace -1 with NUMA_NO_NODE for mips ip27 as suggested by Paul.
> V6: Drop the cpu_all_mask -> cpu_online_mask change for it seems a
>     little controversial, may need deeper investigation, and rebased
>     on the latest linux-next.
> V5: Drop unsigned "fix" change for x86/arm64, and change comment log
>     according to Michal's comment.
> V4: Have all these changes in a single patch.
> V3: Change to only handle NUMA_NO_NODE, and return cpu_online_mask
>     for NUMA_NO_NODE case, and change the commit log to better justify
>     the change.
> V2: make the node id checking change to other arches too.
> ---
>  arch/arm64/include/asm/numa.h                    | 3 +++
>  arch/arm64/mm/numa.c                             | 3 +++
>  arch/mips/include/asm/mach-ip27/topology.h       | 2 +-
>  arch/mips/include/asm/mach-loongson64/topology.h | 4 +++-
>  arch/s390/include/asm/topology.h                 | 3 +++
>  arch/x86/include/asm/topology.h                  | 3 +++
>  arch/x86/mm/numa.c                               | 3 +++
>  7 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> index 626ad01..c8a4b31 100644
> --- a/arch/arm64/include/asm/numa.h
> +++ b/arch/arm64/include/asm/numa.h
> @@ -25,6 +25,9 @@ const struct cpumask *cpumask_of_node(int node);
>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>  static inline const struct cpumask *cpumask_of_node(int node)
>  {
> +	if (node == NUMA_NO_NODE)
> +		return cpu_online_mask;
> +
>  	return node_to_cpumask_map[node];
>  }
>  #endif
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 4decf16..5ae7eea 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -46,6 +46,9 @@ EXPORT_SYMBOL(node_to_cpumask_map);
>   */
>  const struct cpumask *cpumask_of_node(int node)
>  {
> +	if (node == NUMA_NO_NODE)
> +		return cpu_online_mask;
> +
>  	if (WARN_ON(node >= nr_node_ids))
>  		return cpu_none_mask;
>  
> diff --git a/arch/mips/include/asm/mach-ip27/topology.h b/arch/mips/include/asm/mach-ip27/topology.h
> index 965f079..db293cf 100644
> --- a/arch/mips/include/asm/mach-ip27/topology.h
> +++ b/arch/mips/include/asm/mach-ip27/topology.h
> @@ -15,7 +15,7 @@ struct cpuinfo_ip27 {
>  extern struct cpuinfo_ip27 sn_cpu_info[NR_CPUS];
>  
>  #define cpu_to_node(cpu)	(sn_cpu_info[(cpu)].p_nodeid)
> -#define cpumask_of_node(node)	((node) == -1 ?				\
> +#define cpumask_of_node(node)	((node) == NUMA_NO_NODE ?		\
>  				 cpu_all_mask :				\
>  				 &hub_data(node)->h_cpus)
>  struct pci_bus;
> diff --git a/arch/mips/include/asm/mach-loongson64/topology.h b/arch/mips/include/asm/mach-loongson64/topology.h
> index 7ff819a..e78daa6 100644
> --- a/arch/mips/include/asm/mach-loongson64/topology.h
> +++ b/arch/mips/include/asm/mach-loongson64/topology.h
> @@ -5,7 +5,9 @@
>  #ifdef CONFIG_NUMA
>  
>  #define cpu_to_node(cpu)	(cpu_logical_map(cpu) >> 2)
> -#define cpumask_of_node(node)	(&__node_data[(node)]->cpumask)
> +#define cpumask_of_node(node)	((node) == NUMA_NO_NODE ?		\
> +				 cpu_online_mask :			\
> +				 &__node_data[(node)]->cpumask)
>  
>  struct pci_bus;
>  extern int pcibus_to_node(struct pci_bus *);
> diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
> index cca406f..1bd2e73 100644
> --- a/arch/s390/include/asm/topology.h
> +++ b/arch/s390/include/asm/topology.h
> @@ -78,6 +78,9 @@ static inline int cpu_to_node(int cpu)
>  #define cpumask_of_node cpumask_of_node
>  static inline const struct cpumask *cpumask_of_node(int node)
>  {
> +	if (node == NUMA_NO_NODE)
> +		return cpu_online_mask;
> +
>  	return &node_to_cpumask_map[node];
>  }
>  
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index 4b14d23..7fa82e1 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -69,6 +69,9 @@ extern const struct cpumask *cpumask_of_node(int node);
>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>  static inline const struct cpumask *cpumask_of_node(int node)
>  {
> +	if (node == NUMA_NO_NODE)
> +		return cpu_online_mask;
> +
>  	return node_to_cpumask_map[node];
>  }
>  #endif
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 4123100e..9859acb 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -861,6 +861,9 @@ void numa_remove_cpu(int cpu)
>   */
>  const struct cpumask *cpumask_of_node(int node)
>  {
> +	if (node == NUMA_NO_NODE)
> +		return cpu_online_mask;
> +
>  	if ((unsigned)node >= nr_node_ids) {
>  		printk(KERN_WARNING
>  			"cpumask_of_node(%d): (unsigned)node >= nr_node_ids(%u)\n",
> -- 
> 2.8.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30 10:14 ` Peter Zijlstra
@ 2019-10-30 10:22   ` Michal Hocko
  2019-10-30 10:28     ` Peter Zijlstra
  2019-10-31  3:26   ` Yunsheng Lin
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2019-10-30 10:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yunsheng Lin, catalin.marinas, will, mingo, bp, rth, ink,
	mattst88, benh, paulus, mpe, heiko.carstens, gor, borntraeger,
	ysato, dalias, davem, ralf, paul.burton, jhogan, jiaxun.yang,
	chenhc, akpm, rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, gregkh, bhelgaas,
	linux-pci, rjw, lenb, linux-acpi

On Wed 30-10-19 11:14:49, Peter Zijlstra wrote:
> On Wed, Oct 30, 2019 at 05:34:28PM +0800, Yunsheng Lin wrote:
> > When passing the return value of dev_to_node() to cpumask_of_node()
> > without checking if the device's node id is NUMA_NO_NODE, there is
> > global-out-of-bounds detected by KASAN.
> > 
> > From the discussion [1], NUMA_NO_NODE really means no node affinity,
> > which also means all cpus should be usable. So the cpumask_of_node()
> > should always return all cpus online when user passes the node id as
> > NUMA_NO_NODE, just like similar semantic that page allocator handles
> > NUMA_NO_NODE.
> > 
> > But we cannot really copy the page allocator logic. Simply because the
> > page allocator doesn't enforce the near node affinity. It just picks it
> > up as a preferred node but then it is free to fallback to any other numa
> > node. This is not the case here and node_to_cpumask_map will only restrict
> > to the particular node's cpus which would have really non deterministic
> > behavior depending on where the code is executed. So in fact we really
> > want to return cpu_online_mask for NUMA_NO_NODE.
> > 
> > Also there is a debugging version of node_to_cpumask_map() for x86 and
> > arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
> > patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
> > 
> > [1] https://lkml.org/lkml/2019/9/11/66
> > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits
> 
> Still:
> 
> Nacked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Do you have any other proposal that doesn't make any wild guesses about
which node to use instead of the undefined one?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30 10:22   ` Michal Hocko
@ 2019-10-30 10:28     ` Peter Zijlstra
  2019-10-30 11:33       ` Michal Hocko
  2019-10-30 12:28       ` Qian Cai
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-10-30 10:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Yunsheng Lin, catalin.marinas, will, mingo, bp, rth, ink,
	mattst88, benh, paulus, mpe, heiko.carstens, gor, borntraeger,
	ysato, dalias, davem, ralf, paul.burton, jhogan, jiaxun.yang,
	chenhc, akpm, rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, gregkh, bhelgaas,
	linux-pci, rjw, lenb, linux-acpi

On Wed, Oct 30, 2019 at 11:22:29AM +0100, Michal Hocko wrote:
> On Wed 30-10-19 11:14:49, Peter Zijlstra wrote:
> > On Wed, Oct 30, 2019 at 05:34:28PM +0800, Yunsheng Lin wrote:
> > > When passing the return value of dev_to_node() to cpumask_of_node()
> > > without checking if the device's node id is NUMA_NO_NODE, there is
> > > global-out-of-bounds detected by KASAN.
> > > 
> > > From the discussion [1], NUMA_NO_NODE really means no node affinity,
> > > which also means all cpus should be usable. So the cpumask_of_node()
> > > should always return all cpus online when user passes the node id as
> > > NUMA_NO_NODE, just like similar semantic that page allocator handles
> > > NUMA_NO_NODE.
> > > 
> > > But we cannot really copy the page allocator logic. Simply because the
> > > page allocator doesn't enforce the near node affinity. It just picks it
> > > up as a preferred node but then it is free to fallback to any other numa
> > > node. This is not the case here and node_to_cpumask_map will only restrict
> > > to the particular node's cpus which would have really non deterministic
> > > behavior depending on where the code is executed. So in fact we really
> > > want to return cpu_online_mask for NUMA_NO_NODE.
> > > 
> > > Also there is a debugging version of node_to_cpumask_map() for x86 and
> > > arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
> > > patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
> > > 
> > > [1] https://lkml.org/lkml/2019/9/11/66
> > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> > > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits
> > 
> > Still:
> > 
> > Nacked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Do you have any other proposal that doesn't make any wild guesses about
> which node to use instead of the undefined one?

It only makes 'wild' guesses when the BIOS is shit and it complains
about that.

Or do you like you BIOS broken?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30 10:28     ` Peter Zijlstra
@ 2019-10-30 11:33       ` Michal Hocko
  2019-10-30 12:28       ` Qian Cai
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2019-10-30 11:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yunsheng Lin, catalin.marinas, will, mingo, bp, rth, ink,
	mattst88, benh, paulus, mpe, heiko.carstens, gor, borntraeger,
	ysato, dalias, davem, ralf, paul.burton, jhogan, jiaxun.yang,
	chenhc, akpm, rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, gregkh, bhelgaas,
	linux-pci, rjw, lenb, linux-acpi

On Wed 30-10-19 11:28:00, Peter Zijlstra wrote:
> On Wed, Oct 30, 2019 at 11:22:29AM +0100, Michal Hocko wrote:
> > On Wed 30-10-19 11:14:49, Peter Zijlstra wrote:
> > > On Wed, Oct 30, 2019 at 05:34:28PM +0800, Yunsheng Lin wrote:
> > > > When passing the return value of dev_to_node() to cpumask_of_node()
> > > > without checking if the device's node id is NUMA_NO_NODE, there is
> > > > global-out-of-bounds detected by KASAN.
> > > > 
> > > > From the discussion [1], NUMA_NO_NODE really means no node affinity,
> > > > which also means all cpus should be usable. So the cpumask_of_node()
> > > > should always return all cpus online when user passes the node id as
> > > > NUMA_NO_NODE, just like similar semantic that page allocator handles
> > > > NUMA_NO_NODE.
> > > > 
> > > > But we cannot really copy the page allocator logic. Simply because the
> > > > page allocator doesn't enforce the near node affinity. It just picks it
> > > > up as a preferred node but then it is free to fallback to any other numa
> > > > node. This is not the case here and node_to_cpumask_map will only restrict
> > > > to the particular node's cpus which would have really non deterministic
> > > > behavior depending on where the code is executed. So in fact we really
> > > > want to return cpu_online_mask for NUMA_NO_NODE.
> > > > 
> > > > Also there is a debugging version of node_to_cpumask_map() for x86 and
> > > > arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
> > > > patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
> > > > 
> > > > [1] https://lkml.org/lkml/2019/9/11/66
> > > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> > > > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > > Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits
> > > 
> > > Still:
> > > 
> > > Nacked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > Do you have any other proposal that doesn't make any wild guesses about
> > which node to use instead of the undefined one?
> 
> It only makes 'wild' guesses when the BIOS is shit and it complains
> about that.

I really do not see how this is any better than simply using the online
cpu mask in the same "broken" situation. We are effectivelly talking
about a suboptimal path for suboptimal setups. I haven't heard any
actual technical argument why cpu_online_mask is any worse than adding
some sort of failover guessing which node to use as a replacement.

I completely do you point about complaining loud about broken BIOS/fw.
It seems we just disagree where we should workaround those issues
because as of now we simply do generate semi random behavior because of
an uninitialized memory access.

> Or do you like you BIOS broken?

I do not see anything like that in my response nor in my previous
communication. Moreover a patch to warn about this should be on the way
to get merged AFAIK.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30 10:28     ` Peter Zijlstra
  2019-10-30 11:33       ` Michal Hocko
@ 2019-10-30 12:28       ` Qian Cai
  1 sibling, 0 replies; 8+ messages in thread
From: Qian Cai @ 2019-10-30 12:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michal Hocko, Yunsheng Lin, catalin.marinas, will, mingo, bp,
	rth, ink, mattst88, benh, paulus, mpe, heiko.carstens, gor,
	borntraeger, ysato, dalias, davem, ralf, paul.burton, jhogan,
	jiaxun.yang, chenhc, akpm, rppt, anshuman.khandual, tglx,
	robin.murphy, linux-arm-kernel, linux-kernel, hpa, x86,
	dave.hansen, luto, len.brown, axboe, dledford, jeffrey.t.kirsher,
	linux-alpha, naveen.n.rao, mwb, linuxppc-dev, linux-s390,
	linux-sh, sparclinux, tbogendoerfer, linux-mips, rafael, gregkh,
	bhelgaas, linux-pci, rjw, lenb, linux-acpi



> On Oct 30, 2019, at 6:28 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> It only makes 'wild' guesses when the BIOS is shit and it complains
> about that.
> 
> Or do you like you BIOS broken?

Agree. It is the garbage in and garbage out. No need to complicate the existing code further.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-30 10:14 ` Peter Zijlstra
  2019-10-30 10:22   ` Michal Hocko
@ 2019-10-31  3:26   ` Yunsheng Lin
  1 sibling, 0 replies; 8+ messages in thread
From: Yunsheng Lin @ 2019-10-31  3:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: catalin.marinas, will, mingo, bp, rth, ink, mattst88, benh,
	paulus, mpe, heiko.carstens, gor, borntraeger, ysato, dalias,
	davem, ralf, paul.burton, jhogan, jiaxun.yang, chenhc, akpm,
	rppt, anshuman.khandual, tglx, cai, robin.murphy,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, mhocko, gregkh,
	bhelgaas, linux-pci, rjw, lenb, linux-acpi

On 2019/10/30 18:14, Peter Zijlstra wrote:
> On Wed, Oct 30, 2019 at 05:34:28PM +0800, Yunsheng Lin wrote:
>> When passing the return value of dev_to_node() to cpumask_of_node()
>> without checking if the device's node id is NUMA_NO_NODE, there is
>> global-out-of-bounds detected by KASAN.
>>
>> From the discussion [1], NUMA_NO_NODE really means no node affinity,
>> which also means all cpus should be usable. So the cpumask_of_node()
>> should always return all cpus online when user passes the node id as
>> NUMA_NO_NODE, just like similar semantic that page allocator handles
>> NUMA_NO_NODE.
>>
>> But we cannot really copy the page allocator logic. Simply because the
>> page allocator doesn't enforce the near node affinity. It just picks it
>> up as a preferred node but then it is free to fallback to any other numa
>> node. This is not the case here and node_to_cpumask_map will only restrict
>> to the particular node's cpus which would have really non deterministic
>> behavior depending on where the code is executed. So in fact we really
>> want to return cpu_online_mask for NUMA_NO_NODE.
>>
>> Also there is a debugging version of node_to_cpumask_map() for x86 and
>> arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
>> patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
>>
>> [1] https://lkml.org/lkml/2019/9/11/66
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> Suggested-by: Michal Hocko <mhocko@kernel.org>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Acked-by: Paul Burton <paul.burton@mips.com> # MIPS bits
> 
> Still:
> 
> Nacked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

It seems I still misunderstood your meaning by "We must not silently accept
NO_NODE there" in [1].

I am not sure if there is still disagreement that the NO_NODE state for
dev->numa_node should exist at all.

From the previous disscussion [2], you seem to propose to do "wild guess" or
"fixup" for all devices(including virtual and physcial) with NO_NODE, which means
the NO_NODE is needed anymore and should be removed when the "wild guess" or "fixup"
is done. So maybe the reason for your nack here it is that there should be no other
NO_NODE handling or fixing related to NO_NODE before the "wild guess" or "fixup"
process is finished, so making node_to_cpumask_map() NUMA_NO_NODE aware is unnecessary.

Or your reason for the nack is still specific to the pcie device without a numa node,
the "wild guess" need to be done for this case before making node_to_cpumask_map()
NUMA_NO_NODE?

Please help to clarify the reason for nack. Or is there still some other reason for the
nack I missed from the previous disscussion?

Thanks.

[1] https://lore.kernel.org/lkml/20191011111539.GX2311@hirez.programming.kicks-ass.net/
[2] https://lore.kernel.org/lkml/20191014094912.GY2311@hirez.programming.kicks-ass.net/
> 
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-31  3:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30  9:34 [PATCH v7] numa: make node_to_cpumask_map() NUMA_NO_NODE aware Yunsheng Lin
2019-10-30 10:14 ` Peter Zijlstra
2019-10-30 10:22   ` Michal Hocko
2019-10-30 10:28     ` Peter Zijlstra
2019-10-30 11:33       ` Michal Hocko
2019-10-30 12:28       ` Qian Cai
2019-10-31  3:26   ` Yunsheng Lin
2019-10-30 10:20 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).