All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/NUMA: correct off-by-1 in node map size calculation
@ 2022-09-27 14:14 Jan Beulich
  2022-09-29 14:57 ` Roger Pau Monné
  2022-09-30  2:03 ` Wei Chen
  0 siblings, 2 replies; 4+ messages in thread
From: Jan Beulich @ 2022-09-27 14:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

extract_lsb_from_nodes() accumulates "memtop" from all PDXes one past
the covered ranges. Hence the maximum address which can validly by used
to index the node map is one below this value, and we may currently set
up a node map with an unused (and never initialized) trailing entry. In
boundary cases this may also mean we dynamically allocate a page when
the static (64-entry) map would suffice.

While there also correct the comment ahead of the function, for it to
match the actual code: Linux commit 54413927f022 ("x86-64:
x86_64-make-the-numa-hash-function-nodemap-allocation fix fix") removed
the ORing in of the end address before we actually cloned their code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Really the shift value may end up needlessly small when there's
discontiguous memory. Within a gap, any address can be taken for the
node boundary, and hence neither the end of the lower range nor the
start of the higher range necessarily is the best address to use. For
example with these two node ranges (numbers are frame addresses)

[10000,17fff]
[28000,2ffff]

we'd calculate the shift as 15 when 16 or even 17 (because the start of
the 1st range can also be ignored) would do. I haven't tried to properly
prove it yet, but it looks to me as if the top bit of the XOR of lower
range (inclusive) end and higher range start would be what would want
accumulating (of course requiring the entries to be sorted, or to be
processed in address order). This would then "naturally" exclude lowest
range start and highest range end.

--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -110,7 +110,7 @@ static int __init allocate_cachealigned_
 }
 
 /*
- * The LSB of all start and end addresses in the node map is the value of the
+ * The LSB of all start addresses in the node map is the value of the
  * maximum possible shift.
  */
 static int __init extract_lsb_from_nodes(const struct node *nodes,
@@ -135,7 +135,7 @@ static int __init extract_lsb_from_nodes
         i = BITS_PER_LONG - 1;
     else
         i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
-    memnodemapsize = (memtop >> i) + 1;
+    memnodemapsize = ((memtop - 1) >> i) + 1;
     return i;
 }
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/NUMA: correct off-by-1 in node map size calculation
  2022-09-27 14:14 [PATCH] x86/NUMA: correct off-by-1 in node map size calculation Jan Beulich
@ 2022-09-29 14:57 ` Roger Pau Monné
  2022-09-30  2:03 ` Wei Chen
  1 sibling, 0 replies; 4+ messages in thread
From: Roger Pau Monné @ 2022-09-29 14:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Tue, Sep 27, 2022 at 04:14:21PM +0200, Jan Beulich wrote:
> extract_lsb_from_nodes() accumulates "memtop" from all PDXes one past
> the covered ranges. Hence the maximum address which can validly by used
> to index the node map is one below this value, and we may currently set
> up a node map with an unused (and never initialized) trailing entry. In
> boundary cases this may also mean we dynamically allocate a page when
> the static (64-entry) map would suffice.
> 
> While there also correct the comment ahead of the function, for it to
> match the actual code: Linux commit 54413927f022 ("x86-64:
> x86_64-make-the-numa-hash-function-nodemap-allocation fix fix") removed
> the ORing in of the end address before we actually cloned their code.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> Really the shift value may end up needlessly small when there's
> discontiguous memory. Within a gap, any address can be taken for the
> node boundary, and hence neither the end of the lower range nor the
> start of the higher range necessarily is the best address to use. For
> example with these two node ranges (numbers are frame addresses)
> 
> [10000,17fff]
> [28000,2ffff]
> 
> we'd calculate the shift as 15 when 16 or even 17 (because the start of
> the 1st range can also be ignored) would do. I haven't tried to properly
> prove it yet, but it looks to me as if the top bit of the XOR of lower
> range (inclusive) end and higher range start would be what would want
> accumulating (of course requiring the entries to be sorted, or to be
> processed in address order). This would then "naturally" exclude lowest
> range start and highest range end.

I'm not familiar with the logic in the NUMA code, seems like a
possible optimization.  It might be good to include in which way a
bigger shift is beneficial.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] x86/NUMA: correct off-by-1 in node map size calculation
  2022-09-27 14:14 [PATCH] x86/NUMA: correct off-by-1 in node map size calculation Jan Beulich
  2022-09-29 14:57 ` Roger Pau Monné
@ 2022-09-30  2:03 ` Wei Chen
  1 sibling, 0 replies; 4+ messages in thread
From: Wei Chen @ 2022-09-30  2:03 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Hi Jan,

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan
> Beulich
> Sent: 2022年9月27日 22:14
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger
> Pau Monné <roger.pau@citrix.com>
> Subject: [PATCH] x86/NUMA: correct off-by-1 in node map size calculation
> 
> extract_lsb_from_nodes() accumulates "memtop" from all PDXes one past
> the covered ranges. Hence the maximum address which can validly by used
> to index the node map is one below this value, and we may currently set
> up a node map with an unused (and never initialized) trailing entry. In
> boundary cases this may also mean we dynamically allocate a page when
> the static (64-entry) map would suffice.
> 
> While there also correct the comment ahead of the function, for it to
> match the actual code: Linux commit 54413927f022 ("x86-64:
> x86_64-make-the-numa-hash-function-nodemap-allocation fix fix") removed
> the ORing in of the end address before we actually cloned their code.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Really the shift value may end up needlessly small when there's
> discontiguous memory. Within a gap, any address can be taken for the
> node boundary, and hence neither the end of the lower range nor the
> start of the higher range necessarily is the best address to use. For
> example with these two node ranges (numbers are frame addresses)
> 
> [10000,17fff]
> [28000,2ffff]
> 
> we'd calculate the shift as 15 when 16 or even 17 (because the start of
> the 1st range can also be ignored) would do. I haven't tried to properly
> prove it yet, but it looks to me as if the top bit of the XOR of lower
> range (inclusive) end and higher range start would be what would want
> accumulating (of course requiring the entries to be sorted, or to be
> processed in address order). This would then "naturally" exclude lowest
> range start and highest range end.
> 
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -110,7 +110,7 @@ static int __init allocate_cachealigned_
>  }
> 
>  /*
> - * The LSB of all start and end addresses in the node map is the value of
> the
> + * The LSB of all start addresses in the node map is the value of the
>   * maximum possible shift.
>   */
>  static int __init extract_lsb_from_nodes(const struct node *nodes,
> @@ -135,7 +135,7 @@ static int __init extract_lsb_from_nodes
>          i = BITS_PER_LONG - 1;
>      else
>          i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
> -    memnodemapsize = (memtop >> i) + 1;
> +    memnodemapsize = ((memtop - 1) >> i) + 1;
>      return i;
>  }
> 

Thanks for this fix.

Reviewed-by: Wei Chen <Wei.Chen@arm.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] x86/NUMA: correct off-by-1 in node map size calculation
@ 2022-09-27 14:13 Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2022-09-27 14:13 UTC (permalink / raw)
  To: xen-devel

extract_lsb_from_nodes() accumulates "memtop" from all PDXes one past
the covered ranges. Hence the maximum address which can validly by used
to index the node map is one below this value, and we may currently set
up a node map with an unused (and never initialized) trailing entry. In
boundary cases this may also mean we dynamically allocate a page when
the static (64-entry) map would suffice.

While there also correct the comment ahead of the function, for it to
match the actual code: Linux commit 54413927f022 ("x86-64:
x86_64-make-the-numa-hash-function-nodemap-allocation fix fix") removed
the ORing in of the end address before we actually cloned their code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Really the shift value may end up needlessly small when there's
discontiguous memory. Within a gap, any address can be taken for the
node boundary, and hence neither the end of the lower range nor the
start of the higher range necessarily is the best address to use. For
example with these two node ranges (numbers are frame addresses)

[10000,17fff]
[28000,2ffff]

we'd calculate the shift as 15 when 16 or even 17 (because the start of
the 1st range can also be ignored) would do. I haven't tried to properly
prove it yet, but it looks to me as if the top bit of the XOR of lower
range (inclusive) end and higher range start would be what would want
accumulating (of course requiring the entries to be sorted, or to be
processed in address order). This would then "naturally" exclude lowest
range start and highest range end.

--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -110,7 +110,7 @@ static int __init allocate_cachealigned_
 }
 
 /*
- * The LSB of all start and end addresses in the node map is the value of the
+ * The LSB of all start addresses in the node map is the value of the
  * maximum possible shift.
  */
 static int __init extract_lsb_from_nodes(const struct node *nodes,
@@ -135,7 +135,7 @@ static int __init extract_lsb_from_nodes
         i = BITS_PER_LONG - 1;
     else
         i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
-    memnodemapsize = (memtop >> i) + 1;
+    memnodemapsize = ((memtop - 1) >> i) + 1;
     return i;
 }
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-09-30  2:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 14:14 [PATCH] x86/NUMA: correct off-by-1 in node map size calculation Jan Beulich
2022-09-29 14:57 ` Roger Pau Monné
2022-09-30  2:03 ` Wei Chen
  -- strict thread matches above, loose matches on Subject: below --
2022-09-27 14:13 Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.