linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation
@ 2018-10-25 20:26 Dan Williams
  2018-10-25 20:52 ` Dave Hansen
  2018-10-30 10:39 ` [tip:x86/urgent] " tip-bot for Dave Jiang
  0 siblings, 2 replies; 4+ messages in thread
From: Dan Williams @ 2018-10-25 20:26 UTC (permalink / raw)
  To: mingo
  Cc: x86, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Thomas Gleixner, Peter Zijlstra, Dave Hansen, stable, Dave Jiang,
	Alexander Duyck, linux-kernel

From: Dave Jiang <dave.jiang@intel.com>

The numa_emulation() routine in the 'uniform' case walks through all the
physical 'memblk' instances and divides them into N emulated nodes with
split_nodes_size_interleave_uniform(). As each physical node is consumed
it is removed from the physical memblk array in the
numa_remove_memblk_from() helper. Since
split_nodes_size_interleave_uniform() handles advancing the array as the
'memblk' is consumed it is expected that the base of the array is always
specified as the argument.

Otherwise, on multi-socket (> 2) configurations the uniform-split
capability can generate an invalid numa configuration leading to boot
failures with signatures like the following:

    rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
    Sending NMI from CPU 0 to CPUs 2:
    NMI backtrace for cpu 2
    CPU: 2 PID: 1332 Comm: pgdatinit0 Not tainted 4.19.0-rc8-next-20181019-baseline #59
    RIP: 0010:__init_single_page.isra.74+0x81/0x90
    [..]
    Call Trace:
     deferred_init_pages+0xaa/0xe3
     deferred_init_memmap+0x18f/0x318
     kthread+0xf8/0x130
     ? deferred_free_pages.isra.105+0xc9/0xc9
     ? kthread_stop+0x110/0x110
     ret_from_fork+0x35/0x40

Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 1f6a2c6d9f121 ("x86/numa_emulation: Introduce uniform split capability")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v2: https://lore.kernel.org/patchwork/patch/988541/

* Update the changelog with details from testing by Alex

 arch/x86/mm/numa_emulation.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index b54d52a2d00a..d71d72cf6c66 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -400,9 +400,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 		n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
 		ret = -1;
 		for_each_node_mask(i, physnode_mask) {
+			/*
+			 * The reason we pass in blk[0] is due to
+			 * numa_remove_memblk_from() called by
+			 * emu_setup_memblk() will delete entry 0
+			 * and then move everything else up in the pi.blk
+			 * array. Therefore we should always be looking
+			 * at blk[0].
+			 */
 			ret = split_nodes_size_interleave_uniform(&ei, &pi,
-					pi.blk[i].start, pi.blk[i].end, 0,
-					n, &pi.blk[i], nid);
+					pi.blk[0].start, pi.blk[0].end, 0,
+					n, &pi.blk[0], nid);
 			if (ret < 0)
 				break;
 			if (ret < n) {


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation
  2018-10-25 20:26 [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation Dan Williams
@ 2018-10-25 20:52 ` Dave Hansen
  2018-10-25 20:55   ` Dan Williams
  2018-10-30 10:39 ` [tip:x86/urgent] " tip-bot for Dave Jiang
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2018-10-25 20:52 UTC (permalink / raw)
  To: Dan Williams, mingo
  Cc: x86, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Thomas Gleixner, Peter Zijlstra, Dave Hansen, stable, Dave Jiang,
	Alexander Duyck, linux-kernel

On 10/25/18 1:26 PM, Dan Williams wrote:
> --- a/arch/x86/mm/numa_emulation.c
> +++ b/arch/x86/mm/numa_emulation.c
> @@ -400,9 +400,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
>  		n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
>  		ret = -1;
>  		for_each_node_mask(i, physnode_mask) {
> +			/*
> +			 * The reason we pass in blk[0] is due to
> +			 * numa_remove_memblk_from() called by
> +			 * emu_setup_memblk() will delete entry 0
> +			 * and then move everything else up in the pi.blk
> +			 * array. Therefore we should always be looking
> +			 * at blk[0].
> +			 */
>  			ret = split_nodes_size_interleave_uniform(&ei, &pi,
> -					pi.blk[i].start, pi.blk[i].end, 0,
> -					n, &pi.blk[i], nid);
> +					pi.blk[0].start, pi.blk[0].end, 0,
> +					n, &pi.blk[0], nid);

So, has this *ever* worked on a multi-socket configuration?  Or has it
just never been run on a multi-socket configuration?

Either way, nice changelog, and nice comments.  I'd have some minor nits
if you have to respin it, but otherwise:

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation
  2018-10-25 20:52 ` Dave Hansen
@ 2018-10-25 20:55   ` Dan Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2018-10-25 20:55 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, X86 ML, Borislav Petkov, H. Peter Anvin,
	Andy Lutomirski, Thomas Gleixner, Peter Zijlstra, Dave Hansen,
	stable, Dave Jiang, alexander.h.duyck, Linux Kernel Mailing List

On Thu, Oct 25, 2018 at 1:52 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 10/25/18 1:26 PM, Dan Williams wrote:
> > --- a/arch/x86/mm/numa_emulation.c
> > +++ b/arch/x86/mm/numa_emulation.c
> > @@ -400,9 +400,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
> >               n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
> >               ret = -1;
> >               for_each_node_mask(i, physnode_mask) {
> > +                     /*
> > +                      * The reason we pass in blk[0] is due to
> > +                      * numa_remove_memblk_from() called by
> > +                      * emu_setup_memblk() will delete entry 0
> > +                      * and then move everything else up in the pi.blk
> > +                      * array. Therefore we should always be looking
> > +                      * at blk[0].
> > +                      */
> >                       ret = split_nodes_size_interleave_uniform(&ei, &pi,
> > -                                     pi.blk[i].start, pi.blk[i].end, 0,
> > -                                     n, &pi.blk[i], nid);
> > +                                     pi.blk[0].start, pi.blk[0].end, 0,
> > +                                     n, &pi.blk[0], nid);
>
> So, has this *ever* worked on a multi-socket configuration?  Or has it
> just never been run on a multi-socket configuration?

It happened to work on 2-socket. We only saw issues when moving to
4-socket and above, and sometimes only a grey failure with an
odd-sized node not outright crash / boot failure.

> Either way, nice changelog, and nice comments.  I'd have some minor nits
> if you have to respin it, but otherwise:
>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:x86/urgent] x86/numa_emulation: Fix uniform-split numa emulation
  2018-10-25 20:26 [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation Dan Williams
  2018-10-25 20:52 ` Dave Hansen
@ 2018-10-30 10:39 ` tip-bot for Dave Jiang
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Dave Jiang @ 2018-10-30 10:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, dan.j.williams, peterz, tglx, luto, bp, mingo,
	linux-kernel, hpa, dave.jiang, alexander.h.duyck

Commit-ID:  c6ee7a548e2c291398b4f32c1f741c66b9f98e1c
Gitweb:     https://git.kernel.org/tip/c6ee7a548e2c291398b4f32c1f741c66b9f98e1c
Author:     Dave Jiang <dave.jiang@intel.com>
AuthorDate: Thu, 25 Oct 2018 13:26:45 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Oct 2018 11:36:43 +0100

x86/numa_emulation: Fix uniform-split numa emulation

The numa_emulation() routine in the 'uniform' case walks through all the
physical 'memblk' instances and divides them into N emulated nodes with
split_nodes_size_interleave_uniform(). As each physical node is consumed it
is removed from the physical memblk array in the numa_remove_memblk_from()
helper.

Since split_nodes_size_interleave_uniform() handles advancing the array as
the 'memblk' is consumed it is expected that the base of the array is
always specified as the argument.

Otherwise, on multi-socket (> 2) configurations the uniform-split
capability can generate an invalid numa configuration leading to boot
failures with signatures like the following:

    rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
    Sending NMI from CPU 0 to CPUs 2:
    NMI backtrace for cpu 2
    CPU: 2 PID: 1332 Comm: pgdatinit0 Not tainted 4.19.0-rc8-next-20181019-baseline #59
    RIP: 0010:__init_single_page.isra.74+0x81/0x90
    [..]
    Call Trace:
     deferred_init_pages+0xaa/0xe3
     deferred_init_memmap+0x18f/0x318
     kthread+0xf8/0x130
     ? deferred_free_pages.isra.105+0xc9/0xc9
     ? kthread_stop+0x110/0x110
     ret_from_fork+0x35/0x40

Fixes: 1f6a2c6d9f121 ("x86/numa_emulation: Introduce uniform split capability")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/154049911459.2685845.9210186007479774286.stgit@dwillia2-desk3.amr.corp.intel.com

---
 arch/x86/mm/numa_emulation.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index b54d52a2d00a..d71d72cf6c66 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -400,9 +400,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 		n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
 		ret = -1;
 		for_each_node_mask(i, physnode_mask) {
+			/*
+			 * The reason we pass in blk[0] is due to
+			 * numa_remove_memblk_from() called by
+			 * emu_setup_memblk() will delete entry 0
+			 * and then move everything else up in the pi.blk
+			 * array. Therefore we should always be looking
+			 * at blk[0].
+			 */
 			ret = split_nodes_size_interleave_uniform(&ei, &pi,
-					pi.blk[i].start, pi.blk[i].end, 0,
-					n, &pi.blk[i], nid);
+					pi.blk[0].start, pi.blk[0].end, 0,
+					n, &pi.blk[0], nid);
 			if (ret < 0)
 				break;
 			if (ret < n) {

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-30 10:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 20:26 [PATCH v3] x86/numa_emulation: Fix uniform-split numa emulation Dan Williams
2018-10-25 20:52 ` Dave Hansen
2018-10-25 20:55   ` Dan Williams
2018-10-30 10:39 ` [tip:x86/urgent] " tip-bot for Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).