linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: fix the initialization of physnode_map
@ 2014-01-31 10:05 Petr Tesarik
  2014-01-31 21:02 ` David Rientjes
  2014-01-31 21:14 ` Dave Hansen
  0 siblings, 2 replies; 5+ messages in thread
From: Petr Tesarik @ 2014-01-31 10:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Jiang Liu, Andrew Morton, Dave Hansen, linux-kernel

With DISCONTIGMEM, the mapping between a pfn and its owning node is
initialized using data provided by the BIOS or from the command line.
However, the initialization may fail if the extents are not aligned
to section boundary (64M).

The symptom of this bug is an early boot failure in pfn_to_page(),
as it tries to access NODE_DATA(__nid) using index from an unitialized
element of the physnode_map[] array.

While the bug is always present, it is more likely to be hit in kdump
kernels on large machines, because:

1. The memory map for a kdump kernel is specified as exactmap, and
   exactmap is more likely to be unaligned.

2. Large reservations are more likely to span across a 64M boundary.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
---
 arch/x86/mm/numa_32.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 0342d27..f278b04 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -46,15 +46,16 @@ EXPORT_SYMBOL(physnode_map);
 
 void memory_present(int nid, unsigned long start, unsigned long end)
 {
-	unsigned long pfn;
+	unsigned long sect, endsect;
 
 	printk(KERN_INFO "Node: %d, start_pfn: %lx, end_pfn: %lx\n",
 			nid, start, end);
 	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
 	printk(KERN_DEBUG "  ");
-	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
-		physnode_map[pfn / PAGES_PER_SECTION] = nid;
-		printk(KERN_CONT "%lx ", pfn);
+	endsect = (end - 1) / PAGES_PER_SECTION;
+	for (sect = start / PAGES_PER_SECTION; sect <= endsect; ++sect) {
+		physnode_map[sect] = nid;
+		printk(KERN_CONT "%lx ", sect * PAGES_PER_SECTION);
 	}
 	printk(KERN_CONT "\n");
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: fix the initialization of physnode_map
  2014-01-31 10:05 [PATCH] x86: fix the initialization of physnode_map Petr Tesarik
@ 2014-01-31 21:02 ` David Rientjes
  2014-01-31 21:14 ` Dave Hansen
  1 sibling, 0 replies; 5+ messages in thread
From: David Rientjes @ 2014-01-31 21:02 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Jiang Liu,
	Andrew Morton, Dave Hansen, linux-kernel

On Fri, 31 Jan 2014, Petr Tesarik wrote:

> With DISCONTIGMEM, the mapping between a pfn and its owning node is
> initialized using data provided by the BIOS or from the command line.
> However, the initialization may fail if the extents are not aligned
> to section boundary (64M).
> 
> The symptom of this bug is an early boot failure in pfn_to_page(),
> as it tries to access NODE_DATA(__nid) using index from an unitialized
> element of the physnode_map[] array.
> 
> While the bug is always present, it is more likely to be hit in kdump
> kernels on large machines, because:
> 
> 1. The memory map for a kdump kernel is specified as exactmap, and
>    exactmap is more likely to be unaligned.
> 
> 2. Large reservations are more likely to span across a 64M boundary.
> 
> Signed-off-by: Petr Tesarik <ptesarik@suse.cz>

What's missing here is how you're trying to fix the issue.

> diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
> index 0342d27..f278b04 100644
> --- a/arch/x86/mm/numa_32.c
> +++ b/arch/x86/mm/numa_32.c
> @@ -46,15 +46,16 @@ EXPORT_SYMBOL(physnode_map);
>  
>  void memory_present(int nid, unsigned long start, unsigned long end)
>  {
> -	unsigned long pfn;
> +	unsigned long sect, endsect;
>  
>  	printk(KERN_INFO "Node: %d, start_pfn: %lx, end_pfn: %lx\n",
>  			nid, start, end);
>  	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
>  	printk(KERN_DEBUG "  ");
> -	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> -		physnode_map[pfn / PAGES_PER_SECTION] = nid;
> -		printk(KERN_CONT "%lx ", pfn);
> +	endsect = (end - 1) / PAGES_PER_SECTION;
> +	for (sect = start / PAGES_PER_SECTION; sect <= endsect; ++sect) {
> +		physnode_map[sect] = nid;
> +		printk(KERN_CONT "%lx ", sect * PAGES_PER_SECTION);
>  	}
>  	printk(KERN_CONT "\n");
>  }

This looks more like refactoring than anything else and doesn't make it 
clear at all what the fix is.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: fix the initialization of physnode_map
  2014-01-31 10:05 [PATCH] x86: fix the initialization of physnode_map Petr Tesarik
  2014-01-31 21:02 ` David Rientjes
@ 2014-01-31 21:14 ` Dave Hansen
  2014-02-01 12:13   ` Petr Tesarik
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2014-01-31 21:14 UTC (permalink / raw)
  To: Petr Tesarik, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Jiang Liu, Andrew Morton, Dave Hansen, linux-kernel

On 01/31/2014 02:05 AM, Petr Tesarik wrote:
> With DISCONTIGMEM, the mapping between a pfn and its owning node is
> initialized using data provided by the BIOS or from the command line.
> However, the initialization may fail if the extents are not aligned
> to section boundary (64M).

So is this a problem that shows up with DISCONTIGMEM?  Just curious, but
what the heck kind of 32-bit NUMA hardware is still in the wild?  Did
someon buy a NUMA-Q on eBay? :)

>  void memory_present(int nid, unsigned long start, unsigned long end)
>  {
> -	unsigned long pfn;
> +	unsigned long sect, endsect;
>  
>  	printk(KERN_INFO "Node: %d, start_pfn: %lx, end_pfn: %lx\n",
>  			nid, start, end);
>  	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
>  	printk(KERN_DEBUG "  ");
> -	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> -		physnode_map[pfn / PAGES_PER_SECTION] = nid;
> -		printk(KERN_CONT "%lx ", pfn);
> +	endsect = (end - 1) / PAGES_PER_SECTION;
> +	for (sect = start / PAGES_PER_SECTION; sect <= endsect; ++sect) {
> +		physnode_map[sect] = nid;
> +		printk(KERN_CONT "%lx ", sect * PAGES_PER_SECTION);
>  	}
>  	printk(KERN_CONT "\n");
>  }

So, if start and end are not aligned to section boundaries, we will miss
setting physnode_map[] for the final section?

For instance, if we have a 64MB section size and try to call
memory_present(32MB -> 96MB), we will set 0->64MB present, but not set
the 64MB->128MB section as present.

Right?

Can you just align 'start' down to the section's start and 'end' up to
the end of the section that contains it?  I guess you do that
implicitly, but you should be able to do it without refactoring the for
loop entirely.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: fix the initialization of physnode_map
  2014-01-31 21:14 ` Dave Hansen
@ 2014-02-01 12:13   ` Petr Tesarik
  2014-02-01 16:43     ` Dave Hansen
  0 siblings, 1 reply; 5+ messages in thread
From: Petr Tesarik @ 2014-02-01 12:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Jiang Liu,
	Andrew Morton, Dave Hansen, linux-kernel

On Fri, 31 Jan 2014 13:14:29 -0800
Dave Hansen <dave@sr71.net> wrote:

> On 01/31/2014 02:05 AM, Petr Tesarik wrote:
> > With DISCONTIGMEM, the mapping between a pfn and its owning node is
> > initialized using data provided by the BIOS or from the command line.
> > However, the initialization may fail if the extents are not aligned
> > to section boundary (64M).
> 
> So is this a problem that shows up with DISCONTIGMEM?

Yes, that's it.

> Just curious, but
> what the heck kind of 32-bit NUMA hardware is still in the wild?  Did
> someon buy a NUMA-Q on eBay? :)

In fact, this is a patch that has been floating around in SUSE
Enterprise kernels for some time. It was originally added to pass
certification on IBM SurePOS 700 x4900-785.

When cleaning up our kernel patches, I noticed that the bug is still
present in the upstream kernel, so I posted this patch. While I don't
have any evidence that someone actually needs the fix today, it seems
wrong to leave buggy code in the kernel.

If you all agree that we rip off DISCONTIGMEM instead, I can post
patches to do that and be equally happy. ;-)

> >  void memory_present(int nid, unsigned long start, unsigned long end)
> >  {
> > -	unsigned long pfn;
> > +	unsigned long sect, endsect;
> >  
> >  	printk(KERN_INFO "Node: %d, start_pfn: %lx, end_pfn: %lx\n",
> >  			nid, start, end);
> >  	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
> >  	printk(KERN_DEBUG "  ");
> > -	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> > -		physnode_map[pfn / PAGES_PER_SECTION] = nid;
> > -		printk(KERN_CONT "%lx ", pfn);
> > +	endsect = (end - 1) / PAGES_PER_SECTION;
> > +	for (sect = start / PAGES_PER_SECTION; sect <= endsect; ++sect) {
> > +		physnode_map[sect] = nid;
> > +		printk(KERN_CONT "%lx ", sect * PAGES_PER_SECTION);
> >  	}
> >  	printk(KERN_CONT "\n");
> >  }
> 
> So, if start and end are not aligned to section boundaries, we will miss
> setting physnode_map[] for the final section?

If end belongs to a different section than start, the final section
will not be initialized, yes.

> For instance, if we have a 64MB section size and try to call
> memory_present(32MB -> 96MB), we will set 0->64MB present, but not set
> the 64MB->128MB section as present.
> 
> Right?

Exactly.

> Can you just align 'start' down to the section's start and 'end' up to
> the end of the section that contains it?  I guess you do that
> implicitly, but you should be able to do it without refactoring the for
> loop entirely.

Works for me.

Petr Tesarik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: fix the initialization of physnode_map
  2014-02-01 12:13   ` Petr Tesarik
@ 2014-02-01 16:43     ` Dave Hansen
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2014-02-01 16:43 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Jiang Liu,
	Andrew Morton, linux-kernel

On 02/01/2014 04:13 AM, Petr Tesarik wrote:
>> > Just curious, but
>> > what the heck kind of 32-bit NUMA hardware is still in the wild?  Did
>> > someon buy a NUMA-Q on eBay? :)
> In fact, this is a patch that has been floating around in SUSE
> Enterprise kernels for some time. It was originally added to pass
> certification on IBM SurePOS 700 x4900-785.
> 
> When cleaning up our kernel patches, I noticed that the bug is still
> present in the upstream kernel, so I posted this patch. While I don't
> have any evidence that someone actually needs the fix today, it seems
> wrong to leave buggy code in the kernel.
> 
> If you all agree that we rip off DISCONTIGMEM instead, I can post
> patches to do that and be equally happy. ;-)

I have a soft spot in my heart for all that old 32-bit NUMA hardware.
I've been thinking about ripping the support out, but it usually sits
quietly not bothering anybody.

Your patch looks correct to me, and it's easier to tell that it is
correct if you just change the alignment.  The only bummer here is that
it's going to be hard to test for correctness since it sounds like you
don't have the hardware sitting in front of you.  In any case, feel free
to add my:

Acked-by: Dave Hansen <dave@sr71.net>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-01 16:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-31 10:05 [PATCH] x86: fix the initialization of physnode_map Petr Tesarik
2014-01-31 21:02 ` David Rientjes
2014-01-31 21:14 ` Dave Hansen
2014-02-01 12:13   ` Petr Tesarik
2014-02-01 16:43     ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).