All of lore.kernel.org
 help / color / mirror / Atom feed
* 32bit NUMA and fakeNUMA broken for AMD CPUs
@ 2011-06-21 15:41 Conny Seidel
  2011-06-26 10:22 ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Conny Seidel @ 2011-06-21 15:41 UTC (permalink / raw)
  To: LKML, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 5910 bytes --]

Hi,

the commit 797390d8554b1e07aabea37d0140933b0412dba0 breaks 32bit on AMD
with native NUMA and fakeNUMA.

Native NUMA still boots, when the kernel parameter numa=off is added to
the cmdline.

[    0.000000] BUG: unable to handle kernel paging request at 000012b0
[    0.000000] IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
[    0.000000] *pdpt = 0000000000000000 *pde = f000eef3f000ee00
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] last sysfs file:
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
[    0.000000] EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
[    0.000000] EIP is at memmap_init_zone+0x6c/0xf2
[    0.000000] EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
[    0.000000] ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
[    0.000000]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.000000] Process swapper (pid: 0, ti=c19fe000 task=c1a07f60 task.ti=c19fe000)
[    0.000000] Stack:
[    0.000000]  00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
[    0.000000]  c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
[    0.000000]  c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
[    0.000000] Call Trace:
[    0.000000]  [<c1a80f24>] free_area_init_node+0x358/0x385
[    0.000000]  [<c1a81384>] free_area_init_nodes+0x420/0x487
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c102489e>] ? memory_present+0x66/0x6f
[    0.000000]  [<c1a79326>] paging_init+0x114/0x11b
[    0.000000]  [<c101742f>] ? native_apic_mem_read+0x8/0x19
[    0.000000]  [<c1a6cb13>] setup_arch+0xb37/0xc0a
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c1a69554>] start_kernel+0x76/0x316
[    0.000000]  [<c1a690a8>] i386_start_kernel+0xa8/0xb0
[    0.000000] Code: 0a c1 e0 1d 89 45 ec 8b 45 e4 03 3c 85 e8 5b a6 c1 e9 8a 00 00 00 89 f0 89 f3 c1 e8 0e 0f be 80 a8 57 a6 c1 8b 04 85 e8 5b a6 c1 <2b> 98 b0 12 00 00 c1 e3 05 03 98 ac 12 00 00 8b 03 25 ff ff ff
[    0.000000] EIP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2 SS:ESP 0068:c19ffe34
[    0.000000] CR2: 00000000000012b0
[    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Pid: 0, comm: swapper Tainted: G      D     2.6.39-rc5-00164-g797390d #1
[    0.000000] Call Trace:
[    0.000000]  [<c1637213>] panic+0x55/0x151
[    0.000000]  [<c10507c9>] ? blocking_notifier_call_chain+0x11/0x13
[    0.000000]  [<c1038340>] do_exit+0x99/0x6fa
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c10356de>] ? kmsg_dump+0x3c/0xbe
[    0.000000]  [<c163a569>] oops_end+0x97/0x9f
[    0.000000]  [<c101e9a4>] no_context+0x144/0x14e
[    0.000000]  [<c101eada>] __bad_area_nosemaphore+0x12c/0x134
[    0.000000]  [<c1a83a75>] ? memblock_add_region+0xbf/0x4af
[    0.000000]  [<c101eaf4>] bad_area_nosemaphore+0x12/0x15
[    0.000000]  [<c163beb0>] do_page_fault+0x1e8/0x3c8
[    0.000000]  [<c1a82c5e>] ? __alloc_memory_core_early+0x86/0x94
[    0.000000]  [<c163bcc8>] ? spurious_fault+0xf2/0xf2
[    0.000000]  [<c1639c6b>] error_code+0x5f/0x64
[    0.000000]  [<c163bcc8>] ? spurious_fault+0xf2/0xf2
[    0.000000]  [<c1aa13ce>] ? memmap_init_zone+0x6c/0xf2
[    0.000000]  [<c1a80f24>] free_area_init_node+0x358/0x385
[    0.000000]  [<c1a81384>] free_area_init_nodes+0x420/0x487
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c102489e>] ? memory_present+0x66/0x6f
[    0.000000]  [<c1a79326>] paging_init+0x114/0x11b
[    0.000000]  [<c101742f>] ? native_apic_mem_read+0x8/0x19
[    0.000000]  [<c1a6cb13>] setup_arch+0xb37/0xc0a
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c1a69554>] start_kernel+0x76/0x316
[    0.000000]  [<c1a690a8>] i386_start_kernel+0xa8/0xb0



commit 797390d8554b1e07aabea37d0140933b0412dba0
Author: Tejun Heo <tj@kernel.org>
Date:   Mon May 2 14:18:52 2011 +0200

    x86-32, NUMA: use sparse_memory_present_with_active_regions()

    Instead of calling memory_present() for each region from NUMA init,
    call sparse_memory_present_with_active_regions() from paging_init()
    similarly to x86-64.

    For flat and numaq, this results in exactly the same memory_present()
    calls.  For srat, if there are multiple memory chunks for a node,
    after this change, memory_present() will be called separately for each
    chunk instead of being called once to encompass the whole range, which
    doesn't cause any harm and actually is the better behavior.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Yinghai Lu <yinghai@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>


##
##################################################################
# Email : conny.seidel@amd.com            GnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##################################################################
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach      #
# General Managers: Alberto Bozzoi                               #
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#               HRB Nr. 43632                                    #
##################################################################

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-21 15:41 32bit NUMA and fakeNUMA broken for AMD CPUs Conny Seidel
@ 2011-06-26 10:22 ` Tejun Heo
       [not found]   ` <20110626223807.47cef5c6.conny.seidel_amd.com@marah.osrc.amd.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-06-26 10:22 UTC (permalink / raw)
  To: Conny Seidel; +Cc: LKML

Hello,

On Tue, Jun 21, 2011 at 05:41:31PM +0200, Conny Seidel wrote:
> the commit 797390d8554b1e07aabea37d0140933b0412dba0 breaks 32bit on AMD
> with native NUMA and fakeNUMA.
> 
> Native NUMA still boots, when the kernel parameter numa=off is added to
> the cmdline.

I've been looking at it without much success yet.  Can you please
attach full kernel boot log and .config?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines
       [not found]   ` <20110626223807.47cef5c6.conny.seidel_amd.com@marah.osrc.amd.com>
@ 2011-06-28  9:41     ` Tejun Heo
  2011-06-28 12:35       ` Conny Seidel
  2011-07-01 15:26       ` [tip:x86/urgent] " tip-bot for Tejun Heo
       [not found]     ` <20110628174613.GP478@escobedo.osrc.amd.com>
  1 sibling, 2 replies; 28+ messages in thread
From: Tejun Heo @ 2011-06-28  9:41 UTC (permalink / raw)
  To: Conny Seidel, Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: hans.rosenfeld, linux-kernel, Christoph Lameter

During 32/64 NUMA init unification, commit 797390d855 "x86-32, NUMA:
use sparse_memory_present_with_active_regions()" made 32bit mm init
call memory_present() automatically from active_regions instead of
leaving it to each NUMA init path.

This commit description is inaccurate - memory_present() calls aren't
the same for flat and numaq.  After the commit, memory_present() is
only called for the intersection of e820 and NUMA layout.  Before, on
flatmem, memory_present() would be called from 0 to max_pfn.  After,
it would be called only on the areas that e820 indicates to be
populated.

This is how x86_64 works and should be okay as memmap is allowed to
contain holes; however, x86_32 DISCONTIGMEM is missing
early_pfn_valid(), which makes memmap_init_zone() assume that memmap
doesn't contain any hole.  This leads to the following oops if e820
map contains holes as it often does on machine with near or more 4GiB
of memory by calling pfn_to_page() on a pfn which isn't mapped to a
NUMA node.

  BUG: unable to handle kernel paging request at 000012b0
  IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
  *pdpt =3D 0000000000000000 *pde =3D f000eef3f000ee00
  Oops: 0000 [#1] SMP
  last sysfs file:
  Modules linked in:

  Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
  EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
  EIP is at memmap_init_zone+0x6c/0xf2
  EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
  ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60 task.ti=3Dc19fe000)
  Stack:
   00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
   c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
   c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
  Call Trace:
   [<c1a80f24>] free_area_init_node+0x358/0x385
   [<c1a81384>] free_area_init_nodes+0x420/0x487
   [<c1a79326>] paging_init+0x114/0x11b
   [<c1a6cb13>] setup_arch+0xb37/0xc0a
   [<c1a69554>] start_kernel+0x76/0x316
   [<c1a690a8>] i386_start_kernel+0xa8/0xb0

This patch fixes the bug by defining early_pfn_valid() to be the same
as pfn_valid() when DISCONTIGMEM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-bisected-by: Conny Seidel <conny.seidel@amd.com>
LKML-Reference: <20110621174131.054f0422.conny.seidel_amd.com@marah.osrc.amd.com>
---
Conny, can you please verify this fixes the boot problem you're
seeing?

Thanks.

 arch/x86/include/asm/mmzone_32.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 5e83a41..756d2a7 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -68,6 +68,8 @@ static inline int pfn_valid(int pfn)
 	return 0;
 }
 
+#define early_pfn_valid(pfn)	pfn_valid((pfn))
+
 #endif /* CONFIG_DISCONTIGMEM */
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines
  2011-06-28  9:41     ` [PATCH tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines Tejun Heo
@ 2011-06-28 12:35       ` Conny Seidel
  2011-07-01 15:26       ` [tip:x86/urgent] " tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: Conny Seidel @ 2011-06-28 12:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Rosenfeld, Hans,
	linux-kernel, Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 3923 bytes --]

On Tue, 28 Jun 2011 05:41:07 -0400
Tejun Heo <tj@kernel.org> wrote:

>During 32/64 NUMA init unification, commit 797390d855 "x86-32, NUMA:
>use sparse_memory_present_with_active_regions()" made 32bit mm init
>call memory_present() automatically from active_regions instead of
>leaving it to each NUMA init path.
>
>This commit description is inaccurate - memory_present() calls aren't
>the same for flat and numaq.  After the commit, memory_present() is
>only called for the intersection of e820 and NUMA layout.  Before, on
>flatmem, memory_present() would be called from 0 to max_pfn.  After,
>it would be called only on the areas that e820 indicates to be
>populated.
>
>This is how x86_64 works and should be okay as memmap is allowed to
>contain holes; however, x86_32 DISCONTIGMEM is missing
>early_pfn_valid(), which makes memmap_init_zone() assume that memmap
>doesn't contain any hole.  This leads to the following oops if e820
>map contains holes as it often does on machine with near or more 4GiB
>of memory by calling pfn_to_page() on a pfn which isn't mapped to a
>NUMA node.
>
>  BUG: unable to handle kernel paging request at 000012b0
>  IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
>  *pdpt =3D 0000000000000000 *pde =3D f000eef3f000ee00
>  Oops: 0000 [#1] SMP
>  last sysfs file:
>  Modules linked in:
>
>  Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be
> Filled By O.E.M. To Be Filled By O.E.M./E350M1 EIP: 0060:[<c1aa13ce>]
> EFLAGS: 00010012 CPU: 0 EIP is at memmap_init_zone+0x6c/0xf2
>  EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
>  ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>  Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60
> task.ti=3Dc19fe000) Stack:
>   00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000
> f2c00b58 c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800
> 00000100 00000030 c1abb768 0000003c 00000000 00000000 00000004
> 00207a02 f2c00800 000375fe Call Trace:
>   [<c1a80f24>] free_area_init_node+0x358/0x385
>   [<c1a81384>] free_area_init_nodes+0x420/0x487
>   [<c1a79326>] paging_init+0x114/0x11b
>   [<c1a6cb13>] setup_arch+0xb37/0xc0a
>   [<c1a69554>] start_kernel+0x76/0x316
>   [<c1a690a8>] i386_start_kernel+0xa8/0xb0
>
>This patch fixes the bug by defining early_pfn_valid() to be the same
>as pfn_valid() when DISCONTIGMEM.
>
>Signed-off-by: Tejun Heo <tj@kernel.org>
>Reported-and-bisected-by: Conny Seidel <conny.seidel@amd.com>
>LKML-Reference:
><20110621174131.054f0422.conny.seidel_amd.com@marah.osrc.amd.com> ---
>Conny, can you please verify this fixes the boot problem you're
>seeing?

Verified, the patch fixes our problem.

>Thanks.

Thanks for fixing this quickly.

> arch/x86/include/asm/mmzone_32.h |    2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/arch/x86/include/asm/mmzone_32.h
>b/arch/x86/include/asm/mmzone_32.h index 5e83a41..756d2a7 100644
>--- a/arch/x86/include/asm/mmzone_32.h
>+++ b/arch/x86/include/asm/mmzone_32.h
>@@ -68,6 +68,8 @@ static inline int pfn_valid(int pfn)
> 	return 0;
> }
>
>+#define early_pfn_valid(pfn)	pfn_valid((pfn))
>+
> #endif /* CONFIG_DISCONTIGMEM */
>
> #ifdef CONFIG_NEED_MULTIPLE_NODES
>


##
##################################################################
# Email : conny.seidel@amd.com            GnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##################################################################
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach      #
# General Managers: Alberto Bozzoi                               #
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#               HRB Nr. 43632                                    #
##################################################################

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
       [not found]     ` <20110628174613.GP478@escobedo.osrc.amd.com>
@ 2011-06-29  9:44       ` Tejun Heo
  2011-06-29 10:51         ` Tejun Heo
                           ` (2 more replies)
  2011-07-13  5:34       ` [tip:x86/numa] x86, numa: " tip-bot for Tejun Heo
  1 sibling, 3 replies; 28+ messages in thread
From: Tejun Heo @ 2011-06-29  9:44 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Conny Seidel, x86, linux-kernel

(cc'ing x86 and lkml.  Please keep them cc'd on x86 related issues).

Hello,

On Tue, Jun 28, 2011 at 07:46:14PM +0200, Hans Rosenfeld wrote:
> We found another related but different panic on a 4-socket 8-node system,
> caused by this commit:
> 
>     commit 2706a0bf7b02693ed88752df877f10c2206292ff
>     Author: Tejun Heo <tj@kernel.org>
>     Date:   Mon May 2 17:24:48 2011 +0200
> 
>     x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too
>     
>     Now that NUMA init path is unified, amdtopology can be enabled on
>     32bit.  Make amdtopology.c safe on 32bit by explicitly using u64 and
>     drop X86_64 dependency from Kconfig.
>     
>     Inclusion of bootmem.h is added for max_pfn declaration.
>     
>     Signed-off-by: Tejun Heo <tj@kernel.org>
>     Cc: Ingo Molnar <mingo@redhat.com>
>     Cc: Yinghai Lu <yinghai@kernel.org>
>     Cc: David Rientjes <rientjes@google.com>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Cc: "H. Peter Anvin" <hpa@zytor.com>
> 
> 
> The fix for the other panic does not fix this one.
> Full bootlog and config are attached.

Hmmm, interesting.

> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: 0000000000000000 - 0000000000087800 (usable)
> [    0.000000]  BIOS-e820: 0000000000087800 - 00000000000a0000 (reserved)
> [    0.000000]  BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000c7e70000 (usable)
> [    0.000000]  BIOS-e820: 00000000c7e70000 - 00000000c7e8c000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000c7e8c000 - 00000000c7e8e000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000c7e8e000 - 00000000c8000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
> [    0.000000]  BIOS-e820: 0000000100000000 - 0000001838000000 (usable)

Okay, a fairly large machine.  Memory goes over PAE limit.

> [    0.000000] Scanning NUMA topology in Northbridge 24
> [    0.000000] Number of physical nodes 8
> [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
> [    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
> [    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
> [    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
> [    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
> [    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
> [    0.000000] Node 6 bogus settings 1238000000-1000000000.
> [    0.000000] Node 7 bogus settings 1438000000-1000000000.

amdtopology code behaved correctly.  It trimmed node 5 which spans
over the PAE limit and squashed nodes above that.

> [    0.000000] BUG: Int 6: CR2   (null)
> [    0.000000]      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
> [    0.000000]      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
> [    0.000000]      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
> [    0.000000] Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
> [    0.000000]          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
> [    0.000000]        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
> [    0.000000] Call Trace:
> [    0.000000]  [<c136b1e5>] ? early_fault+0x2e/0x2e
> [    0.000000]  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
> [    0.000000]  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
> [    0.000000]  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
> [    0.000000]  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
> [    0.000000]  [<c1601d80>] ? paging_init+0x112/0x118
> [    0.000000]  [<c15f578d>] ? setup_arch+0x791/0x82f
> [    0.000000]  [<c15f43d9>] ? start_kernel+0x6a/0x257

But it later tripped in mminit_verify_page_links().  Maybe
page_to_nid() doesn't match?

Hmmm... I can't see how it would have worked before.  amdtopology used
ulong for @end and would simply have been zero.  Maybe NUMA config
failed and it booted as flatmem instead?  Can you please post boot log
before the patch?

Also, can you please apply the following patch, reproduce the boot
failure and post the log?  Thank you.


diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4e0e265..cb230bf 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -124,6 +124,12 @@ void __init mminit_verify_pageflags_layout(void)
 void __meminit mminit_verify_page_links(struct page *page, enum zone_type zone,
 			unsigned long nid, unsigned long pfn)
 {
+	if (page_to_nid(page) != nid || page_zonenum(page) != zone ||
+	    page_to_pfn(page) != pfn)
+		printk(KERN_CRIT "mminit_verify_page_links: nid=%lu/%lu zone=%d/%d pfn=0x%lx/0x%lx\n",
+		       page_to_nid(page), nid, page_zonenum(page), zone,
+		       page_to_pfn(page), pfn);
+
 	BUG_ON(page_to_nid(page) != nid);
 	BUG_ON(page_zonenum(page) != zone);
 	BUG_ON(page_to_pfn(page) != pfn);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-29  9:44       ` 32bit NUMA and fakeNUMA broken for AMD CPUs Tejun Heo
@ 2011-06-29 10:51         ` Tejun Heo
  2011-06-29 12:34         ` Tejun Heo
  2011-07-01 16:22         ` [PATCH x86/urgent 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
  2 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2011-06-29 10:51 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Conny Seidel, x86, linux-kernel

On Wed, Jun 29, 2011 at 11:44:51AM +0200, Tejun Heo wrote:
> Hmmm... I can't see how it would have worked before.  amdtopology used
> ulong for @end and would simply have been zero.  Maybe NUMA config
> failed and it booted as flatmem instead?  Can you please post boot log
> before the patch?

Ooh, please forget about this one.  I got confused and thought that
amdtopology was for 32bit only and then converted to apply to both 32
and 64.  It was the other way around, so the machine didn't use to get
NUMA configuration at all.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-29  9:44       ` 32bit NUMA and fakeNUMA broken for AMD CPUs Tejun Heo
  2011-06-29 10:51         ` Tejun Heo
@ 2011-06-29 12:34         ` Tejun Heo
  2011-06-29 12:55           ` Hans Rosenfeld
  2011-07-01 16:22         ` [PATCH x86/urgent 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
  2 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-06-29 12:34 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Conny Seidel, x86, linux-kernel

Hello, again.

I think I found what went wrong.

> > [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
> > [    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
> > [    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
> > [    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
> > [    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
> > [    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
> > [    0.000000] Node 6 bogus settings 1238000000-1000000000.
> > [    0.000000] Node 7 bogus settings 1438000000-1000000000.

NUMA nodes are aligned to 27bit - 128MiB.  SPARSEMEM is enabled but on
x86-32 w/ PAE SECTION_SIZE_BITS is 29 - 512MiB, which means that pages
living near the boundary will have wrong nid assigned to them.

> > [    0.000000] BUG: Int 6: CR2   (null)
> > [    0.000000]      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
> > [    0.000000]      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
> > [    0.000000]      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
> > [    0.000000] Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
> > [    0.000000]          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
> > [    0.000000]        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
> > [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
> > [    0.000000] Call Trace:
> > [    0.000000]  [<c136b1e5>] ? early_fault+0x2e/0x2e
> > [    0.000000]  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42

So, mminit_verify_page_links() detects it while the last 512MiB
highmem chunk of node 0 is being initialized and freaks out.

We definitely need a safe guard to check NUMA node alignment and
disable NUMA if it requires finer granuality than supported by the
memory model.  If you use DISCONTIGMEM, which has 64MiB granuality,
instead, it works, right?

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-29 12:34         ` Tejun Heo
@ 2011-06-29 12:55           ` Hans Rosenfeld
  2011-06-29 13:03             ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Hans Rosenfeld @ 2011-06-29 12:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Seidel, Conny, x86, linux-kernel

On Wed, Jun 29, 2011 at 08:34:09AM -0400, Tejun Heo wrote:
> So, mminit_verify_page_links() detects it while the last 512MiB
> highmem chunk of node 0 is being initialized and freaks out.
> 
> We definitely need a safe guard to check NUMA node alignment and
> disable NUMA if it requires finer granuality than supported by the
> memory model.  If you use DISCONTIGMEM, which has 64MiB granuality,
> instead, it works, right?

I had DISCONTIGMEM enabled in the kernel config, it does not work.

Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-29 12:55           ` Hans Rosenfeld
@ 2011-06-29 13:03             ` Tejun Heo
  2011-06-29 16:15               ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-06-29 13:03 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Seidel, Conny, x86, linux-kernel

On Wed, Jun 29, 2011 at 02:55:08PM +0200, Hans Rosenfeld wrote:
> On Wed, Jun 29, 2011 at 08:34:09AM -0400, Tejun Heo wrote:
> > So, mminit_verify_page_links() detects it while the last 512MiB
> > highmem chunk of node 0 is being initialized and freaks out.
> > 
> > We definitely need a safe guard to check NUMA node alignment and
> > disable NUMA if it requires finer granuality than supported by the
> > memory model.  If you use DISCONTIGMEM, which has 64MiB granuality,
> > instead, it works, right?
> 
> I had DISCONTIGMEM enabled in the kernel config, it does not work.

Hmmm?  The following is the relevant part from your .config.

  CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
  CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y
  CONFIG_ARCH_SPARSEMEM_ENABLE=y
  CONFIG_ARCH_SELECT_MEMORY_MODEL=y
  CONFIG_ILLEGAL_POINTER_VALUE=0
  CONFIG_SELECT_MEMORY_MODEL=y
  # CONFIG_DISCONTIGMEM_MANUAL is not set
  CONFIG_SPARSEMEM_MANUAL=y
  CONFIG_SPARSEMEM=y

And it selects SPARSEMEM via SPARSEMEM_MANUAL.  You need to choose
DISCONTIGMEM_MANUAL in "Processor type and features" -> "Memory
Model".

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-29 13:03             ` Tejun Heo
@ 2011-06-29 16:15               ` Tejun Heo
  2011-06-30 13:13                 ` Hans Rosenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-06-29 16:15 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Seidel, Conny, x86, linux-kernel

Hans, can you please apply the following patch and post the boot log
from both SPARSEMEM and DISCONTIGMEM kernels?  On SPARSEMEM, it should
reject NUMA config and boot w/ flatmem.

Thanks.

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 224e8c5..0b6c75b 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -34,15 +34,15 @@ static inline void resume_map_numa_kva(pgd_t *pgd) {}
  *    64Gb / 4096bytes/page = 16777216 pages
  */
 #define MAX_NR_PAGES 16777216
-#define MAX_ELEMENTS 1024
-#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS)
+#define MAX_SECTIONS 1024
+#define PAGES_PER_SECTION (MAX_NR_PAGES/MAX_SECTIONS)
 
 extern s8 physnode_map[];
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
 #ifdef CONFIG_NUMA
-	return((int) physnode_map[(pfn) / PAGES_PER_ELEMENT]);
+	return((int) physnode_map[(pfn) / PAGES_PER_SECTIONS]);
 #else
 	return 0;
 #endif
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index f5510d8..9d643e2 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -496,6 +496,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
+	unsigned long pfn_align;
 	int i, nid;
 
 	/* Account for nodes with cpus and no memory */
@@ -511,6 +512,15 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	/* for out of order entries */
 	sort_node_map();
+
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
+		       (u64)pfn_align << PAGE_SHIFT >> 20,
+		       (u64)PAGES_PER_SECTION << PAGE_SHIFT >> 20);
+		return -EINVAL;
+	}
+
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 849a975..3adebe7 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -41,7 +41,7 @@
  *     physnode_map[16-31] = 1;
  *     physnode_map[32- ] = -1;
  */
-s8 physnode_map[MAX_ELEMENTS] __read_mostly = { [0 ... (MAX_ELEMENTS - 1)] = -1};
+s8 physnode_map[MAX_SECTIONS] __read_mostly = { [0 ... (MAX_SECTIONS - 1)] = -1};
 EXPORT_SYMBOL(physnode_map);
 
 void memory_present(int nid, unsigned long start, unsigned long end)
@@ -52,8 +52,8 @@ void memory_present(int nid, unsigned long start, unsigned long end)
 			nid, start, end);
 	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
 	printk(KERN_DEBUG "  ");
-	for (pfn = start; pfn < end; pfn += PAGES_PER_ELEMENT) {
-		physnode_map[pfn / PAGES_PER_ELEMENT] = nid;
+	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+		physnode_map[pfn / PAGES_PER_SECTION] = nid;
 		printk(KERN_CONT "%lx ", pfn);
 	}
 	printk(KERN_CONT "\n");
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..c70a326 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1313,6 +1313,7 @@ extern void remove_active_range(unsigned int nid, unsigned long start_pfn,
 					unsigned long end_pfn);
 extern void remove_all_active_ranges(void);
 void sort_node_map(void);
+unsigned long node_map_pfn_alignment(void);
 unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
 						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..2ae7dbc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4585,6 +4585,34 @@ void __init sort_node_map(void)
 			cmp_node_active_region, NULL);
 }
 
+unsigned long __init node_map_pfn_alignment(void)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	int last_nid = -1;
+	int i;
+
+	for_each_active_range_index_in_nid(i, MAX_NUMNODES) {
+		int nid = early_node_map[i].nid;
+		unsigned long start = early_node_map[i].start_pfn;
+		unsigned long end = early_node_map[i].end_pfn;
+		unsigned long mask;
+
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		accl_mask |= mask;
+	}
+
+	return ~accl_mask + 1;
+}
+
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-29 16:15               ` Tejun Heo
@ 2011-06-30 13:13                 ` Hans Rosenfeld
  2011-06-30 15:55                   ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Hans Rosenfeld @ 2011-06-30 13:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Seidel, Conny, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

On Wed, Jun 29, 2011 at 12:15:17PM -0400, Tejun Heo wrote:
> Hans, can you please apply the following patch and post the boot log
> from both SPARSEMEM and DISCONTIGMEM kernels?  On SPARSEMEM, it should
> reject NUMA config and boot w/ flatmem.

Bootlogs are attached. Now DISCONTIGMEM panics.

> diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
> index 224e8c5..0b6c75b 100644
> --- a/arch/x86/include/asm/mmzone_32.h
> +++ b/arch/x86/include/asm/mmzone_32.h
> @@ -34,15 +34,15 @@ static inline void resume_map_numa_kva(pgd_t *pgd) {}
>   *    64Gb / 4096bytes/page = 16777216 pages
>   */
>  #define MAX_NR_PAGES 16777216
> -#define MAX_ELEMENTS 1024
> -#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS)
> +#define MAX_SECTIONS 1024
> +#define PAGES_PER_SECTION (MAX_NR_PAGES/MAX_SECTIONS)
>  
>  extern s8 physnode_map[];
>  
>  static inline int pfn_to_nid(unsigned long pfn)
>  {
>  #ifdef CONFIG_NUMA
> -	return((int) physnode_map[(pfn) / PAGES_PER_ELEMENT]);
> +	return((int) physnode_map[(pfn) / PAGES_PER_SECTIONS]);

This probably should be PAGES_PER_SECTION.

>  #else
>  	return 0;
>  #endif

-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

[-- Attachment #2: bootlog_discontigmem --]
[-- Type: text/plain, Size: 19666 bytes --]

kernel /boot/vmlinuz.panic root=/dev/sda1 console=ttyS0,115200 console=tty0 ignore_loglevel earlyprintk=ttyS0,115200 debug
   [Linux-bzImage, setup=0x3200, size=0x34ced0]

early console in setup code
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] Linux version 2.6.39-rc5-00181-g2706a0b-dirty (root@worms) (gcc version 4.5.2 (Gentoo 4.5.2 p1.1, pie-0.4.5) ) #26 SMP Thu Jun 30 14:39:05 CEST 2011
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 0000000000087800 (usable)
[    0.000000]  BIOS-e820: 0000000000087800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000c7e70000 (usable)
[    0.000000]  BIOS-e820: 00000000c7e70000 - 00000000c7e8c000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000c7e8c000 - 00000000c7e8e000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000c7e8e000 - 00000000c8000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000001838000000 (usable)
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI present.
[    0.000000] DMI: AMD DRACHMA/DRACHMA, BIOS PDPAX1-6 12/15/2009
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] last_pfn = 0x1000000 max_arch_pfn = 0x1000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-D7FFF write-protect
[    0.000000]   D8000-DFFFF uncachable
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 base 0000C0000000 mask FFFFF8000000 write-back
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 0000001838000000 aka 99200M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000c8000000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] found SMP MP-table at [c00f7240] f7240
[    0.000000] initial memory mapped : 0 - 01c00000
[    0.000000] Base memory trampoline at [c0083000] 83000 size 16384
[    0.000000] init_memory_mapping: 0000000000000000-00000000375fe000
[    0.000000]  0000000000 - 0000200000 page 4k
[    0.000000]  0000200000 - 0037400000 page 2M
[    0.000000]  0037400000 - 00375fe000 page 4k
[    0.000000] kernel direct mapping tables up to 375fe000 @ 1bf8000-1c00000
[    0.000000] ACPI: RSDP 000f71c0 00024 (v02 PTLTD )
[    0.000000] ACPI: XSDT c7e78aee 00074 (v01 PTLTD  ? XSDT   06040000  LTP 00000000)
[    0.000000] ACPI: FACP c7e83b26 000F4 (v03 AMD    CHIPOTLE 06040000 AMD  000F4240)
[    0.000000] ACPI: DSDT c7e78b62 0AFC4 (v02    AMD    SB700 06040000 MSFT 03000000)
[    0.000000] ACPI: FACS c7e8dfc0 00040
[    0.000000] ACPI: TCPA c7e83c8e 00032 (v02 AMD             06040000 PTEC 00000000)
[    0.000000] ACPI: SLIT c7e83cc0 0006C (v01 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: SRAT c7e83d2c 004C0 (v02 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: IVRS c7e841ec 000E8 (v01  AMD     RD890S 06040000 AMD  00000000)
[    0.000000] ACPI: SSDT c7e842d4 078B4 (v01 AMD    POWERNOW 06040000 AMD  00000001)
[    0.000000] ACPI: SSDT c7e8bb88 0010A (v01 AMD-K8 AMD-ACPI 06040000  AMD 00000001)
[    0.000000] ACPI: APIC c7e8bc92 002FA (v01 PTLTD  ? APIC   06040000  LTP 00000000)
[    0.000000] ACPI: MCFG c7e8bf8c 0003C (v01 PTLTD    MCFG   06040000  LTP 00000000)
[    0.000000] ACPI: HPET c7e8bfc8 00038 (v01 PTLTD  HPETTBL  06040000  LTP 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 8
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
[    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
[    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
[    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
[    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
[    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
[    0.000000] Node 6 bogus settings 1238000000-1000000000.
[    0.000000] Node 7 bogus settings 1438000000-1000000000.
[    0.000000] BSP APIC ID: 10
[    0.000000] node 0 pfn: [0 - 238000]
[    0.000000] remap_alloc: node 0 [233800000-238000000) -> [f2c00000-f7400000)
[    0.000000] Initmem setup node 0 0000000000000000-0000000238000000
[    0.000000]   NODE_DATA [0000000032c00000 - 0000000032c01fff] (remapped)
[    0.000000] node 1 pfn: [238000 - 638000]
[    0.000000] remap_alloc: node 1 [62fe00000-638000000) -> [eaa00000-f2c00000)
[    0.000000] Initmem setup node 1 0000000238000000-0000000638000000
[    0.000000]   NODE_DATA [000000002aa00000 - 000000002aa01fff] (remapped)
[    0.000000] node 2 pfn: [638000 - 838000]
[    0.000000] remap_alloc: node 2 [833e00000-838000000) -> [e6800000-eaa00000)
[    0.000000] Initmem setup node 2 0000000638000000-0000000838000000
[    0.000000]   NODE_DATA [0000000026800000 - 0000000026801fff] (remapped)
[    0.000000] node 3 pfn: [838000 - c38000]
[    0.000000] remap_alloc: node 3 [c2fe00000-c38000000) -> [de600000-e6800000)
[    0.000000] Initmem setup node 3 0000000838000000-0000000c38000000
[    0.000000]   NODE_DATA [000000001e600000 - 000000001e601fff] (remapped)
[    0.000000] node 4 pfn: [c38000 - e38000]
[    0.000000] remap_alloc: node 4 [e33e00000-e38000000) -> [da400000-de600000)
[    0.000000] Initmem setup node 4 0000000c38000000-0000000e38000000
[    0.000000]   NODE_DATA [000000001a400000 - 000000001a401fff] (remapped)
[    0.000000] node 5 pfn: [e38000 - 1000000]
[    0.000000] remap_alloc: node 5 [ffc600000-1000000000) -> [d6a00000-da400000)
[    0.000000] Initmem setup node 5 0000000e38000000-0000001000000000
[    0.000000]   NODE_DATA [0000000016a00000 - 0000000016a01fff] (remapped)
[    0.000000] 64650MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000] max_low_pfn = 375fe, highstart_pfn = 375fe
[    0.000000] Low memory ends at vaddr f75fe000
[    0.000000] High memory starts at vaddr f75fe000
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] Node: 0, start_pfn: 10, end_pfn: 87
[    0.000000]   Setting physnode_map array to node 0 for pfns:
[    0.000000]   10
[    0.000000] Node: 0, start_pfn: 100, end_pfn: c7e70
[    0.000000]   Setting physnode_map array to node 0 for pfns:
[    0.000000]   100 4100 8100 c100 10100 14100 18100 1c100 20100 24100 28100 2c100 30100 34100 38100 3c100 40100 44100 48100 4c100 50100 54100 58100 5c100 60100 64100 68100 6c100 70100 74100 78100 7c100 80100 84100 88100 8c100 90100 94100 98100 9c100 a0100 a4100 a8100 ac100 b0100 b4100 b8100 bc100 c0100 c4100
[    0.000000] Node: 0, start_pfn: 100000, end_pfn: 238000
[    0.000000]   Setting physnode_map array to node 0 for pfns:
[    0.000000]   100000 104000 108000 10c000 110000 114000 118000 11c000 120000 124000 128000 12c000 130000 134000 138000 13c000 140000 144000 148000 14c000 150000 154000 158000 15c000 160000 164000 168000 16c000 170000 174000 178000 17c000 180000 184000 188000 18c000 190000 194000 198000 19c000 1a0000 1a4000 1a8000 1ac000 1b0000 1b4000 1b8000 1bc000 1c0000 1c4000 1c8000 1cc000 1d0000 1d4000 1d8000 1dc000 1e0000 1e4000 1e8000 1ec000 1f0000 1f4000 1f8000 1fc000 200000 204000 208000 20c000 210000 214000 218000 21c000 220000 224000 228000 22c000 230000 234000
[    0.000000] Node: 1, start_pfn: 238000, end_pfn: 638000
[    0.000000]   Setting physnode_map array to node 1 for pfns:
[    0.000000]   238000 23c000 240000 244000 248000 24c000 250000 254000 258000 25c000 260000 264000 268000 26c000 270000 274000 278000 27c000 280000 284000 288000 28c000 290000 294000 298000 29c000 2a0000 2a4000 2a8000 2ac000 2b0000 2b4000 2b8000 2bc000 2c0000 2c4000 2c8000 2cc000 2d0000 2d4000 2d8000 2dc000 2e0000 2e4000 2e8000 2ec000 2f0000 2f4000 2f8000 2fc000 300000 304000 308000 30c000 310000 314000 318000 31c000 320000 324000 328000 32c000 330000 334000 338000 33c000 340000 344000 348000 34c000 350000 354000 358000 35c000 360000 364000 368000 36c000 370000 374000 378000 37c000 380000 384000 388000 38c000 390000 394000 398000 39c000 3a0000 3a4000 3a8000 3ac000 3b0000 3b4000 3b8000 3bc000 3c0000 3c4000 3c8000 3cc000 3d0000 3d4000 3d8000 3dc000 3e0000 3e4000 3e8000 3ec000 3f0000 3f4000 3f8000 3fc000 400000 404000 408000 40c000 410000 414000 418000 41c000 420000 424000 428000 42c000 430000 434000 438000 43c000 440000 444000 448000 44c000 450000 454000 458000 45c000 460000 464000 468000 46c000 470000 474000 478000 47c000 480000 484000 488000 48c000 490000 494000 498000 49c000 4a0000 4a4000 4a8000 4ac000 4b0000 4b4000 4b8000 4bc000 4c0000 4c4000 4c8000 4cc000 4d0000 4d4000 4d8000 4dc000 4e0000 4e4000 4e8000 4ec000 4f0000 4f4000 4f8000 4fc000 500000 504000 508000 50c000 510000 514000 518000 51c000 520000 524000 528000 52c000 530000 534000 538000 53c000 540000 544000 548000 54c000 550000 554000 558000 55c000 560000 564000 568000 56c000 570000 574000 578000 57c000 580000 584000 588000 58c000 590000 594000 598000 59c000 5a0000 5a4000 5a8000 5ac000 5b0000 5b4000 5b8000 5bc000 5c0000 5c4000 5c8000 5cc000 5d0000 5d4000 5d8000 5dc000 5e0000 5e4000 5e8000 5ec000 5f0000 5f4000 5f8000 5fc000 600000 604000 608000 60c000 610000 614000 618000 61c000 620000 624000 628000 62c000 630000 634000
[    0.000000] Node: 2, start_pfn: 638000, end_pfn: 838000
[    0.000000]   Setting physnode_map array to node 2 for pfns:
[    0.000000]   638000 63c000 640000 644000 648000 64c000 650000 654000 658000 65c000 660000 664000 668000 66c000 670000 674000 678000 67c000 680000 684000 688000 68c000 690000 694000 698000 69c000 6a0000 6a4000 6a8000 6ac000 6b0000 6b4000 6b8000 6bc000 6c0000 6c4000 6c8000 6cc000 6d0000 6d4000 6d8000 6dc000 6e0000 6e4000 6e8000 6ec000 6f0000 6f4000 6f8000 6fc000 700000 704000 708000 70c000 710000 714000 718000 71c000 720000 724000 728000 72c000 730000 734000 738000 73c000 740000 744000 748000 74c000 750000 754000 758000 75c000 760000 764000 768000 76c000 770000 774000 778000 77c000 780000 784000 788000 78c000 790000 794000 798000 79c000 7a0000 7a4000 7a8000 7ac000 7b0000 7b4000 7b8000 7bc000 7c0000 7c4000 7c8000 7cc000 7d0000 7d4000 7d8000 7dc000 7e0000 7e4000 7e8000 7ec000 7f0000 7f4000 7f8000 7fc000 800000 804000 808000 80c000 810000 814000 818000 81c000 820000 824000 828000 82c000 830000 834000
[    0.000000] Node: 3, start_pfn: 838000, end_pfn: c38000
[    0.000000]   Setting physnode_map array to node 3 for pfns:
[    0.000000]   838000 83c000 840000 844000 848000 84c000 850000 854000 858000 85c000 860000 864000 868000 86c000 870000 874000 878000 87c000 880000 884000 888000 88c000 890000 894000 898000 89c000 8a0000 8a4000 8a8000 8ac000 8b0000 8b4000 8b8000 8bc000 8c0000 8c4000 8c8000 8cc000 8d0000 8d4000 8d8000 8dc000 8e0000 8e4000 8e8000 8ec000 8f0000 8f4000 8f8000 8fc000 900000 904000 908000 90c000 910000 914000 918000 91c000 920000 924000 928000 92c000 930000 934000 938000 93c000 940000 944000 948000 94c000 950000 954000 958000 95c000 960000 964000 968000 96c000 970000 974000 978000 97c000 980000 984000 988000 98c000 990000 994000 998000 99c000 9a0000 9a4000 9a8000 9ac000 9b0000 9b4000 9b8000 9bc000 9c0000 9c4000 9c8000 9cc000 9d0000 9d4000 9d8000 9dc000 9e0000 9e4000 9e8000 9ec000 9f0000 9f4000 9f8000 9fc000 a00000 a04000 a08000 a0c000 a10000 a14000 a18000 a1c000 a20000 a24000 a28000 a2c000 a30000 a34000 a38000 a3c000 a40000 a44000 a48000 a4c000 a50000 a54000 a58000 a5c000 a60000 a64000 a68000 a6c000 a70000 a74000 a78000 a7c000 a80000 a84000 a88000 a8c000 a90000 a94000 a98000 a9c000 aa0000 aa4000 aa8000 aac000 ab0000 ab4000 ab8000 abc000 ac0000 ac4000 ac8000 acc000 ad0000 ad4000 ad8000 adc000 ae0000 ae4000 ae8000 aec000 af0000 af4000 af8000 afc000 b00000 b04000 b08000 b0c000 b10000 b14000 b18000 b1c000 b20000 b24000 b28000 b2c000 b30000 b34000 b38000 b3c000 b40000 b44000 b48000 b4c000 b50000 b54000 b58000 b5c000 b60000 b64000 b68000 b6c000 b70000 b74000 b78000 b7c000 b80000 b84000 b88000 b8c000 b90000 b94000 b98000 b9c000 ba0000 ba4000 ba8000 bac000 bb0000 bb4000 bb8000 bbc000 bc0000 bc4000 bc8000 bcc000 bd0000 bd4000 bd8000 bdc000 be0000 be4000 be8000 bec000 bf0000 bf4000 bf8000 bfc000 c00000 c04000 c08000 c0c000 c10000 c14000 c18000 c1c000 c20000 c24000 c28000 c2c000 c30000 c34000
[    0.000000] Node: 4, start_pfn: c38000, end_pfn: e38000
[    0.000000]   Setting physnode_map array to node 4 for pfns:
[    0.000000]   c38000 c3c000 c40000 c44000 c48000 c4c000 c50000 c54000 c58000 c5c000 c60000 c64000 c68000 c6c000 c70000 c74000 c78000 c7c000 c80000 c84000 c88000 c8c000 c90000 c94000 c98000 c9c000 ca0000 ca4000 ca8000 cac000 cb0000 cb4000 cb8000 cbc000 cc0000 cc4000 cc8000 ccc000 cd0000 cd4000 cd8000 cdc000 ce0000 ce4000 ce8000 cec000 cf0000 cf4000 cf8000 cfc000 d00000 d04000 d08000 d0c000 d10000 d14000 d18000 d1c000 d20000 d24000 d28000 d2c000 d30000 d34000 d38000 d3c000 d40000 d44000 d48000 d4c000 d50000 d54000 d58000 d5c000 d60000 d64000 d68000 d6c000 d70000 d74000 d78000 d7c000 d80000 d84000 d88000 d8c000 d90000 d94000 d98000 d9c000 da0000 da4000 da8000 dac000 db0000 db4000 db8000 dbc000 dc0000 dc4000 dc8000 dcc000 dd0000 dd4000 dd8000 ddc000 de0000 de4000 de8000 dec000 df0000 df4000 df8000 dfc000 e00000 e04000 e08000 e0c000 e10000 e14000 e18000 e1c000 e20000 e24000 e28000 e2c000 e30000 e34000
[    0.000000] Node: 5, start_pfn: e38000, end_pfn: 1000000
[    0.000000]   Setting physnode_map array to node 5 for pfns:
[    0.000000]   e38000 e3c000 e40000 e44000 e48000 e4c000 e50000 e54000 e58000 e5c000 e60000 e64000 e68000 e6c000 e70000 e74000 e78000 e7c000 e80000 e84000 e88000 e8c000 e90000 e94000 e98000 e9c000 ea0000 ea4000 ea8000 eac000 eb0000 eb4000 eb8000 ebc000 ec0000 ec4000 ec8000 ecc000 ed0000 ed4000 ed8000 edc000 ee0000 ee4000 ee8000 eec000 ef0000 ef4000 ef8000 efc000 f00000 f04000 f08000 f0c000 f10000 f14000 f18000 f1c000 f20000 f24000 f28000 f2c000 f30000 f34000 f38000 f3c000 f40000 f44000 f48000 f4c000 f50000 f54000 f58000 f5c000 f60000 f64000 f68000 f6c000 f70000 f74000 f78000 f7c000 f80000 f84000 f88000 f8c000 f90000 f94000 f98000 f9c000 fa0000 fa4000 fa8000 fac000 fb0000 fb4000 fb8000 fbc000 fc0000 fc4000 fc8000 fcc000 fd0000 fd4000 fd8000 fdc000 fe0000 fe4000 fe8000 fec000 ff0000 ff4000 ff8000 ffc000
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   Normal   0x00001000 -> 0x000375fe
[    0.000000]   HighMem  0x000375fe -> 0x01000000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x00000087
[    0.000000]     0: 0x00000100 -> 0x000c7e70
[    0.000000]     0: 0x00100000 -> 0x00238000
[    0.000000]     1: 0x00238000 -> 0x00638000
[    0.000000]     2: 0x00638000 -> 0x00838000
[    0.000000]     3: 0x00838000 -> 0x00c38000
[    0.000000]     4: 0x00c38000 -> 0x00e38000
[    0.000000]     5: 0x00e38000 -> 0x01000000
[    0.000000] On node 0 totalpages: 2096615
[    0.000000] free_area_init_node: node 0, pgdat f2c00000, node_mem_map f2c02200
[    0.000000]   DMA zone: 32 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3927 pages, LIFO batch:0
[    0.000000]   Normal zone: 1740 pages used for memmap
[    0.000000]   Normal zone: 220978 pages, LIFO batch:31
[    0.000000]   HighMem zone: 16405 pages used for memmap
[    0.000000]   HighMem zone: 1853533 pages, LIFO batch:31
[    0.000000] BUG: unable to handle kernel paging request at 000716b6
[    0.000000] IP: [<c1620635>] memmap_init_zone+0x66/0xe9
[    0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] last sysfs file:
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b-dirty #26 AMD DRACHMA/DRACHMA
[    0.000000] EIP: 0060:[<c1620635>] EFLAGS: 00010012 CPU: 0
[    0.000000] EIP is at memmap_init_zone+0x66/0xe9
[    0.000000] EAX: 00070406 EBX: 000c8000 ECX: 00000000 EDX: 00238000
[    0.000000] ESI: 000c8000 EDI: f2c00800 EBP: 00000000 ESP: c1543eec
[    0.000000]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.000000] Process swapper (pid: 0, ti=c1542000 task=c158ff20 task.ti=c1542000)
[    0.000000] Stack:
[    0.000000]  10000000 00000002 00238000 00000000 f2c00000 00000002 f2c00a00 f2c00a80
[    0.000000]  c1607e54 000375fe 00000000 f2c00b80 00000004 00000001 00200a02 f2c00800
[    0.000000]  00000000 000375fe 00000002 00000000 f2c00000 00000008 c14b2f24 c160827f
[    0.000000] Call Trace:
[    0.000000]  [<c1607e54>] ? free_area_init_node+0x327/0x351
[    0.000000]  [<c160827f>] ? free_area_init_nodes+0x3f2/0x451
[    0.000000]  [<c10205f8>] ? memory_present+0x5c/0x61
[    0.000000]  [<c1601d7b>] ? paging_init+0x10d/0x113
[    0.000000]  [<c15f578d>] ? setup_arch+0x791/0x82f
[    0.000000]  [<c15f43d9>] ? start_kernel+0x6a/0x257
[    0.000000] Code: 1d c1 e7 0a 89 44 24 0c 03 3c ad 08 1e 5f c1 e9 88 00 00 00 89 f0 89 f3 c1 e8 0e 89 e9 0f be 80 28 1e 5f c1 8b 04 85 08 1e 5f c1 <2b> 98 b0 12 00 00 c1 e3 05 03 98 ac 12 00 00 8b 03 25 ff ff ff
[    0.000000] EIP: [<c1620635>] memmap_init_zone+0x66/0xe9 SS:ESP 0068:c1543eec
[    0.000000] CR2: 00000000000716b6
[    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Pid: 0, comm: swapper Tainted: G      D     2.6.39-rc5-00181-g2706a0b-dirty #26
[    0.000000] Call Trace:
[    0.000000]  [<c137416c>] ? panic+0x4d/0x132
[    0.000000]  [<c102d17c>] ? do_exit+0x88/0x5f0
[    0.000000]  [<c102b854>] ? kmsg_dump+0x34/0xb4
[    0.000000]  [<c100490f>] ? oops_end+0x8f/0x93
[    0.000000]  [<c101beb5>] ? no_context+0x13d/0x147
[    0.000000]  [<c101c14b>] ? vmalloc_sync_all+0x1/0x1
[    0.000000]  [<c101bffc>] ? bad_area_nosemaphore+0xa/0xc
[    0.000000]  [<c101c297>] ? do_page_fault+0x14c/0x329
[    0.000000]  [<c16099f9>] ? __alloc_memory_core_early+0x88/0x95
[    0.000000]  [<c101c14b>] ? vmalloc_sync_all+0x1/0x1
[    0.000000]  [<c1376b2a>] ? error_code+0x5a/0x60
[    0.000000]  [<c101c14b>] ? vmalloc_sync_all+0x1/0x1
[    0.000000]  [<c1620635>] ? memmap_init_zone+0x66/0xe9
[    0.000000]  [<c1607e54>] ? free_area_init_node+0x327/0x351
[    0.000000]  [<c160827f>] ? free_area_init_nodes+0x3f2/0x451
[    0.000000]  [<c10205f8>] ? memory_present+0x5c/0x61
[    0.000000]  [<c1601d7b>] ? paging_init+0x10d/0x113
[    0.000000]  [<c15f578d>] ? setup_arch+0x791/0x82f
[    0.000000]  [<c15f43d9>] ? start_kernel+0x6a/0x257

[-- Attachment #3: bootlog_sparsemem --]
[-- Type: text/plain, Size: 6759 bytes --]

kernel /boot/vmlinuz.panic root=/dev/sda1 console=ttyS0,115200 console=tty0 ign
ore_loglevel earlyprintk=ttyS0,115200 debug
   [Linux-bzImage, setup=0x3200, size=0x34ce70]

early console in setup code
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] Linux version 2.6.39-rc5-00181-g2706a0b-dirty (root@worms) (gcc version 4.5.2 (Gentoo 4.5.2 p1.1, pie-0.4.5) ) #27 SMP Thu Jun 30 15:00:39 CEST 2011
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 0000000000087800 (usable)
[    0.000000]  BIOS-e820: 0000000000087800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000c7e70000 (usable)
[    0.000000]  BIOS-e820: 00000000c7e70000 - 00000000c7e8c000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000c7e8c000 - 00000000c7e8e000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000c7e8e000 - 00000000c8000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000001838000000 (usable)
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI present.
[    0.000000] DMI: AMD DRACHMA/DRACHMA, BIOS PDPAX1-6 12/15/2009
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] last_pfn = 0x1000000 max_arch_pfn = 0x1000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-D7FFF write-protect
[    0.000000]   D8000-DFFFF uncachable
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 base 0000C0000000 mask FFFFF8000000 write-back
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 0000001838000000 aka 99200M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000c8000000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] found SMP MP-table at [c00f7240] f7240
[    0.000000] initial memory mapped : 0 - 01c00000
[    0.000000] Base memory trampoline at [c0083000] 83000 size 16384
[    0.000000] init_memory_mapping: 0000000000000000-00000000375fe000
[    0.000000]  0000000000 - 0000200000 page 4k
[    0.000000]  0000200000 - 0037400000 page 2M
[    0.000000]  0037400000 - 00375fe000 page 4k
[    0.000000] kernel direct mapping tables up to 375fe000 @ 1bf8000-1c00000
[    0.000000] ACPI: RSDP 000f71c0 00024 (v02 PTLTD )
[    0.000000] ACPI: XSDT c7e78aee 00074 (v01 PTLTD  ? XSDT   06040000  LTP 00000000)
[    0.000000] ACPI: FACP c7e83b26 000F4 (v03 AMD    CHIPOTLE 06040000 AMD  000F4240)
[    0.000000] ACPI: DSDT c7e78b62 0AFC4 (v02    AMD    SB700 06040000 MSFT 03000000)
[    0.000000] ACPI: FACS c7e8dfc0 00040
[    0.000000] ACPI: TCPA c7e83c8e 00032 (v02 AMD             06040000 PTEC 00000000)
[    0.000000] ACPI: SLIT c7e83cc0 0006C (v01 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: SRAT c7e83d2c 004C0 (v02 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: IVRS c7e841ec 000E8 (v01  AMD     RD890S 06040000 AMD  00000000)
[    0.000000] ACPI: SSDT c7e842d4 078B4 (v01 AMD    POWERNOW 06040000 AMD  00000001)
[    0.000000] ACPI: SSDT c7e8bb88 0010A (v01 AMD-K8 AMD-ACPI 06040000  AMD 00000001)
[    0.000000] ACPI: APIC c7e8bc92 002FA (v01 PTLTD  ? APIC   06040000  LTP 00000000)
[    0.000000] ACPI: MCFG c7e8bf8c 0003C (v01 PTLTD    MCFG   06040000  LTP 00000000)
[    0.000000] ACPI: HPET c7e8bfc8 00038 (v01 PTLTD  HPETTBL  06040000  LTP 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 8
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
[    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
[    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
[    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
[    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
[    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
[    0.000000] Node 6 bogus settings 1238000000-1000000000.
[    0.000000] Node 7 bogus settings 1438000000-1000000000.
[    0.000000] BSP APIC ID: 10
[    0.000000] Node alignment 128MB < min 512MB, rejecting NUMA config
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000001000000000
[    0.000000] node 0 pfn: [0 - 1000000]
[    0.000000] remap_alloc: node 0 [fffe00000-1000000000) -> [f7200000-f7400000)
[    0.000000] Initmem setup node 0 0000000000000000-0000001000000000
[    0.000000]   NODE_DATA [0000000037200000 - 0000000037201fff] (remapped)
[    0.000000] 64650MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000] max_low_pfn = 375fe, highstart_pfn = 375fe
[    0.000000] Low memory ends at vaddr f75fe000
[    0.000000] High memory starts at vaddr f75fe000
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   Normal   0x00001000 -> 0x000375fe
[    0.000000]   HighMem  0x000375fe -> 0x01000000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[3] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x00000087
[    0.000000]     0: 0x00000100 -> 0x000c7e70
[    0.000000]     0: 0x00100000 -> 0x01000000
[    0.000000] On node 0 totalpages: 16547303
[    0.000000]   DMA zone: 32 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3927 pages, LIFO batch:0
[    0.000000]   Normal zone: 1740 pages used for memmap
[    0.000000]   Normal zone: 220978 pages, LIFO batch:31
[    0.000000]   HighMem zone: 129301 pages used for memmap
[    0.000000]   HighMem zone: 16191325 pages, LIFO batch:31
[    0.000000] Using APIC driver default

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-30 13:13                 ` Hans Rosenfeld
@ 2011-06-30 15:55                   ` Tejun Heo
  2011-06-30 16:32                     ` Hans Rosenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-06-30 15:55 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Seidel, Conny, x86, linux-kernel

Hello,

On Thu, Jun 30, 2011 at 03:13:38PM +0200, Hans Rosenfeld wrote:
> On Wed, Jun 29, 2011 at 12:15:17PM -0400, Tejun Heo wrote:
> > Hans, can you please apply the following patch and post the boot log
> > from both SPARSEMEM and DISCONTIGMEM kernels?  On SPARSEMEM, it should
> > reject NUMA config and boot w/ flatmem.
> 
> Bootlogs are attached. Now DISCONTIGMEM panics.
...
> [    0.000000] Linux version 2.6.39-rc5-00181-g2706a0b-dirty (root@worms) (gcc version 4.5.2 (Gentoo 4.5.2 p1.1, pie-0.4.5) ) #26 SMP Thu Jun 30 14:39:05 CEST 2011

Hmmm... it looks like the kernel is crashing from the other bug in
this thread.  Can you please apply both patches on top of 3.0-rc5 and
re-test?

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-30 15:55                   ` Tejun Heo
@ 2011-06-30 16:32                     ` Hans Rosenfeld
  2011-06-30 16:42                       ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Hans Rosenfeld @ 2011-06-30 16:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Seidel, Conny, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

On Thu, Jun 30, 2011 at 11:55:57AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jun 30, 2011 at 03:13:38PM +0200, Hans Rosenfeld wrote:
> > On Wed, Jun 29, 2011 at 12:15:17PM -0400, Tejun Heo wrote:
> > > Hans, can you please apply the following patch and post the boot log
> > > from both SPARSEMEM and DISCONTIGMEM kernels?  On SPARSEMEM, it should
> > > reject NUMA config and boot w/ flatmem.
> > 
> > Bootlogs are attached. Now DISCONTIGMEM panics.
> ...
> > [    0.000000] Linux version 2.6.39-rc5-00181-g2706a0b-dirty (root@worms) (gcc version 4.5.2 (Gentoo 4.5.2 p1.1, pie-0.4.5) ) #26 SMP Thu Jun 30 14:39:05 CEST 2011
> 
> Hmmm... it looks like the kernel is crashing from the other bug in
> this thread.  Can you please apply both patches on top of 3.0-rc5 and
> re-test?

Oh, thats why it looked so familiar :)

I wasn't able to reproduce this panic on this machine earlier without
DISCONTIGMEM.

It works now with both patches, bootlog is attached.


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

[-- Attachment #2: bootlog_discontigmem --]
[-- Type: text/plain, Size: 17790 bytes --]

kernel /boot/vmlinuz.panic root=/dev/sda1 console=ttyS0,115200 console=tty0 ign
ore_loglevel earlyprintk=ttyS0,115200 debug
   [Linux-bzImage, setup=0x3200, size=0x34ced0]

early console in setup code
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] Linux version 2.6.39-rc5-00181-g2706a0b-dirty (root@worms) (gcc version 4.5.2 (Gentoo 4.5.2 p1.1, pie-0.4.5) ) #37 SM
P Thu Jun 30 18:18:48 CEST 2011
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 0000000000087800 (usable)
[    0.000000]  BIOS-e820: 0000000000087800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000c7e70000 (usable)
[    0.000000]  BIOS-e820: 00000000c7e70000 - 00000000c7e8c000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000c7e8c000 - 00000000c7e8e000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000c7e8e000 - 00000000c8000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000001838000000 (usable)
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI present.
[    0.000000] DMI: AMD DRACHMA/DRACHMA, BIOS PDPAX1-6 12/15/2009
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] last_pfn = 0x1000000 max_arch_pfn = 0x1000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-D7FFF write-protect
[    0.000000]   D8000-DFFFF uncachable
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 base 0000C0000000 mask FFFFF8000000 write-back
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 0000001838000000 aka 99200M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000c8000000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] found SMP MP-table at [c00f7240] f7240
[    0.000000] initial memory mapped : 0 - 01c00000
[    0.000000] Base memory trampoline at [c0083000] 83000 size 16384
[    0.000000] init_memory_mapping: 0000000000000000-00000000375fe000
[    0.000000]  0000000000 - 0000200000 page 4k
[    0.000000]  0000200000 - 0037400000 page 2M
[    0.000000]  0037400000 - 00375fe000 page 4k
[    0.000000] kernel direct mapping tables up to 375fe000 @ 1bf8000-1c00000
[    0.000000] ACPI: RSDP 000f71c0 00024 (v02 PTLTD )
[    0.000000] ACPI: XSDT c7e78aee 00074 (v01 PTLTD  ? XSDT   06040000  LTP 00000000)
[    0.000000] ACPI: FACP c7e83b26 000F4 (v03 AMD    CHIPOTLE 06040000 AMD  000F4240)
[    0.000000] ACPI: DSDT c7e78b62 0AFC4 (v02    AMD    SB700 06040000 MSFT 03000000)
[    0.000000] ACPI: FACS c7e8dfc0 00040
[    0.000000] ACPI: TCPA c7e83c8e 00032 (v02 AMD             06040000 PTEC 00000000)
[    0.000000] ACPI: SLIT c7e83cc0 0006C (v01 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: SRAT c7e83d2c 004C0 (v02 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: IVRS c7e841ec 000E8 (v01  AMD     RD890S 06040000 AMD  00000000)
[    0.000000] ACPI: SSDT c7e842d4 078B4 (v01 AMD    POWERNOW 06040000 AMD  00000001)
[    0.000000] ACPI: SSDT c7e8bb88 0010A (v01 AMD-K8 AMD-ACPI 06040000  AMD 00000001)
[    0.000000] ACPI: APIC c7e8bc92 002FA (v01 PTLTD  ? APIC   06040000  LTP 00000000)
[    0.000000] ACPI: MCFG c7e8bf8c 0003C (v01 PTLTD    MCFG   06040000  LTP 00000000)
[    0.000000] ACPI: HPET c7e8bfc8 00038 (v01 PTLTD  HPETTBL  06040000  LTP 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 8
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
[    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
[    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
[    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
[    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
[    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
[    0.000000] Node 6 bogus settings 1238000000-1000000000.
[    0.000000] Node 7 bogus settings 1438000000-1000000000.
[    0.000000] BSP APIC ID: 10
[    0.000000] node 0 pfn: [0 - 238000]
[    0.000000] remap_alloc: node 0 [233800000-238000000) -> [f2c00000-f7400000)
[    0.000000] Initmem setup node 0 0000000000000000-0000000238000000
[    0.000000]   NODE_DATA [0000000032c00000 - 0000000032c01fff] (remapped)
[    0.000000] node 1 pfn: [238000 - 638000]
[    0.000000] remap_alloc: node 1 [62fe00000-638000000) -> [eaa00000-f2c00000)
[    0.000000] Initmem setup node 1 0000000238000000-0000000638000000
[    0.000000]   NODE_DATA [000000002aa00000 - 000000002aa01fff] (remapped)
[    0.000000] node 2 pfn: [638000 - 838000]
[    0.000000] remap_alloc: node 2 [833e00000-838000000) -> [e6800000-eaa00000)
[    0.000000] Initmem setup node 2 0000000638000000-0000000838000000
[    0.000000]   NODE_DATA [0000000026800000 - 0000000026801fff] (remapped)
[    0.000000] node 3 pfn: [838000 - c38000]
[    0.000000] remap_alloc: node 3 [c2fe00000-c38000000) -> [de600000-e6800000)
[    0.000000] Initmem setup node 3 0000000838000000-0000000c38000000
[    0.000000]   NODE_DATA [000000001e600000 - 000000001e601fff] (remapped)
[    0.000000] node 4 pfn: [c38000 - e38000]
[    0.000000] remap_alloc: node 4 [e33e00000-e38000000) -> [da400000-de600000)
[    0.000000] Initmem setup node 4 0000000c38000000-0000000e38000000
[    0.000000]   NODE_DATA [000000001a400000 - 000000001a401fff] (remapped)
[    0.000000] node 5 pfn: [e38000 - 1000000]
[    0.000000] remap_alloc: node 5 [ffc600000-1000000000) -> [d6a00000-da400000)
[    0.000000] Initmem setup node 5 0000000e38000000-0000001000000000
[    0.000000]   NODE_DATA [0000000016a00000 - 0000000016a01fff] (remapped)
[    0.000000] 64650MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000] max_low_pfn = 375fe, highstart_pfn = 375fe
[    0.000000] Low memory ends at vaddr f75fe000
[    0.000000] High memory starts at vaddr f75fe000
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] Node: 0, start_pfn: 10, end_pfn: 87
[    0.000000]   Setting physnode_map array to node 0 for pfns:
[    0.000000]   10
[    0.000000] Node: 0, start_pfn: 100, end_pfn: c7e70
[    0.000000]   Setting physnode_map array to node 0 for pfns:
[    0.000000]   100 4100 8100 c100 10100 14100 18100 1c100 20100 24100 28100 2c100 30100 34100 38100 3c100 40100 44100 48100 4c100 50100 54100 58100 5c100 60100 64100 68100 6c100 70100 74100 78100 7c100 80100 84100 88100 8c100 90100 94100 98100 9c100 a0100 a4100 a8100 ac100 b0100 b4100 b8100 bc100 c0100 c4100
[    0.000000] Node: 0, start_pfn: 100000, end_pfn: 238000
[    0.000000]   Setting physnode_map array to node 0 for pfns:
[    0.000000]   100000 104000 108000 10c000 110000 114000 118000 11c000 120000 124000 128000 12c000 130000 134000 138000 13c000 140000 144000 148000 14c000 150000 154000 158000 15c000 160000 164000 168000 16c000 170000 174000 178000 17c000 180000 184000 188000 18c000 190000 194000 198000 19c000 1a0000 1a4000 1a8000 1ac000 1b0000 1b4000 1b8000 1bc000 1c0000 1c4000 1c8000 1cc000 1d0000 1d4000 1d8000 1dc000 1e0000 1e4000 1e8000 1ec000 1f0000 1f4000 1f8000 1fc000 200000 204000 208000 20c000 210000 214000 218000 21c000 220000 224000 228000 22c000 230000 234000
[    0.000000] Node: 1, start_pfn: 238000, end_pfn: 638000
[    0.000000]   Setting physnode_map array to node 1 for pfns:
[    0.000000]   238000 23c000 240000 244000 248000 24c000 250000 254000 258000 25c000 260000 264000 268000 26c000 270000 274000 278000 27c000 280000 284000 288000 28c000 290000 294000 298000 29c000 2a0000 2a4000 2a8000 2ac000 2b0000 2b4000 2b8000 2bc000 2c0000 2c4000 2c8000 2cc000 2d0000 2d4000 2d8000 2dc000 2e0000 2e4000 2e8000 2ec000 2f0000 2f4000 2f8000 2fc000 300000 304000 308000 30c000 310000 314000 318000 31c000 320000 324000 328000 32c000 330000 334000 338000 33c000 340000 344000 348000 34c000 350000 354000 358000 35c000 360000 364000 368000 36c000 370000 374000 378000 37c000 380000 384000 388000 38c000 390000 394000 398000 39c000 3a0000 3a4000 3a8000 3ac000 3b0000 3b4000 3b8000 3bc000 3c0000 3c4000 3c8000 3cc000 3d0000 3d4000 3d8000 3dc000 3e0000 3e4000 3e8000 3ec000 3f0000 3f4000 3f8000 3fc000 400000 404000 408000 40c000 410000 414000 418000 41c000 420000 424000 428000 42c000 430000 434000 438000 43c000 440000 444000 448000 44c000 450000 454000 458000 45c000 460000 464000 468000 46c000 470000 474000 478000 47c000 480000 484000 488000 48c000 490000 494000 498000 49c000 4a0000 4a4000 4a8000 4ac000 4b0000 4b4000 4b8000 4bc000 4c0000 4c4000 4c8000 4cc000 4d0000 4d4000 4d8000 4dc000 4e0000 4e4000 4e8000 4ec000 4f0000 4f4000 4f8000 4fc000 500000 504000 508000 50c000 510000 514000 518000 51c000 520000 524000 528000 52c000 530000 534000 538000 53c000 540000 544000 548000 54c000 550000 554000 558000 55c000 560000 564000 568000 56c000 570000 574000 578000 57c000 580000 584000 588000 58c000 590000 594000 598000 59c000 5a0000 5a4000 5a8000 5ac000 5b0000 5b4000 5b8000 5bc000 5c0000 5c4000 5c8000 5cc000 5d0000 5d4000 5d8000 5dc000 5e0000 5e4000 5e8000 5ec000 5f0000 5f4000 5f8000 5fc000 600000 604000 608000 60c000 610000 614000 618000 61c000 620000 624000 628000 62c000 630000 634000
[    0.000000] Node: 2, start_pfn: 638000, end_pfn: 838000
[    0.000000]   Setting physnode_map array to node 2 for pfns:
[    0.000000]   638000 63c000 640000 644000 648000 64c000 650000 654000 658000 65c000 660000 664000 668000 66c000 670000 674000 678000 67c000 680000 684000 688000 68c000 690000 694000 698000 69c000 6a0000 6a4000 6a8000 6ac000 6b0000 6b4000 6b8000 6bc000 6c0000 6c4000 6c8000 6cc000 6d0000 6d4000 6d8000 6dc000 6e0000 6e4000 6e8000 6ec000 6f0000 6f4000 6f8000 6fc000 700000 704000 708000 70c000 710000 714000 718000 71c000 720000 724000 728000 72c000 730000 734000 738000 73c000 740000 744000 748000 74c000 750000 754000 758000 75c000 760000 764000 768000 76c000 770000 774000 778000 77c000 780000 784000 788000 78c000 790000 794000 798000 79c000 7a0000 7a4000 7a8000 7ac000 7b0000 7b4000 7b8000 7bc000 7c0000 7c4000 7c8000 7cc000 7d0000 7d4000 7d8000 7dc000 7e0000 7e4000 7e8000 7ec000 7f0000 7f4000 7f8000 7fc000 800000 804000 808000 80c000 810000 814000 818000 81c000 820000 824000 828000 82c000 830000 834000
[    0.000000] Node: 3, start_pfn: 838000, end_pfn: c38000
[    0.000000]   Setting physnode_map array to node 3 for pfns:
[    0.000000]   838000 83c000 840000 844000 848000 84c000 850000 854000 858000 85c000 860000 864000 868000 86c000 870000 874000 878000 87c000 880000 884000 888000 88c000 890000 894000 898000 89c000 8a0000 8a4000 8a8000 8ac000 8b0000 8b4000 8b8000 8bc000 8c0000 8c4000 8c8000 8cc000 8d0000 8d4000 8d8000 8dc000 8e0000 8e4000 8e8000 8ec000 8f0000 8f4000 8f8000 8fc000 900000 904000 908000 90c000 910000 914000 918000 91c000 920000 924000 928000 92c000 930000 934000 938000 93c000 940000 944000 948000 94c000 950000 954000 958000 95c000 960000 964000 968000 96c000 970000 974000 978000 97c000 980000 984000 988000 98c000 990000 994000 998000 99c000 9a0000 9a4000 9a8000 9ac000 9b0000 9b4000 9b8000 9bc000 9c0000 9c4000 9c8000 9cc000 9d0000 9d4000 9d8000 9dc000 9e0000 9e4000 9e8000 9ec000 9f0000 9f4000 9f8000 9fc000 a00000 a04000 a08000 a0c000 a10000 a14000 a18000 a1c000 a20000 a24000 a28000 a2c000 a30000 a34000 a38000 a3c000 a40000 a44000 a48000 a4c000 a50000 a54000 a58000 a5c000 a60000 a64000 a68000 a6c000 a70000 a74000 a78000 a7c000 a80000 a84000 a88000 a8c000 a90000 a94000 a98000 a9c000 aa0000 aa4000 aa8000 aac000 ab0000 ab4000 ab8000 abc000 ac0000 ac4000 ac8000 acc000 ad0000 ad4000 ad8000 adc000 ae0000 ae4000 ae8000 aec000 af0000 af4000 af8000 afc000 b00000 b04000 b08000 b0c000 b10000 b14000 b18000 b1c000 b20000 b24000 b28000 b2c000 b30000 b34000 b38000 b3c000 b40000 b44000 b48000 b4c000 b50000 b54000 b58000 b5c000 b60000 b64000 b68000 b6c000 b70000 b74000 b78000 b7c000 b80000 b84000 b88000 b8c000 b90000 b94000 b98000 b9c000 ba0000 ba4000 ba8000 bac000 bb0000 bb4000 bb8000 bbc000 bc0000 bc4000 bc8000 bcc000 bd0000 bd4000 bd8000 bdc000 be0000 be4000 be8000 bec000 bf0000 bf4000 bf8000 bfc000 c00000 c04000 c08000 c0c000 c10000 c14000 c18000 c1c000 c20000 c24000 c28000 c2c000 c30000 c34000
[    0.000000] Node: 4, start_pfn: c38000, end_pfn: e38000
[    0.000000]   Setting physnode_map array to node 4 for pfns:
[    0.000000]   c38000 c3c000 c40000 c44000 c48000 c4c000 c50000 c54000 c58000 c5c000 c60000 c64000 c68000 c6c000 c70000 c74000 c78000 c7c000 c80000 c84000 c88000 c8c000 c90000 c94000 c98000 c9c000 ca0000 ca4000 ca8000 cac000 cb0000 cb4000 cb8000 cbc000 cc0000 cc4000 cc8000 ccc000 cd0000 cd4000 cd8000 cdc000 ce0000 ce4000 ce8000 cec000 cf0000 cf4000 cf8000 cfc000 d00000 d04000 d08000 d0c000 d10000 d14000 d18000 d1c000 d20000 d24000 d28000 d2c000 d30000 d34000 d38000 d3c000 d40000 d44000 d48000 d4c000 d50000 d54000 d58000 d5c000 d60000 d64000 d68000 d6c000 d70000 d74000 d78000 d7c000 d80000 d84000 d88000 d8c000 d90000 d94000 d98000 d9c000 da0000 da4000 da8000 dac000 db0000 db4000 db8000 dbc000 dc0000 dc4000 dc8000 dcc000 dd0000 dd4000 dd8000 ddc000 de0000 de4000 de8000 dec000 df0000 df4000 df8000 dfc000 e00000 e04000 e08000 e0c000 e10000 e14000 e18000 e1c000 e20000 e24000 e28000 e2c000 e30000 e34000
[    0.000000] Node: 5, start_pfn: e38000, end_pfn: 1000000
[    0.000000]   Setting physnode_map array to node 5 for pfns:
[    0.000000]   e38000 e3c000 e40000 e44000 e48000 e4c000 e50000 e54000 e58000 e5c000 e60000 e64000 e68000 e6c000 e70000 e74000 e78000 e7c000 e80000 e84000 e88000 e8c000 e90000 e94000 e98000 e9c000 ea0000 ea4000 ea8000 eac000 eb0000 eb4000 eb8000 ebc000 ec0000 ec4000 ec8000 ecc000 ed0000 ed4000 ed8000 edc000 ee0000 ee4000 ee8000 eec000 ef0000 ef4000 ef8000 efc000 f00000 f04000 f08000 f0c000 f10000 f14000 f18000 f1c000 f20000 f24000 f28000 f2c000 f30000 f34000 f38000 f3c000 f40000 f44000 f48000 f4c000 f50000 f54000 f58000 f5c000 f60000 f64000 f68000 f6c000 f70000 f74000 f78000 f7c000 f80000 f84000 f88000 f8c000 f90000 f94000 f98000 f9c000 fa0000 fa4000 fa8000 fac000 fb0000 fb4000 fb8000 fbc000 fc0000 fc4000 fc8000 fcc000 fd0000 fd4000 fd8000 fdc000 fe0000 fe4000 fe8000 fec000 ff0000 ff4000 ff8000 ffc000
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   Normal   0x00001000 -> 0x000375fe
[    0.000000]   HighMem  0x000375fe -> 0x01000000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x00000087
[    0.000000]     0: 0x00000100 -> 0x000c7e70
[    0.000000]     0: 0x00100000 -> 0x00238000
[    0.000000]     1: 0x00238000 -> 0x00638000
[    0.000000]     2: 0x00638000 -> 0x00838000
[    0.000000]     3: 0x00838000 -> 0x00c38000
[    0.000000]     4: 0x00c38000 -> 0x00e38000
[    0.000000]     5: 0x00e38000 -> 0x01000000
[    0.000000] On node 0 totalpages: 2096615
[    0.000000] free_area_init_node: node 0, pgdat f2c00000, node_mem_map f2c02200
[    0.000000]   DMA zone: 32 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3927 pages, LIFO batch:0
[    0.000000]   Normal zone: 1740 pages used for memmap
[    0.000000]   Normal zone: 220978 pages, LIFO batch:31
[    0.000000]   HighMem zone: 16405 pages used for memmap
[    0.000000]   HighMem zone: 1853533 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 4194304
[    0.000000] free_area_init_node: node 1, pgdat eaa00000, node_mem_map eaa02000
[    0.000000]   HighMem zone: 32768 pages used for memmap
[    0.000000]   HighMem zone: 4161536 pages, LIFO batch:31
[    0.000000] On node 2 totalpages: 2097152
[    0.000000] free_area_init_node: node 2, pgdat e6800000, node_mem_map e6802000
[    0.000000]   HighMem zone: 16384 pages used for memmap
[    0.000000]   HighMem zone: 2080768 pages, LIFO batch:31
[    0.000000] On node 3 totalpages: 4194304
[    0.000000] free_area_init_node: node 3, pgdat de600000, node_mem_map de602000
[    0.000000]   HighMem zone: 32768 pages used for memmap
[    0.000000]   HighMem zone: 4161536 pages, LIFO batch:31
[    0.000000] On node 4 totalpages: 2097152
[    0.000000] free_area_init_node: node 4, pgdat da400000, node_mem_map da402000
[    0.000000]   HighMem zone: 16384 pages used for memmap
[    0.000000]   HighMem zone: 2080768 pages, LIFO batch:31
[    0.000000] On node 5 totalpages: 1867776
[    0.000000] free_area_init_node: node 5, pgdat d6a00000, node_mem_map d6a02000
[    0.000000]   HighMem zone: 14592 pages used for memmap
[    0.000000]   HighMem zone: 1853184 pages, LIFO batch:31
[    0.000000] Using APIC driver default

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-30 16:32                     ` Hans Rosenfeld
@ 2011-06-30 16:42                       ` Tejun Heo
  2011-06-30 17:04                         ` Hans Rosenfeld
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-06-30 16:42 UTC (permalink / raw)
  To: Hans Rosenfeld; +Cc: Seidel, Conny, x86, linux-kernel

Hello,

On Thu, Jun 30, 2011 at 06:32:28PM +0200, Hans Rosenfeld wrote:
> On Thu, Jun 30, 2011 at 11:55:57AM -0400, Tejun Heo wrote:
> > Hmmm... it looks like the kernel is crashing from the other bug in
> > this thread.  Can you please apply both patches on top of 3.0-rc5 and
> > re-test?
> 
> Oh, thats why it looked so familiar :)
> 
> I wasn't able to reproduce this panic on this machine earlier without
> DISCONTIGMEM.
> 
> It works now with both patches, bootlog is attached.

Can you please attach boot log w/ SPARSEMEM?  Let's see whether NUMA
config is being rejected correctly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 32bit NUMA and fakeNUMA broken for AMD CPUs
  2011-06-30 16:42                       ` Tejun Heo
@ 2011-06-30 17:04                         ` Hans Rosenfeld
  0 siblings, 0 replies; 28+ messages in thread
From: Hans Rosenfeld @ 2011-06-30 17:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Seidel, Conny, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 323 bytes --]

On Thu, Jun 30, 2011 at 12:42:16PM -0400, Tejun Heo wrote:
> Can you please attach boot log w/ SPARSEMEM?  Let's see whether NUMA
> config is being rejected correctly.

I already sent it in the earlier mail, but here it is again. NUMA is
rejected.


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

[-- Attachment #2: bootlog_sparsemem --]
[-- Type: text/plain, Size: 6759 bytes --]

kernel /boot/vmlinuz.panic root=/dev/sda1 console=ttyS0,115200 console=tty0 ign
ore_loglevel earlyprintk=ttyS0,115200 debug
   [Linux-bzImage, setup=0x3200, size=0x34ce70]

early console in setup code
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] Linux version 2.6.39-rc5-00181-g2706a0b-dirty (root@worms) (gcc version 4.5.2 (Gentoo 4.5.2 p1.1, pie-0.4.5) ) #27 SMP Thu Jun 30 15:00:39 CEST 2011
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 0000000000087800 (usable)
[    0.000000]  BIOS-e820: 0000000000087800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000c7e70000 (usable)
[    0.000000]  BIOS-e820: 00000000c7e70000 - 00000000c7e8c000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000c7e8c000 - 00000000c7e8e000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000c7e8e000 - 00000000c8000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000001838000000 (usable)
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI present.
[    0.000000] DMI: AMD DRACHMA/DRACHMA, BIOS PDPAX1-6 12/15/2009
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] last_pfn = 0x1000000 max_arch_pfn = 0x1000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-D7FFF write-protect
[    0.000000]   D8000-DFFFF uncachable
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 base 0000C0000000 mask FFFFF8000000 write-back
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 0000001838000000 aka 99200M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000c8000000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] found SMP MP-table at [c00f7240] f7240
[    0.000000] initial memory mapped : 0 - 01c00000
[    0.000000] Base memory trampoline at [c0083000] 83000 size 16384
[    0.000000] init_memory_mapping: 0000000000000000-00000000375fe000
[    0.000000]  0000000000 - 0000200000 page 4k
[    0.000000]  0000200000 - 0037400000 page 2M
[    0.000000]  0037400000 - 00375fe000 page 4k
[    0.000000] kernel direct mapping tables up to 375fe000 @ 1bf8000-1c00000
[    0.000000] ACPI: RSDP 000f71c0 00024 (v02 PTLTD )
[    0.000000] ACPI: XSDT c7e78aee 00074 (v01 PTLTD  ? XSDT   06040000  LTP 00000000)
[    0.000000] ACPI: FACP c7e83b26 000F4 (v03 AMD    CHIPOTLE 06040000 AMD  000F4240)
[    0.000000] ACPI: DSDT c7e78b62 0AFC4 (v02    AMD    SB700 06040000 MSFT 03000000)
[    0.000000] ACPI: FACS c7e8dfc0 00040
[    0.000000] ACPI: TCPA c7e83c8e 00032 (v02 AMD             06040000 PTEC 00000000)
[    0.000000] ACPI: SLIT c7e83cc0 0006C (v01 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: SRAT c7e83d2c 004C0 (v02 AMD    F10      06040000 AMD  00000001)
[    0.000000] ACPI: IVRS c7e841ec 000E8 (v01  AMD     RD890S 06040000 AMD  00000000)
[    0.000000] ACPI: SSDT c7e842d4 078B4 (v01 AMD    POWERNOW 06040000 AMD  00000001)
[    0.000000] ACPI: SSDT c7e8bb88 0010A (v01 AMD-K8 AMD-ACPI 06040000  AMD 00000001)
[    0.000000] ACPI: APIC c7e8bc92 002FA (v01 PTLTD  ? APIC   06040000  LTP 00000000)
[    0.000000] ACPI: MCFG c7e8bf8c 0003C (v01 PTLTD    MCFG   06040000  LTP 00000000)
[    0.000000] ACPI: HPET c7e8bfc8 00038 (v01 PTLTD  HPETTBL  06040000  LTP 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 8
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000238000000
[    0.000000] Node 1 MemBase 0000000238000000 Limit 0000000638000000
[    0.000000] Node 2 MemBase 0000000638000000 Limit 0000000838000000
[    0.000000] Node 3 MemBase 0000000838000000 Limit 0000000c38000000
[    0.000000] Node 4 MemBase 0000000c38000000 Limit 0000000e38000000
[    0.000000] Node 5 MemBase 0000000e38000000 Limit 0000001000000000
[    0.000000] Node 6 bogus settings 1238000000-1000000000.
[    0.000000] Node 7 bogus settings 1438000000-1000000000.
[    0.000000] BSP APIC ID: 10
[    0.000000] Node alignment 128MB < min 512MB, rejecting NUMA config
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000001000000000
[    0.000000] node 0 pfn: [0 - 1000000]
[    0.000000] remap_alloc: node 0 [fffe00000-1000000000) -> [f7200000-f7400000)
[    0.000000] Initmem setup node 0 0000000000000000-0000001000000000
[    0.000000]   NODE_DATA [0000000037200000 - 0000000037201fff] (remapped)
[    0.000000] 64650MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000] max_low_pfn = 375fe, highstart_pfn = 375fe
[    0.000000] Low memory ends at vaddr f75fe000
[    0.000000] High memory starts at vaddr f75fe000
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   Normal   0x00001000 -> 0x000375fe
[    0.000000]   HighMem  0x000375fe -> 0x01000000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[3] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x00000087
[    0.000000]     0: 0x00000100 -> 0x000c7e70
[    0.000000]     0: 0x00100000 -> 0x01000000
[    0.000000] On node 0 totalpages: 16547303
[    0.000000]   DMA zone: 32 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3927 pages, LIFO batch:0
[    0.000000]   Normal zone: 1740 pages used for memmap
[    0.000000]   Normal zone: 220978 pages, LIFO batch:31
[    0.000000]   HighMem zone: 129301 pages used for memmap
[    0.000000]   HighMem zone: 16191325 pages, LIFO batch:31
[    0.000000] Using APIC driver default

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines
  2011-06-28  9:41     ` [PATCH tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines Tejun Heo
  2011-06-28 12:35       ` Conny Seidel
@ 2011-07-01 15:26       ` tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Tejun Heo @ 2011-07-01 15:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, conny.seidel, tj, cl, tglx, mingo

Commit-ID:  a26474e8649643e82d71e3a386d5c4bcc0b207ef
Gitweb:     http://git.kernel.org/tip/a26474e8649643e82d71e3a386d5c4bcc0b207ef
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Tue, 28 Jun 2011 11:41:07 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 1 Jul 2011 13:38:51 +0200

x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines

During 32/64 NUMA init unification, commit 797390d855 ("x86-32,
NUMA: use sparse_memory_present_with_active_regions()") made
32bit mm init call memory_present() automatically from
active_regions instead of leaving it to each NUMA init path.

This commit description is inaccurate - memory_present() calls
aren't the same for flat and numaq.  After the commit,
memory_present() is only called for the intersection of e820 and
NUMA layout.  Before, on flatmem, memory_present() would be
called from 0 to max_pfn.  After, it would be called only on the
areas that e820 indicates to be populated.

This is how x86_64 works and should be okay as memmap is allowed
to contain holes; however, x86_32 DISCONTIGMEM is missing
early_pfn_valid(), which makes memmap_init_zone() assume that
memmap doesn't contain any hole.  This leads to the following
oops if e820 map contains holes as it often does on machine with
near or more 4GiB of memory by calling pfn_to_page() on a pfn
which isn't mapped to a NUMA node, a reported by Conny Seidel:

  BUG: unable to handle kernel paging request at 000012b0
  IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
  *pdpt =3D 0000000000000000 *pde =3D f000eef3f000ee00
  Oops: 0000 [#1] SMP
  last sysfs file:
  Modules linked in:

  Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
  EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
  EIP is at memmap_init_zone+0x6c/0xf2
  EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
  ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60 task.ti=3Dc19fe000)
  Stack:
   00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
   c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
   c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
  Call Trace:
   [<c1a80f24>] free_area_init_node+0x358/0x385
   [<c1a81384>] free_area_init_nodes+0x420/0x487
   [<c1a79326>] paging_init+0x114/0x11b
   [<c1a6cb13>] setup_arch+0xb37/0xc0a
   [<c1a69554>] start_kernel+0x76/0x316
   [<c1a690a8>] i386_start_kernel+0xa8/0xb0

This patch fixes the bug by defining early_pfn_valid() to be the
same as pfn_valid() when DISCONTIGMEM.

Reported-bisected-and-tested-by: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: hans.rosenfeld@amd.com
Cc: Christoph Lameter <cl@linux.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110628094107.GB3386@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/mmzone_32.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 224e8c5..ffa037f 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -57,6 +57,8 @@ static inline int pfn_valid(int pfn)
 	return 0;
 }
 
+#define early_pfn_valid(pfn)	pfn_valid((pfn))
+
 #endif /* CONFIG_DISCONTIGMEM */
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH x86/urgent 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
  2011-06-29  9:44       ` 32bit NUMA and fakeNUMA broken for AMD CPUs Tejun Heo
  2011-06-29 10:51         ` Tejun Heo
  2011-06-29 12:34         ` Tejun Heo
@ 2011-07-01 16:22         ` Tejun Heo
  2011-07-01 16:23           ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
  2 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-07-01 16:22 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Conny Seidel, x86, linux-kernel, Hans Rosenfeld

DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
---
 arch/x86/include/asm/mmzone_32.h |    6 +++---
 arch/x86/mm/numa_32.c            |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

Index: work/arch/x86/include/asm/mmzone_32.h
===================================================================
--- work.orig/arch/x86/include/asm/mmzone_32.h
+++ work/arch/x86/include/asm/mmzone_32.h
@@ -34,15 +34,15 @@ static inline void resume_map_numa_kva(p
  *    64Gb / 4096bytes/page = 16777216 pages
  */
 #define MAX_NR_PAGES 16777216
-#define MAX_ELEMENTS 1024
-#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS)
+#define MAX_SECTIONS 1024
+#define PAGES_PER_SECTION (MAX_NR_PAGES/MAX_SECTIONS)
 
 extern s8 physnode_map[];
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
 #ifdef CONFIG_NUMA
-	return((int) physnode_map[(pfn) / PAGES_PER_ELEMENT]);
+	return((int) physnode_map[(pfn) / PAGES_PER_SECTION]);
 #else
 	return 0;
 #endif
Index: work/arch/x86/mm/numa_32.c
===================================================================
--- work.orig/arch/x86/mm/numa_32.c
+++ work/arch/x86/mm/numa_32.c
@@ -41,7 +41,7 @@
  *     physnode_map[16-31] = 1;
  *     physnode_map[32- ] = -1;
  */
-s8 physnode_map[MAX_ELEMENTS] __read_mostly = { [0 ... (MAX_ELEMENTS - 1)] = -1};
+s8 physnode_map[MAX_SECTIONS] __read_mostly = { [0 ... (MAX_SECTIONS - 1)] = -1};
 EXPORT_SYMBOL(physnode_map);
 
 void memory_present(int nid, unsigned long start, unsigned long end)
@@ -52,8 +52,8 @@ void memory_present(int nid, unsigned lo
 			nid, start, end);
 	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
 	printk(KERN_DEBUG "  ");
-	for (pfn = start; pfn < end; pfn += PAGES_PER_ELEMENT) {
-		physnode_map[pfn / PAGES_PER_ELEMENT] = nid;
+	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+		physnode_map[pfn / PAGES_PER_SECTION] = nid;
 		printk(KERN_CONT "%lx ", pfn);
 	}
 	printk(KERN_CONT "\n");

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check
  2011-07-01 16:22         ` [PATCH x86/urgent 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
@ 2011-07-01 16:23           ` Tejun Heo
  2011-07-09  8:32             ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-07-01 16:23 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Conny Seidel, x86, linux-kernel, Hans Rosenfeld

Both SPARSEMEM and DISCONTIGMEM have limited granularity when mapping
pfn to nid.  If NUMA nodes are laid out such that the mapping cannot
be accurate, boot will fail triggering BUG_ON() in
mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  As
x86_64 has granularity of 128MiB, NUMA config worked fine on the
machine; however, the commit enabled AMD NUMA config on 32bit too and
the 512MiB granularity wasn't enough.

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
---
 arch/x86/mm/numa.c |   11 ++++++++++
 include/linux/mm.h |    1 
 mm/page_alloc.c    |   54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

Index: work/arch/x86/mm/numa.c
===================================================================
--- work.orig/arch/x86/mm/numa.c
+++ work/arch/x86/mm/numa.c
@@ -496,6 +496,7 @@ static bool __init numa_meminfo_cover_me
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
+	unsigned long pfn_align;
 	int i, nid;
 
 	/* Account for nodes with cpus and no memory */
@@ -511,6 +512,16 @@ static int __init numa_register_memblks(
 
 	/* for out of order entries */
 	sort_node_map();
+
+	/* check whether pfn -> nid mapping has enough granularity */
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
+		       (u64)pfn_align << PAGE_SHIFT >> 20,
+		       (u64)PAGES_PER_SECTION << PAGE_SHIFT >> 20);
+		return -EINVAL;
+	}
+
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
Index: work/include/linux/mm.h
===================================================================
--- work.orig/include/linux/mm.h
+++ work/include/linux/mm.h
@@ -1313,6 +1313,7 @@ extern void remove_active_range(unsigned
 					unsigned long end_pfn);
 extern void remove_all_active_ranges(void);
 void sort_node_map(void);
+unsigned long node_map_pfn_alignment(void);
 unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
 						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
Index: work/mm/page_alloc.c
===================================================================
--- work.orig/mm/page_alloc.c
+++ work/mm/page_alloc.c
@@ -4585,6 +4585,60 @@ void __init sort_node_map(void)
 			cmp_node_active_region, NULL);
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+unsigned long __init node_map_pfn_alignment(void)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	int last_nid = -1;
+	int i;
+
+	for_each_active_range_index_in_nid(i, MAX_NUMNODES) {
+		int nid = early_node_map[i].nid;
+		unsigned long start = early_node_map[i].start_pfn;
+		unsigned long end = early_node_map[i].end_pfn;
+		unsigned long mask;
+
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		/*
+		 * Start with a mask granular enough to pin-point to the
+		 * start pfn and tick off bits one-by-one until it becomes
+		 * too coarse to separate the current node from the last.
+		 */
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		/* accumulate all internode masks */
+		accl_mask |= mask;
+	}
+
+	/* convert mask to number of pages */
+	return ~accl_mask + 1;
+}
+
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check
  2011-07-01 16:23           ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
@ 2011-07-09  8:32             ` Tejun Heo
  2011-07-09  8:42               ` H. Peter Anvin
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2011-07-09  8:32 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Conny Seidel, x86, linux-kernel, Hans Rosenfeld

On Fri, Jul 01, 2011 at 06:23:27PM +0200, Tejun Heo wrote:
> Both SPARSEMEM and DISCONTIGMEM have limited granularity when mapping
> pfn to nid.  If NUMA nodes are laid out such that the mapping cannot
> be accurate, boot will fail triggering BUG_ON() in
> mminit_verify_page_links().
> 
> On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
> granular enough until commit 2706a0bf7b (x86, NUMA: Enable
> CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
> aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  As
> x86_64 has granularity of 128MiB, NUMA config worked fine on the
> machine; however, the commit enabled AMD NUMA config on 32bit too and
> the 512MiB granularity wasn't enough.
> 
>  On node 0 totalpages: 2096615
>    DMA zone: 32 pages used for memmap
>    DMA zone: 0 pages reserved
>    DMA zone: 3927 pages, LIFO batch:0
>    Normal zone: 1740 pages used for memmap
>    Normal zone: 220978 pages, LIFO batch:31
>    HighMem zone: 16405 pages used for memmap
>    HighMem zone: 1853533 pages, LIFO batch:31
>  BUG: Int 6: CR2   (null)
>       EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
>       EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
>       err   (null)  EIP c16209aa   CS 00000060  flg 00010002
>  Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
>           (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
>         f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
>  Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
>  Call Trace:
>   [<c136b1e5>] ? early_fault+0x2e/0x2e
>   [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
>   [<c1620613>] ? memmap_init_zone+0xaf/0x10c
>   [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
>   [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
>   [<c1601d80>] ? paging_init+0x112/0x118
>   [<c15f578d>] ? setup_arch+0x791/0x82f
>   [<c15f43d9>] ? start_kernel+0x6a/0x257
> 
> This patch implements node_map_pfn_alignment() which determines
> maximum internode alignment and update numa_register_memblks() to
> reject NUMA configuration if alignment exceeds the pfn -> nid mapping
> granularity of the memory model as determined by PAGES_PER_SECTION.
> 
> This makes the problematic machine boot w/ flatmem by rejecting the
> NUMA config and provides protection against crazy NUMA configurations.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
> Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
> Cc: Conny Seidel <conny.seidel@amd.com>

Ping?  If the change is too invasive at this stage, we can disable AMD
NUMA on x86_32 for 3.0 and queue these two for 3.1.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check
  2011-07-09  8:32             ` Tejun Heo
@ 2011-07-09  8:42               ` H. Peter Anvin
  2011-07-11  8:34                 ` [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now Tejun Heo
  2011-07-11 14:20                 ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Hans Rosenfeld
  0 siblings, 2 replies; 28+ messages in thread
From: H. Peter Anvin @ 2011-07-09  8:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Thomas Gleixner, Conny Seidel, x86, linux-kernel,
	Hans Rosenfeld

On 07/09/2011 01:32 AM, Tejun Heo wrote:
> 
> Ping?  If the change is too invasive at this stage, we can disable AMD
> NUMA on x86_32 for 3.0 and queue these two for 3.1.
> 

I am kind of leaning that way... Hans, what is your opinion?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now
  2011-07-09  8:42               ` H. Peter Anvin
@ 2011-07-11  8:34                 ` Tejun Heo
  2011-07-11 14:01                   ` Tejun Heo
  2011-07-11 18:58                   ` [tip:x86/urgent] " tip-bot for Tejun Heo
  2011-07-11 14:20                 ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Hans Rosenfeld
  1 sibling, 2 replies; 28+ messages in thread
From: Tejun Heo @ 2011-07-11  8:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Conny Seidel, x86, linux-kernel,
	Hans Rosenfeld

Commit 2706a0bf7b "x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too"
enabled AMD NUMA for 32bit too.  Unfortunately, SPARSEMEM on 32bit had
rather coarse (512MiB) addr->node mapping granularity due to lack of
space in page->flags.  This led to boot failure on certain AMD NUMA
machines which had 128MiB alignment on nodes.

Patches to properly detect this condition and reject NUMA
configuration are posted[1] but deemed too pervasive for merge at this
point (-rc6).  Disable AMD NUMA for 32bit for now and re-enable once
the detection logic is merged.

[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
---
 arch/x86/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index da34972..37357a5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1170,7 +1170,7 @@ comment "NUMA (Summit) requires SMP, 64GB highmem support, ACPI"
 config AMD_NUMA
 	def_bool y
 	prompt "Old style AMD Opteron NUMA detection"
-	depends on NUMA && PCI
+	depends on X86_64 && NUMA && PCI
 	---help---
 	  Enable AMD NUMA node topology detection.  You should say Y here if
 	  you have a multi processor AMD system. This uses an old method to
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now
  2011-07-11  8:34                 ` [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now Tejun Heo
@ 2011-07-11 14:01                   ` Tejun Heo
  2011-07-11 18:58                   ` [tip:x86/urgent] " tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2011-07-11 14:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Conny Seidel, x86, linux-kernel,
	Hans Rosenfeld

Hello,

I just found out that the original two patch series was too strict for
cases where nid is recorded in page->flags (always for x86-64), so the
original two patch series would unnecessarily reject some NUMA
emulation configs, so the simple one for this release.  I'll post
updated patches for the next merge window later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check
  2011-07-09  8:42               ` H. Peter Anvin
  2011-07-11  8:34                 ` [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now Tejun Heo
@ 2011-07-11 14:20                 ` Hans Rosenfeld
  1 sibling, 0 replies; 28+ messages in thread
From: Hans Rosenfeld @ 2011-07-11 14:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, Seidel, Conny, x86,
	linux-kernel

On Sat, Jul 09, 2011 at 04:42:31AM -0400, H. Peter Anvin wrote:
> On 07/09/2011 01:32 AM, Tejun Heo wrote:
> > 
> > Ping?  If the change is too invasive at this stage, we can disable AMD
> > NUMA on x86_32 for 3.0 and queue these two for 3.1.
> > 
> 
> I am kind of leaning that way... Hans, what is your opinion?

Thats ok for me. Disable it in 3.0 and fix it properly in 3.1.


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip:x86/urgent] x86: Disable AMD_NUMA for 32bit for now
  2011-07-11  8:34                 ` [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now Tejun Heo
  2011-07-11 14:01                   ` Tejun Heo
@ 2011-07-11 18:58                   ` tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Tejun Heo @ 2011-07-11 18:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hans.rosenfeld, hpa, mingo, conny.seidel, tj, tglx, mingo

Commit-ID:  5da0ef9a8554a8d03dc880a53f213289fe7b576d
Gitweb:     http://git.kernel.org/tip/5da0ef9a8554a8d03dc880a53f213289fe7b576d
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Mon, 11 Jul 2011 10:34:32 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 11 Jul 2011 16:25:30 +0200

x86: Disable AMD_NUMA for 32bit for now

Commit 2706a0bf7b ("x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit
too") enabled AMD NUMA for 32bit too.  Unfortunately, SPARSEMEM
on 32bit had rather coarse (512MiB) addr->node mapping
granularity due to lack of space in page->flags.  This led to
boot failure on certain AMD NUMA machines which had 128MiB
alignment on nodes.

Patches to properly detect this condition and reject NUMA
configuration are posted[1] but deemed too pervasive for merge
at this point (-rc6).  Disable AMD NUMA for 32bit for now and
re-enable once the detection logic is merged.

[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583

Reported-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110711083432.GC943@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index da34972..37357a5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1170,7 +1170,7 @@ comment "NUMA (Summit) requires SMP, 64GB highmem support, ACPI"
 config AMD_NUMA
 	def_bool y
 	prompt "Old style AMD Opteron NUMA detection"
-	depends on NUMA && PCI
+	depends on X86_64 && NUMA && PCI
 	---help---
 	  Enable AMD NUMA node topology detection.  You should say Y here if
 	  you have a multi processor AMD system. This uses an old method to

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH x86/mm 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
@ 2011-07-12  7:44 Tejun Heo
  2011-07-12  7:45 ` [PATCH x86/mm 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
  2011-07-13  5:33 ` [tip:x86/numa] x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ tip-bot for Tejun Heo
  0 siblings, 2 replies; 28+ messages in thread
From: Tejun Heo @ 2011-07-12  7:44 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Conny Seidel, x86, linux-kernel, Hans Rosenfeld

>From 9f5e6296923d7cf47738dfcd38ab9e333d3fd356 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 1 Jul 2011 18:22:39 +0200

DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
---
This one is identical as the original posting[1].  Only the second
patch is updated.  Please schedule for 3.1-rc1.  Also available on the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-x86-mm-base

Thanks.

[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583

 arch/x86/include/asm/mmzone_32.h |    6 +++---
 arch/x86/mm/numa_32.c            |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index ffa037f..55728e1 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -34,15 +34,15 @@ static inline void resume_map_numa_kva(pgd_t *pgd) {}
  *    64Gb / 4096bytes/page = 16777216 pages
  */
 #define MAX_NR_PAGES 16777216
-#define MAX_ELEMENTS 1024
-#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS)
+#define MAX_SECTIONS 1024
+#define PAGES_PER_SECTION (MAX_NR_PAGES/MAX_SECTIONS)
 
 extern s8 physnode_map[];
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
 #ifdef CONFIG_NUMA
-	return((int) physnode_map[(pfn) / PAGES_PER_ELEMENT]);
+	return((int) physnode_map[(pfn) / PAGES_PER_SECTION]);
 #else
 	return 0;
 #endif
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 849a975..3adebe7 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -41,7 +41,7 @@
  *     physnode_map[16-31] = 1;
  *     physnode_map[32- ] = -1;
  */
-s8 physnode_map[MAX_ELEMENTS] __read_mostly = { [0 ... (MAX_ELEMENTS - 1)] = -1};
+s8 physnode_map[MAX_SECTIONS] __read_mostly = { [0 ... (MAX_SECTIONS - 1)] = -1};
 EXPORT_SYMBOL(physnode_map);
 
 void memory_present(int nid, unsigned long start, unsigned long end)
@@ -52,8 +52,8 @@ void memory_present(int nid, unsigned long start, unsigned long end)
 			nid, start, end);
 	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
 	printk(KERN_DEBUG "  ");
-	for (pfn = start; pfn < end; pfn += PAGES_PER_ELEMENT) {
-		physnode_map[pfn / PAGES_PER_ELEMENT] = nid;
+	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+		physnode_map[pfn / PAGES_PER_SECTION] = nid;
 		printk(KERN_CONT "%lx ", pfn);
 	}
 	printk(KERN_CONT "\n");
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH x86/mm 2/2] x86: Implement pfn -> nid mapping granularity check
  2011-07-12  7:44 [PATCH x86/mm 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
@ 2011-07-12  7:45 ` Tejun Heo
  2011-07-13  5:33 ` [tip:x86/numa] x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2011-07-12  7:45 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Conny Seidel, x86, linux-kernel, Hans Rosenfeld

>From c11db5200bfdb371ebc590c5c17736288eeae5d4 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 12 Jul 2011 09:36:04 +0200

SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity.  If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
led to the following BUG_ON().

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
---
Updated such that the check is applied iff NODE_NOT_IN_PAGE_FLAGS.

Thanks.

 arch/x86/mm/numa.c |   15 ++++++++++++++
 include/linux/mm.h |    1 +
 mm/page_alloc.c    |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index f5510d8..fbeaaf4 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -496,6 +496,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
+	unsigned long uninitialized_var(pfn_align);
 	int i, nid;
 
 	/* Account for nodes with cpus and no memory */
@@ -511,6 +512,20 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	/* for out of order entries */
 	sort_node_map();
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..c70a326 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1313,6 +1313,7 @@ extern void remove_active_range(unsigned int nid, unsigned long start_pfn,
 					unsigned long end_pfn);
 extern void remove_all_active_ranges(void);
 void sort_node_map(void);
+unsigned long node_map_pfn_alignment(void);
 unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
 						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..9119faa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4585,6 +4585,60 @@ void __init sort_node_map(void)
 			cmp_node_active_region, NULL);
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+unsigned long __init node_map_pfn_alignment(void)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	int last_nid = -1;
+	int i;
+
+	for_each_active_range_index_in_nid(i, MAX_NUMNODES) {
+		int nid = early_node_map[i].nid;
+		unsigned long start = early_node_map[i].start_pfn;
+		unsigned long end = early_node_map[i].end_pfn;
+		unsigned long mask;
+
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		/*
+		 * Start with a mask granular enough to pin-point to the
+		 * start pfn and tick off bits one-by-one until it becomes
+		 * too coarse to separate the current node from the last.
+		 */
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		/* accumulate all internode masks */
+		accl_mask |= mask;
+	}
+
+	/* convert mask to number of pages */
+	return ~accl_mask + 1;
+}
+
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/numa] x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
  2011-07-12  7:44 [PATCH x86/mm 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
  2011-07-12  7:45 ` [PATCH x86/mm 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
@ 2011-07-13  5:33 ` tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Tejun Heo @ 2011-07-13  5:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hans.rosenfeld, linux-kernel, hpa, mingo, tj, tglx, hpa

Commit-ID:  d0ead157387f19801beb1b419568723b2e9b7c79
Gitweb:     http://git.kernel.org/tip/d0ead157387f19801beb1b419568723b2e9b7c79
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Tue, 12 Jul 2011 09:44:22 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 12 Jul 2011 21:58:11 -0700

x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/

DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/mmzone_32.h |    6 +++---
 arch/x86/mm/numa_32.c            |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index ffa037f..55728e1 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -34,15 +34,15 @@ static inline void resume_map_numa_kva(pgd_t *pgd) {}
  *    64Gb / 4096bytes/page = 16777216 pages
  */
 #define MAX_NR_PAGES 16777216
-#define MAX_ELEMENTS 1024
-#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS)
+#define MAX_SECTIONS 1024
+#define PAGES_PER_SECTION (MAX_NR_PAGES/MAX_SECTIONS)
 
 extern s8 physnode_map[];
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
 #ifdef CONFIG_NUMA
-	return((int) physnode_map[(pfn) / PAGES_PER_ELEMENT]);
+	return((int) physnode_map[(pfn) / PAGES_PER_SECTION]);
 #else
 	return 0;
 #endif
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 849a975..3adebe7 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -41,7 +41,7 @@
  *     physnode_map[16-31] = 1;
  *     physnode_map[32- ] = -1;
  */
-s8 physnode_map[MAX_ELEMENTS] __read_mostly = { [0 ... (MAX_ELEMENTS - 1)] = -1};
+s8 physnode_map[MAX_SECTIONS] __read_mostly = { [0 ... (MAX_SECTIONS - 1)] = -1};
 EXPORT_SYMBOL(physnode_map);
 
 void memory_present(int nid, unsigned long start, unsigned long end)
@@ -52,8 +52,8 @@ void memory_present(int nid, unsigned long start, unsigned long end)
 			nid, start, end);
 	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
 	printk(KERN_DEBUG "  ");
-	for (pfn = start; pfn < end; pfn += PAGES_PER_ELEMENT) {
-		physnode_map[pfn / PAGES_PER_ELEMENT] = nid;
+	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+		physnode_map[pfn / PAGES_PER_SECTION] = nid;
 		printk(KERN_CONT "%lx ", pfn);
 	}
 	printk(KERN_CONT "\n");

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/numa] x86, numa: Implement pfn -> nid mapping granularity check
       [not found]     ` <20110628174613.GP478@escobedo.osrc.amd.com>
  2011-06-29  9:44       ` 32bit NUMA and fakeNUMA broken for AMD CPUs Tejun Heo
@ 2011-07-13  5:34       ` tip-bot for Tejun Heo
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot for Tejun Heo @ 2011-07-13  5:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hans.rosenfeld, linux-kernel, hpa, mingo, conny.seidel, tj, tglx, hpa

Commit-ID:  1e01979c8f502ac13e3cdece4f38712c5944e6e8
Gitweb:     http://git.kernel.org/tip/1e01979c8f502ac13e3cdece4f38712c5944e6e8
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Tue, 12 Jul 2011 09:45:34 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 12 Jul 2011 21:58:29 -0700

x86, numa: Implement pfn -> nid mapping granularity check

SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity.  If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
led to the following BUG_ON().

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa.c |   15 ++++++++++++++
 include/linux/mm.h |    1 +
 mm/page_alloc.c    |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index f5510d8..fbeaaf4 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -496,6 +496,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
+	unsigned long uninitialized_var(pfn_align);
 	int i, nid;
 
 	/* Account for nodes with cpus and no memory */
@@ -511,6 +512,20 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	/* for out of order entries */
 	sort_node_map();
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..c70a326 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1313,6 +1313,7 @@ extern void remove_active_range(unsigned int nid, unsigned long start_pfn,
 					unsigned long end_pfn);
 extern void remove_all_active_ranges(void);
 void sort_node_map(void);
+unsigned long node_map_pfn_alignment(void);
 unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
 						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..9119faa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4585,6 +4585,60 @@ void __init sort_node_map(void)
 			cmp_node_active_region, NULL);
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+unsigned long __init node_map_pfn_alignment(void)
+{
+	unsigned long accl_mask = 0, last_end = 0;
+	int last_nid = -1;
+	int i;
+
+	for_each_active_range_index_in_nid(i, MAX_NUMNODES) {
+		int nid = early_node_map[i].nid;
+		unsigned long start = early_node_map[i].start_pfn;
+		unsigned long end = early_node_map[i].end_pfn;
+		unsigned long mask;
+
+		if (!start || last_nid < 0 || last_nid == nid) {
+			last_nid = nid;
+			last_end = end;
+			continue;
+		}
+
+		/*
+		 * Start with a mask granular enough to pin-point to the
+		 * start pfn and tick off bits one-by-one until it becomes
+		 * too coarse to separate the current node from the last.
+		 */
+		mask = ~((1 << __ffs(start)) - 1);
+		while (mask && last_end <= (start & (mask << 1)))
+			mask <<= 1;
+
+		/* accumulate all internode masks */
+		accl_mask |= mask;
+	}
+
+	/* convert mask to number of pages */
+	return ~accl_mask + 1;
+}
+
 /* Find the lowest pfn for a node */
 static unsigned long __init find_min_pfn_for_node(int nid)
 {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-07-13  5:34 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-21 15:41 32bit NUMA and fakeNUMA broken for AMD CPUs Conny Seidel
2011-06-26 10:22 ` Tejun Heo
     [not found]   ` <20110626223807.47cef5c6.conny.seidel_amd.com@marah.osrc.amd.com>
2011-06-28  9:41     ` [PATCH tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines Tejun Heo
2011-06-28 12:35       ` Conny Seidel
2011-07-01 15:26       ` [tip:x86/urgent] " tip-bot for Tejun Heo
     [not found]     ` <20110628174613.GP478@escobedo.osrc.amd.com>
2011-06-29  9:44       ` 32bit NUMA and fakeNUMA broken for AMD CPUs Tejun Heo
2011-06-29 10:51         ` Tejun Heo
2011-06-29 12:34         ` Tejun Heo
2011-06-29 12:55           ` Hans Rosenfeld
2011-06-29 13:03             ` Tejun Heo
2011-06-29 16:15               ` Tejun Heo
2011-06-30 13:13                 ` Hans Rosenfeld
2011-06-30 15:55                   ` Tejun Heo
2011-06-30 16:32                     ` Hans Rosenfeld
2011-06-30 16:42                       ` Tejun Heo
2011-06-30 17:04                         ` Hans Rosenfeld
2011-07-01 16:22         ` [PATCH x86/urgent 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
2011-07-01 16:23           ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
2011-07-09  8:32             ` Tejun Heo
2011-07-09  8:42               ` H. Peter Anvin
2011-07-11  8:34                 ` [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now Tejun Heo
2011-07-11 14:01                   ` Tejun Heo
2011-07-11 18:58                   ` [tip:x86/urgent] " tip-bot for Tejun Heo
2011-07-11 14:20                 ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Hans Rosenfeld
2011-07-13  5:34       ` [tip:x86/numa] x86, numa: " tip-bot for Tejun Heo
2011-07-12  7:44 [PATCH x86/mm 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
2011-07-12  7:45 ` [PATCH x86/mm 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
2011-07-13  5:33 ` [tip:x86/numa] x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ tip-bot for Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.