All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix early panic issue on machines with memless node
@ 2009-05-05  3:15 Zhang, Yanmin
  2009-05-05  3:32 ` David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: Zhang, Yanmin @ 2009-05-05  3:15 UTC (permalink / raw)
  To: Jack Steiner, David Rientjes; +Cc: alex.shi, LKML, Ingo Molnar, Andi Kleen

Kernel 2.6.30-rc4 panic with boot parameter mem=2G on Nehalem machine.
The machines has 2 nodes and every node has about 3G memory.

Alex Shi did a good bisect and located the bad patch.

commit dc098551918093901d8ac8936e9d1a1b891b56ed
Author: Jack Steiner <steiner@sgi.com>
Date:   Fri Apr 17 09:22:42 2009 -0500

    x86/uv: fix init of memory-less nodes
    
    Add support for nodes that have cpus but no memory.
    The current code was failing to add these nodes
    to the nodes_present_map.
    
    v2: Fixes case caught by David Rientjes - missed support
        for the x2apic SRAT table.
    
    [ Impact: fix potential boot crash on memory-less UV nodes. ]
    
    Reported-by: David Rientjes <rientjes@google.com>
    Signed-off-by: Jack Steiner <steiner@sgi.com>
    LKML-Reference: <20090417142242.GA23743@sgi.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>



With earlyprintk boot parameter, we captured below dump info.

<6>bootmem::alloc_bootmem_core nid=0 size=0 [0 pages] align=1000 goal=1000000 lim0
PANIC: early exception 06 rip 10:ffffffff80a2fbe4 error 0 cr2 0
Pid: 0, comm: swapper Not tainted 2.6.30-rc4-ymz #3
Call Trace:                                        
 [<ffffffff80a1a195>] ? early_idt_handler+0x55/0x68  
 [<ffffffff80a2fbe4>] ? alloc_bootmem_core+0x91/0x2ae
 [<ffffffff80a2fbdc>] ? alloc_bootmem_core+0x89/0x2ae     
 [<ffffffff80a2fe74>] ? ___alloc_bootmem_nopanic+0x73/0xab
 [<ffffffff80a2af73>] ? early_node_mem+0x54/0x78      
 [<ffffffff80a2b0ed>] ? setup_node_bootmem+0x156/0x282
 [<ffffffff80a2b880>] ? acpi_scan_nodes+0x207/0x303
 [<ffffffff80a2b255>] ? initmem_init+0x3c/0x14c
 [<ffffffff80a1e33b>] ? setup_arch+0x5ba/0x760       
 [<ffffffff80a2e904>] ? cgroup_init_subsys+0xfc/0x105
 [<ffffffff80a2ea5f>] ? cgroup_init_early+0x152/0x163
 [<ffffffff80a1a915>] ? start_kernel+0x84/0x35e      
 [<ffffffff80a1a37e>] ? x86_64_start_kernel+0xe5/0xeb
RIP alloc_bootmem_core+0x91/0x2ae

Consider below call chain:
acpi_scan_nodes =>
		setup_node_bootmem
			 (twice) => early_node_mem

At begining, acpi_scan_nodes filters out memless nodes by calling
unparse_node. Patch dc098551918 adds the node back actually.
acpi_scan_nodes has many comments around unparse_node.

Below patch fixes it with node memory checking. Another method is just
to revert the bad patch.

David Rientjes, Jack Steiner,
Would you check if below patch satisfy your original objective?


Signed-off-by: Shi Alex <alex.shi@intel.com>
Signed-off-by: Zhang Yanmin <yanmin.zhang@linux.intel.com>


---

--- linux-2.6.30-rc4/arch/x86/mm/numa_64.c	2009-05-05 09:20:05.000000000 +0800
+++ linux-2.6.30-rc4_memlessnode/arch/x86/mm/numa_64.c	2009-05-05 10:28:34.000000000 +0800
@@ -199,6 +199,10 @@ void __init setup_node_bootmem(int nodei
 	start_pfn = start >> PAGE_SHIFT;
 	last_pfn = end >> PAGE_SHIFT;
 
+	bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn);
+	if (bootmap_pages == 0)
+		return;
+
 	node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size,
 					   SMP_CACHE_BYTES);
 	if (node_data[nodeid] == NULL)
@@ -219,7 +223,6 @@ void __init setup_node_bootmem(int nodei
 	 * early_node_mem will get that with find_e820_area instead
 	 * of alloc_bootmem, that could clash with reserved range
 	 */
-	bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn);
 	nid = phys_to_nid(nodedata_phys);
 	if (nid == nodeid)
 		bootmap_start = roundup(nodedata_phys + pgdat_size, PAGE_SIZE);



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-05-06 14:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05  3:15 [PATCH] Fix early panic issue on machines with memless node Zhang, Yanmin
2009-05-05  3:32 ` David Rientjes
2009-05-05  5:55   ` Zhang, Yanmin
2009-05-05 16:36   ` Jack Steiner
2009-05-05 19:50     ` [patch] srat: do not register nodes beyond e820 map David Rientjes
2009-05-06  8:58       ` [tip:x86/urgent] x86, " tip-bot for David Rientjes
2009-05-05 19:52     ` [PATCH] Fix early panic issue on machines with memless node David Rientjes
2009-05-05 20:27       ` Jack Steiner
2009-05-05 20:41         ` David Rientjes
2009-05-06  5:19         ` Zhang, Yanmin
2009-05-06 14:38           ` Jack Steiner
2009-05-06  8:50       ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.