All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] x86: fix node_possible_map logic -v2
Date: Mon, 11 May 2009 12:53:12 -0500	[thread overview]
Message-ID: <20090511175312.GA27905@sgi.com> (raw)
In-Reply-To: <4A0527CB.4020807@kernel.org>

On Fri, May 08, 2009 at 11:50:51PM -0700, Yinghai Lu wrote:
> 
> recently there are some changes to about meaning of node_possible_map
> 
> and it is some strange:
> the node without memory would be set in node_possible_map
> but some node with less NODE_MIN_SIZE will be kicked out of node_possible_map.
> 
> try to fix it by adding strict_setup_node_bootmem.
> also remove unparse_node.

I still see the same panic. Entry 0 of the node_data array is NULL &
it is dereferenced building the zonelists.

I'm sure that you are way ahead of me in diagnosing this problem but
this is a regression from previous behavior. Fpor example, in 2.6.27, node_data
is created for both nodes but node 0 contains no memory:

	(2.7.27)
	<6>SRAT: PXM 0 -> APIC 0 -> Node 0
	<6>SRAT: PXM 1 -> APIC 128 -> Node 1
	<6>SRAT: Node 1 PXM 1 0-fff6c000
	<7>NUMA: Using 63 for the hash shift.
	<6>Bootmem setup node 0 0000000000000000-0000000000000000
	<3>Cannot find 212992 bytes in node 0
	<6>Bootmem setup node 1 0000000000000000-0000000010000000
	<6>  NODE_DATA [000000000139be80 - 00000000013cfe7f]
	<6>  bootmap [00000000013d0000 -  00000000013d1fff] pages 2
	<6>(7 early reservations) ==> bootmem [0000000000 - 0010000000]
	<6>  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
	<6>  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
	<6>  #2 [0000200000 - 000139be38]    TEXT DATA BSS ==> [0000200000 - 000139be38]
	<6>  #3 [000009f000 - 00000e0900]    BIOS reserved ==> [000009f000 - 00000e0900]
	<6>  #4 [00000e0a68 - 0000100000]    BIOS reserved ==> [00000e0a68 - 0000100000]
	<6>  #5 [00000e0900 - 00000e0a68]       EFI memmap ==> [00000e0900 - 00000e0a68]
	<6>  #6 [0000001000 - 0000001030]        ACPI SLIT ==> [0000001000 - 0000001030]
	<6>Bootmem setup node 0 0000000000000000-0000000000000000
	<6>  NODE_DATA [00000000013d2000 - 0000000001405fff]
	<6>  bootmap [0000000000000000 -  ffffffffffffffff] pages 0
	<6>(7 early reservations) ==> bootmem [0000000000 - 0000000000]
	<6>  #0 [0000000000 - 0000001000]   BIOS data page
	<6>  #1 [0000006000 - 0000008000]       TRAMPOLINE
	<6>  #2 [0000200000 - 000139be38]    TEXT DATA BSS
	<6>  #3 [000009f000 - 00000e0900]    BIOS reserved
	<6>  #4 [00000e0a68 - 0000100000]    BIOS reserved
	<6>  #5 [00000e0900 - 00000e0a68]       EFI memmap
	<6>  #6 [0000001000 - 0000001030]        ACPI SLIT
	<6>    NODE_DATA(0) on node 1
	<6>    bootmap(0) on node 1
	<7> [ffffe20000000000-ffffe200003fffff] PMD -> [ffff880001600000-ffff8800019fffff] on node 1
	<4>Zone PFN ranges:
	<4>  DMA      0x00000000 -> 0x00001000
	<4>  DMA32    0x00001000 -> 0x00100000
	<4>  Normal   0x00100000 -> 0x00100000
	<4>Movable zone start PFN for each node
	<4>early_node_map[2] active PFN ranges
	<4>    1: 0x00000000 -> 0x00000006
	<4>    1: 0x00000200 -> 0x00010000
	<4>Could not find start_pfn for node 0
	<7>On node 0 totalpages: 0
	<7>On node 1 totalpages: 65030
	<7>  DMA zone: 3427 pages, LIFO batch:0
	<7>  DMA32 zone: 60480 pages, LIFO batch:15

I have not seen any problems running on 2.6.27 using nodes that have no memory.


Do we have a clear and unambiguous definition of what a node really is?
In this case, is a board (socket) with cpus, a unique PXM but no memory
considered a node. Even though it has no memory, it is a node (depending on the
definition of "node") for purposes such as scheduling. The memoryless node also
has local IO buses that want to direct interrupts to node-local cpus.



> 
> so result will be:
> 1. cpu_to_node will return online node only (nearest one)
> 2. apicid_to_node still return the node that could be not online but is set
>    in node_possible_map.
> 3. node_possible_map will include nodes that mem on it are less NODE_MIN_SIZE
> 
> v2: after move_cpus_to_node change.
> 
> [ Impact: get node_possible_map right ]
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/include/asm/numa_64.h |    4 ++++
>  arch/x86/mm/numa_64.c          |    7 +++++++
>  arch/x86/mm/srat_64.c          |   29 ++---------------------------
>  3 files changed, 13 insertions(+), 27 deletions(-)
> 
> Index: linux-2.6/arch/x86/mm/srat_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/srat_64.c
> +++ linux-2.6/arch/x86/mm/srat_64.c
> @@ -36,10 +36,6 @@ static int num_node_memblks __initdata;
>  static struct bootnode node_memblk_range[NR_NODE_MEMBLKS] __initdata;
>  static int memblk_nodeid[NR_NODE_MEMBLKS] __initdata;
>  
> -/* Too small nodes confuse the VM badly. Usually they result
> -   from BIOS bugs. */
> -#define NODE_MIN_SIZE (4*1024*1024)
> -
>  static __init int setup_node(int pxm)
>  {
>  	return acpi_map_pxm_to_node(pxm);
> @@ -338,17 +334,6 @@ static int __init nodes_cover_memory(con
>  	return 1;
>  }
>  
> -static void __init unparse_node(int node)
> -{
> -	int i;
> -	node_clear(node, nodes_parsed);
> -	node_clear(node, cpu_nodes_parsed);
> -	for (i = 0; i < MAX_LOCAL_APIC; i++) {
> -		if (apicid_to_node[i] == node)
> -			apicid_to_node[i] = NUMA_NO_NODE;
> -	}
> -}
> -
>  void __init acpi_numa_arch_fixup(void) {}
>  
>  /* Use the information discovered above to actually set up the nodes. */
> @@ -360,18 +345,8 @@ int __init acpi_scan_nodes(unsigned long
>  		return -1;
>  
>  	/* First clean up the node list */
> -	for (i = 0; i < MAX_NUMNODES; i++) {
> +	for (i = 0; i < MAX_NUMNODES; i++)
>  		cutoff_node(i, start, end);
> -		/*
> -		 * don't confuse VM with a node that doesn't have the
> -		 * minimum memory.
> -		 */
> -		if (nodes[i].end &&
> -			(nodes[i].end - nodes[i].start) < NODE_MIN_SIZE) {
> -			unparse_node(i);
> -			node_set_offline(i);
> -		}
> -	}
>  
>  	if (!nodes_cover_memory(nodes)) {
>  		bad_srat();
> @@ -404,7 +379,7 @@ int __init acpi_scan_nodes(unsigned long
>  
>  		if (node == NUMA_NO_NODE)
>  			continue;
> -		if (!node_isset(node, node_possible_map))
> +		if (!node_online(node))
>  			numa_clear_node(i);
>  	}
>  	numa_init_array();
> Index: linux-2.6/arch/x86/mm/numa_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/numa_64.c
> +++ linux-2.6/arch/x86/mm/numa_64.c
> @@ -192,6 +192,13 @@ void __init setup_node_bootmem(int nodei
>  	if (!end)
>  		return;
>  
> +	/*
> +	 * don't confuse VM with a node that doesn't have the
> +	 * minimum memory.
> +	 */
> +	if (end && (end - start) < NODE_MIN_SIZE)
> +		return;
> +
>  	start = roundup(start, ZONE_ALIGN);
>  
>  	printk(KERN_INFO "Bootmem setup node %d %016lx-%016lx\n", nodeid,
> Index: linux-2.6/arch/x86/include/asm/numa_64.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/numa_64.h
> +++ linux-2.6/arch/x86/include/asm/numa_64.h
> @@ -24,6 +24,10 @@ extern void setup_node_bootmem(int nodei
>  			       unsigned long end);
>  
>  #ifdef CONFIG_NUMA
> +/* Too small nodes confuse the VM badly. Usually they result
> +   from BIOS bugs. */
> +#define NODE_MIN_SIZE (4*1024*1024)
> +
>  extern void __init init_cpu_to_node(void);
>  extern void numa_set_node(int cpu, int node);
>  extern void numa_clear_node(int cpu);

  reply	other threads:[~2009-05-11 17:53 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-09  6:45 [PATCH 1/3] x86: remove MEMORY_HOTPLUG_RESERVE related code Yinghai Lu
2009-05-09  6:48 ` [PATCH 2/3] x86: add numa_move_cpus_to_node Yinghai Lu
2009-05-09  7:05   ` Justin P. Mattock
2009-05-12  1:27   ` Christoph Lameter
2009-05-11 21:53     ` Yinghai Lu
2009-05-12 20:59       ` Christoph Lameter
2009-05-12 17:16         ` Yinghai Lu
2009-05-12 21:21           ` Christoph Lameter
2009-05-13  5:39             ` Yinghai Lu
2009-05-14 19:34               ` Christoph Lameter
2009-05-14 20:58                 ` Yinghai Lu
2009-05-09  6:50 ` [PATCH 3/3] x86: fix node_possible_map logic -v2 Yinghai Lu
2009-05-11 17:53   ` Jack Steiner [this message]
2009-05-11 19:15     ` Yinghai Lu
2009-05-11 19:36       ` Yinghai Lu
2009-05-11 19:27     ` David Rientjes
2009-05-11 21:12       ` H. Peter Anvin
2009-05-11 21:26         ` Alan Cox
2009-05-11 22:25         ` David Rientjes
2009-05-12 15:06           ` Jack Steiner
2009-05-12 15:10             ` Yinghai Lu
2009-05-12 16:16               ` Jack Steiner
2009-05-12 16:40                 ` Yinghai Lu
2009-05-12 18:03                   ` Jack Steiner
2009-05-12 21:31                     ` Yinghai Lu
2009-05-12 21:58                       ` Jack Steiner
2009-05-12 23:13                         ` Yinghai Lu
2009-05-12 23:26                           ` Yinghai Lu
2009-05-12 15:43             ` Andi Kleen
2009-05-13  1:34             ` [PATCH] x86: fix system without memory on node0 Yinghai Lu
2009-05-13  8:00               ` Andi Kleen
2009-05-13 15:58                 ` Yinghai Lu
2009-05-13 13:35               ` Ingo Molnar
2009-05-13 16:52               ` Jack Steiner
2009-05-13 17:43                 ` Yinghai Lu
2009-05-13 18:08                 ` Yinghai Lu
2009-05-12  7:15         ` [PATCH 3/3] x86: fix node_possible_map logic -v2 Andi Kleen
2009-05-11 21:33       ` Jack Steiner
2009-05-11 22:56         ` David Rientjes
2009-05-11 23:00           ` Yinghai Lu
2009-05-12  7:09       ` Andi Kleen
2009-05-12  1:02 ` [PATCH 1/3] x86: remove MEMORY_HOTPLUG_RESERVE related code Christoph Lameter
2009-05-12 11:16 ` Mel Gorman
2009-05-13  5:29   ` Yinghai Lu
2009-05-13  9:55     ` Mel Gorman
2009-05-13  6:13   ` [PATCH] x86: remove MEMORY_HOTPLUG_RESERVE related code -v2 Yinghai Lu
2009-05-13 14:59     ` Mel Gorman
2009-05-14 16:38       ` [PATCH 1/5] " Yinghai Lu
2009-05-14 16:40         ` [PATCH 2/5] x86: add numa_move_cpus_to_node Yinghai Lu
2009-05-14 16:41         ` [PATCH 3/5] x86: fix node_possible_map logic -v2 Yinghai Lu
2009-05-18  7:40           ` [tip:x86/mm] x86, mm: Fix node_possible_map logic tip-bot for Yinghai Lu
2009-05-14 16:42         ` [PATCH 4/5] x86: fix system without memory on node0 -v2 Yinghai Lu
2009-05-18  7:40           ` [tip:x86/mm] x86: fix system without memory on node0 tip-bot for Yinghai Lu
2009-05-14 16:43         ` [PATCH 5/5] mm: clear N_HIGH_MEMORY map before se set it again -v2 Yinghai Lu
2009-05-14 16:54           ` Andrew Morton
2009-05-14 17:05             ` Yinghai Lu
2009-05-14 17:25               ` Andrew Morton
2009-05-14 17:34                 ` Yinghai Lu
2009-05-14 19:44                   ` Christoph Lameter
2009-06-04  5:16                   ` [RESEND PATCH] " Yinghai Lu
2009-06-04 16:38                     ` Christoph Lameter
2009-06-04 16:48                       ` Yinghai Lu
2009-06-04 17:11                         ` Christoph Lameter
2009-06-04 17:26                           ` [PATCH] mm: clear N_HIGH_MEMORY map before se set it again -v4 Yinghai Lu
2009-06-19  6:42                             ` Nathan Lynch
2009-06-19  8:18                               ` Yinghai Lu
     [not found]                                 ` <4A3B49BA.40100-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-19  8:43                                   ` Nathan Lynch
2009-06-19  8:43                                 ` Nathan Lynch
2009-06-19 16:16                                   ` Yinghai Lu
     [not found]                                   ` <m3prd0havh.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-19 16:16                                     ` Yinghai Lu
2009-06-20 23:43                                     ` Yinghai Lu
2009-06-20 23:43                                       ` Yinghai Lu
     [not found]                                       ` <4A3D7419.8040305-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-22  4:39                                         ` Nathan Lynch
2009-06-22  4:39                                       ` Nathan Lynch
2009-06-22 15:38                                         ` [PATCH] x86: only clear node_states for 64bit Yinghai Lu
     [not found]                                           ` <4A3FA58A.3010909-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-26 20:54                                             ` Andrew Morton
2009-06-26 20:54                                           ` Andrew Morton
2009-06-26 21:09                                             ` Yinghai Lu
     [not found]                                               ` <4A4538FE.2090101-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-27 17:17                                                 ` Ingo Molnar
2009-06-27 17:17                                                   ` Ingo Molnar
     [not found]                                                   ` <20090627171714.GD21595-X9Un+BFzKDI@public.gmane.org>
2009-06-27 20:40                                                     ` Yinghai Lu
2009-06-27 20:40                                                       ` Yinghai Lu
     [not found]                                                       ` <4A4683B2.106-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-29  7:39                                                         ` Yinghai Lu
2009-06-29  7:39                                                           ` Yinghai Lu
     [not found]                                             ` <20090626135428.d8f88a70.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-06-26 21:09                                               ` Yinghai Lu
     [not found]                                         ` <m3my807ug3.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-22 15:38                                           ` Yinghai Lu
     [not found]                               ` <m3bpokiv0u.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-19  8:18                                 ` [PATCH] mm: clear N_HIGH_MEMORY map before se set it again -v4 Yinghai Lu
     [not found]                             ` <4A2803D1.4070001-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-19  6:42                               ` Nathan Lynch
2009-05-18  7:39         ` [tip:x86/mm] mm, x86: remove MEMORY_HOTPLUG_RESERVE related code tip-bot for Yinghai Lu
     [not found] ` <20090511095022.GA23121@elte.hu>
     [not found]   ` <20090511163158.c4e4d334.akpm@linux-foundation.org>
     [not found]     ` <20090512090704.GC18004@elte.hu>
     [not found]       ` <4A0A6700.3070100@kernel.org>
     [not found]         ` <20090513133635.GB7384@elte.hu>
     [not found]           ` <4A0AFA6E.5050200@kernel.org>
     [not found]             ` <20090515173521.GA29647@elte.hu>
2009-05-15 21:38               ` tip: patches in git for irq and numa Yinghai Lu
2009-05-18  7:29                 ` Ingo Molnar
2009-05-18 13:50                   ` Peter Zijlstra
2009-05-18 13:56                     ` Ingo Molnar
2009-05-18 15:03                     ` Yinghai Lu
2009-05-18 15:09                       ` Ingo Molnar
2009-05-18 15:11                       ` Peter Zijlstra
2009-05-18 17:23                         ` Yinghai Lu
2009-05-19  9:37                           ` Ingo Molnar
2009-05-19 10:31                             ` Peter Zijlstra
2009-05-19 12:26                               ` Ingo Molnar
2009-05-19  9:39                           ` [tip:irq/numa] x86, io-apic: Don't mark pin_programmed early tip-bot for Yinghai Lu
2009-05-19 12:30                           ` tip-bot for Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090511175312.GA27905@sgi.com \
    --to=steiner@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.