All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: David Rientjes <rientjes@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Yinghai Lu <yinghai@kernel.org>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <andi@firstfloor.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] x86: fix node_possible_map logic -v2
Date: Tue, 12 May 2009 10:06:22 -0500	[thread overview]
Message-ID: <20090512150622.GA10015@sgi.com> (raw)
In-Reply-To: <alpine.DEB.2.00.0905111509570.2234@chino.kir.corp.google.com>

On Mon, May 11, 2009 at 03:25:39PM -0700, David Rientjes wrote:
> On Mon, 11 May 2009, H. Peter Anvin wrote:
> 
> > > In your example of two cpus (0-1) that are remote to the system's only
> > > memory and two cpus (2-3) that have affinity to that memory, it appears as
> > > though the kernel is considering cpus 2-3 and the memory to be a node and
> > > cpus 0-1 to be a memoryless node.
> > > 
> > > That's a pretty useless scenario for memoryless node support, actually,
> > > unless there's a third node with memory that cpus 0-1 have a different
> > > distance to.  cpus 0-1 have no memory that is local, so the "remote" memory
> > > should be considered local to them.
> > > 
> > 
> > Should it?  It seems to me that CPUs 0-1 should be antipreferentially
> > scheduled, since they will have slower access to the memory than CPUs 2-3.
> > Since in this case all the memory is in the same place you could argue that
> > SMP distances could do the same job, which is of course true.
> > 
> > However, consider now:
> > 
> > CPU [0-1]	- no memory
> > CPU [2-3]	- memory
> > CPU [4-5]	- memory
> > 
> > Each node is equidistant, but for the memory nodes there is differences
> > between their own local memory and the remote memory.
> > 
> > CPU [0-1] cannot be considered local in either node, since they are further
> > away from the memory than either, and furthermore, unlike either of the memory
> > nodes, they have no preference for memory from either of the other two nodes
> > (quite on the contrary; they would probably benefit from drawing from both.)
> > 
> 
> Right, there's no difference from Jack's scenario if the three nodes are 
> equiadistant.  I was thinking of a topology where cpu 0-1 was closer to, 
> for example, cpu 2-3's memory than cpu 4-5's.

Agree.

We actually have configurations that match both scenarios above. The
system is a blade-based system with 2 processor sockets per blade.
Memory is socket attached and each socket is in a unique PXM.

For the case where 1 socket on a blade has memory & the other does not,
the memoryless socket is very close to it's neighbor and much further from
memory on any other blade.

For the case where neither socket has memory, the blade is equidistant
from 14 nodes located on adjacent blades.

One final point. In case you think this configuration makes no sense, the
sockets actually have memory. However, none of the memory is directly
accessible to the OS nor can it be referenced by cores located on the
processor sockets. The memory is reserved for high speed access to special
blade-attached IO devices. The IO devices need large 2**2n sized chunks of
memory. If the memory is fragmented so that a portion can be used by the
OS, then the max chunk size is reduced by a factor of 4.

> 
> The particular topology you're referring to should have a slit that 
> describes the relative distances in each direction differently.  The pxms 
> that these cpus belong to will always be local to itself, but ACPI 3.0 
> allows distances for different directions between the same pxms to be 
> different.
> 
> That means it's possible that cpus 0-1 above have local distance to all 
> memory and cpus 2-3 (and cpus 4-5) have remote distance to all nodes other 
> than itself.
> 
> numactl --hardware would show something like this:
> 
> 		0	1	2
> 	0	10	10	10
> 	1	20	10	20
> 	2	20	20	10
> 
> which is valid according to the ACPI specification.  This is based on the 
> pxms to which the cpus belong so this topology would describe all members 
> of those pxms and not just memory.

The BIOS currently defines unique PXMs for all nodes as implied above. The
SLIT currently looks like:
 		0	1	2
 	0	10	20	20
 	1	20	10	20
 	2	20	20	10

but I understand your point. This is an easy fix.


--- jack

	

  reply	other threads:[~2009-05-12 15:06 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-09  6:45 [PATCH 1/3] x86: remove MEMORY_HOTPLUG_RESERVE related code Yinghai Lu
2009-05-09  6:48 ` [PATCH 2/3] x86: add numa_move_cpus_to_node Yinghai Lu
2009-05-09  7:05   ` Justin P. Mattock
2009-05-12  1:27   ` Christoph Lameter
2009-05-11 21:53     ` Yinghai Lu
2009-05-12 20:59       ` Christoph Lameter
2009-05-12 17:16         ` Yinghai Lu
2009-05-12 21:21           ` Christoph Lameter
2009-05-13  5:39             ` Yinghai Lu
2009-05-14 19:34               ` Christoph Lameter
2009-05-14 20:58                 ` Yinghai Lu
2009-05-09  6:50 ` [PATCH 3/3] x86: fix node_possible_map logic -v2 Yinghai Lu
2009-05-11 17:53   ` Jack Steiner
2009-05-11 19:15     ` Yinghai Lu
2009-05-11 19:36       ` Yinghai Lu
2009-05-11 19:27     ` David Rientjes
2009-05-11 21:12       ` H. Peter Anvin
2009-05-11 21:26         ` Alan Cox
2009-05-11 22:25         ` David Rientjes
2009-05-12 15:06           ` Jack Steiner [this message]
2009-05-12 15:10             ` Yinghai Lu
2009-05-12 16:16               ` Jack Steiner
2009-05-12 16:40                 ` Yinghai Lu
2009-05-12 18:03                   ` Jack Steiner
2009-05-12 21:31                     ` Yinghai Lu
2009-05-12 21:58                       ` Jack Steiner
2009-05-12 23:13                         ` Yinghai Lu
2009-05-12 23:26                           ` Yinghai Lu
2009-05-12 15:43             ` Andi Kleen
2009-05-13  1:34             ` [PATCH] x86: fix system without memory on node0 Yinghai Lu
2009-05-13  8:00               ` Andi Kleen
2009-05-13 15:58                 ` Yinghai Lu
2009-05-13 13:35               ` Ingo Molnar
2009-05-13 16:52               ` Jack Steiner
2009-05-13 17:43                 ` Yinghai Lu
2009-05-13 18:08                 ` Yinghai Lu
2009-05-12  7:15         ` [PATCH 3/3] x86: fix node_possible_map logic -v2 Andi Kleen
2009-05-11 21:33       ` Jack Steiner
2009-05-11 22:56         ` David Rientjes
2009-05-11 23:00           ` Yinghai Lu
2009-05-12  7:09       ` Andi Kleen
2009-05-12  1:02 ` [PATCH 1/3] x86: remove MEMORY_HOTPLUG_RESERVE related code Christoph Lameter
2009-05-12 11:16 ` Mel Gorman
2009-05-13  5:29   ` Yinghai Lu
2009-05-13  9:55     ` Mel Gorman
2009-05-13  6:13   ` [PATCH] x86: remove MEMORY_HOTPLUG_RESERVE related code -v2 Yinghai Lu
2009-05-13 14:59     ` Mel Gorman
2009-05-14 16:38       ` [PATCH 1/5] " Yinghai Lu
2009-05-14 16:40         ` [PATCH 2/5] x86: add numa_move_cpus_to_node Yinghai Lu
2009-05-14 16:41         ` [PATCH 3/5] x86: fix node_possible_map logic -v2 Yinghai Lu
2009-05-18  7:40           ` [tip:x86/mm] x86, mm: Fix node_possible_map logic tip-bot for Yinghai Lu
2009-05-14 16:42         ` [PATCH 4/5] x86: fix system without memory on node0 -v2 Yinghai Lu
2009-05-18  7:40           ` [tip:x86/mm] x86: fix system without memory on node0 tip-bot for Yinghai Lu
2009-05-14 16:43         ` [PATCH 5/5] mm: clear N_HIGH_MEMORY map before se set it again -v2 Yinghai Lu
2009-05-14 16:54           ` Andrew Morton
2009-05-14 17:05             ` Yinghai Lu
2009-05-14 17:25               ` Andrew Morton
2009-05-14 17:34                 ` Yinghai Lu
2009-05-14 19:44                   ` Christoph Lameter
2009-06-04  5:16                   ` [RESEND PATCH] " Yinghai Lu
2009-06-04 16:38                     ` Christoph Lameter
2009-06-04 16:48                       ` Yinghai Lu
2009-06-04 17:11                         ` Christoph Lameter
2009-06-04 17:26                           ` [PATCH] mm: clear N_HIGH_MEMORY map before se set it again -v4 Yinghai Lu
2009-06-19  6:42                             ` Nathan Lynch
2009-06-19  8:18                               ` Yinghai Lu
     [not found]                                 ` <4A3B49BA.40100-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-19  8:43                                   ` Nathan Lynch
2009-06-19  8:43                                 ` Nathan Lynch
2009-06-19 16:16                                   ` Yinghai Lu
     [not found]                                   ` <m3prd0havh.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-19 16:16                                     ` Yinghai Lu
2009-06-20 23:43                                     ` Yinghai Lu
2009-06-20 23:43                                       ` Yinghai Lu
     [not found]                                       ` <4A3D7419.8040305-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-22  4:39                                         ` Nathan Lynch
2009-06-22  4:39                                       ` Nathan Lynch
2009-06-22 15:38                                         ` [PATCH] x86: only clear node_states for 64bit Yinghai Lu
     [not found]                                           ` <4A3FA58A.3010909-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-26 20:54                                             ` Andrew Morton
2009-06-26 20:54                                           ` Andrew Morton
2009-06-26 21:09                                             ` Yinghai Lu
     [not found]                                               ` <4A4538FE.2090101-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-27 17:17                                                 ` Ingo Molnar
2009-06-27 17:17                                                   ` Ingo Molnar
     [not found]                                                   ` <20090627171714.GD21595-X9Un+BFzKDI@public.gmane.org>
2009-06-27 20:40                                                     ` Yinghai Lu
2009-06-27 20:40                                                       ` Yinghai Lu
     [not found]                                                       ` <4A4683B2.106-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-29  7:39                                                         ` Yinghai Lu
2009-06-29  7:39                                                           ` Yinghai Lu
     [not found]                                             ` <20090626135428.d8f88a70.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-06-26 21:09                                               ` Yinghai Lu
     [not found]                                         ` <m3my807ug3.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-22 15:38                                           ` Yinghai Lu
     [not found]                               ` <m3bpokiv0u.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-19  8:18                                 ` [PATCH] mm: clear N_HIGH_MEMORY map before se set it again -v4 Yinghai Lu
     [not found]                             ` <4A2803D1.4070001-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2009-06-19  6:42                               ` Nathan Lynch
2009-05-18  7:39         ` [tip:x86/mm] mm, x86: remove MEMORY_HOTPLUG_RESERVE related code tip-bot for Yinghai Lu
     [not found] ` <20090511095022.GA23121@elte.hu>
     [not found]   ` <20090511163158.c4e4d334.akpm@linux-foundation.org>
     [not found]     ` <20090512090704.GC18004@elte.hu>
     [not found]       ` <4A0A6700.3070100@kernel.org>
     [not found]         ` <20090513133635.GB7384@elte.hu>
     [not found]           ` <4A0AFA6E.5050200@kernel.org>
     [not found]             ` <20090515173521.GA29647@elte.hu>
2009-05-15 21:38               ` tip: patches in git for irq and numa Yinghai Lu
2009-05-18  7:29                 ` Ingo Molnar
2009-05-18 13:50                   ` Peter Zijlstra
2009-05-18 13:56                     ` Ingo Molnar
2009-05-18 15:03                     ` Yinghai Lu
2009-05-18 15:09                       ` Ingo Molnar
2009-05-18 15:11                       ` Peter Zijlstra
2009-05-18 17:23                         ` Yinghai Lu
2009-05-19  9:37                           ` Ingo Molnar
2009-05-19 10:31                             ` Peter Zijlstra
2009-05-19 12:26                               ` Ingo Molnar
2009-05-19  9:39                           ` [tip:irq/numa] x86, io-apic: Don't mark pin_programmed early tip-bot for Yinghai Lu
2009-05-19 12:30                           ` tip-bot for Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090512150622.GA10015@sgi.com \
    --to=steiner@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.