All of lore.kernel.org
 help / color / mirror / Atom feed
* Externalize SLIT table
@ 2004-11-03 20:56 Jack Steiner
  2004-11-04  1:59   ` Takayoshi Kochi
  0 siblings, 1 reply; 58+ messages in thread
From: Jack Steiner @ 2004-11-03 20:56 UTC (permalink / raw)
  To: linux-ia64


The SLIT table provides useful information on internode
distances. Has anyone considered externalizing this
table via /proc or some equivalent mechanism.

For example, something like the following would be useful:

	# cat /proc/acpi/slit
	010 066 046 066
	066 010 066 046
	046 066 010 020
	066 046 020 010


If this looks ok (or something equivalent), I'll generate a patch....

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-03 20:56 Externalize SLIT table Jack Steiner
@ 2004-11-04  1:59   ` Takayoshi Kochi
  0 siblings, 0 replies; 58+ messages in thread
From: Takayoshi Kochi @ 2004-11-04  1:59 UTC (permalink / raw)
  To: steiner; +Cc: linux-ia64, linux-kernel

Hi,

For wider audience, added LKML.

From: Jack Steiner <steiner@sgi.com>
Subject: Externalize SLIT table
Date: Wed, 3 Nov 2004 14:56:56 -0600

> The SLIT table provides useful information on internode
> distances. Has anyone considered externalizing this
> table via /proc or some equivalent mechanism.
> 
> For example, something like the following would be useful:
> 
> 	# cat /proc/acpi/slit
> 	010 066 046 066
> 	066 010 066 046
> 	046 066 010 020
> 	066 046 020 010
> 
> If this looks ok (or something equivalent), I'll generate a patch....

For user space to manipulate scheduling domains, pinning processes
to some cpu groups etc, that kind of information is very useful!
Without this, users have no notion about how far between two nodes.

But ACPI SLIT table is too arch specific (ia64 and x86 only) and
user-visible logical number and ACPI proximity domain number is
not always identical.

Why not export node_distance() under sysfs?
I like (1).

(1) obey one-value-per-file sysfs principle

% cat /sys/devices/system/node/node0/distance0
10
% cat /sys/devices/system/node/node0/distance1
66

(2) one distance for each line

% cat /sys/devices/system/node/node0/distance
0:10
1:66
2:46
3:66

(3) all distances in one line like /proc/<PID>/stat

% cat /sys/devices/system/node/node0/distance
10 66 46 66

---
Takayoshi Kochi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04  1:59   ` Takayoshi Kochi
  0 siblings, 0 replies; 58+ messages in thread
From: Takayoshi Kochi @ 2004-11-04  1:59 UTC (permalink / raw)
  To: steiner; +Cc: linux-ia64, linux-kernel

Hi,

For wider audience, added LKML.

From: Jack Steiner <steiner@sgi.com>
Subject: Externalize SLIT table
Date: Wed, 3 Nov 2004 14:56:56 -0600

> The SLIT table provides useful information on internode
> distances. Has anyone considered externalizing this
> table via /proc or some equivalent mechanism.
> 
> For example, something like the following would be useful:
> 
> 	# cat /proc/acpi/slit
> 	010 066 046 066
> 	066 010 066 046
> 	046 066 010 020
> 	066 046 020 010
> 
> If this looks ok (or something equivalent), I'll generate a patch....

For user space to manipulate scheduling domains, pinning processes
to some cpu groups etc, that kind of information is very useful!
Without this, users have no notion about how far between two nodes.

But ACPI SLIT table is too arch specific (ia64 and x86 only) and
user-visible logical number and ACPI proximity domain number is
not always identical.

Why not export node_distance() under sysfs?
I like (1).

(1) obey one-value-per-file sysfs principle

% cat /sys/devices/system/node/node0/distance0
10
% cat /sys/devices/system/node/node0/distance1
66

(2) one distance for each line

% cat /sys/devices/system/node/node0/distance
0:10
1:66
2:46
3:66

(3) all distances in one line like /proc/<PID>/stat

% cat /sys/devices/system/node/node0/distance
10 66 46 66

---
Takayoshi Kochi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04  1:59   ` Takayoshi Kochi
@ 2004-11-04  4:07     ` Andi Kleen
  -1 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04  4:07 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: steiner, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> For wider audience, added LKML.
> 
> From: Jack Steiner <steiner@sgi.com>
> Subject: Externalize SLIT table
> Date: Wed, 3 Nov 2004 14:56:56 -0600
> 
> > The SLIT table provides useful information on internode
> > distances. Has anyone considered externalizing this
> > table via /proc or some equivalent mechanism.
> > 
> > For example, something like the following would be useful:
> > 
> > 	# cat /proc/acpi/slit
> > 	010 066 046 066
> > 	066 010 066 046
> > 	046 066 010 020
> > 	066 046 020 010
> > 
> > If this looks ok (or something equivalent), I'll generate a patch....

This isn't very useful without information about proximity domains.
e.g. on x86-64 the proximity domain number is not necessarily 
the same as the node number. 


> For user space to manipulate scheduling domains, pinning processes
> to some cpu groups etc, that kind of information is very useful!
> Without this, users have no notion about how far between two nodes.

Also some reporting of _PXM for PCI devices is needed. I had a 
experimental patch for this on x86-64 (not ACPI based), that
reported nearby nodes for PCI busses. 

> 
> But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> user-visible logical number and ACPI proximity domain number is
> not always identical.

Exactly.

> 
> Why not export node_distance() under sysfs?
> I like (1).
> 
> (1) obey one-value-per-file sysfs principle
> 
> % cat /sys/devices/system/node/node0/distance0
> 10

Surely distance from 0 to 0 is 0?

> % cat /sys/devices/system/node/node0/distance1
> 66

> 
> (2) one distance for each line
> 
> % cat /sys/devices/system/node/node0/distance
> 0:10
> 1:66
> 2:46
> 3:66
> 
> (3) all distances in one line like /proc/<PID>/stat
> 
> % cat /sys/devices/system/node/node0/distance
> 10 66 46 66

I would prefer that. 

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04  4:07     ` Andi Kleen
  0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04  4:07 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: steiner, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> For wider audience, added LKML.
> 
> From: Jack Steiner <steiner@sgi.com>
> Subject: Externalize SLIT table
> Date: Wed, 3 Nov 2004 14:56:56 -0600
> 
> > The SLIT table provides useful information on internode
> > distances. Has anyone considered externalizing this
> > table via /proc or some equivalent mechanism.
> > 
> > For example, something like the following would be useful:
> > 
> > 	# cat /proc/acpi/slit
> > 	010 066 046 066
> > 	066 010 066 046
> > 	046 066 010 020
> > 	066 046 020 010
> > 
> > If this looks ok (or something equivalent), I'll generate a patch....

This isn't very useful without information about proximity domains.
e.g. on x86-64 the proximity domain number is not necessarily 
the same as the node number. 


> For user space to manipulate scheduling domains, pinning processes
> to some cpu groups etc, that kind of information is very useful!
> Without this, users have no notion about how far between two nodes.

Also some reporting of _PXM for PCI devices is needed. I had a 
experimental patch for this on x86-64 (not ACPI based), that
reported nearby nodes for PCI busses. 

> 
> But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> user-visible logical number and ACPI proximity domain number is
> not always identical.

Exactly.

> 
> Why not export node_distance() under sysfs?
> I like (1).
> 
> (1) obey one-value-per-file sysfs principle
> 
> % cat /sys/devices/system/node/node0/distance0
> 10

Surely distance from 0 to 0 is 0?

> % cat /sys/devices/system/node/node0/distance1
> 66

> 
> (2) one distance for each line
> 
> % cat /sys/devices/system/node/node0/distance
> 0:10
> 1:66
> 2:46
> 3:66
> 
> (3) all distances in one line like /proc/<PID>/stat
> 
> % cat /sys/devices/system/node/node0/distance
> 10 66 46 66

I would prefer that. 

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04  4:07     ` Andi Kleen
@ 2004-11-04  4:57       ` Takayoshi Kochi
  -1 siblings, 0 replies; 58+ messages in thread
From: Takayoshi Kochi @ 2004-11-04  4:57 UTC (permalink / raw)
  To: ak; +Cc: steiner, linux-ia64, linux-kernel

Hi,

From: Andi Kleen <ak@suse.de>
Subject: Re: Externalize SLIT table
Date: Thu, 4 Nov 2004 05:07:13 +0100

> > Why not export node_distance() under sysfs?
> > I like (1).
> > 
> > (1) obey one-value-per-file sysfs principle
> > 
> > % cat /sys/devices/system/node/node0/distance0
> > 10
> 
> Surely distance from 0 to 0 is 0?

According to the ACPI spec, 10 means local and other values
mean ratio to 10.  But what the distance number should mean
mean is ambiguous from the spec (e.g. some veondors interpret as
memory access latency, others interpret as memory throughput
etc.)
However relative distance just works for most of uses, I believe.

Anyway, we should clarify how the numbers should be interpreted
to avoid confusion.

How about this?
"The distance to itself means the base value.  Distance to
other nodes are relative to the base value.
0 means unreachable (hot-removed or disabled) to that node."

(Just FYI, numbers 0-9 are reserved and 255 (unsigned char -1) means
unreachable, according to the ACPI spec.)

> > % cat /sys/devices/system/node/node0/distance1
> > 66
> 
> > 
> > (2) one distance for each line
> > 
> > % cat /sys/devices/system/node/node0/distance
> > 0:10
> > 1:66
> > 2:46
> > 3:66
> > 
> > (3) all distances in one line like /proc/<PID>/stat
> > 
> > % cat /sys/devices/system/node/node0/distance
> > 10 66 46 66
> 
> I would prefer that. 

Ah, I missed the following last sentence in
Documentation/filesystems/sysfs.txt:

|Attributes should be ASCII text files, preferably with only one value
|per file. It is noted that it may not be efficient to contain only
|value per file, so it is socially acceptable to express an array of
|values of the same type. 

If an array is acceptable, I would prefer (3), too.

---
Takayoshi Kochi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04  4:57       ` Takayoshi Kochi
  0 siblings, 0 replies; 58+ messages in thread
From: Takayoshi Kochi @ 2004-11-04  4:57 UTC (permalink / raw)
  To: ak; +Cc: steiner, linux-ia64, linux-kernel

Hi,

From: Andi Kleen <ak@suse.de>
Subject: Re: Externalize SLIT table
Date: Thu, 4 Nov 2004 05:07:13 +0100

> > Why not export node_distance() under sysfs?
> > I like (1).
> > 
> > (1) obey one-value-per-file sysfs principle
> > 
> > % cat /sys/devices/system/node/node0/distance0
> > 10
> 
> Surely distance from 0 to 0 is 0?

According to the ACPI spec, 10 means local and other values
mean ratio to 10.  But what the distance number should mean
mean is ambiguous from the spec (e.g. some veondors interpret as
memory access latency, others interpret as memory throughput
etc.)
However relative distance just works for most of uses, I believe.

Anyway, we should clarify how the numbers should be interpreted
to avoid confusion.

How about this?
"The distance to itself means the base value.  Distance to
other nodes are relative to the base value.
0 means unreachable (hot-removed or disabled) to that node."

(Just FYI, numbers 0-9 are reserved and 255 (unsigned char -1) means
unreachable, according to the ACPI spec.)

> > % cat /sys/devices/system/node/node0/distance1
> > 66
> 
> > 
> > (2) one distance for each line
> > 
> > % cat /sys/devices/system/node/node0/distance
> > 0:10
> > 1:66
> > 2:46
> > 3:66
> > 
> > (3) all distances in one line like /proc/<PID>/stat
> > 
> > % cat /sys/devices/system/node/node0/distance
> > 10 66 46 66
> 
> I would prefer that. 

Ah, I missed the following last sentence in
Documentation/filesystems/sysfs.txt:

|Attributes should be ASCII text files, preferably with only one value
|per file. It is noted that it may not be efficient to contain only
|value per file, so it is socially acceptable to express an array of
|values of the same type. 

If an array is acceptable, I would prefer (3), too.

---
Takayoshi Kochi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04  4:57       ` Takayoshi Kochi
@ 2004-11-04  6:37         ` Andi Kleen
  -1 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04  6:37 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: ak, steiner, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 01:57:21PM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> From: Andi Kleen <ak@suse.de>
> Subject: Re: Externalize SLIT table
> Date: Thu, 4 Nov 2004 05:07:13 +0100
> 
> > > Why not export node_distance() under sysfs?
> > > I like (1).
> > > 
> > > (1) obey one-value-per-file sysfs principle
> > > 
> > > % cat /sys/devices/system/node/node0/distance0
> > > 10
> > 
> > Surely distance from 0 to 0 is 0?
> 
> According to the ACPI spec, 10 means local and other values
> mean ratio to 10.  But what the distance number should mean

Ah, missed that. ok I guess it makes sense to use the same
encoding as ACPI, no need to be intentionally different.

> mean is ambiguous from the spec (e.g. some veondors interpret as
> memory access latency, others interpret as memory throughput
> etc.)
> However relative distance just works for most of uses, I believe.
> 
> Anyway, we should clarify how the numbers should be interpreted
> to avoid confusion.

Defining it as "as defined in the ACPI spec" should be ok. 
I guess even non ACPI architectures will be able to live with that.

Anyways, since we seem to agree and so far nobody has complained
it's just that somebody needs to do a patch?  If possible make
it generic code in drivers/acpi/numa.c, there won't be anything architecture 
specific in this and it should work for x86-64 too.


-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04  6:37         ` Andi Kleen
  0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04  6:37 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: ak, steiner, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 01:57:21PM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> From: Andi Kleen <ak@suse.de>
> Subject: Re: Externalize SLIT table
> Date: Thu, 4 Nov 2004 05:07:13 +0100
> 
> > > Why not export node_distance() under sysfs?
> > > I like (1).
> > > 
> > > (1) obey one-value-per-file sysfs principle
> > > 
> > > % cat /sys/devices/system/node/node0/distance0
> > > 10
> > 
> > Surely distance from 0 to 0 is 0?
> 
> According to the ACPI spec, 10 means local and other values
> mean ratio to 10.  But what the distance number should mean

Ah, missed that. ok I guess it makes sense to use the same
encoding as ACPI, no need to be intentionally different.

> mean is ambiguous from the spec (e.g. some veondors interpret as
> memory access latency, others interpret as memory throughput
> etc.)
> However relative distance just works for most of uses, I believe.
> 
> Anyway, we should clarify how the numbers should be interpreted
> to avoid confusion.

Defining it as "as defined in the ACPI spec" should be ok. 
I guess even non ACPI architectures will be able to live with that.

Anyways, since we seem to agree and so far nobody has complained
it's just that somebody needs to do a patch?  If possible make
it generic code in drivers/acpi/numa.c, there won't be anything architecture 
specific in this and it should work for x86-64 too.


-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04  1:59   ` Takayoshi Kochi
@ 2004-11-04 14:13     ` Jack Steiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-04 14:13 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> For wider audience, added LKML.
> 
> From: Jack Steiner <steiner@sgi.com>
> Subject: Externalize SLIT table
> Date: Wed, 3 Nov 2004 14:56:56 -0600
> 
> > The SLIT table provides useful information on internode
> > distances. Has anyone considered externalizing this
> > table via /proc or some equivalent mechanism.
> > 
> > For example, something like the following would be useful:
> > 
> > 	# cat /proc/acpi/slit
> > 	010 066 046 066
> > 	066 010 066 046
> > 	046 066 010 020
> > 	066 046 020 010
> > 
> > If this looks ok (or something equivalent), I'll generate a patch....
> 
> For user space to manipulate scheduling domains, pinning processes
> to some cpu groups etc, that kind of information is very useful!
> Without this, users have no notion about how far between two nodes.
> 
> But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> user-visible logical number and ACPI proximity domain number is
> not always identical.
> 
> Why not export node_distance() under sysfs?
> I like (1).
> 
> (1) obey one-value-per-file sysfs principle
> 
> % cat /sys/devices/system/node/node0/distance0
> 10
> % cat /sys/devices/system/node/node0/distance1
> 66

I'm not familar with the internals of sysfs. For example, on a 256 node
system, there will be 65536 instances of
	 /sys/devices/system/node/node<M>/distance<N>

Does this require any significant amount of kernel resources to
maintain this amount of information.




> 
> (2) one distance for each line
> 
> % cat /sys/devices/system/node/node0/distance
> 0:10
> 1:66
> 2:46
> 3:66
> 
> (3) all distances in one line like /proc/<PID>/stat
> 
> % cat /sys/devices/system/node/node0/distance
> 10 66 46 66
> 


I like (3) the best.

I think it would also be useful to have a similar cpu-to-cpu distance
metric:
	% cat /sys/devices/system/cpu/cpu0/distance
	10 20 40 60 

This gives the same information but is cpu-centric rather than
node centric.



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04 14:13     ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-04 14:13 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> For wider audience, added LKML.
> 
> From: Jack Steiner <steiner@sgi.com>
> Subject: Externalize SLIT table
> Date: Wed, 3 Nov 2004 14:56:56 -0600
> 
> > The SLIT table provides useful information on internode
> > distances. Has anyone considered externalizing this
> > table via /proc or some equivalent mechanism.
> > 
> > For example, something like the following would be useful:
> > 
> > 	# cat /proc/acpi/slit
> > 	010 066 046 066
> > 	066 010 066 046
> > 	046 066 010 020
> > 	066 046 020 010
> > 
> > If this looks ok (or something equivalent), I'll generate a patch....
> 
> For user space to manipulate scheduling domains, pinning processes
> to some cpu groups etc, that kind of information is very useful!
> Without this, users have no notion about how far between two nodes.
> 
> But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> user-visible logical number and ACPI proximity domain number is
> not always identical.
> 
> Why not export node_distance() under sysfs?
> I like (1).
> 
> (1) obey one-value-per-file sysfs principle
> 
> % cat /sys/devices/system/node/node0/distance0
> 10
> % cat /sys/devices/system/node/node0/distance1
> 66

I'm not familar with the internals of sysfs. For example, on a 256 node
system, there will be 65536 instances of
	 /sys/devices/system/node/node<M>/distance<N>

Does this require any significant amount of kernel resources to
maintain this amount of information.




> 
> (2) one distance for each line
> 
> % cat /sys/devices/system/node/node0/distance
> 0:10
> 1:66
> 2:46
> 3:66
> 
> (3) all distances in one line like /proc/<PID>/stat
> 
> % cat /sys/devices/system/node/node0/distance
> 10 66 46 66
> 


I like (3) the best.

I think it would also be useful to have a similar cpu-to-cpu distance
metric:
	% cat /sys/devices/system/cpu/cpu0/distance
	10 20 40 60 

This gives the same information but is cpu-centric rather than
node centric.



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04 14:13     ` Jack Steiner
@ 2004-11-04 14:29       ` Andi Kleen
  -1 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04 14:29 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 08:13:37AM -0600, Jack Steiner wrote:
> On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> > Hi,
> > 
> > For wider audience, added LKML.
> > 
> > From: Jack Steiner <steiner@sgi.com>
> > Subject: Externalize SLIT table
> > Date: Wed, 3 Nov 2004 14:56:56 -0600
> > 
> > > The SLIT table provides useful information on internode
> > > distances. Has anyone considered externalizing this
> > > table via /proc or some equivalent mechanism.
> > > 
> > > For example, something like the following would be useful:
> > > 
> > > 	# cat /proc/acpi/slit
> > > 	010 066 046 066
> > > 	066 010 066 046
> > > 	046 066 010 020
> > > 	066 046 020 010
> > > 
> > > If this looks ok (or something equivalent), I'll generate a patch....
> > 
> > For user space to manipulate scheduling domains, pinning processes
> > to some cpu groups etc, that kind of information is very useful!
> > Without this, users have no notion about how far between two nodes.
> > 
> > But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> > user-visible logical number and ACPI proximity domain number is
> > not always identical.
> > 
> > Why not export node_distance() under sysfs?
> > I like (1).
> > 
> > (1) obey one-value-per-file sysfs principle
> > 
> > % cat /sys/devices/system/node/node0/distance0
> > 10
> > % cat /sys/devices/system/node/node0/distance1
> > 66
> 
> I'm not familar with the internals of sysfs. For example, on a 256 node
> system, there will be 65536 instances of
> 	 /sys/devices/system/node/node<M>/distance<N>
> 
> Does this require any significant amount of kernel resources to
> maintain this amount of information.

Yes it does, even with the new sysfs backing store. And reading
it would create all the inodes and dentries, which are quite
bloated.

> 
> I think it would also be useful to have a similar cpu-to-cpu distance
> metric:
> 	% cat /sys/devices/system/cpu/cpu0/distance
> 	10 20 40 60 
> 
> This gives the same information but is cpu-centric rather than
> node centric.


And the same thing for PCI busses, like in this patch. However
for strict ACPI systems this information would need to be gotten
from _PXM first. x86-64 on Opteron currently reads it directly
from the hardware and uses it to allocate DMA memory near the device.

-Andi


diff -urpN -X ../KDIFX linux-2.6.8rc3/drivers/pci/pci-sysfs.c linux-2.6.8rc3-amd64/drivers/pci/pci-sysfs.c
--- linux-2.6.8rc3/drivers/pci/pci-sysfs.c	2004-07-27 14:44:10.000000000 +0200
+++ linux-2.6.8rc3-amd64/drivers/pci/pci-sysfs.c	2004-08-04 02:42:11.000000000 +0200
@@ -17,6 +17,7 @@
 #include <linux/kernel.h>
 #include <linux/pci.h>
 #include <linux/stat.h>
+#include <linux/topology.h>
 
 #include "pci.h"
 
@@ -38,6 +39,15 @@ pci_config_attr(subsystem_device, "0x%04
 pci_config_attr(class, "0x%06x\n");
 pci_config_attr(irq, "%u\n");
 
+static ssize_t local_cpus_show(struct device *dev, char *buf)
+{		
+	struct pci_dev *pdev = to_pci_dev(dev);
+	cpumask_t mask = pcibus_to_cpumask(pdev->bus->number);
+	int len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
+	strcat(buf,"\n"); 
+	return 1+len;
+}
+
 /* show resources */
 static ssize_t
 resource_show(struct device * dev, char * buf)
@@ -67,6 +77,7 @@ struct device_attribute pci_dev_attrs[] 
 	__ATTR_RO(subsystem_device),
 	__ATTR_RO(class),
 	__ATTR_RO(irq),
+	__ATTR_RO(local_cpus),
 	__ATTR_NULL,
 };
 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04 14:29       ` Andi Kleen
  0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04 14:29 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 08:13:37AM -0600, Jack Steiner wrote:
> On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> > Hi,
> > 
> > For wider audience, added LKML.
> > 
> > From: Jack Steiner <steiner@sgi.com>
> > Subject: Externalize SLIT table
> > Date: Wed, 3 Nov 2004 14:56:56 -0600
> > 
> > > The SLIT table provides useful information on internode
> > > distances. Has anyone considered externalizing this
> > > table via /proc or some equivalent mechanism.
> > > 
> > > For example, something like the following would be useful:
> > > 
> > > 	# cat /proc/acpi/slit
> > > 	010 066 046 066
> > > 	066 010 066 046
> > > 	046 066 010 020
> > > 	066 046 020 010
> > > 
> > > If this looks ok (or something equivalent), I'll generate a patch....
> > 
> > For user space to manipulate scheduling domains, pinning processes
> > to some cpu groups etc, that kind of information is very useful!
> > Without this, users have no notion about how far between two nodes.
> > 
> > But ACPI SLIT table is too arch specific (ia64 and x86 only) and
> > user-visible logical number and ACPI proximity domain number is
> > not always identical.
> > 
> > Why not export node_distance() under sysfs?
> > I like (1).
> > 
> > (1) obey one-value-per-file sysfs principle
> > 
> > % cat /sys/devices/system/node/node0/distance0
> > 10
> > % cat /sys/devices/system/node/node0/distance1
> > 66
> 
> I'm not familar with the internals of sysfs. For example, on a 256 node
> system, there will be 65536 instances of
> 	 /sys/devices/system/node/node<M>/distance<N>
> 
> Does this require any significant amount of kernel resources to
> maintain this amount of information.

Yes it does, even with the new sysfs backing store. And reading
it would create all the inodes and dentries, which are quite
bloated.

> 
> I think it would also be useful to have a similar cpu-to-cpu distance
> metric:
> 	% cat /sys/devices/system/cpu/cpu0/distance
> 	10 20 40 60 
> 
> This gives the same information but is cpu-centric rather than
> node centric.


And the same thing for PCI busses, like in this patch. However
for strict ACPI systems this information would need to be gotten
from _PXM first. x86-64 on Opteron currently reads it directly
from the hardware and uses it to allocate DMA memory near the device.

-Andi


diff -urpN -X ../KDIFX linux-2.6.8rc3/drivers/pci/pci-sysfs.c linux-2.6.8rc3-amd64/drivers/pci/pci-sysfs.c
--- linux-2.6.8rc3/drivers/pci/pci-sysfs.c	2004-07-27 14:44:10.000000000 +0200
+++ linux-2.6.8rc3-amd64/drivers/pci/pci-sysfs.c	2004-08-04 02:42:11.000000000 +0200
@@ -17,6 +17,7 @@
 #include <linux/kernel.h>
 #include <linux/pci.h>
 #include <linux/stat.h>
+#include <linux/topology.h>
 
 #include "pci.h"
 
@@ -38,6 +39,15 @@ pci_config_attr(subsystem_device, "0x%04
 pci_config_attr(class, "0x%06x\n");
 pci_config_attr(irq, "%u\n");
 
+static ssize_t local_cpus_show(struct device *dev, char *buf)
+{		
+	struct pci_dev *pdev = to_pci_dev(dev);
+	cpumask_t mask = pcibus_to_cpumask(pdev->bus->number);
+	int len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
+	strcat(buf,"\n"); 
+	return 1+len;
+}
+
 /* show resources */
 static ssize_t
 resource_show(struct device * dev, char * buf)
@@ -67,6 +77,7 @@ struct device_attribute pci_dev_attrs[] 
 	__ATTR_RO(subsystem_device),
 	__ATTR_RO(class),
 	__ATTR_RO(irq),
+	__ATTR_RO(local_cpus),
 	__ATTR_NULL,
 };
 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04 14:13     ` Jack Steiner
@ 2004-11-04 15:31       ` Erich Focht
  -1 siblings, 0 replies; 58+ messages in thread
From: Erich Focht @ 2004-11-04 15:31 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, linux-ia64, linux-kernel

On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> I think it would also be useful to have a similar cpu-to-cpu distance
> metric:
>         % cat /sys/devices/system/cpu/cpu0/distance
>         10 20 40 60 
> 
> This gives the same information but is cpu-centric rather than
> node centric.

I don't see the use of that once you have some way to find the logical
CPU to node number mapping. The "node distances" are meant to be
proportional to the memory access latency ratios (20 means 2 times
larger than local (intra-node) access, which is by definition 10). 
If the cpu_to_cpu distance is necessary because there is a hierarchy
in the memory blocks inside one node, then maybe the definition of a
node should be changed...

We currently have (at least in -mm kernels):
       % ls /sys/devices/system/node/node0/cpu*
for finding out which CPUs belong to which nodes. Together with
       /sys/devices/system/node/node0/distances
this should be enough for user space NUMA tools.

Regards,
Erich


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04 15:31       ` Erich Focht
  0 siblings, 0 replies; 58+ messages in thread
From: Erich Focht @ 2004-11-04 15:31 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, linux-ia64, linux-kernel

On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> I think it would also be useful to have a similar cpu-to-cpu distance
> metric:
>         % cat /sys/devices/system/cpu/cpu0/distance
>         10 20 40 60 
> 
> This gives the same information but is cpu-centric rather than
> node centric.

I don't see the use of that once you have some way to find the logical
CPU to node number mapping. The "node distances" are meant to be
proportional to the memory access latency ratios (20 means 2 times
larger than local (intra-node) access, which is by definition 10). 
If the cpu_to_cpu distance is necessary because there is a hierarchy
in the memory blocks inside one node, then maybe the definition of a
node should be changed...

We currently have (at least in -mm kernels):
       % ls /sys/devices/system/node/node0/cpu*
for finding out which CPUs belong to which nodes. Together with
       /sys/devices/system/node/node0/distances
this should be enough for user space NUMA tools.

Regards,
Erich


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04 15:31       ` Erich Focht
@ 2004-11-04 17:04         ` Andi Kleen
  -1 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04 17:04 UTC (permalink / raw)
  To: Erich Focht; +Cc: Jack Steiner, Takayoshi Kochi, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 04:31:42PM +0100, Erich Focht wrote:
> On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > I think it would also be useful to have a similar cpu-to-cpu distance
> > metric:
> > ????????% cat /sys/devices/system/cpu/cpu0/distance
> > ????????10 20 40 60 
> > 
> > This gives the same information but is cpu-centric rather than
> > node centric.
> 
> I don't see the use of that once you have some way to find the logical
> CPU to node number mapping. The "node distances" are meant to be

I think he wants it just to have a more convenient interface,
which is not necessarily a bad thing.  But then one could put the 
convenience into libnuma anyways.

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04 17:04         ` Andi Kleen
  0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-04 17:04 UTC (permalink / raw)
  To: Erich Focht; +Cc: Jack Steiner, Takayoshi Kochi, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 04:31:42PM +0100, Erich Focht wrote:
> On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > I think it would also be useful to have a similar cpu-to-cpu distance
> > metric:
> > ????????% cat /sys/devices/system/cpu/cpu0/distance
> > ????????10 20 40 60 
> > 
> > This gives the same information but is cpu-centric rather than
> > node centric.
> 
> I don't see the use of that once you have some way to find the logical
> CPU to node number mapping. The "node distances" are meant to be

I think he wants it just to have a more convenient interface,
which is not necessarily a bad thing.  But then one could put the 
convenience into libnuma anyways.

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04 17:04         ` Andi Kleen
@ 2004-11-04 19:36           ` Jack Steiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-04 19:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Erich Focht, Takayoshi Kochi, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 06:04:35PM +0100, Andi Kleen wrote:
> On Thu, Nov 04, 2004 at 04:31:42PM +0100, Erich Focht wrote:
> > On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > > I think it would also be useful to have a similar cpu-to-cpu distance
> > > metric:
> > > ????????% cat /sys/devices/system/cpu/cpu0/distance
> > > ????????10 20 40 60 
> > > 
> > > This gives the same information but is cpu-centric rather than
> > > node centric.
> > 
> > I don't see the use of that once you have some way to find the logical
> > CPU to node number mapping. The "node distances" are meant to be
> 
> I think he wants it just to have a more convenient interface,
> which is not necessarily a bad thing.  But then one could put the 
> convenience into libnuma anyways.
> 
> -Andi

Yes, strictly convenience.  Most of the cases that I have seen deal with
cpu placement & cpu distances from each other. I agree that cpu-to-cpu
distances can be determined by converting to nodes & finding the 
node-to-node distance.

A second reason is symmetry. If there is a /sys/devices/system/node/node0/distance
metric, it seems as though there should also be a /sys/devices/system/cpu/cpu0/distance
metric.

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-04 19:36           ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-04 19:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Erich Focht, Takayoshi Kochi, linux-ia64, linux-kernel

On Thu, Nov 04, 2004 at 06:04:35PM +0100, Andi Kleen wrote:
> On Thu, Nov 04, 2004 at 04:31:42PM +0100, Erich Focht wrote:
> > On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > > I think it would also be useful to have a similar cpu-to-cpu distance
> > > metric:
> > > ????????% cat /sys/devices/system/cpu/cpu0/distance
> > > ????????10 20 40 60 
> > > 
> > > This gives the same information but is cpu-centric rather than
> > > node centric.
> > 
> > I don't see the use of that once you have some way to find the logical
> > CPU to node number mapping. The "node distances" are meant to be
> 
> I think he wants it just to have a more convenient interface,
> which is not necessarily a bad thing.  But then one could put the 
> convenience into libnuma anyways.
> 
> -Andi

Yes, strictly convenience.  Most of the cases that I have seen deal with
cpu placement & cpu distances from each other. I agree that cpu-to-cpu
distances can be determined by converting to nodes & finding the 
node-to-node distance.

A second reason is symmetry. If there is a /sys/devices/system/node/node0/distance
metric, it seems as though there should also be a /sys/devices/system/cpu/cpu0/distance
metric.

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04  4:57       ` Takayoshi Kochi
@ 2004-11-05 16:08         ` Jack Steiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-05 16:08 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: ak, linux-ia64, linux-kernel

Based on the ideas from Andi & Takayoshi, I created a patch to
add the SLIT distance information to the sysfs.

I've tested this on Altix/IA64 & it appears to work ok. I have 
not tried it on other architectures.

Andi also posted a related patch for adding similar information 
for PCI busses.

Comments, suggestions, .....


	# cd /sys/devices/system
	# find .
	./node
	./node/node5
	./node/node5/cpu11
	./node/node5/cpu10
	./node/node5/distance
	./node/node5/numastat
	./node/node5/meminfo
	./node/node5/cpumap
	./node/node4
	./node/node4/cpu9
	./node/node4/cpu8
	./node/node4/distance
	./node/node4/numastat
	./node/node4/meminfo
	./node/node4/cpumap
	....
	./cpu
	./cpu/cpu11
	./cpu/cpu11/distance
	./cpu/cpu10
	./cpu/cpu10/distance
	./cpu/cpu9
	./cpu/cpu9/distance
	./cpu/cpu8
	...

	# cat ./node/node0/distance
	10 20 64 42 42 22

	# cat ./cpu/cpu8/distance
	42 42 64 64 22 22 42 42 10 10 20 20

	# cat node/*/distance
	10 20 64 42 42 22
	20 10 42 22 64 84
	64 42 10 20 22 42
	42 22 20 10 42 62
	42 64 22 42 10 20
	22 84 42 62 20 10

	# cat cpu/*/distance
	10 10 20 20 64 64 42 42 42 42 22 22
	10 10 20 20 64 64 42 42 42 42 22 22
	20 20 10 10 42 42 22 22 64 64 84 84
	20 20 10 10 42 42 22 22 64 64 84 84
	64 64 42 42 10 10 20 20 22 22 42 42
	64 64 42 42 10 10 20 20 22 22 42 42
	42 42 22 22 20 20 10 10 42 42 62 62
	42 42 22 22 20 20 10 10 42 42 62 62
	42 42 64 64 22 22 42 42 10 10 20 20
	42 42 64 64 22 22 42 42 10 10 20 20
	22 22 84 84 42 42 62 62 20 20 10 10
	22 22 84 84 42 42 62 62 20 20 10 10



Index: linux/drivers/base/node.c
===================================================================
--- linux.orig/drivers/base/node.c	2004-11-05 08:34:42.000000000 -0600
+++ linux/drivers/base/node.c	2004-11-05 09:00:01.000000000 -0600
@@ -111,6 +111,21 @@ static ssize_t node_read_numastat(struct
 }
 static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
+static ssize_t node_read_distance(struct sys_device * dev, char * buf)
+{
+	int nid = dev->id;
+	int len = 0;
+	int i;
+
+	for (i = 0; i < numnodes; i++)
+		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
+		
+	len += sprintf(buf + len, "\n");
+	return len;
+}
+static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL);
+
+
 /*
  * register_node - Setup a driverfs device for a node.
  * @num - Node number to use when creating the device.
@@ -129,6 +144,7 @@ int __init register_node(struct node *no
 		sysdev_create_file(&node->sysdev, &attr_cpumap);
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
+		sysdev_create_file(&node->sysdev, &attr_distance);
 	}
 	return error;
 }
Index: linux/drivers/base/cpu.c
===================================================================
--- linux.orig/drivers/base/cpu.c	2004-11-05 08:58:09.000000000 -0600
+++ linux/drivers/base/cpu.c	2004-11-05 08:59:25.000000000 -0600
@@ -8,6 +8,7 @@
 #include <linux/cpu.h>
 #include <linux/topology.h>
 #include <linux/device.h>
+#include <linux/cpumask.h>
 
 
 struct sysdev_class cpu_sysdev_class = {
@@ -58,6 +59,31 @@ static inline void register_cpu_control(
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_NUMA
+static ssize_t cpu_read_distance(struct sys_device * dev, char * buf)
+{
+	int nid = cpu_to_node(dev->id);
+	int len = 0;
+	int i;
+
+	for (i = 0; i < num_possible_cpus(); i++)
+		len += sprintf(buf + len, "%s%d", i ? " " : "", 
+			node_distance(nid, cpu_to_node(i)));
+	len += sprintf(buf + len, "\n");
+	return len;
+}
+static SYSDEV_ATTR(distance, S_IRUGO, cpu_read_distance, NULL);
+
+static inline void register_cpu_distance(struct cpu *cpu)
+{
+	sysdev_create_file(&cpu->sysdev, &attr_distance);
+}
+#else /* !CONFIG_NUMA */
+static inline void register_cpu_distance(struct cpu *cpu)
+{
+}
+#endif
+
 /*
  * register_cpu - Setup a driverfs device for a CPU.
  * @cpu - Callers can set the cpu->no_control field to 1, to indicate not to
@@ -81,6 +107,10 @@ int __init register_cpu(struct cpu *cpu,
 					  kobject_name(&cpu->sysdev.kobj));
 	if (!error && !cpu->no_control)
 		register_cpu_control(cpu);
+
+	if (!error)
+		register_cpu_distance(cpu);
+
 	return error;
 }
 

On Thu, Nov 04, 2004 at 01:57:21PM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> From: Andi Kleen <ak@suse.de>
> Subject: Re: Externalize SLIT table
> Date: Thu, 4 Nov 2004 05:07:13 +0100
> 
> > > Why not export node_distance() under sysfs?
> > > I like (1).
> > > 
> > > (1) obey one-value-per-file sysfs principle
> > > 
> > > % cat /sys/devices/system/node/node0/distance0
> > > 10
> > 
> > Surely distance from 0 to 0 is 0?
> 
> According to the ACPI spec, 10 means local and other values
> mean ratio to 10.  But what the distance number should mean
> mean is ambiguous from the spec (e.g. some veondors interpret as
> memory access latency, others interpret as memory throughput
> etc.)
> However relative distance just works for most of uses, I believe.
> 
> Anyway, we should clarify how the numbers should be interpreted
> to avoid confusion.
> 
> How about this?
> "The distance to itself means the base value.  Distance to
> other nodes are relative to the base value.
> 0 means unreachable (hot-removed or disabled) to that node."
> 
> (Just FYI, numbers 0-9 are reserved and 255 (unsigned char -1) means
> unreachable, according to the ACPI spec.)
> 
> > > % cat /sys/devices/system/node/node0/distance1
> > > 66
> > 
> > > 
> > > (2) one distance for each line
> > > 
> > > % cat /sys/devices/system/node/node0/distance
> > > 0:10
> > > 1:66
> > > 2:46
> > > 3:66
> > > 
> > > (3) all distances in one line like /proc/<PID>/stat
> > > 
> > > % cat /sys/devices/system/node/node0/distance
> > > 10 66 46 66
> > 
> > I would prefer that. 
> 
> Ah, I missed the following last sentence in
> Documentation/filesystems/sysfs.txt:
> 
> |Attributes should be ASCII text files, preferably with only one value
> |per file. It is noted that it may not be efficient to contain only
> |value per file, so it is socially acceptable to express an array of
> |values of the same type. 
> 
> If an array is acceptable, I would prefer (3), too.
> 
> ---
> Takayoshi Kochi

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-05 16:08         ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-05 16:08 UTC (permalink / raw)
  To: Takayoshi Kochi; +Cc: ak, linux-ia64, linux-kernel

Based on the ideas from Andi & Takayoshi, I created a patch to
add the SLIT distance information to the sysfs.

I've tested this on Altix/IA64 & it appears to work ok. I have 
not tried it on other architectures.

Andi also posted a related patch for adding similar information 
for PCI busses.

Comments, suggestions, .....


	# cd /sys/devices/system
	# find .
	./node
	./node/node5
	./node/node5/cpu11
	./node/node5/cpu10
	./node/node5/distance
	./node/node5/numastat
	./node/node5/meminfo
	./node/node5/cpumap
	./node/node4
	./node/node4/cpu9
	./node/node4/cpu8
	./node/node4/distance
	./node/node4/numastat
	./node/node4/meminfo
	./node/node4/cpumap
	....
	./cpu
	./cpu/cpu11
	./cpu/cpu11/distance
	./cpu/cpu10
	./cpu/cpu10/distance
	./cpu/cpu9
	./cpu/cpu9/distance
	./cpu/cpu8
	...

	# cat ./node/node0/distance
	10 20 64 42 42 22

	# cat ./cpu/cpu8/distance
	42 42 64 64 22 22 42 42 10 10 20 20

	# cat node/*/distance
	10 20 64 42 42 22
	20 10 42 22 64 84
	64 42 10 20 22 42
	42 22 20 10 42 62
	42 64 22 42 10 20
	22 84 42 62 20 10

	# cat cpu/*/distance
	10 10 20 20 64 64 42 42 42 42 22 22
	10 10 20 20 64 64 42 42 42 42 22 22
	20 20 10 10 42 42 22 22 64 64 84 84
	20 20 10 10 42 42 22 22 64 64 84 84
	64 64 42 42 10 10 20 20 22 22 42 42
	64 64 42 42 10 10 20 20 22 22 42 42
	42 42 22 22 20 20 10 10 42 42 62 62
	42 42 22 22 20 20 10 10 42 42 62 62
	42 42 64 64 22 22 42 42 10 10 20 20
	42 42 64 64 22 22 42 42 10 10 20 20
	22 22 84 84 42 42 62 62 20 20 10 10
	22 22 84 84 42 42 62 62 20 20 10 10



Index: linux/drivers/base/node.c
=================================--- linux.orig/drivers/base/node.c	2004-11-05 08:34:42.000000000 -0600
+++ linux/drivers/base/node.c	2004-11-05 09:00:01.000000000 -0600
@@ -111,6 +111,21 @@ static ssize_t node_read_numastat(struct
 }
 static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
+static ssize_t node_read_distance(struct sys_device * dev, char * buf)
+{
+	int nid = dev->id;
+	int len = 0;
+	int i;
+
+	for (i = 0; i < numnodes; i++)
+		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
+		
+	len += sprintf(buf + len, "\n");
+	return len;
+}
+static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL);
+
+
 /*
  * register_node - Setup a driverfs device for a node.
  * @num - Node number to use when creating the device.
@@ -129,6 +144,7 @@ int __init register_node(struct node *no
 		sysdev_create_file(&node->sysdev, &attr_cpumap);
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
+		sysdev_create_file(&node->sysdev, &attr_distance);
 	}
 	return error;
 }
Index: linux/drivers/base/cpu.c
=================================--- linux.orig/drivers/base/cpu.c	2004-11-05 08:58:09.000000000 -0600
+++ linux/drivers/base/cpu.c	2004-11-05 08:59:25.000000000 -0600
@@ -8,6 +8,7 @@
 #include <linux/cpu.h>
 #include <linux/topology.h>
 #include <linux/device.h>
+#include <linux/cpumask.h>
 
 
 struct sysdev_class cpu_sysdev_class = {
@@ -58,6 +59,31 @@ static inline void register_cpu_control(
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_NUMA
+static ssize_t cpu_read_distance(struct sys_device * dev, char * buf)
+{
+	int nid = cpu_to_node(dev->id);
+	int len = 0;
+	int i;
+
+	for (i = 0; i < num_possible_cpus(); i++)
+		len += sprintf(buf + len, "%s%d", i ? " " : "", 
+			node_distance(nid, cpu_to_node(i)));
+	len += sprintf(buf + len, "\n");
+	return len;
+}
+static SYSDEV_ATTR(distance, S_IRUGO, cpu_read_distance, NULL);
+
+static inline void register_cpu_distance(struct cpu *cpu)
+{
+	sysdev_create_file(&cpu->sysdev, &attr_distance);
+}
+#else /* !CONFIG_NUMA */
+static inline void register_cpu_distance(struct cpu *cpu)
+{
+}
+#endif
+
 /*
  * register_cpu - Setup a driverfs device for a CPU.
  * @cpu - Callers can set the cpu->no_control field to 1, to indicate not to
@@ -81,6 +107,10 @@ int __init register_cpu(struct cpu *cpu,
 					  kobject_name(&cpu->sysdev.kobj));
 	if (!error && !cpu->no_control)
 		register_cpu_control(cpu);
+
+	if (!error)
+		register_cpu_distance(cpu);
+
 	return error;
 }
 

On Thu, Nov 04, 2004 at 01:57:21PM +0900, Takayoshi Kochi wrote:
> Hi,
> 
> From: Andi Kleen <ak@suse.de>
> Subject: Re: Externalize SLIT table
> Date: Thu, 4 Nov 2004 05:07:13 +0100
> 
> > > Why not export node_distance() under sysfs?
> > > I like (1).
> > > 
> > > (1) obey one-value-per-file sysfs principle
> > > 
> > > % cat /sys/devices/system/node/node0/distance0
> > > 10
> > 
> > Surely distance from 0 to 0 is 0?
> 
> According to the ACPI spec, 10 means local and other values
> mean ratio to 10.  But what the distance number should mean
> mean is ambiguous from the spec (e.g. some veondors interpret as
> memory access latency, others interpret as memory throughput
> etc.)
> However relative distance just works for most of uses, I believe.
> 
> Anyway, we should clarify how the numbers should be interpreted
> to avoid confusion.
> 
> How about this?
> "The distance to itself means the base value.  Distance to
> other nodes are relative to the base value.
> 0 means unreachable (hot-removed or disabled) to that node."
> 
> (Just FYI, numbers 0-9 are reserved and 255 (unsigned char -1) means
> unreachable, according to the ACPI spec.)
> 
> > > % cat /sys/devices/system/node/node0/distance1
> > > 66
> > 
> > > 
> > > (2) one distance for each line
> > > 
> > > % cat /sys/devices/system/node/node0/distance
> > > 0:10
> > > 1:66
> > > 2:46
> > > 3:66
> > > 
> > > (3) all distances in one line like /proc/<PID>/stat
> > > 
> > > % cat /sys/devices/system/node/node0/distance
> > > 10 66 46 66
> > 
> > I would prefer that. 
> 
> Ah, I missed the following last sentence in
> Documentation/filesystems/sysfs.txt:
> 
> |Attributes should be ASCII text files, preferably with only one value
> |per file. It is noted that it may not be efficient to contain only
> |value per file, so it is socially acceptable to express an array of
> |values of the same type. 
> 
> If an array is acceptable, I would prefer (3), too.
> 
> ---
> Takayoshi Kochi

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-05 16:08         ` Jack Steiner
@ 2004-11-05 16:26           ` Andreas Schwab
  -1 siblings, 0 replies; 58+ messages in thread
From: Andreas Schwab @ 2004-11-05 16:26 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

Jack Steiner <steiner@sgi.com> writes:

> @@ -111,6 +111,21 @@ static ssize_t node_read_numastat(struct
>  }
>  static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
>  
> +static ssize_t node_read_distance(struct sys_device * dev, char * buf)
> +{
> +	int nid = dev->id;
> +	int len = 0;
> +	int i;
> +
> +	for (i = 0; i < numnodes; i++)
> +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));

Can this overflow the space allocated for buf?

> @@ -58,6 +59,31 @@ static inline void register_cpu_control(
>  }
>  #endif /* CONFIG_HOTPLUG_CPU */
>  
> +#ifdef CONFIG_NUMA
> +static ssize_t cpu_read_distance(struct sys_device * dev, char * buf)
> +{
> +	int nid = cpu_to_node(dev->id);
> +	int len = 0;
> +	int i;
> +
> +	for (i = 0; i < num_possible_cpus(); i++)
> +		len += sprintf(buf + len, "%s%d", i ? " " : "", 
> +			node_distance(nid, cpu_to_node(i)));

Or this?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-05 16:26           ` Andreas Schwab
  0 siblings, 0 replies; 58+ messages in thread
From: Andreas Schwab @ 2004-11-05 16:26 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

Jack Steiner <steiner@sgi.com> writes:

> @@ -111,6 +111,21 @@ static ssize_t node_read_numastat(struct
>  }
>  static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
>  
> +static ssize_t node_read_distance(struct sys_device * dev, char * buf)
> +{
> +	int nid = dev->id;
> +	int len = 0;
> +	int i;
> +
> +	for (i = 0; i < numnodes; i++)
> +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));

Can this overflow the space allocated for buf?

> @@ -58,6 +59,31 @@ static inline void register_cpu_control(
>  }
>  #endif /* CONFIG_HOTPLUG_CPU */
>  
> +#ifdef CONFIG_NUMA
> +static ssize_t cpu_read_distance(struct sys_device * dev, char * buf)
> +{
> +	int nid = cpu_to_node(dev->id);
> +	int len = 0;
> +	int i;
> +
> +	for (i = 0; i < num_possible_cpus(); i++)
> +		len += sprintf(buf + len, "%s%d", i ? " " : "", 
> +			node_distance(nid, cpu_to_node(i)));

Or this?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-05 16:26           ` Andreas Schwab
@ 2004-11-05 16:44             ` Jack Steiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-05 16:44 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

On Fri, Nov 05, 2004 at 05:26:10PM +0100, Andreas Schwab wrote:
> Jack Steiner <steiner@sgi.com> writes:
> 
> > @@ -111,6 +111,21 @@ static ssize_t node_read_numastat(struct
> >  }
> >  static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
> >  
> > +static ssize_t node_read_distance(struct sys_device * dev, char * buf)
> > +{
> > +	int nid = dev->id;
> > +	int len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < numnodes; i++)
> > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> 
> Can this overflow the space allocated for buf?


Good point. I think we are ok for now. AFAIK, the largest cpu count
currently supported is 512. That gives a max string of 2k (max of 3 
digits + space per cpu).

However, I should probably add a BUILD_BUG_ON to check for overflow.

	BUILD_BUG_ON(NR_NODES*4 > PAGE_SIZE/2);
	BUILD_BUG_ON(NR_CPUS*4 > PAGE_SIZE/2);



> 
> > @@ -58,6 +59,31 @@ static inline void register_cpu_control(
> >  }
> >  #endif /* CONFIG_HOTPLUG_CPU */
> >  
> > +#ifdef CONFIG_NUMA
> > +static ssize_t cpu_read_distance(struct sys_device * dev, char * buf)
> > +{
> > +	int nid = cpu_to_node(dev->id);
> > +	int len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < num_possible_cpus(); i++)
> > +		len += sprintf(buf + len, "%s%d", i ? " " : "", 
> > +			node_distance(nid, cpu_to_node(i)));
> 
> Or this?
> 
> Andreas.
> 
> -- 
> Andreas Schwab, SuSE Labs, schwab@suse.de
> SuSE Linux AG, Maxfeldstra_e 5, 90409 N|rnberg, Germany
> Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-05 16:44             ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-05 16:44 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

On Fri, Nov 05, 2004 at 05:26:10PM +0100, Andreas Schwab wrote:
> Jack Steiner <steiner@sgi.com> writes:
> 
> > @@ -111,6 +111,21 @@ static ssize_t node_read_numastat(struct
> >  }
> >  static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
> >  
> > +static ssize_t node_read_distance(struct sys_device * dev, char * buf)
> > +{
> > +	int nid = dev->id;
> > +	int len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < numnodes; i++)
> > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> 
> Can this overflow the space allocated for buf?


Good point. I think we are ok for now. AFAIK, the largest cpu count
currently supported is 512. That gives a max string of 2k (max of 3 
digits + space per cpu).

However, I should probably add a BUILD_BUG_ON to check for overflow.

	BUILD_BUG_ON(NR_NODES*4 > PAGE_SIZE/2);
	BUILD_BUG_ON(NR_CPUS*4 > PAGE_SIZE/2);



> 
> > @@ -58,6 +59,31 @@ static inline void register_cpu_control(
> >  }
> >  #endif /* CONFIG_HOTPLUG_CPU */
> >  
> > +#ifdef CONFIG_NUMA
> > +static ssize_t cpu_read_distance(struct sys_device * dev, char * buf)
> > +{
> > +	int nid = cpu_to_node(dev->id);
> > +	int len = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < num_possible_cpus(); i++)
> > +		len += sprintf(buf + len, "%s%d", i ? " " : "", 
> > +			node_distance(nid, cpu_to_node(i)));
> 
> Or this?
> 
> Andreas.
> 
> -- 
> Andreas Schwab, SuSE Labs, schwab@suse.de
> SuSE Linux AG, Maxfeldstra_e 5, 90409 N|rnberg, Germany
> Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-05 16:08         ` Jack Steiner
@ 2004-11-05 17:13           ` Erich Focht
  -1 siblings, 0 replies; 58+ messages in thread
From: Erich Focht @ 2004-11-05 17:13 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

Hi Jack,

the patch looks fine, of course.
> 	# cat ./node/node0/distance
> 	10 20 64 42 42 22
Great!

But:
> 	# cat ./cpu/cpu8/distance
> 	42 42 64 64 22 22 42 42 10 10 20 20
...

what exactly do you mean by cpu_to_cpu distance? In analogy with the
node distance I'd say it is the time (latency) for moving data from
the register of one CPU into the register of another CPU:
        cpu*/distance :   cpu -> memory -> cpu
                         node1   node?    node2

On most architectures this means flushing a cacheline to memory on one
side and reading it on another side. What you actually implement is
the latency from memory (one node) to a particular cpu (on some
node). 
                       memory ->  cpu
                       node1     node2

That's only half of the story and actually misleading. I don't
think the complexity hiding is good in this place. Questions coming to
my mind are: Where is the memory? Is the SLIT matrix really symmetric
(cpu_to_cpu distance only makes sense for symmetric matrices)? I
remember talking to IBM people about hardware where the node distance
matrix was asymmetric.

Why do you want this distance anyway? libnuma offers you _node_ masks
for allocating memory from a particular node. And when you want to
arrange a complex MPI process structure you'll have to think about
latency for moving data from one processes buffer to the other
processes buffer. The buffers live on nodes, not on cpus.

Regards,
Erich


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-05 17:13           ` Erich Focht
  0 siblings, 0 replies; 58+ messages in thread
From: Erich Focht @ 2004-11-05 17:13 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

Hi Jack,

the patch looks fine, of course.
> 	# cat ./node/node0/distance
> 	10 20 64 42 42 22
Great!

But:
> 	# cat ./cpu/cpu8/distance
> 	42 42 64 64 22 22 42 42 10 10 20 20
...

what exactly do you mean by cpu_to_cpu distance? In analogy with the
node distance I'd say it is the time (latency) for moving data from
the register of one CPU into the register of another CPU:
        cpu*/distance :   cpu -> memory -> cpu
                         node1   node?    node2

On most architectures this means flushing a cacheline to memory on one
side and reading it on another side. What you actually implement is
the latency from memory (one node) to a particular cpu (on some
node). 
                       memory ->  cpu
                       node1     node2

That's only half of the story and actually misleading. I don't
think the complexity hiding is good in this place. Questions coming to
my mind are: Where is the memory? Is the SLIT matrix really symmetric
(cpu_to_cpu distance only makes sense for symmetric matrices)? I
remember talking to IBM people about hardware where the node distance
matrix was asymmetric.

Why do you want this distance anyway? libnuma offers you _node_ masks
for allocating memory from a particular node. And when you want to
arrange a complex MPI process structure you'll have to think about
latency for moving data from one processes buffer to the other
processes buffer. The buffers live on nodes, not on cpus.

Regards,
Erich


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-05 17:13           ` Erich Focht
@ 2004-11-05 19:13             ` Jack Steiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-05 19:13 UTC (permalink / raw)
  To: Erich Focht; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

On Fri, Nov 05, 2004 at 06:13:24PM +0100, Erich Focht wrote:
> Hi Jack,
> 
> the patch looks fine, of course.
> > 	# cat ./node/node0/distance
> > 	10 20 64 42 42 22
> Great!
> 
> But:
> > 	# cat ./cpu/cpu8/distance
> > 	42 42 64 64 22 22 42 42 10 10 20 20
> ...
> 
> what exactly do you mean by cpu_to_cpu distance? In analogy with the
> node distance I'd say it is the time (latency) for moving data from
> the register of one CPU into the register of another CPU:
>         cpu*/distance :   cpu -> memory -> cpu
>                          node1   node?    node2
> 

I'm trying to create an easy-to-use metric for finding sets of cpus that
are close to each other. By "close", I mean that the average offnode
reference from a cpu to remote memory in the set is minimized.

The numbers in cpuN/distance represent the distance from cpu N to 
the memory that is local to each of the other cpus. 

I agree that this can be derived from converting cpuN->node, finding
internode distances, then finding the cpus on each remote node.
The cpu metric is much easier to use. 


> On most architectures this means flushing a cacheline to memory on one
> side and reading it on another side. What you actually implement is
> the latency from memory (one node) to a particular cpu (on some
> node). 
>                        memory ->  cpu
>                        node1     node2

I see how the term can be misleading. The metric is intended to 
represent ONLY the cost of remote access to another processor's local memory.
Is there a better way to describe the cpu-to-remote-cpu's-memory metric OR
should we let users contruct their own matrix from the node data?


> 
> That's only half of the story and actually misleading. I don't
> think the complexity hiding is good in this place. Questions coming to
> my mind are: Where is the memory? Is the SLIT matrix really symmetric
> (cpu_to_cpu distance only makes sense for symmetric matrices)? I
> remember talking to IBM people about hardware where the node distance
> matrix was asymmetric.
> 
> Why do you want this distance anyway? libnuma offers you _node_ masks
> for allocating memory from a particular node. And when you want to
> arrange a complex MPI process structure you'll have to think about
> latency for moving data from one processes buffer to the other
> processes buffer. The buffers live on nodes, not on cpus.

One important use is in the creation of cpusets. The batch scheduler needs 
to pick a subset of cpus that are as close together as possible.


-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-05 19:13             ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-05 19:13 UTC (permalink / raw)
  To: Erich Focht; +Cc: Takayoshi Kochi, ak, linux-ia64, linux-kernel

On Fri, Nov 05, 2004 at 06:13:24PM +0100, Erich Focht wrote:
> Hi Jack,
> 
> the patch looks fine, of course.
> > 	# cat ./node/node0/distance
> > 	10 20 64 42 42 22
> Great!
> 
> But:
> > 	# cat ./cpu/cpu8/distance
> > 	42 42 64 64 22 22 42 42 10 10 20 20
> ...
> 
> what exactly do you mean by cpu_to_cpu distance? In analogy with the
> node distance I'd say it is the time (latency) for moving data from
> the register of one CPU into the register of another CPU:
>         cpu*/distance :   cpu -> memory -> cpu
>                          node1   node?    node2
> 

I'm trying to create an easy-to-use metric for finding sets of cpus that
are close to each other. By "close", I mean that the average offnode
reference from a cpu to remote memory in the set is minimized.

The numbers in cpuN/distance represent the distance from cpu N to 
the memory that is local to each of the other cpus. 

I agree that this can be derived from converting cpuN->node, finding
internode distances, then finding the cpus on each remote node.
The cpu metric is much easier to use. 


> On most architectures this means flushing a cacheline to memory on one
> side and reading it on another side. What you actually implement is
> the latency from memory (one node) to a particular cpu (on some
> node). 
>                        memory ->  cpu
>                        node1     node2

I see how the term can be misleading. The metric is intended to 
represent ONLY the cost of remote access to another processor's local memory.
Is there a better way to describe the cpu-to-remote-cpu's-memory metric OR
should we let users contruct their own matrix from the node data?


> 
> That's only half of the story and actually misleading. I don't
> think the complexity hiding is good in this place. Questions coming to
> my mind are: Where is the memory? Is the SLIT matrix really symmetric
> (cpu_to_cpu distance only makes sense for symmetric matrices)? I
> remember talking to IBM people about hardware where the node distance
> matrix was asymmetric.
> 
> Why do you want this distance anyway? libnuma offers you _node_ masks
> for allocating memory from a particular node. And when you want to
> arrange a complex MPI process structure you'll have to think about
> latency for moving data from one processes buffer to the other
> processes buffer. The buffers live on nodes, not on cpus.

One important use is in the creation of cpusets. The batch scheduler needs 
to pick a subset of cpus that are as close together as possible.


-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-05 16:44             ` Jack Steiner
@ 2004-11-06 11:50               ` Christoph Hellwig
  -1 siblings, 0 replies; 58+ messages in thread
From: Christoph Hellwig @ 2004-11-06 11:50 UTC (permalink / raw)
  To: Jack Steiner
  Cc: Andreas Schwab, Takayoshi Kochi, ak, linux-ia64, linux-kernel

On Fri, Nov 05, 2004 at 10:44:49AM -0600, Jack Steiner wrote:
> > > +	for (i = 0; i < numnodes; i++)
> > > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> > 
> > Can this overflow the space allocated for buf?
> 
> 
> Good point. I think we are ok for now. AFAIK, the largest cpu count
> currently supported is 512. That gives a max string of 2k (max of 3 
> digits + space per cpu).

I always wondered why sysfs doesn't use the seq_file interface that makes
life easier in the rest of them kernel.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-06 11:50               ` Christoph Hellwig
  0 siblings, 0 replies; 58+ messages in thread
From: Christoph Hellwig @ 2004-11-06 11:50 UTC (permalink / raw)
  To: Jack Steiner
  Cc: Andreas Schwab, Takayoshi Kochi, ak, linux-ia64, linux-kernel

On Fri, Nov 05, 2004 at 10:44:49AM -0600, Jack Steiner wrote:
> > > +	for (i = 0; i < numnodes; i++)
> > > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> > 
> > Can this overflow the space allocated for buf?
> 
> 
> Good point. I think we are ok for now. AFAIK, the largest cpu count
> currently supported is 512. That gives a max string of 2k (max of 3 
> digits + space per cpu).

I always wondered why sysfs doesn't use the seq_file interface that makes
life easier in the rest of them kernel.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-06 11:50               ` Christoph Hellwig
@ 2004-11-06 12:48                 ` Andi Kleen
  -1 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-06 12:48 UTC (permalink / raw)
  To: Christoph Hellwig, Jack Steiner, Andreas Schwab, Takayoshi Kochi,
	ak, linux-ia64, linux-kernel

On Sat, Nov 06, 2004 at 11:50:29AM +0000, Christoph Hellwig wrote:
> On Fri, Nov 05, 2004 at 10:44:49AM -0600, Jack Steiner wrote:
> > > > +	for (i = 0; i < numnodes; i++)
> > > > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> > > 
> > > Can this overflow the space allocated for buf?
> > 
> > 
> > Good point. I think we are ok for now. AFAIK, the largest cpu count
> > currently supported is 512. That gives a max string of 2k (max of 3 
> > digits + space per cpu).
> 
> I always wondered why sysfs doesn't use the seq_file interface that makes
> life easier in the rest of them kernel.

Most fields only output a single number, and seq_file would be 
extreme overkill for that.

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-06 12:48                 ` Andi Kleen
  0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-06 12:48 UTC (permalink / raw)
  To: Christoph Hellwig, Jack Steiner, Andreas Schwab, Takayoshi Kochi,
	ak, linux-ia64, linux-kernel

On Sat, Nov 06, 2004 at 11:50:29AM +0000, Christoph Hellwig wrote:
> On Fri, Nov 05, 2004 at 10:44:49AM -0600, Jack Steiner wrote:
> > > > +	for (i = 0; i < numnodes; i++)
> > > > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> > > 
> > > Can this overflow the space allocated for buf?
> > 
> > 
> > Good point. I think we are ok for now. AFAIK, the largest cpu count
> > currently supported is 512. That gives a max string of 2k (max of 3 
> > digits + space per cpu).
> 
> I always wondered why sysfs doesn't use the seq_file interface that makes
> life easier in the rest of them kernel.

Most fields only output a single number, and seq_file would be 
extreme overkill for that.

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-06 12:48                 ` Andi Kleen
@ 2004-11-06 13:07                   ` Christoph Hellwig
  -1 siblings, 0 replies; 58+ messages in thread
From: Christoph Hellwig @ 2004-11-06 13:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Jack Steiner, Andreas Schwab, Takayoshi Kochi,
	linux-ia64, linux-kernel

On Sat, Nov 06, 2004 at 01:48:38PM +0100, Andi Kleen wrote:
> On Sat, Nov 06, 2004 at 11:50:29AM +0000, Christoph Hellwig wrote:
> > On Fri, Nov 05, 2004 at 10:44:49AM -0600, Jack Steiner wrote:
> > > > > +	for (i = 0; i < numnodes; i++)
> > > > > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> > > > 
> > > > Can this overflow the space allocated for buf?
> > > 
> > > 
> > > Good point. I think we are ok for now. AFAIK, the largest cpu count
> > > currently supported is 512. That gives a max string of 2k (max of 3 
> > > digits + space per cpu).
> > 
> > I always wondered why sysfs doesn't use the seq_file interface that makes
> > life easier in the rest of them kernel.
> 
> Most fields only output a single number, and seq_file would be 
> extreme overkill for that.

Personally I think even a:

static void
show_foo(struct device *dev, struct seq_file *s)
{
	seq_printf(s, "blafcsvsdfg\n");
}

static ssize_t
show_foo(struct device *dev, char *buf)
{
	return snprintf(buf, 20, "blafcsvsdfg\n");
}

would be a definitive improvement.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-06 13:07                   ` Christoph Hellwig
  0 siblings, 0 replies; 58+ messages in thread
From: Christoph Hellwig @ 2004-11-06 13:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Jack Steiner, Andreas Schwab, Takayoshi Kochi,
	linux-ia64, linux-kernel

On Sat, Nov 06, 2004 at 01:48:38PM +0100, Andi Kleen wrote:
> On Sat, Nov 06, 2004 at 11:50:29AM +0000, Christoph Hellwig wrote:
> > On Fri, Nov 05, 2004 at 10:44:49AM -0600, Jack Steiner wrote:
> > > > > +	for (i = 0; i < numnodes; i++)
> > > > > +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
> > > > 
> > > > Can this overflow the space allocated for buf?
> > > 
> > > 
> > > Good point. I think we are ok for now. AFAIK, the largest cpu count
> > > currently supported is 512. That gives a max string of 2k (max of 3 
> > > digits + space per cpu).
> > 
> > I always wondered why sysfs doesn't use the seq_file interface that makes
> > life easier in the rest of them kernel.
> 
> Most fields only output a single number, and seq_file would be 
> extreme overkill for that.

Personally I think even a:

static void
show_foo(struct device *dev, struct seq_file *s)
{
	seq_printf(s, "blafcsvsdfg\n");
}

static ssize_t
show_foo(struct device *dev, char *buf)
{
	return snprintf(buf, 20, "blafcsvsdfg\n");
}

would be a definitive improvement.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04  4:07     ` Andi Kleen
@ 2004-11-09 19:23       ` Matthew Dobson
  -1 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 19:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Takayoshi Kochi, steiner, linux-ia64, LKML

On Wed, 2004-11-03 at 20:07, Andi Kleen wrote:
> On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> > (3) all distances in one line like /proc/<PID>/stat
> > 
> > % cat /sys/devices/system/node/node0/distance
> > 10 66 46 66
> 
> I would prefer that. 
> 
> -Andi

That would be my vote as well.  One line, space delimited.  Easy to
parse...  Plus you could easily reproduce the entire SLIT matrix by:

cd /sys/devices/system/node/
for i in `ls node*`; do cat $i/distance; done


-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-09 19:23       ` Matthew Dobson
  0 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 19:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Takayoshi Kochi, steiner, linux-ia64, LKML

On Wed, 2004-11-03 at 20:07, Andi Kleen wrote:
> On Thu, Nov 04, 2004 at 10:59:08AM +0900, Takayoshi Kochi wrote:
> > (3) all distances in one line like /proc/<PID>/stat
> > 
> > % cat /sys/devices/system/node/node0/distance
> > 10 66 46 66
> 
> I would prefer that. 
> 
> -Andi

That would be my vote as well.  One line, space delimited.  Easy to
parse...  Plus you could easily reproduce the entire SLIT matrix by:

cd /sys/devices/system/node/
for i in `ls node*`; do cat $i/distance; done


-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04 15:31       ` Erich Focht
@ 2004-11-09 19:43         ` Matthew Dobson
  -1 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 19:43 UTC (permalink / raw)
  To: Erich Focht; +Cc: Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Thu, 2004-11-04 at 07:31, Erich Focht wrote:
> On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > I think it would also be useful to have a similar cpu-to-cpu distance
> > metric:
> >         % cat /sys/devices/system/cpu/cpu0/distance
> >         10 20 40 60 
> > 
> > This gives the same information but is cpu-centric rather than
> > node centric.
> 
> I don't see the use of that once you have some way to find the logical
> CPU to node number mapping. The "node distances" are meant to be
> proportional to the memory access latency ratios (20 means 2 times
> larger than local (intra-node) access, which is by definition 10). 
> If the cpu_to_cpu distance is necessary because there is a hierarchy
> in the memory blocks inside one node, then maybe the definition of a
> node should be changed...
> 
> We currently have (at least in -mm kernels):
>        % ls /sys/devices/system/node/node0/cpu*
> for finding out which CPUs belong to which nodes. Together with
>        /sys/devices/system/node/node0/distances
> this should be enough for user space NUMA tools.
> 
> Regards,
> Erich

I have to agree with Erich here.  Node distances make sense, but adding
'cpu distances' which are just re-exporting the node distances in each
cpu's directory in sysfs doesn't make much sense to me.  Especially
because it is so trivial to get a list of which CPUs are on which node. 
If you're looking for groups of CPUs which are close, simply look for
groups of nodes that are close, then use the CPUs on those nodes.  If we
came up with some sort of different notion of 'distance' for CPUs and
exported that, I'd be OK with it, because it'd be new information.  I
don't think we should export the *exact same* node distance information
through the CPUs, though.

-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-09 19:43         ` Matthew Dobson
  0 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 19:43 UTC (permalink / raw)
  To: Erich Focht; +Cc: Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Thu, 2004-11-04 at 07:31, Erich Focht wrote:
> On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > I think it would also be useful to have a similar cpu-to-cpu distance
> > metric:
> >         % cat /sys/devices/system/cpu/cpu0/distance
> >         10 20 40 60 
> > 
> > This gives the same information but is cpu-centric rather than
> > node centric.
> 
> I don't see the use of that once you have some way to find the logical
> CPU to node number mapping. The "node distances" are meant to be
> proportional to the memory access latency ratios (20 means 2 times
> larger than local (intra-node) access, which is by definition 10). 
> If the cpu_to_cpu distance is necessary because there is a hierarchy
> in the memory blocks inside one node, then maybe the definition of a
> node should be changed...
> 
> We currently have (at least in -mm kernels):
>        % ls /sys/devices/system/node/node0/cpu*
> for finding out which CPUs belong to which nodes. Together with
>        /sys/devices/system/node/node0/distances
> this should be enough for user space NUMA tools.
> 
> Regards,
> Erich

I have to agree with Erich here.  Node distances make sense, but adding
'cpu distances' which are just re-exporting the node distances in each
cpu's directory in sysfs doesn't make much sense to me.  Especially
because it is so trivial to get a list of which CPUs are on which node. 
If you're looking for groups of CPUs which are close, simply look for
groups of nodes that are close, then use the CPUs on those nodes.  If we
came up with some sort of different notion of 'distance' for CPUs and
exported that, I'd be OK with it, because it'd be new information.  I
don't think we should export the *exact same* node distance information
through the CPUs, though.

-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-04 17:04         ` Andi Kleen
@ 2004-11-09 19:45           ` Matthew Dobson
  -1 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 19:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Thu, 2004-11-04 at 09:04, Andi Kleen wrote:
> On Thu, Nov 04, 2004 at 04:31:42PM +0100, Erich Focht wrote:
> > On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > > I think it would also be useful to have a similar cpu-to-cpu distance
> > > metric:
> > > ????????% cat /sys/devices/system/cpu/cpu0/distance
> > > ????????10 20 40 60 
> > > 
> > > This gives the same information but is cpu-centric rather than
> > > node centric.
> > 
> > I don't see the use of that once you have some way to find the logical
> > CPU to node number mapping. The "node distances" are meant to be
> 
> I think he wants it just to have a more convenient interface,
> which is not necessarily a bad thing.  But then one could put the 
> convenience into libnuma anyways.
> 
> -Andi

Using libnuma sounds fine to me.  On a 512 CPU system, with 4 CPUs/node,
we'd have 128 nodes.  Re-exporting ALL the same data, those huge strings
of node-to-node distances, 512 *additional* times in the per-CPU sysfs
directories seems like a waste.

-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-09 19:45           ` Matthew Dobson
  0 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 19:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Thu, 2004-11-04 at 09:04, Andi Kleen wrote:
> On Thu, Nov 04, 2004 at 04:31:42PM +0100, Erich Focht wrote:
> > On Thursday 04 November 2004 15:13, Jack Steiner wrote:
> > > I think it would also be useful to have a similar cpu-to-cpu distance
> > > metric:
> > > ????????% cat /sys/devices/system/cpu/cpu0/distance
> > > ????????10 20 40 60 
> > > 
> > > This gives the same information but is cpu-centric rather than
> > > node centric.
> > 
> > I don't see the use of that once you have some way to find the logical
> > CPU to node number mapping. The "node distances" are meant to be
> 
> I think he wants it just to have a more convenient interface,
> which is not necessarily a bad thing.  But then one could put the 
> convenience into libnuma anyways.
> 
> -Andi

Using libnuma sounds fine to me.  On a 512 CPU system, with 4 CPUs/node,
we'd have 128 nodes.  Re-exporting ALL the same data, those huge strings
of node-to-node distances, 512 *additional* times in the per-CPU sysfs
directories seems like a waste.

-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-09 19:43         ` Matthew Dobson
@ 2004-11-09 20:34           ` Mark Goodwin
  -1 siblings, 0 replies; 58+ messages in thread
From: Mark Goodwin @ 2004-11-09 20:34 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML


On Tue, 9 Nov 2004, Matthew Dobson wrote:
> ...
> I don't think we should export the *exact same* node distance information
> through the CPUs, though.

We should still export cpu distances though because the distance between
cpus on the same node may not be equal. e.g. consider a node with multiple
cpu sockets, each socket with a hyperthreaded (or dual core) cpu.

Once again however, it depends on the definition of distance. For nodes,
we've established it's the ACPI SLIT (relative distance to memory). For
cpus, should it be distance to memory? Distance to cache? Registers? Or
what?

-- Mark

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-09 20:34           ` Mark Goodwin
  0 siblings, 0 replies; 58+ messages in thread
From: Mark Goodwin @ 2004-11-09 20:34 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML


On Tue, 9 Nov 2004, Matthew Dobson wrote:
> ...
> I don't think we should export the *exact same* node distance information
> through the CPUs, though.

We should still export cpu distances though because the distance between
cpus on the same node may not be equal. e.g. consider a node with multiple
cpu sockets, each socket with a hyperthreaded (or dual core) cpu.

Once again however, it depends on the definition of distance. For nodes,
we've established it's the ACPI SLIT (relative distance to memory). For
cpus, should it be distance to memory? Distance to cache? Registers? Or
what?

-- Mark

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-09 20:34           ` Mark Goodwin
@ 2004-11-09 22:00             ` Jesse Barnes
  -1 siblings, 0 replies; 58+ messages in thread
From: Jesse Barnes @ 2004-11-09 22:00 UTC (permalink / raw)
  To: Mark Goodwin
  Cc: Matthew Dobson, Erich Focht, Jack Steiner, Takayoshi Kochi,
	linux-ia64, LKML

On Tuesday, November 09, 2004 3:34 pm, Mark Goodwin wrote:
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> > ...
> > I don't think we should export the *exact same* node distance information
> > through the CPUs, though.
>
> We should still export cpu distances though because the distance between
> cpus on the same node may not be equal. e.g. consider a node with multiple
> cpu sockets, each socket with a hyperthreaded (or dual core) cpu.
>
> Once again however, it depends on the definition of distance. For nodes,
> we've established it's the ACPI SLIT (relative distance to memory). For
> cpus, should it be distance to memory? Distance to cache? Registers? Or
> what?

Yeah, that's a tough call.  We should definitely get the node stuff in there 
now though, IMO.  We can always add the CPU distances later if we figure out 
what they should mean.

Jesse

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-09 22:00             ` Jesse Barnes
  0 siblings, 0 replies; 58+ messages in thread
From: Jesse Barnes @ 2004-11-09 22:00 UTC (permalink / raw)
  To: Mark Goodwin
  Cc: Matthew Dobson, Erich Focht, Jack Steiner, Takayoshi Kochi,
	linux-ia64, LKML

On Tuesday, November 09, 2004 3:34 pm, Mark Goodwin wrote:
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> > ...
> > I don't think we should export the *exact same* node distance information
> > through the CPUs, though.
>
> We should still export cpu distances though because the distance between
> cpus on the same node may not be equal. e.g. consider a node with multiple
> cpu sockets, each socket with a hyperthreaded (or dual core) cpu.
>
> Once again however, it depends on the definition of distance. For nodes,
> we've established it's the ACPI SLIT (relative distance to memory). For
> cpus, should it be distance to memory? Distance to cache? Registers? Or
> what?

Yeah, that's a tough call.  We should definitely get the node stuff in there 
now though, IMO.  We can always add the CPU distances later if we figure out 
what they should mean.

Jesse

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-09 20:34           ` Mark Goodwin
@ 2004-11-09 23:58             ` Matthew Dobson
  -1 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 23:58 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> > ...
> > I don't think we should export the *exact same* node distance information
> > through the CPUs, though.
> 
> We should still export cpu distances though because the distance between
> cpus on the same node may not be equal. e.g. consider a node with multiple
> cpu sockets, each socket with a hyperthreaded (or dual core) cpu.

Well, I'm not sure that just because a CPU has two hyperthread units in
the same core that those HT units have a different distance or latency
to memory...?  The fact that it is a HT unit and not a physical core has
implications to the scheduler, but I thought that the 2 siblings looked
identical to userspace, no?  If 2 CPUs in the same node are on the same
bus, then in all likelihood they have the same "distance".


> Once again however, it depends on the definition of distance. For nodes,
> we've established it's the ACPI SLIT (relative distance to memory). For
> cpus, should it be distance to memory? Distance to cache? Registers? Or
> what?
> 
> -- Mark

That's the real issue.  We need to agree upon a meaningful definition of
CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
agree on what Node-to-Node "distance" means, but there doesn't appear to
be much consensus on what CPU "distance" means.

-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-09 23:58             ` Matthew Dobson
  0 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-09 23:58 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> > ...
> > I don't think we should export the *exact same* node distance information
> > through the CPUs, though.
> 
> We should still export cpu distances though because the distance between
> cpus on the same node may not be equal. e.g. consider a node with multiple
> cpu sockets, each socket with a hyperthreaded (or dual core) cpu.

Well, I'm not sure that just because a CPU has two hyperthread units in
the same core that those HT units have a different distance or latency
to memory...?  The fact that it is a HT unit and not a physical core has
implications to the scheduler, but I thought that the 2 siblings looked
identical to userspace, no?  If 2 CPUs in the same node are on the same
bus, then in all likelihood they have the same "distance".


> Once again however, it depends on the definition of distance. For nodes,
> we've established it's the ACPI SLIT (relative distance to memory). For
> cpus, should it be distance to memory? Distance to cache? Registers? Or
> what?
> 
> -- Mark

That's the real issue.  We need to agree upon a meaningful definition of
CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
agree on what Node-to-Node "distance" means, but there doesn't appear to
be much consensus on what CPU "distance" means.

-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-09 23:58             ` Matthew Dobson
@ 2004-11-10  5:05               ` Mark Goodwin
  -1 siblings, 0 replies; 58+ messages in thread
From: Mark Goodwin @ 2004-11-10  5:05 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML


On Tue, 9 Nov 2004, Matthew Dobson wrote:
> On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
>> Once again however, it depends on the definition of distance. For nodes,
>> we've established it's the ACPI SLIT (relative distance to memory). For
>> cpus, should it be distance to memory? Distance to cache? Registers? Or
>> what?
>>
> That's the real issue.  We need to agree upon a meaningful definition of
> CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
> agree on what Node-to-Node "distance" means, but there doesn't appear to
> be much consensus on what CPU "distance" means.

How about we define cpu-distance to be "relative distance to the
lowest level cache on another CPU". On a system that has nodes with
multiple sockets (each supporting multiple cores or HT "CPUs" sharing
some level of cache), when the scheduler needs to migrate a task it would
first choose a CPU sharing the same cache, then a CPU on the same node,
then an off-node CPU (i.e. falling back to node distance).

Of course, I have no idea if that's anything like an optimal or desirable
task migration policy. Probably depends on cache-trashiness of the task
being migrated.

-- Mark

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-10  5:05               ` Mark Goodwin
  0 siblings, 0 replies; 58+ messages in thread
From: Mark Goodwin @ 2004-11-10  5:05 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Erich Focht, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML


On Tue, 9 Nov 2004, Matthew Dobson wrote:
> On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
>> Once again however, it depends on the definition of distance. For nodes,
>> we've established it's the ACPI SLIT (relative distance to memory). For
>> cpus, should it be distance to memory? Distance to cache? Registers? Or
>> what?
>>
> That's the real issue.  We need to agree upon a meaningful definition of
> CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
> agree on what Node-to-Node "distance" means, but there doesn't appear to
> be much consensus on what CPU "distance" means.

How about we define cpu-distance to be "relative distance to the
lowest level cache on another CPU". On a system that has nodes with
multiple sockets (each supporting multiple cores or HT "CPUs" sharing
some level of cache), when the scheduler needs to migrate a task it would
first choose a CPU sharing the same cache, then a CPU on the same node,
then an off-node CPU (i.e. falling back to node distance).

Of course, I have no idea if that's anything like an optimal or desirable
task migration policy. Probably depends on cache-trashiness of the task
being migrated.

-- Mark

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-10  5:05               ` Mark Goodwin
@ 2004-11-10 18:45                 ` Erich Focht
  -1 siblings, 0 replies; 58+ messages in thread
From: Erich Focht @ 2004-11-10 18:45 UTC (permalink / raw)
  To: Mark Goodwin
  Cc: Matthew Dobson, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Wednesday 10 November 2004 06:05, Mark Goodwin wrote:
> 
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> > On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
> >> Once again however, it depends on the definition of distance. For nodes,
> >> we've established it's the ACPI SLIT (relative distance to memory). For
> >> cpus, should it be distance to memory? Distance to cache? Registers? Or
> >> what?
> >>
> > That's the real issue.  We need to agree upon a meaningful definition of
> > CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
> > agree on what Node-to-Node "distance" means, but there doesn't appear to
> > be much consensus on what CPU "distance" means.
> 
> How about we define cpu-distance to be "relative distance to the
> lowest level cache on another CPU".

Several definitions are possible, this is really a source of
confusion. Any of these can be reconstructed if one has access to the
constituents: node-to-node latency (SLIT), cache-to-cache
latencies. The later ones aren't available and would anyhow be better
placed in something like /proc/cpuinfo or similar. They are CPU or
package specific and have nothing to do with NUMA.

> On a system that has nodes with multiple sockets (each supporting
> multiple cores or HT "CPUs" sharing some level of cache), when the
> scheduler needs to migrate a task it would first choose a CPU
> sharing the same cache, then a CPU on the same node, then an
> off-node CPU (i.e. falling back to node distance).

This should be done by correctly setting up the sched domains. It's
not a question of exporting useless or redundant information to user
space.

The need for some (any) cpu-to-cpu metrics initially brought up by
Jack seemed mainly motivated by existing user space tools for
constructing cpusets (maybe in PBS). I think it is a tolerable effort
to introduce in user space an inlined function or macro doing
something like
   cpu_metric(i,j) := node_metric(cpu_node(i),cpu_node(j))

It keeps the kernel free of misleading information which might just
slightly make cpusets construction more comfortable. In user space you
have the full freedom to enhance your metrics when getting more
details about the next generation cpus.

Regards,
Erich


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-10 18:45                 ` Erich Focht
  0 siblings, 0 replies; 58+ messages in thread
From: Erich Focht @ 2004-11-10 18:45 UTC (permalink / raw)
  To: Mark Goodwin
  Cc: Matthew Dobson, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Wednesday 10 November 2004 06:05, Mark Goodwin wrote:
> 
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> > On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
> >> Once again however, it depends on the definition of distance. For nodes,
> >> we've established it's the ACPI SLIT (relative distance to memory). For
> >> cpus, should it be distance to memory? Distance to cache? Registers? Or
> >> what?
> >>
> > That's the real issue.  We need to agree upon a meaningful definition of
> > CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
> > agree on what Node-to-Node "distance" means, but there doesn't appear to
> > be much consensus on what CPU "distance" means.
> 
> How about we define cpu-distance to be "relative distance to the
> lowest level cache on another CPU".

Several definitions are possible, this is really a source of
confusion. Any of these can be reconstructed if one has access to the
constituents: node-to-node latency (SLIT), cache-to-cache
latencies. The later ones aren't available and would anyhow be better
placed in something like /proc/cpuinfo or similar. They are CPU or
package specific and have nothing to do with NUMA.

> On a system that has nodes with multiple sockets (each supporting
> multiple cores or HT "CPUs" sharing some level of cache), when the
> scheduler needs to migrate a task it would first choose a CPU
> sharing the same cache, then a CPU on the same node, then an
> off-node CPU (i.e. falling back to node distance).

This should be done by correctly setting up the sched domains. It's
not a question of exporting useless or redundant information to user
space.

The need for some (any) cpu-to-cpu metrics initially brought up by
Jack seemed mainly motivated by existing user space tools for
constructing cpusets (maybe in PBS). I think it is a tolerable effort
to introduce in user space an inlined function or macro doing
something like
   cpu_metric(i,j) := node_metric(cpu_node(i),cpu_node(j))

It keeps the kernel free of misleading information which might just
slightly make cpusets construction more comfortable. In user space you
have the full freedom to enhance your metrics when getting more
details about the next generation cpus.

Regards,
Erich


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-10 18:45                 ` Erich Focht
@ 2004-11-10 22:09                   ` Matthew Dobson
  -1 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-10 22:09 UTC (permalink / raw)
  To: Erich Focht; +Cc: Mark Goodwin, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Wed, 2004-11-10 at 10:45, Erich Focht wrote:
> On Wednesday 10 November 2004 06:05, Mark Goodwin wrote:
> > On a system that has nodes with multiple sockets (each supporting
> > multiple cores or HT "CPUs" sharing some level of cache), when the
> > scheduler needs to migrate a task it would first choose a CPU
> > sharing the same cache, then a CPU on the same node, then an
> > off-node CPU (i.e. falling back to node distance).
> 
> This should be done by correctly setting up the sched domains. It's
> not a question of exporting useless or redundant information to user
> space.
> 
> The need for some (any) cpu-to-cpu metrics initially brought up by
> Jack seemed mainly motivated by existing user space tools for
> constructing cpusets (maybe in PBS). I think it is a tolerable effort
> to introduce in user space an inlined function or macro doing
> something like
>    cpu_metric(i,j) := node_metric(cpu_node(i),cpu_node(j))
> 
> It keeps the kernel free of misleading information which might just
> slightly make cpusets construction more comfortable. In user space you
> have the full freedom to enhance your metrics when getting more
> details about the next generation cpus.

Good point, Erich.  I don't think there is any desperate need for
CPU-to-CPU distances to be exported to userspace right now.  If that is
incorrect and someone really needs a particular distance metric to be
exported by the kernel, we can look into that and export the required
info.  For now I think the Node-to-Node distance information is enough. 
-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-10 22:09                   ` Matthew Dobson
  0 siblings, 0 replies; 58+ messages in thread
From: Matthew Dobson @ 2004-11-10 22:09 UTC (permalink / raw)
  To: Erich Focht; +Cc: Mark Goodwin, Jack Steiner, Takayoshi Kochi, linux-ia64, LKML

On Wed, 2004-11-10 at 10:45, Erich Focht wrote:
> On Wednesday 10 November 2004 06:05, Mark Goodwin wrote:
> > On a system that has nodes with multiple sockets (each supporting
> > multiple cores or HT "CPUs" sharing some level of cache), when the
> > scheduler needs to migrate a task it would first choose a CPU
> > sharing the same cache, then a CPU on the same node, then an
> > off-node CPU (i.e. falling back to node distance).
> 
> This should be done by correctly setting up the sched domains. It's
> not a question of exporting useless or redundant information to user
> space.
> 
> The need for some (any) cpu-to-cpu metrics initially brought up by
> Jack seemed mainly motivated by existing user space tools for
> constructing cpusets (maybe in PBS). I think it is a tolerable effort
> to introduce in user space an inlined function or macro doing
> something like
>    cpu_metric(i,j) := node_metric(cpu_node(i),cpu_node(j))
> 
> It keeps the kernel free of misleading information which might just
> slightly make cpusets construction more comfortable. In user space you
> have the full freedom to enhance your metrics when getting more
> details about the next generation cpus.

Good point, Erich.  I don't think there is any desperate need for
CPU-to-CPU distances to be exported to userspace right now.  If that is
incorrect and someone really needs a particular distance metric to be
exported by the kernel, we can look into that and export the required
info.  For now I think the Node-to-Node distance information is enough. 
-Matt


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-03 20:56 Externalize SLIT table Jack Steiner
@ 2004-11-18 16:39 ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-18 16:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ia64

(Resend of mail sent Nov 10, 2004 - as far as I can tell, it went nowhere)


On Wed, Nov 10, 2004 at 04:05:43PM +1100, Mark Goodwin wrote:
>
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> >On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
> >>Once again however, it depends on the definition of distance. For nodes,
> >>we've established it's the ACPI SLIT (relative distance to memory). For
> >>cpus, should it be distance to memory? Distance to cache? Registers? Or
> >>what?
> >>
> >That's the real issue.  We need to agree upon a meaningful definition of   
> >CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
> >agree on what Node-to-Node "distance" means, but there doesn't appear to
> >be much consensus on what CPU "distance" means.
>
> How about we define cpu-distance to be "relative distance to the
> lowest level cache on another CPU". On a system that has nodes with
> multiple sockets (each supporting multiple cores or HT "CPUs" sharing
> some level of cache), when the scheduler needs to migrate a task it would
> first choose a CPU sharing the same cache, then a CPU on the same node,
> then an off-node CPU (i.e. falling back to node distance).

I think I like your definition better than the one I originally proposed (cpu
distance was distance between the local memories of the cpus).

But how do we determine the distance between the caches.


> 
> Of course, I have no idea if that's anything like an optimal or desirable
> task migration policy. Probably depends on cache-trashiness of the task
> being migrated.
> 
> -- Mark



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
@ 2004-11-18 16:39 ` Jack Steiner
  0 siblings, 0 replies; 58+ messages in thread
From: Jack Steiner @ 2004-11-18 16:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ia64

(Resend of mail sent Nov 10, 2004 - as far as I can tell, it went nowhere)


On Wed, Nov 10, 2004 at 04:05:43PM +1100, Mark Goodwin wrote:
>
> On Tue, 9 Nov 2004, Matthew Dobson wrote:
> >On Tue, 2004-11-09 at 12:34, Mark Goodwin wrote:
> >>Once again however, it depends on the definition of distance. For nodes,
> >>we've established it's the ACPI SLIT (relative distance to memory). For
> >>cpus, should it be distance to memory? Distance to cache? Registers? Or
> >>what?
> >>
> >That's the real issue.  We need to agree upon a meaningful definition of   
> >CPU-to-CPU "distance".  As Jesse mentioned in a follow-up, we can all
> >agree on what Node-to-Node "distance" means, but there doesn't appear to
> >be much consensus on what CPU "distance" means.
>
> How about we define cpu-distance to be "relative distance to the
> lowest level cache on another CPU". On a system that has nodes with
> multiple sockets (each supporting multiple cores or HT "CPUs" sharing
> some level of cache), when the scheduler needs to migrate a task it would
> first choose a CPU sharing the same cache, then a CPU on the same node,
> then an off-node CPU (i.e. falling back to node distance).

I think I like your definition better than the one I originally proposed (cpu
distance was distance between the local memories of the cpus).

But how do we determine the distance between the caches.


> 
> Of course, I have no idea if that's anything like an optimal or desirable
> task migration policy. Probably depends on cache-trashiness of the task
> being migrated.
> 
> -- Mark



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-23 17:32           ` Jack Steiner
@ 2004-11-23 19:06             ` Andi Kleen
  0 siblings, 0 replies; 58+ messages in thread
From: Andi Kleen @ 2004-11-23 19:06 UTC (permalink / raw)
  To: Jack Steiner; +Cc: Andi Kleen, linux-kernel

On Tue, Nov 23, 2004 at 11:32:09AM -0600, Jack Steiner wrote:
> (Sorry for the delay in posting this. Our mail server was
> dropping mail ....)

Looks good. Thanks. I actually came up with my own patch now
(which ended up quite similar), but yours looks slightly better.

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
  2004-11-06  6:30         ` Andi Kleen
@ 2004-11-23 17:32           ` Jack Steiner
  2004-11-23 19:06             ` Andi Kleen
  0 siblings, 1 reply; 58+ messages in thread
From: Jack Steiner @ 2004-11-23 17:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

(Sorry for the delay in posting this. Our mail server was
dropping mail ....)


Here is an update patch to externalize the SLIT information. I think I have
encorporated all the comments that were posted previously)

For example:

        # cd /sys/devices/system
        # find .
        ./node
        ./node/node5
        ./node/node5/distance
        ./node/node5/numastat
        ./node/node5/meminfo
        ./node/node5/cpumap

        # cat ./node/node0/distance
        10 20 64 42 42 22

        # cat node/*/distance
        10 20 64 42 42 22
        20 10 42 22 64 84
        64 42 10 20 22 42
        42 22 20 10 42 62
        42 64 22 42 10 20
        22 84 42 62 20 10


Does this look ok???




Signed-off-by: Jack Steiner <steiner@sgi.com>

Add SLIT (inter node distance) information to sysfs. 



Index: linux/drivers/base/node.c
===================================================================
--- linux.orig/drivers/base/node.c	2004-11-05 08:34:42.461312000 -0600
+++ linux/drivers/base/node.c	2004-11-05 15:56:23.345662000 -0600
@@ -111,6 +111,24 @@ static ssize_t node_read_numastat(struct
 }
 static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
+static ssize_t node_read_distance(struct sys_device * dev, char * buf)
+{
+	int nid = dev->id;
+	int len = 0;
+	int i;
+
+	/* buf currently PAGE_SIZE, need ~4 chars per node */
+	BUILD_BUG_ON(NR_NODES*4 > PAGE_SIZE/2);
+
+	for (i = 0; i < numnodes; i++)
+		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));
+		
+	len += sprintf(buf + len, "\n");
+	return len;
+}
+static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL);
+
+
 /*
  * register_node - Setup a driverfs device for a node.
  * @num - Node number to use when creating the device.
@@ -129,6 +147,7 @@ int __init register_node(struct node *no
 		sysdev_create_file(&node->sysdev, &attr_cpumap);
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
+		sysdev_create_file(&node->sysdev, &attr_distance);
 	}
 	return error;
 }
Index: linux/include/asm-i386/topology.h
===================================================================
--- linux.orig/include/asm-i386/topology.h	2004-11-05 08:34:53.713053000 -0600
+++ linux/include/asm-i386/topology.h	2004-11-23 09:59:43.574062951 -0600
@@ -66,9 +66,6 @@ static inline cpumask_t pcibus_to_cpumas
 	return node_to_cpumask(mp_bus_id_to_node[bus]);
 }
 
-/* Node-to-Node distance */
-#define node_distance(from, to) ((from) != (to))
-
 /* sched_domains SD_NODE_INIT for NUMAQ machines */
 #define SD_NODE_INIT (struct sched_domain) {		\
 	.span			= CPU_MASK_NONE,	\
Index: linux/include/linux/topology.h
===================================================================
--- linux.orig/include/linux/topology.h	2004-11-05 08:34:57.492932000 -0600
+++ linux/include/linux/topology.h	2004-11-23 10:03:26.700821978 -0600
@@ -55,7 +55,10 @@ static inline int __next_node_with_cpus(
 	for (node = 0; node < numnodes; node = __next_node_with_cpus(node))
 
 #ifndef node_distance
-#define node_distance(from,to)	((from) != (to))
+/* Conform to ACPI 2.0 SLIT distance definitions */
+#define LOCAL_DISTANCE		10
+#define REMOTE_DISTANCE		20
+#define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
 #endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Externalize SLIT table
       [not found]       ` <20041105160808.GA26719@sgi.com.suse.lists.linux.kernel>
@ 2004-11-06  6:30         ` Andi Kleen
  2004-11-23 17:32           ` Jack Steiner
  0 siblings, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2004-11-06  6:30 UTC (permalink / raw)
  To: Jack Steiner; +Cc: linux-kernel

Jack Steiner <steiner@sgi.com> writes:
>  
> +static ssize_t node_read_distance(struct sys_device * dev, char * buf)
> +{
> +	int nid = dev->id;
> +	int len = 0;
> +	int i;
> +
> +	for (i = 0; i < numnodes; i++)
> +		len += sprintf(buf + len, "%s%d", i ? " " : "", node_distance(nid, i));


One problem is that most architectures define node_distance currently
as nid != i. This would give 0 on them for the identity mapping and 10 
on IA64 which uses the SLIT values. Not good for a portable interface.
I would suggest to at least change them to return 10 for a zero node distance.

Also in general I would prefer if you could move all the SLIT parsing
into drivers/acpi/numa.c. Then the other ACPI architectures don't need to copy
the basically identical code from ia64.

-Andi

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2004-11-23 19:11 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-03 20:56 Externalize SLIT table Jack Steiner
2004-11-04  1:59 ` Takayoshi Kochi
2004-11-04  1:59   ` Takayoshi Kochi
2004-11-04  4:07   ` Andi Kleen
2004-11-04  4:07     ` Andi Kleen
2004-11-04  4:57     ` Takayoshi Kochi
2004-11-04  4:57       ` Takayoshi Kochi
2004-11-04  6:37       ` Andi Kleen
2004-11-04  6:37         ` Andi Kleen
2004-11-05 16:08       ` Jack Steiner
2004-11-05 16:08         ` Jack Steiner
2004-11-05 16:26         ` Andreas Schwab
2004-11-05 16:26           ` Andreas Schwab
2004-11-05 16:44           ` Jack Steiner
2004-11-05 16:44             ` Jack Steiner
2004-11-06 11:50             ` Christoph Hellwig
2004-11-06 11:50               ` Christoph Hellwig
2004-11-06 12:48               ` Andi Kleen
2004-11-06 12:48                 ` Andi Kleen
2004-11-06 13:07                 ` Christoph Hellwig
2004-11-06 13:07                   ` Christoph Hellwig
2004-11-05 17:13         ` Erich Focht
2004-11-05 17:13           ` Erich Focht
2004-11-05 19:13           ` Jack Steiner
2004-11-05 19:13             ` Jack Steiner
2004-11-09 19:23     ` Matthew Dobson
2004-11-09 19:23       ` Matthew Dobson
2004-11-04 14:13   ` Jack Steiner
2004-11-04 14:13     ` Jack Steiner
2004-11-04 14:29     ` Andi Kleen
2004-11-04 14:29       ` Andi Kleen
2004-11-04 15:31     ` Erich Focht
2004-11-04 15:31       ` Erich Focht
2004-11-04 17:04       ` Andi Kleen
2004-11-04 17:04         ` Andi Kleen
2004-11-04 19:36         ` Jack Steiner
2004-11-04 19:36           ` Jack Steiner
2004-11-09 19:45         ` Matthew Dobson
2004-11-09 19:45           ` Matthew Dobson
2004-11-09 19:43       ` Matthew Dobson
2004-11-09 19:43         ` Matthew Dobson
2004-11-09 20:34         ` Mark Goodwin
2004-11-09 20:34           ` Mark Goodwin
2004-11-09 22:00           ` Jesse Barnes
2004-11-09 22:00             ` Jesse Barnes
2004-11-09 23:58           ` Matthew Dobson
2004-11-09 23:58             ` Matthew Dobson
2004-11-10  5:05             ` Mark Goodwin
2004-11-10  5:05               ` Mark Goodwin
2004-11-10 18:45               ` Erich Focht
2004-11-10 18:45                 ` Erich Focht
2004-11-10 22:09                 ` Matthew Dobson
2004-11-10 22:09                   ` Matthew Dobson
     [not found] <20041103205655.GA5084@sgi.com.suse.lists.linux.kernel>
     [not found] ` <20041104.105908.18574694.t-kochi@bq.jp.nec.com.suse.lists.linux.kernel>
     [not found]   ` <20041104040713.GC21211@wotan.suse.de.suse.lists.linux.kernel>
     [not found]     ` <20041104.135721.08317994.t-kochi@bq.jp.nec.com.suse.lists.linux.kernel>
     [not found]       ` <20041105160808.GA26719@sgi.com.suse.lists.linux.kernel>
2004-11-06  6:30         ` Andi Kleen
2004-11-23 17:32           ` Jack Steiner
2004-11-23 19:06             ` Andi Kleen
2004-11-18 16:39 Jack Steiner
2004-11-18 16:39 ` Jack Steiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.