All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance
@ 2011-11-18 11:55 Petr Holasek
  2011-11-18 11:55 ` [PATCH 2/2] NUMA emulation x86_64: Documentation changes in boot-options.txt Petr Holasek
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Petr Holasek @ 2011-11-18 11:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton
  Cc: linux-kernel, x86, Anton Arapov, Petr Holasek

As default, when numa emulation is turned on, node distance table
uses physical distance, so for 4 nodes emulated on 1 physical table is

node   0   1   2   3
0:  10  10  10  10
1:  10  10  10  10
2:  10  10  10  10
3:  10  10  10  10

This patch adds new [distance] argument to

numa=fake=<number/size of nodes>[,distance]

When distance argument is used, it sets linear distance between nodes
like that:

    __distance__
___|___     ____|___     ________     ________
|       |   |        |   |        |   |        |
| node1 |---| node 2 |---| node 3 |---| node 4 |
|_______|   |________|   |________|   |________|
|                        |             |
|                        |             |
|____distance * 2________|             |
|                                      |
|____________distance * 3______________|

This feature might be useful for testing some numa awareness features in
both user and kernel spaces.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
---
 arch/x86/mm/numa_emulation.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index d0ed086..1824972 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -309,6 +309,8 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 	u8 *phys_dist = NULL;
 	size_t phys_size = numa_dist_cnt * numa_dist_cnt * sizeof(phys_dist[0]);
 	int max_emu_nid, dfl_phys_nid;
+	unsigned long dist_level;
+	char *c;
 	int i, j, ret;
 
 	if (!emu_cmdline)
@@ -404,6 +406,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 		if (emu_nid_to_phys[i] == NUMA_NO_NODE)
 			emu_nid_to_phys[i] = dfl_phys_nid;
 
+	/* load distance level parameter */
+	dist_level = -1;
+	c = strchr(emu_cmdline, ',');
+	if (c) {
+		c++;
+		ret = kstrtoul(c, 10, &dist_level);
+		if (ret < 0 || dist_level < LOCAL_DISTANCE ||
+				dist_level * max_emu_nid > ULONG_MAX)
+			dist_level = -1;
+	}
+
 	/* transform distance table */
 	numa_reset_distance();
 	for (i = 0; i < max_emu_nid + 1; i++) {
@@ -418,6 +431,9 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 			else
 				dist = phys_dist[physi * numa_dist_cnt + physj];
 
+			if (dist_level != -1 && i != j)
+				dist = abs(i - j) * dist_level;
+
 			numa_set_distance(i, j, dist);
 		}
 	}
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] NUMA emulation x86_64: Documentation changes in boot-options.txt
  2011-11-18 11:55 [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Petr Holasek
@ 2011-11-18 11:55 ` Petr Holasek
  2011-11-18 19:53 ` [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Andrew Morton
  2011-11-20  2:06 ` [PATCH 1/2] " David Rientjes
  2 siblings, 0 replies; 8+ messages in thread
From: Petr Holasek @ 2011-11-18 11:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton
  Cc: linux-kernel, x86, Anton Arapov, Petr Holasek

Signed-off-by: Petr Holasek <pholasek@redhat.com>
---
 Documentation/x86/x86_64/boot-options.txt |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index c54b4f5..33c0c10 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -166,13 +166,15 @@ NUMA
 
   numa=noacpi   Don't parse the SRAT table for NUMA setup
 
-  numa=fake=<size>[MG]
+  numa=fake=<size>[MG][,distance]
 		If given as a memory unit, fills all system RAM with nodes of
 		size interleaved over physical nodes.
+		Optional distance sets linear distance between emulated nodes.
 
-  numa=fake=<N>
+  numa=fake=<N>[,distance]
 		If given as an integer, fills all system RAM with N fake nodes
 		interleaved over physical nodes.
+		Optional distance sets linear distance between emulated nodes.
 
 ACPI
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance
  2011-11-18 11:55 [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Petr Holasek
  2011-11-18 11:55 ` [PATCH 2/2] NUMA emulation x86_64: Documentation changes in boot-options.txt Petr Holasek
@ 2011-11-18 19:53 ` Andrew Morton
  2011-11-19  0:31   ` Petr Holasek
  2011-11-20  2:06 ` [PATCH 1/2] " David Rientjes
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2011-11-18 19:53 UTC (permalink / raw)
  To: Petr Holasek
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86,
	Anton Arapov

On Fri, 18 Nov 2011 12:55:07 +0100
Petr Holasek <pholasek@redhat.com> wrote:

> As default, when numa emulation is turned on, node distance table
> uses physical distance, so for 4 nodes emulated on 1 physical table is
> 
> node   0   1   2   3
> 0:  10  10  10  10
> 1:  10  10  10  10
> 2:  10  10  10  10
> 3:  10  10  10  10
> 
> This patch adds new [distance] argument to
> 
> numa=fake=<number/size of nodes>[,distance]
> 
> When distance argument is used, it sets linear distance between nodes
> like that:
> 
>     __distance__
> ___|___     ____|___     ________     ________
> |       |   |        |   |        |   |        |
> | node1 |---| node 2 |---| node 3 |---| node 4 |
> |_______|   |________|   |________|   |________|
> |                        |             |
> |                        |             |
> |____distance * 2________|             |
> |                                      |
> |____________distance * 3______________|
> 
> This feature might be useful for testing some numa awareness features in
> both user and kernel spaces.
> 

"might" is a red flag.  We don't merge things which might be useful!

*Is* it useful?  If so then please tell us why and explain how it might
be useful to others.

> @@ -404,6 +406,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
>  		if (emu_nid_to_phys[i] == NUMA_NO_NODE)
>  			emu_nid_to_phys[i] = dfl_phys_nid;
>  
> +	/* load distance level parameter */
> +	dist_level = -1;
> +	c = strchr(emu_cmdline, ',');
> +	if (c) {
> +		c++;
> +		ret = kstrtoul(c, 10, &dist_level);
> +		if (ret < 0 || dist_level < LOCAL_DISTANCE ||
> +				dist_level * max_emu_nid > ULONG_MAX)
> +			dist_level = -1;

If this happens, the user goofed and we should tell them, with a printk.


[patch 2/2] adds the documentation for the feature and should be
included in the same patch as the implementation.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NUMA emulation x86_64: numa=fake parameter for custom nodes distance
  2011-11-18 19:53 ` [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Andrew Morton
@ 2011-11-19  0:31   ` Petr Holasek
  2011-11-20  2:09     ` David Rientjes
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Holasek @ 2011-11-19  0:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86,
	Anton Arapov

On Fri, 18 Nov 2011, Andrew Morton wrote:

> Date: Fri, 18 Nov 2011 11:53:36 -0800
> From: Andrew Morton <akpm@linux-foundation.org>
> To: Petr Holasek <pholasek@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
>  "H. Peter Anvin" <hpa@zytor.com>, linux-kernel@vger.kernel.org,
>  x86@kernel.org, Anton Arapov <anton@redhat.com>
> Subject: Re: [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for
>  custom nodes distance
> 
> On Fri, 18 Nov 2011 12:55:07 +0100
> Petr Holasek <pholasek@redhat.com> wrote:
> 
> > As default, when numa emulation is turned on, node distance table
> > uses physical distance, so for 4 nodes emulated on 1 physical table is
> > 
> > node   0   1   2   3
> > 0:  10  10  10  10
> > 1:  10  10  10  10
> > 2:  10  10  10  10
> > 3:  10  10  10  10
> > 
> > This patch adds new [distance] argument to
> > 
> > numa=fake=<number/size of nodes>[,distance]
> > 
> > When distance argument is used, it sets linear distance between nodes
> > like that:
> > 
> >     __distance__
> > ___|___     ____|___     ________     ________
> > |       |   |        |   |        |   |        |
> > | node1 |---| node 2 |---| node 3 |---| node 4 |
> > |_______|   |________|   |________|   |________|
> > |                        |             |
> > |                        |             |
> > |____distance * 2________|             |
> > |                                      |
> > |____________distance * 3______________|
> > 
> > This feature might be useful for testing some numa awareness features in
> > both user and kernel spaces.
> > 
> 
> "might" is a red flag.  We don't merge things which might be useful!
> 
> *Is* it useful?  If so then please tell us why and explain how it might
> be useful to others.

A lot of developers still have no access to large NUMA machines and 
possibility of NUMA emulation could involve more of them to thinking
about NUMA awareness of their apps/kernel code.

> 
> > @@ -404,6 +406,17 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
> >  		if (emu_nid_to_phys[i] == NUMA_NO_NODE)
> >  			emu_nid_to_phys[i] = dfl_phys_nid;
> >  
> > +	/* load distance level parameter */
> > +	dist_level = -1;
> > +	c = strchr(emu_cmdline, ',');
> > +	if (c) {
> > +		c++;
> > +		ret = kstrtoul(c, 10, &dist_level);
> > +		if (ret < 0 || dist_level < LOCAL_DISTANCE ||
> > +				dist_level * max_emu_nid > ULONG_MAX)
> > +			dist_level = -1;
> 
> If this happens, the user goofed and we should tell them, with a printk.
> 
> 
> [patch 2/2] adds the documentation for the feature and should be
> included in the same patch as the implementation.

Apologize, I'll send v2 of patch with printk() and documentation all-in-one
if it is necessary.

thanks,
Petr H

> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance
  2011-11-18 11:55 [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Petr Holasek
  2011-11-18 11:55 ` [PATCH 2/2] NUMA emulation x86_64: Documentation changes in boot-options.txt Petr Holasek
  2011-11-18 19:53 ` [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Andrew Morton
@ 2011-11-20  2:06 ` David Rientjes
  2 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2011-11-20  2:06 UTC (permalink / raw)
  To: Petr Holasek
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	linux-kernel, x86, Anton Arapov

On Fri, 18 Nov 2011, Petr Holasek wrote:

> As default, when numa emulation is turned on, node distance table
> uses physical distance, so for 4 nodes emulated on 1 physical table is
> 
> node   0   1   2   3
> 0:  10  10  10  10
> 1:  10  10  10  10
> 2:  10  10  10  10
> 3:  10  10  10  10
> 

That should only be true if you're booting on a system with one physical 
node and an SRAT, otherwise the distance between fake nodes should be 
representative of their physical distance.  For example, if you boot 
with numa=fake=4 on a two symmetrical two-node box, you should get 
something like

	10 10 20 20
	10 10 20 20
	20 20 10 10
	20 20 10 10

It's done like this intentionally so you can test NUMA without having many 
nodes.  What you're doing is changing the distance even though there is no 
actual difference in latency on the hardware so it's an incorrect 
representation.

> This patch adds new [distance] argument to
> 
> numa=fake=<number/size of nodes>[,distance]
> 
> When distance argument is used, it sets linear distance between nodes
> like that:
> 
>     __distance__
> ___|___     ____|___     ________     ________
> |       |   |        |   |        |   |        |
> | node1 |---| node 2 |---| node 3 |---| node 4 |
> |_______|   |________|   |________|   |________|
> |                        |             |
> |                        |             |
> |____distance * 2________|             |
> |                                      |
> |____________distance * 3______________|
> 
> This feature might be useful for testing some numa awareness features in
> both user and kernel spaces.
> 

I don't see any use case for this other than testing if code can actually 
order nodes correctly or not.  The distances that you're now adding are, 
by definition, incorrect since they aren't the same as exported by the 
true SLIT (which is what happens by default now) so nothing other than 
functional testing of node ordering is achieved with this patch.

So nack on this approach.

What you could do, however, and would be generally useful even outside of 
NUMA emulation, is to add fake SLIT functionality so that you can define 
it yourself on the command line.  You could use that either with or 
without NUMA emulation if you know the physical SLIT is incorrect in some 
way.  Then, you get the same functionality as your patch here by using it 
in combination with numa=fake and the added bonus is that you don't need 
any of the "distance * 2" or "distance * 3" limitations.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NUMA emulation x86_64: numa=fake parameter for custom nodes distance
  2011-11-19  0:31   ` Petr Holasek
@ 2011-11-20  2:09     ` David Rientjes
  2011-11-21 20:41       ` Petr Holasek
  0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2011-11-20  2:09 UTC (permalink / raw)
  To: Petr Holasek
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, x86, Anton Arapov

On Sat, 19 Nov 2011, Petr Holasek wrote:

> A lot of developers still have no access to large NUMA machines and 
> possibility of NUMA emulation could involve more of them to thinking
> about NUMA awareness of their apps/kernel code.
> 

That's a bogus argument, numa=fake already allows you to construct as 
large of a NUMA box as you want in a faked environment.  The distances 
have nothing to do with that.

The distances you're adding here are, by definition, incorrect because it 
doesn't respect the actual distance between physical nodes that numa=fake 
uses already.  If you're using numa=fake on an UMA machine, then the 
performance of the kernel will be just that, you won't actual see any 
introduced latency between fake nodes just by changing the distance.  So 
you're completely invalidating what internode distances actually mean.

I'd much rather see an option to fake the SLIT that could do all of this 
without limitation and would be possible to debug issues in the future.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NUMA emulation x86_64: numa=fake parameter for custom nodes distance
  2011-11-20  2:09     ` David Rientjes
@ 2011-11-21 20:41       ` Petr Holasek
  2011-11-21 22:24         ` David Rientjes
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Holasek @ 2011-11-21 20:41 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, x86, Anton Arapov

On Sat, 19 Nov 2011, David Rientjes wrote:

> Date: Sat, 19 Nov 2011 18:09:59 -0800 (PST)
> From: David Rientjes <rientjes@google.com>
> To: Petr Holasek <pholasek@redhat.com>
> cc: Andrew Morton <akpm@linux-foundation.org>, Thomas Gleixner
>  <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin"
>  <hpa@zytor.com>, linux-kernel@vger.kernel.org, x86@kernel.org, Anton
>  Arapov <anton@redhat.com>
> Subject: Re: NUMA emulation x86_64: numa=fake parameter for custom nodes
>  distance
> 
> On Sat, 19 Nov 2011, Petr Holasek wrote:
> 
> > A lot of developers still have no access to large NUMA machines and 
> > possibility of NUMA emulation could involve more of them to thinking
> > about NUMA awareness of their apps/kernel code.
> > 
> 
> That's a bogus argument, numa=fake already allows you to construct as 
> large of a NUMA box as you want in a faked environment.  The distances 
> have nothing to do with that.
> 
> The distances you're adding here are, by definition, incorrect because it 
> doesn't respect the actual distance between physical nodes that numa=fake 
> uses already.  If you're using numa=fake on an UMA machine, then the 
> performance of the kernel will be just that, you won't actual see any 
> introduced latency between fake nodes just by changing the distance.  So 
> you're completely invalidating what internode distances actually mean.
> 
> I'd much rather see an option to fake the SLIT that could do all of this 
> without limitation and would be possible to debug issues in the future.

This patch was designed as nothing more than helper for debugging/testing
purposes, e.g. when it is useful to have more values in exports than only
LOCAL_DISTANCEs. So that's the reason why it disregards former distances 
between physical nodes.

Faking the SLIT table is a really good point, if this patch would be 
eventually rejected, I will rework the patch in that manner.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NUMA emulation x86_64: numa=fake parameter for custom nodes distance
  2011-11-21 20:41       ` Petr Holasek
@ 2011-11-21 22:24         ` David Rientjes
  0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2011-11-21 22:24 UTC (permalink / raw)
  To: Petr Holasek
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, x86, Anton Arapov

On Mon, 21 Nov 2011, Petr Holasek wrote:

> This patch was designed as nothing more than helper for debugging/testing
> purposes, e.g. when it is useful to have more values in exports than only
> LOCAL_DISTANCEs. So that's the reason why it disregards former distances 
> between physical nodes.
> 

I understand, but like I said: the only debugging and testing it would be 
useful for is node ordering.  The actual latency of memory accesses are 
not going to be representative of the new distances and will lead to 
confusion since they're wrong.  It's also pretty limited in even that 
regard because all nodes are now spaced by the same distance so they're 
just spread out linearly instead of actually representing a real NUMA 
architecture.

> Faking the SLIT table is a really good point, if this patch would be 
> eventually rejected, I will rework the patch in that manner.
> 

That has applicability even outside of debugging, you could override your 
own machine's slit if you know it's bogus.  The way it's defined is very 
lengthy, however, and would require (4 * nr_nodes^2) characters at maximum 
since the max distance is three characters, 255 (unreachable node), and 
you'd need to separate them by one character, a comma.  That's 256 chars 
for eight nodes!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-11-21 22:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-18 11:55 [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Petr Holasek
2011-11-18 11:55 ` [PATCH 2/2] NUMA emulation x86_64: Documentation changes in boot-options.txt Petr Holasek
2011-11-18 19:53 ` [PATCH 1/2] NUMA emulation x86_64: numa=fake parameter for custom nodes distance Andrew Morton
2011-11-19  0:31   ` Petr Holasek
2011-11-20  2:09     ` David Rientjes
2011-11-21 20:41       ` Petr Holasek
2011-11-21 22:24         ` David Rientjes
2011-11-20  2:06 ` [PATCH 1/2] " David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.