* [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 6:09 ` Ankita Garg
0 siblings, 0 replies; 12+ messages in thread
From: Ankita Garg @ 2009-09-02 6:09 UTC (permalink / raw)
To: LKML, linuxppc-dev, Benjamin Herrenschmidt, Balbir Singh
Cc: Vaidyanathan Srinivasan, ankita
Hi,
Below is a patch to fix a couple of issues with fake numa node creation
on ppc:
1) Presently, fake nodes could be created such that real numa node
boundaries are not respected. So a node could have lmbs that belong to
different real nodes.
2) The cpu association is broken. On a JS22 blade for example, which is
a 2-node numa machine, I get the following:
# cat /proc/cmdline
root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
# cat /sys/devices/system/node/node0/cpulist
0-3
# cat /sys/devices/system/node/node1/cpulist
4-7
# cat /sys/devices/system/node/node4/cpulist
#
So, though the cpus 4-7 should have been associated with node4, they
still belong to node1. The patch works by recording a real numa node
boundary and incrementing the fake node count. At the same time, a
mapping is stored from the real numa node to the first fake node that
gets created on it.
Tested the patch with the following commandlines:
numa=fake=2G,4G,6G,8G,10G,12G,14G,16G
numa=fake=3G,6G,10G,16G
numa=fake=4G
numa=fake=
For testing if the fake nodes respect the real node boundaries, I added
some debug printks in the node creation path. Without the patch, for the
commandline numa=fake=2G,4G,6G,8G,10G,12G,14G,16G, this is what I got:
fake id: 1 nid: 0
fake id: 1 nid: 0
...
fake id: 2 nid: 0
fake id: 2 nid: 0
...
fake id: 2 nid: 0
created new fake_node with id 3
fake id: 3 nid: 0
fake id: 3 nid: 0
...
fake id: 3 nid: 0
fake id: 3 nid: 0
fake id: 3 nid: 1
fake id: 3 nid: 1
...
created new fake_node with id 4
fake id: 4 nid: 1
fake id: 4 nid: 1
...
and so on. So, fake node 3 encompasses real node 0 & 1. Also,
# cat /sys/devices/system/node/node3/meminfo
Node 0 MemTotal: 2097152 kB
...
# # cat /sys/devices/system/node/node4/meminfo
Node 0 MemTotal: 2097152 kB
...
With the patch, I get:
fake id: 1 nid: 0
fake id: 1 nid: 0
...
fake id: 2 nid: 0
fake id: 2 nid: 0
...
fake id: 2 nid: 0
created new fake_node with id 3
fake id: 3 nid: 0
fake id: 3 nid: 0
...
fake id: 3 nid: 0
fake id: 3 nid: 0
created new fake_node with id 4
fake id: 4 nid: 1
fake id: 4 nid: 1
...
and so on. With the patch, the fake node sizes are slightly different
from that specified by the user.
# cat /sys/devices/system/node/node3/meminfo
Node 3 MemTotal: 1638400 kB
...
# cat /sys/devices/system/node/node4/meminfo
Node 4 MemTotal: 458752 kB
...
CPU association was tested as mentioned in the previous mail:
Without the patch,
# cat /proc/cmdline
root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
# cat /sys/devices/system/node/node0/cpulist
0-3
# cat /sys/devices/system/node/node1/cpulist
4-7
# cat /sys/devices/system/node/node4/cpulist
#
With the patch,
# cat /proc/cmdline
root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
# cat /sys/devices/system/node/node0/cpulist
0-3
# cat /sys/devices/system/node/node1/cpulist
# cat /sys/devices/system/node/node4/cpulist
4-7
Signed-off-by: Ankita Garg <ankita@in.ibm.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Index: linux-2.6.31-rc5/arch/powerpc/mm/numa.c
===================================================================
--- linux-2.6.31-rc5.orig/arch/powerpc/mm/numa.c
+++ linux-2.6.31-rc5/arch/powerpc/mm/numa.c
@@ -26,6 +26,13 @@
#include <asm/smp.h>
static int numa_enabled = 1;
+static int fake_enabled = 1;
+
+/*
+ * The array maps a real numa node to the first fake node that gets
+ * created on it
+ */
+int fake_numa_node_mapping[MAX_NUMNODES];
static char *cmdline __initdata;
@@ -49,14 +56,29 @@ static int __cpuinit fake_numa_create_ne
unsigned long long mem;
char *p = cmdline;
static unsigned int fake_nid;
+ static unsigned int prev_nid = 0;
static unsigned long long curr_boundary;
/*
* Modify node id, iff we started creating NUMA nodes
* We want to continue from where we left of the last time
*/
- if (fake_nid)
+ if (fake_nid) {
+ /*
+ * Moved over to the next real numa node, increment fake
+ * node number and store the mapping of the real node to
+ * the fake node
+ */
+ if (prev_nid != *nid) {
+ fake_nid++;
+ fake_numa_node_mapping[*nid] = fake_nid;
+ prev_nid = *nid;
+ *nid = fake_nid;
+ return 0;
+ }
*nid = fake_nid;
+ }
+
/*
* In case there are no more arguments to parse, the
* node_id should be the same as the last fake node id
@@ -440,7 +462,7 @@ static int of_drconf_to_nid_single(struc
*/
static int __cpuinit numa_setup_cpu(unsigned long lcpu)
{
- int nid = 0;
+ int nid = 0, new_nid;
struct device_node *cpu = of_get_cpu_node(lcpu, NULL);
if (!cpu) {
@@ -450,8 +472,15 @@ static int __cpuinit numa_setup_cpu(unsi
nid = of_node_to_nid_single(cpu);
+ if (fake_enabled && nid) {
+ new_nid = fake_numa_node_mapping[nid];
+ if (new_nid > 0)
+ nid = new_nid;
+ }
+
if (nid < 0 || !node_online(nid))
nid = any_online_node(NODE_MASK_ALL);
+
out:
map_cpu_to_node(lcpu, nid);
@@ -1005,8 +1034,12 @@ static int __init early_numa(char *p)
numa_debug = 1;
p = strstr(p, "fake=");
- if (p)
+ if (p) {
cmdline = p + strlen("fake=");
+ if (numa_enabled) {
+ fake_enabled = 1;
+ }
+ }
return 0;
}
--
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 6:09 ` Ankita Garg
0 siblings, 0 replies; 12+ messages in thread
From: Ankita Garg @ 2009-09-02 6:09 UTC (permalink / raw)
To: LKML, linuxppc-dev, Benjamin Herrenschmidt, Balbir Singh; +Cc: ankita
Hi,
Below is a patch to fix a couple of issues with fake numa node creation
on ppc:
1) Presently, fake nodes could be created such that real numa node
boundaries are not respected. So a node could have lmbs that belong to
different real nodes.
2) The cpu association is broken. On a JS22 blade for example, which is
a 2-node numa machine, I get the following:
# cat /proc/cmdline
root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
# cat /sys/devices/system/node/node0/cpulist
0-3
# cat /sys/devices/system/node/node1/cpulist
4-7
# cat /sys/devices/system/node/node4/cpulist
#
So, though the cpus 4-7 should have been associated with node4, they
still belong to node1. The patch works by recording a real numa node
boundary and incrementing the fake node count. At the same time, a
mapping is stored from the real numa node to the first fake node that
gets created on it.
Tested the patch with the following commandlines:
numa=fake=2G,4G,6G,8G,10G,12G,14G,16G
numa=fake=3G,6G,10G,16G
numa=fake=4G
numa=fake=
For testing if the fake nodes respect the real node boundaries, I added
some debug printks in the node creation path. Without the patch, for the
commandline numa=fake=2G,4G,6G,8G,10G,12G,14G,16G, this is what I got:
fake id: 1 nid: 0
fake id: 1 nid: 0
...
fake id: 2 nid: 0
fake id: 2 nid: 0
...
fake id: 2 nid: 0
created new fake_node with id 3
fake id: 3 nid: 0
fake id: 3 nid: 0
...
fake id: 3 nid: 0
fake id: 3 nid: 0
fake id: 3 nid: 1
fake id: 3 nid: 1
...
created new fake_node with id 4
fake id: 4 nid: 1
fake id: 4 nid: 1
...
and so on. So, fake node 3 encompasses real node 0 & 1. Also,
# cat /sys/devices/system/node/node3/meminfo
Node 0 MemTotal: 2097152 kB
...
# # cat /sys/devices/system/node/node4/meminfo
Node 0 MemTotal: 2097152 kB
...
With the patch, I get:
fake id: 1 nid: 0
fake id: 1 nid: 0
...
fake id: 2 nid: 0
fake id: 2 nid: 0
...
fake id: 2 nid: 0
created new fake_node with id 3
fake id: 3 nid: 0
fake id: 3 nid: 0
...
fake id: 3 nid: 0
fake id: 3 nid: 0
created new fake_node with id 4
fake id: 4 nid: 1
fake id: 4 nid: 1
...
and so on. With the patch, the fake node sizes are slightly different
from that specified by the user.
# cat /sys/devices/system/node/node3/meminfo
Node 3 MemTotal: 1638400 kB
...
# cat /sys/devices/system/node/node4/meminfo
Node 4 MemTotal: 458752 kB
...
CPU association was tested as mentioned in the previous mail:
Without the patch,
# cat /proc/cmdline
root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
# cat /sys/devices/system/node/node0/cpulist
0-3
# cat /sys/devices/system/node/node1/cpulist
4-7
# cat /sys/devices/system/node/node4/cpulist
#
With the patch,
# cat /proc/cmdline
root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
# cat /sys/devices/system/node/node0/cpulist
0-3
# cat /sys/devices/system/node/node1/cpulist
# cat /sys/devices/system/node/node4/cpulist
4-7
Signed-off-by: Ankita Garg <ankita@in.ibm.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Index: linux-2.6.31-rc5/arch/powerpc/mm/numa.c
===================================================================
--- linux-2.6.31-rc5.orig/arch/powerpc/mm/numa.c
+++ linux-2.6.31-rc5/arch/powerpc/mm/numa.c
@@ -26,6 +26,13 @@
#include <asm/smp.h>
static int numa_enabled = 1;
+static int fake_enabled = 1;
+
+/*
+ * The array maps a real numa node to the first fake node that gets
+ * created on it
+ */
+int fake_numa_node_mapping[MAX_NUMNODES];
static char *cmdline __initdata;
@@ -49,14 +56,29 @@ static int __cpuinit fake_numa_create_ne
unsigned long long mem;
char *p = cmdline;
static unsigned int fake_nid;
+ static unsigned int prev_nid = 0;
static unsigned long long curr_boundary;
/*
* Modify node id, iff we started creating NUMA nodes
* We want to continue from where we left of the last time
*/
- if (fake_nid)
+ if (fake_nid) {
+ /*
+ * Moved over to the next real numa node, increment fake
+ * node number and store the mapping of the real node to
+ * the fake node
+ */
+ if (prev_nid != *nid) {
+ fake_nid++;
+ fake_numa_node_mapping[*nid] = fake_nid;
+ prev_nid = *nid;
+ *nid = fake_nid;
+ return 0;
+ }
*nid = fake_nid;
+ }
+
/*
* In case there are no more arguments to parse, the
* node_id should be the same as the last fake node id
@@ -440,7 +462,7 @@ static int of_drconf_to_nid_single(struc
*/
static int __cpuinit numa_setup_cpu(unsigned long lcpu)
{
- int nid = 0;
+ int nid = 0, new_nid;
struct device_node *cpu = of_get_cpu_node(lcpu, NULL);
if (!cpu) {
@@ -450,8 +472,15 @@ static int __cpuinit numa_setup_cpu(unsi
nid = of_node_to_nid_single(cpu);
+ if (fake_enabled && nid) {
+ new_nid = fake_numa_node_mapping[nid];
+ if (new_nid > 0)
+ nid = new_nid;
+ }
+
if (nid < 0 || !node_online(nid))
nid = any_online_node(NODE_MASK_ALL);
+
out:
map_cpu_to_node(lcpu, nid);
@@ -1005,8 +1034,12 @@ static int __init early_numa(char *p)
numa_debug = 1;
p = strstr(p, "fake=");
- if (p)
+ if (p) {
cmdline = p + strlen("fake=");
+ if (numa_enabled) {
+ fake_enabled = 1;
+ }
+ }
return 0;
}
--
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
2009-09-02 6:09 ` Ankita Garg
@ 2009-09-02 6:37 ` David Rientjes
-1 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2009-09-02 6:37 UTC (permalink / raw)
To: Ankita Garg
Cc: LKML, linuxppc-dev, Benjamin Herrenschmidt, Balbir Singh,
Vaidyanathan Srinivasan
On Wed, 2 Sep 2009, Ankita Garg wrote:
> Hi,
>
> Below is a patch to fix a couple of issues with fake numa node creation
> on ppc:
>
> 1) Presently, fake nodes could be created such that real numa node
> boundaries are not respected. So a node could have lmbs that belong to
> different real nodes.
>
On x86_64, we can use numa=off to completely disable NUMA so that all
memory and all cpus are mapped to a single node 0. That's an extreme
example of the above and is totally permissible.
> 2) The cpu association is broken. On a JS22 blade for example, which is
> a 2-node numa machine, I get the following:
>
> # cat /proc/cmdline
> root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
> # cat /sys/devices/system/node/node0/cpulist
> 0-3
> # cat /sys/devices/system/node/node1/cpulist
> 4-7
> # cat /sys/devices/system/node/node4/cpulist
>
> #
>
This doesn't show what the true NUMA topology of the machine is, could you
please post the output of
$ cat /sys/devices/system/node/node*/cpulist
$ cat /sys/devices/system/node/node*/distance
$ ls -d /sys/devices/system/node/node*/cpu[0-8]
from a normal boot without any numa=fake?
> So, though the cpus 4-7 should have been associated with node4, they
> still belong to node1. The patch works by recording a real numa node
> boundary and incrementing the fake node count. At the same time, a
> mapping is stored from the real numa node to the first fake node that
> gets created on it.
>
If there are multiple fake nodes on a real physical node, all cpus in that
node should appear in the cpulist for each fake node for which it has
local distance.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 6:37 ` David Rientjes
0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2009-09-02 6:37 UTC (permalink / raw)
To: Ankita Garg; +Cc: linuxppc-dev, LKML
On Wed, 2 Sep 2009, Ankita Garg wrote:
> Hi,
>
> Below is a patch to fix a couple of issues with fake numa node creation
> on ppc:
>
> 1) Presently, fake nodes could be created such that real numa node
> boundaries are not respected. So a node could have lmbs that belong to
> different real nodes.
>
On x86_64, we can use numa=off to completely disable NUMA so that all
memory and all cpus are mapped to a single node 0. That's an extreme
example of the above and is totally permissible.
> 2) The cpu association is broken. On a JS22 blade for example, which is
> a 2-node numa machine, I get the following:
>
> # cat /proc/cmdline
> root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
> # cat /sys/devices/system/node/node0/cpulist
> 0-3
> # cat /sys/devices/system/node/node1/cpulist
> 4-7
> # cat /sys/devices/system/node/node4/cpulist
>
> #
>
This doesn't show what the true NUMA topology of the machine is, could you
please post the output of
$ cat /sys/devices/system/node/node*/cpulist
$ cat /sys/devices/system/node/node*/distance
$ ls -d /sys/devices/system/node/node*/cpu[0-8]
from a normal boot without any numa=fake?
> So, though the cpus 4-7 should have been associated with node4, they
> still belong to node1. The patch works by recording a real numa node
> boundary and incrementing the fake node count. At the same time, a
> mapping is stored from the real numa node to the first fake node that
> gets created on it.
>
If there are multiple fake nodes on a real physical node, all cpus in that
node should appear in the cpulist for each fake node for which it has
local distance.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
2009-09-02 6:37 ` David Rientjes
@ 2009-09-02 8:03 ` Ankita Garg
-1 siblings, 0 replies; 12+ messages in thread
From: Ankita Garg @ 2009-09-02 8:03 UTC (permalink / raw)
To: David Rientjes
Cc: LKML, linuxppc-dev, Benjamin Herrenschmidt, Balbir Singh,
Vaidyanathan Srinivasan
Hi David,
On Tue, Sep 01, 2009 at 11:37:05PM -0700, David Rientjes wrote:
> On Wed, 2 Sep 2009, Ankita Garg wrote:
>
> > Hi,
> >
> > Below is a patch to fix a couple of issues with fake numa node creation
> > on ppc:
> >
> > 1) Presently, fake nodes could be created such that real numa node
> > boundaries are not respected. So a node could have lmbs that belong to
> > different real nodes.
> >
>
> On x86_64, we can use numa=off to completely disable NUMA so that all
> memory and all cpus are mapped to a single node 0. That's an extreme
> example of the above and is totally permissible.
>
> > 2) The cpu association is broken. On a JS22 blade for example, which is
> > a 2-node numa machine, I get the following:
> >
> > # cat /proc/cmdline
> > root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
> > # cat /sys/devices/system/node/node0/cpulist
> > 0-3
> > # cat /sys/devices/system/node/node1/cpulist
> > 4-7
> > # cat /sys/devices/system/node/node4/cpulist
> >
> > #
> >
>
> This doesn't show what the true NUMA topology of the machine is, could you
> please post the output of
>
> $ cat /sys/devices/system/node/node*/cpulist
> $ cat /sys/devices/system/node/node*/distance
> $ ls -d /sys/devices/system/node/node*/cpu[0-8]
>
> from a normal boot without any numa=fake?
>
Heres the output as requested by you:
# ls /sys/devices/system/node/
has_cpu has_normal_memory node0 node1 online possible
# cat /sys/devices/system/node/node*/cpulist
0-3
4-7
# cat /sys/devices/system/node/node*/distance
10 20
20 10
# ls -d /sys/devices/system/node/node*/cpu[0-8]
/sys/devices/system/node/node0/cpu0 /sys/devices/system/node/node0/cpu3
/sys/devices/system/node/node1/cpu6
/sys/devices/system/node/node0/cpu1 /sys/devices/system/node/node1/cpu4
/sys/devices/system/node/node1/cpu7
/sys/devices/system/node/node0/cpu2 /sys/devices/system/node/node1/cpu5
> > So, though the cpus 4-7 should have been associated with node4, they
> > still belong to node1. The patch works by recording a real numa node
> > boundary and incrementing the fake node count. At the same time, a
> > mapping is stored from the real numa node to the first fake node that
> > gets created on it.
> >
>
> If there are multiple fake nodes on a real physical node, all cpus in that
> node should appear in the cpulist for each fake node for which it has
> local distance.
Currently, the behavior of fake numa is not so on x86 as well? Below is
a sample output from a single node x86 system booted with numa=fake=8:
# cat node0/cpulist
# cat node1/cpulist
...
# cat node6/cpulist
# cat node7/cpulist
0-7
Presently, just fixing the cpu association issue with ppc, as explained
in my previous mail.
--
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 8:03 ` Ankita Garg
0 siblings, 0 replies; 12+ messages in thread
From: Ankita Garg @ 2009-09-02 8:03 UTC (permalink / raw)
To: David Rientjes; +Cc: linuxppc-dev, LKML
Hi David,
On Tue, Sep 01, 2009 at 11:37:05PM -0700, David Rientjes wrote:
> On Wed, 2 Sep 2009, Ankita Garg wrote:
>
> > Hi,
> >
> > Below is a patch to fix a couple of issues with fake numa node creation
> > on ppc:
> >
> > 1) Presently, fake nodes could be created such that real numa node
> > boundaries are not respected. So a node could have lmbs that belong to
> > different real nodes.
> >
>
> On x86_64, we can use numa=off to completely disable NUMA so that all
> memory and all cpus are mapped to a single node 0. That's an extreme
> example of the above and is totally permissible.
>
> > 2) The cpu association is broken. On a JS22 blade for example, which is
> > a 2-node numa machine, I get the following:
> >
> > # cat /proc/cmdline
> > root=/dev/sda6 numa=fake=2G,4G,,6G,8G,10G,12G,14G,16G
> > # cat /sys/devices/system/node/node0/cpulist
> > 0-3
> > # cat /sys/devices/system/node/node1/cpulist
> > 4-7
> > # cat /sys/devices/system/node/node4/cpulist
> >
> > #
> >
>
> This doesn't show what the true NUMA topology of the machine is, could you
> please post the output of
>
> $ cat /sys/devices/system/node/node*/cpulist
> $ cat /sys/devices/system/node/node*/distance
> $ ls -d /sys/devices/system/node/node*/cpu[0-8]
>
> from a normal boot without any numa=fake?
>
Heres the output as requested by you:
# ls /sys/devices/system/node/
has_cpu has_normal_memory node0 node1 online possible
# cat /sys/devices/system/node/node*/cpulist
0-3
4-7
# cat /sys/devices/system/node/node*/distance
10 20
20 10
# ls -d /sys/devices/system/node/node*/cpu[0-8]
/sys/devices/system/node/node0/cpu0 /sys/devices/system/node/node0/cpu3
/sys/devices/system/node/node1/cpu6
/sys/devices/system/node/node0/cpu1 /sys/devices/system/node/node1/cpu4
/sys/devices/system/node/node1/cpu7
/sys/devices/system/node/node0/cpu2 /sys/devices/system/node/node1/cpu5
> > So, though the cpus 4-7 should have been associated with node4, they
> > still belong to node1. The patch works by recording a real numa node
> > boundary and incrementing the fake node count. At the same time, a
> > mapping is stored from the real numa node to the first fake node that
> > gets created on it.
> >
>
> If there are multiple fake nodes on a real physical node, all cpus in that
> node should appear in the cpulist for each fake node for which it has
> local distance.
Currently, the behavior of fake numa is not so on x86 as well? Below is
a sample output from a single node x86 system booted with numa=fake=8:
# cat node0/cpulist
# cat node1/cpulist
...
# cat node6/cpulist
# cat node7/cpulist
0-7
Presently, just fixing the cpu association issue with ppc, as explained
in my previous mail.
--
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
2009-09-02 8:03 ` Ankita Garg
@ 2009-09-02 19:36 ` David Rientjes
-1 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2009-09-02 19:36 UTC (permalink / raw)
To: Ankita Garg
Cc: LKML, linuxppc-dev, Benjamin Herrenschmidt, Balbir Singh,
Vaidyanathan Srinivasan
On Wed, 2 Sep 2009, Ankita Garg wrote:
> Currently, the behavior of fake numa is not so on x86 as well? Below is
> a sample output from a single node x86 system booted with numa=fake=8:
>
> # cat node0/cpulist
>
> # cat node1/cpulist
>
> ...
> # cat node6/cpulist
>
> # cat node7/cpulist
> 0-7
>
> Presently, just fixing the cpu association issue with ppc, as explained
> in my previous mail.
>
Right, I'm proposing an alternate mapping scheme (which we've used for
years) for both platforms such that a cpu is bound (and is set in
cpumask_of_node()) to each fake node with which it has physical affinity.
That is the only way for zonelist ordering in node order, task migration
from offlined cpus, correct sched domains, etc. I can propose a patchset
for x86_64 to do exactly this if there aren't any objections and I hope
you'll help do ppc.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 19:36 ` David Rientjes
0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2009-09-02 19:36 UTC (permalink / raw)
To: Ankita Garg; +Cc: linuxppc-dev, LKML
On Wed, 2 Sep 2009, Ankita Garg wrote:
> Currently, the behavior of fake numa is not so on x86 as well? Below is
> a sample output from a single node x86 system booted with numa=fake=8:
>
> # cat node0/cpulist
>
> # cat node1/cpulist
>
> ...
> # cat node6/cpulist
>
> # cat node7/cpulist
> 0-7
>
> Presently, just fixing the cpu association issue with ppc, as explained
> in my previous mail.
>
Right, I'm proposing an alternate mapping scheme (which we've used for
years) for both platforms such that a cpu is bound (and is set in
cpumask_of_node()) to each fake node with which it has physical affinity.
That is the only way for zonelist ordering in node order, task migration
from offlined cpus, correct sched domains, etc. I can propose a patchset
for x86_64 to do exactly this if there aren't any objections and I hope
you'll help do ppc.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
2009-09-02 19:36 ` David Rientjes
@ 2009-09-02 19:56 ` Balbir Singh
-1 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2009-09-02 19:56 UTC (permalink / raw)
To: David Rientjes
Cc: Ankita Garg, LKML, linuxppc-dev, Benjamin Herrenschmidt,
Vaidyanathan Srinivasan
On Thu, Sep 3, 2009 at 1:06 AM, David Rientjes<rientjes@google.com> wrote:
> On Wed, 2 Sep 2009, Ankita Garg wrote:
>
>> Currently, the behavior of fake numa is not so on x86 as well? Below is
>> a sample output from a single node x86 system booted with numa=fake=8:
>>
>> # cat node0/cpulist
>>
>> # cat node1/cpulist
>>
>> ...
>> # cat node6/cpulist
>>
>> # cat node7/cpulist
>> 0-7
>>
>> Presently, just fixing the cpu association issue with ppc, as explained
>> in my previous mail.
>>
>
> Right, I'm proposing an alternate mapping scheme (which we've used for
> years) for both platforms such that a cpu is bound (and is set in
> cpumask_of_node()) to each fake node with which it has physical affinity.
> That is the only way for zonelist ordering in node order, task migration
> from offlined cpus, correct sched domains, etc. I can propose a patchset
> for x86_64 to do exactly this if there aren't any objections and I hope
> you'll help do ppc.
Sounds interesting, I'd definitely be interested in seeing your
proposal, but I would think of that as additional development on top
of this patch
Balbir Singh.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 19:56 ` Balbir Singh
0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2009-09-02 19:56 UTC (permalink / raw)
To: David Rientjes; +Cc: linuxppc-dev, Ankita Garg, LKML
On Thu, Sep 3, 2009 at 1:06 AM, David Rientjes<rientjes@google.com> wrote:
> On Wed, 2 Sep 2009, Ankita Garg wrote:
>
>> Currently, the behavior of fake numa is not so on x86 as well? Below is
>> a sample output from a single node x86 system booted with numa=3Dfake=3D=
8:
>>
>> # cat node0/cpulist
>>
>> # cat node1/cpulist
>>
>> ...
>> # cat node6/cpulist
>>
>> # cat node7/cpulist
>> 0-7
>>
>> Presently, just fixing the cpu association issue with ppc, as explained
>> in my previous mail.
>>
>
> Right, I'm proposing an alternate mapping scheme (which we've used for
> years) for both platforms such that a cpu is bound (and is set in
> cpumask_of_node()) to each fake node with which it has physical affinity.
> That is the only way for zonelist ordering in node order, task migration
> from offlined cpus, correct sched domains, etc. =A0I can propose a patchs=
et
> for x86_64 to do exactly this if there aren't any objections and I hope
> you'll help do ppc.
Sounds interesting, I'd definitely be interested in seeing your
proposal, but I would think of that as additional development on top
of this patch
Balbir Singh.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
2009-09-02 19:56 ` Balbir Singh
@ 2009-09-02 20:09 ` David Rientjes
-1 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2009-09-02 20:09 UTC (permalink / raw)
To: Balbir Singh
Cc: Ankita Garg, LKML, linuxppc-dev, Benjamin Herrenschmidt,
Vaidyanathan Srinivasan
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1023 bytes --]
On Thu, 3 Sep 2009, Balbir Singh wrote:
> > Right, I'm proposing an alternate mapping scheme (which we've used for
> > years) for both platforms such that a cpu is bound (and is set in
> > cpumask_of_node()) to each fake node with which it has physical affinity.
> > That is the only way for zonelist ordering in node order, task migration
> > from offlined cpus, correct sched domains, etc. I can propose a patchset
> > for x86_64 to do exactly this if there aren't any objections and I hope
> > you'll help do ppc.
>
> Sounds interesting, I'd definitely be interested in seeing your
> proposal, but I would think of that as additional development on top
> of this patch
>
Absolutely. I'm not familiar with numa=fake on ppc, but if cpus are being
bound to nodes with which they don't have affinity, it definitely warrants
a fix such as this (although the initial value for fake_enabled looks
wrong and fake_numa_node_mapping[] can be __cpuinitdata). I'll cc you,
Ben, and Ankita on the x86_64 patches. Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix fake numa on ppc
@ 2009-09-02 20:09 ` David Rientjes
0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2009-09-02 20:09 UTC (permalink / raw)
To: Balbir Singh; +Cc: linuxppc-dev, Ankita Garg, LKML
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1023 bytes --]
On Thu, 3 Sep 2009, Balbir Singh wrote:
> > Right, I'm proposing an alternate mapping scheme (which we've used for
> > years) for both platforms such that a cpu is bound (and is set in
> > cpumask_of_node()) to each fake node with which it has physical affinity.
> > That is the only way for zonelist ordering in node order, task migration
> > from offlined cpus, correct sched domains, etc. I can propose a patchset
> > for x86_64 to do exactly this if there aren't any objections and I hope
> > you'll help do ppc.
>
> Sounds interesting, I'd definitely be interested in seeing your
> proposal, but I would think of that as additional development on top
> of this patch
>
Absolutely. I'm not familiar with numa=fake on ppc, but if cpus are being
bound to nodes with which they don't have affinity, it definitely warrants
a fix such as this (although the initial value for fake_enabled looks
wrong and fake_numa_node_mapping[] can be __cpuinitdata). I'll cc you,
Ben, and Ankita on the x86_64 patches. Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-09-02 20:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-02 6:09 [PATCH v2] Fix fake numa on ppc Ankita Garg
2009-09-02 6:09 ` Ankita Garg
2009-09-02 6:37 ` David Rientjes
2009-09-02 6:37 ` David Rientjes
2009-09-02 8:03 ` Ankita Garg
2009-09-02 8:03 ` Ankita Garg
2009-09-02 19:36 ` David Rientjes
2009-09-02 19:36 ` David Rientjes
2009-09-02 19:56 ` Balbir Singh
2009-09-02 19:56 ` Balbir Singh
2009-09-02 20:09 ` David Rientjes
2009-09-02 20:09 ` David Rientjes
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.