All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3][RFC] NUMA: add host side pinning
@ 2010-06-23 21:09 Andre Przywara
  2010-06-23 21:09 ` [PATCH 1/3] NUMA: add Linux libnuma detection Andre Przywara
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Andre Przywara @ 2010-06-23 21:09 UTC (permalink / raw)
  To: kvm; +Cc: anthony, agraf

Hi,

these three patches add basic NUMA pinning to KVM. According to a user
provided assignment parts of the guest's memory will be bound to different
host nodes. This should increase performance in large virtual machines
and on loaded hosts.
These patches are quite basic (but work) and I send them as RFC to get
some feedback before implementing stuff in vain.

To use it you need to provide a guest NUMA configuration, this could be
as simple as "-numa node -numa node" to give two nodes in the guest. Then
you pin these nodes on a separate command line option to different host
nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
This separation of host and guest config sounds a bit complicated, but
was demanded last time I submitted a similar version.
I refrained from binding the vCPUs to physical CPUs for now, but this
can be added later with an "cpubind" option to "-numa pin,". Also this
could be done from a management application by using sched_setaffinity().

Please note that this is currently made for qemu-kvm, although I am not
up-to-date regarding the curent status of upstreams QEMU's true SMP
capabilities. The final patch will be made against upstream QEMU anyway.
Also this is currently for Linux hosts (any other KVM hosts alive?) and
for PC guests only. I think both can be fixed easily if someone requests
it (and gives me a pointer to further information).

Please comment on the approach in general and the implementation.

Thanks and Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] NUMA: add Linux libnuma detection
  2010-06-23 21:09 [PATCH 0/3][RFC] NUMA: add host side pinning Andre Przywara
@ 2010-06-23 21:09 ` Andre Przywara
  2010-06-23 21:09 ` [PATCH 2/3] NUMA: add parsing of host NUMA pin option Andre Przywara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Andre Przywara @ 2010-06-23 21:09 UTC (permalink / raw)
  To: kvm; +Cc: anthony, agraf, Andre Przywara

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Currently this is Linux only, but can be
extended later should the need for other interfaces come up.
Can be enabled or disabled on the command line, default is use if
available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
 configure |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 08883e7..3e2dc5b 100755
--- a/configure
+++ b/configure
@@ -281,6 +281,7 @@ vnc_sasl=""
 xen=""
 linux_aio=""
 vhost_net=""
+numa="yes"
 
 gprof="no"
 debug_tcg="no"
@@ -721,6 +722,10 @@ for opt do
   ;;
   --enable-vhost-net) vhost_net="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   --*dir)
   ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
@@ -905,6 +910,8 @@ echo "  --enable-docs            enable documentation build"
 echo "  --disable-docs           disable documentation build"
 echo "  --disable-vhost-net      disable vhost-net acceleration support"
 echo "  --enable-vhost-net       enable vhost-net acceleration support"
+echo "  --disable-numa           disable host Linux NUMA support"
+echo "  --enable-numa            enable host Linux NUMA support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -1962,6 +1969,28 @@ if compile_prog "" "" ; then
   signalfd=yes
 fi
 
+##########################################
+# libnuma probe
+
+if test "$numa" = "yes" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "linux NUMA (install numactl?)"
+    fi
+    numa=no
+  fi
+fi
+
+
 # check if eventfd is supported
 eventfd=no
 cat > $TMPC << EOF
@@ -2245,6 +2274,7 @@ echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
 echo "uuid support      $uuid"
 echo "vhost-net support $vhost_net"
+echo "NUMA host support $numa"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2468,6 +2498,9 @@ if test $cpu_emulation = "yes"; then
 else
   echo "CONFIG_NO_CPU_EMULATION=y" >> $config_host_mak
 fi
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
-- 
1.6.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] NUMA: add parsing of host NUMA pin option
  2010-06-23 21:09 [PATCH 0/3][RFC] NUMA: add host side pinning Andre Przywara
  2010-06-23 21:09 ` [PATCH 1/3] NUMA: add Linux libnuma detection Andre Przywara
@ 2010-06-23 21:09 ` Andre Przywara
  2010-06-23 21:09 ` [PATCH 3/3] NUMA: realize NUMA memory pinning Andre Przywara
  2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
  3 siblings, 0 replies; 20+ messages in thread
From: Andre Przywara @ 2010-06-23 21:09 UTC (permalink / raw)
  To: kvm; +Cc: anthony, agraf, Andre Przywara

Introduce another variant of QEMU's -numa option to allow host node
pinning. This was separated from the guest relevant configuration
to make it cleaner to use, especially for management applications.
The syntax is -numa pin,nodeid=n,host=m to assign the guest node n
to host node m.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
 sysemu.h |    1 +
 vl.c     |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/sysemu.h b/sysemu.h
index 6018d97..1b3f77b 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -139,6 +139,7 @@ extern long hpagesize;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int node_pin[MAX_NODES];
 
 #define MAX_OPTION_ROMS 16
 extern const char *option_rom[MAX_OPTION_ROMS];
diff --git a/vl.c b/vl.c
index 0ee963c..02e0bed 100644
--- a/vl.c
+++ b/vl.c
@@ -234,6 +234,7 @@ int boot_menu;
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 uint64_t node_cpumask[MAX_NODES];
+int node_pin[MAX_NODES];
 
 static QEMUTimer *nographic_timer;
 
@@ -771,6 +772,22 @@ static void numa_add(const char *optarg)
             node_cpumask[nodenr] = value;
         }
         nb_numa_nodes++;
+    } else if (!strcmp(option, "pin")) {
+        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
+            fprintf(stderr, "error: need nodeid for -numa pin,...\n");
+            exit(1);
+        } else {
+            nodenr = strtoull(option, NULL, 10);
+            if (nodenr >= nb_numa_nodes) {
+            	fprintf(stderr, "nodeid exceed specified NUMA nodes\n");
+            	exit(1);
+            }
+        }
+        if (get_param_value(option, 128, "host", optarg) == 0) {
+            node_pin[nodenr] = -1;
+        } else {
+            node_pin[nodenr] = strtoull(option, NULL, 10);
+        }
     }
     return;
 }
@@ -1873,6 +1890,7 @@ int main(int argc, char **argv, char **envp)
     for (i = 0; i < MAX_NODES; i++) {
         node_mem[i] = 0;
         node_cpumask[i] = 0;
+        node_pin[i] = -1;
     }
 
     assigned_devices_index = 0;
-- 
1.6.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] NUMA: realize NUMA memory pinning
  2010-06-23 21:09 [PATCH 0/3][RFC] NUMA: add host side pinning Andre Przywara
  2010-06-23 21:09 ` [PATCH 1/3] NUMA: add Linux libnuma detection Andre Przywara
  2010-06-23 21:09 ` [PATCH 2/3] NUMA: add parsing of host NUMA pin option Andre Przywara
@ 2010-06-23 21:09 ` Andre Przywara
  2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
  3 siblings, 0 replies; 20+ messages in thread
From: Andre Przywara @ 2010-06-23 21:09 UTC (permalink / raw)
  To: kvm; +Cc: anthony, agraf, Andre Przywara

According to the user-provided assignment bind the respective part
of the guest's memory to the given host node. This uses Linux'
libnuma interface to realize the pinning right after the allocation.
Failures are not fatal, but produce a warning.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
 hw/pc.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 1f61609..b6d4d7a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -41,6 +41,11 @@
 #include "device-assignment.h"
 #include "kvm.h"
 
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#endif
+
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
@@ -874,6 +879,49 @@ void pc_cpus_init(const char *cpu_model)
     }
 }
 
+static void bind_numa(ram_addr_t ram_addr, ram_addr_t border_4g,
+                      int below_4g)
+{
+#ifdef CONFIG_NUMA
+    int i, skip;
+    char* ram_ptr;
+    nodemask_t nodemask;
+    ram_addr_t len, ram_offset;
+
+    ram_ptr = qemu_get_ram_ptr(ram_addr);
+
+    ram_offset = 0;
+    skip = !below_4g;
+    for (i = 0; i < nb_numa_nodes; i++) {
+        len = node_mem[i];
+        if (ram_offset <= border_4g && ram_offset + len > border_4g) {
+            len = border_4g - ram_offset;
+   	        if (skip) {
+                ram_offset = 0;
+                len = node_mem[i] - len;
+                skip = 0;
+            }
+        }
+        if (skip && ram_offset + len <= border_4g) {
+            ram_offset += len;
+            continue;
+        }
+        if (!skip && node_pin[i] >= 0) {
+            nodemask_zero(&nodemask);
+            nodemask_set_compat(&nodemask, node_pin[i]);
+           	if (mbind(ram_ptr + ram_offset, len, MPOL_BIND,
+           	    nodemask.n, NUMA_NUM_NODES, 0)) {
+           	        perror("mbind");
+            }
+        }
+        ram_offset += len;
+        if (below_4g && ram_offset >= border_4g)
+            return;
+    }
+#endif
+    return;
+}
+
 void pc_memory_init(ram_addr_t ram_size,
                     const char *kernel_filename,
                     const char *kernel_cmdline,
@@ -906,6 +954,8 @@ void pc_memory_init(ram_addr_t ram_size,
                  below_4g_mem_size - 0x100000,
                  ram_addr + 0x100000);
 
+    bind_numa(ram_addr, below_4g_mem_size, 1);
+
     /* above 4giga memory allocation */
     if (above_4g_mem_size > 0) {
 #if TARGET_PHYS_ADDR_BITS == 32
@@ -915,6 +965,7 @@ void pc_memory_init(ram_addr_t ram_size,
         cpu_register_physical_memory(0x100000000ULL,
                                      above_4g_mem_size,
                                      ram_addr);
+        bind_numa(ram_addr, below_4g_mem_size, 0);
 #endif
     }
 
-- 
1.6.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-23 21:09 [PATCH 0/3][RFC] NUMA: add host side pinning Andre Przywara
                   ` (2 preceding siblings ...)
  2010-06-23 21:09 ` [PATCH 3/3] NUMA: realize NUMA memory pinning Andre Przywara
@ 2010-06-23 22:21 ` Anthony Liguori
  2010-06-23 22:29   ` Alexander Graf
                     ` (2 more replies)
  3 siblings, 3 replies; 20+ messages in thread
From: Anthony Liguori @ 2010-06-23 22:21 UTC (permalink / raw)
  To: Andre Przywara; +Cc: kvm, agraf

On 06/23/2010 04:09 PM, Andre Przywara wrote:
> Hi,
>
> these three patches add basic NUMA pinning to KVM. According to a user
> provided assignment parts of the guest's memory will be bound to different
> host nodes. This should increase performance in large virtual machines
> and on loaded hosts.
> These patches are quite basic (but work) and I send them as RFC to get
> some feedback before implementing stuff in vain.
>
> To use it you need to provide a guest NUMA configuration, this could be
> as simple as "-numa node -numa node" to give two nodes in the guest. Then
> you pin these nodes on a separate command line option to different host
> nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
> This separation of host and guest config sounds a bit complicated, but
> was demanded last time I submitted a similar version.
> I refrained from binding the vCPUs to physical CPUs for now, but this
> can be added later with an "cpubind" option to "-numa pin,". Also this
> could be done from a management application by using sched_setaffinity().
>
> Please note that this is currently made for qemu-kvm, although I am not
> up-to-date regarding the curent status of upstreams QEMU's true SMP
> capabilities. The final patch will be made against upstream QEMU anyway.
> Also this is currently for Linux hosts (any other KVM hosts alive?) and
> for PC guests only. I think both can be fixed easily if someone requests
> it (and gives me a pointer to further information).
>
> Please comment on the approach in general and the implementation.
>    

If we extended integrated -mem-path with -numa such that a different 
path could be used with each numa node (and we let an explicit file be 
specified instead of just a directory), then if I understand correctly, 
we could use numactl without any specific integration in qemu.  Does 
this sound correct?

IOW:

qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem 
-numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem

It's then possible to say:

numactl --file /dev/shm/node0.mem --interleave=0,1
numactl --file /dev/shm/node1.mem --membind=2

I think this approach is nicer because it gives the user a lot more 
flexibility without having us chase other tools like numactl.  For 
instance, your patches only support pinning and not interleaving.

Regards,

Anthony Liguori

> Thanks and Regards,
> Andre.
>
>    


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
@ 2010-06-23 22:29   ` Alexander Graf
  2010-06-24 10:58     ` Andre Przywara
  2010-06-24  6:44   ` Andre Przywara
  2010-06-24 13:14   ` Andi Kleen
  2 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2010-06-23 22:29 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Andre Przywara, kvm


On 24.06.2010, at 00:21, Anthony Liguori wrote:

> On 06/23/2010 04:09 PM, Andre Przywara wrote:
>> Hi,
>> 
>> these three patches add basic NUMA pinning to KVM. According to a user
>> provided assignment parts of the guest's memory will be bound to different
>> host nodes. This should increase performance in large virtual machines
>> and on loaded hosts.
>> These patches are quite basic (but work) and I send them as RFC to get
>> some feedback before implementing stuff in vain.
>> 
>> To use it you need to provide a guest NUMA configuration, this could be
>> as simple as "-numa node -numa node" to give two nodes in the guest. Then
>> you pin these nodes on a separate command line option to different host
>> nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
>> This separation of host and guest config sounds a bit complicated, but
>> was demanded last time I submitted a similar version.
>> I refrained from binding the vCPUs to physical CPUs for now, but this
>> can be added later with an "cpubind" option to "-numa pin,". Also this
>> could be done from a management application by using sched_setaffinity().
>> 
>> Please note that this is currently made for qemu-kvm, although I am not
>> up-to-date regarding the curent status of upstreams QEMU's true SMP
>> capabilities. The final patch will be made against upstream QEMU anyway.
>> Also this is currently for Linux hosts (any other KVM hosts alive?) and
>> for PC guests only. I think both can be fixed easily if someone requests
>> it (and gives me a pointer to further information).
>> 
>> Please comment on the approach in general and the implementation.
>>   
> 
> If we extended integrated -mem-path with -numa such that a different path could be used with each numa node (and we let an explicit file be specified instead of just a directory), then if I understand correctly, we could use numactl without any specific integration in qemu.  Does this sound correct?
> 
> IOW:
> 
> qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
> 
> It's then possible to say:
> 
> numactl --file /dev/shm/node0.mem --interleave=0,1
> numactl --file /dev/shm/node1.mem --membind=2
> 
> I think this approach is nicer because it gives the user a lot more flexibility without having us chase other tools like numactl.  For instance, your patches only support pinning and not interleaving.

Interesting idea.

So who would create the /dev/shm/nodeXX files? I can imagine starting numactl before qemu, even though that's cumbersome. I don't think it's feasible to start numactl after qemu is running. That'd involve way too much magic that I'd prefer qemu to call numactl itself.

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
  2010-06-23 22:29   ` Alexander Graf
@ 2010-06-24  6:44   ` Andre Przywara
  2010-06-24 13:14   ` Andi Kleen
  2 siblings, 0 replies; 20+ messages in thread
From: Andre Przywara @ 2010-06-24  6:44 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm, agraf

Anthony Liguori wrote:
> On 06/23/2010 04:09 PM, Andre Przywara wrote:
>> Hi,
>>
>> these three patches add basic NUMA pinning to KVM. According to a user
>> provided assignment parts of the guest's memory will be bound to different
>> host nodes. This should increase performance in large virtual machines
>> and on loaded hosts.
>> These patches are quite basic (but work) and I send them as RFC to get
>> some feedback before implementing stuff in vain.
>>
 >> ....
>>
>> Please comment on the approach in general and the implementation.
>>    
> 
> If we extended integrated -mem-path with -numa such that a different 
> path could be used with each numa node (and we let an explicit file be 
> specified instead of just a directory), then if I understand correctly, 
> we could use numactl without any specific integration in qemu.  Does 
> this sound correct?
In general, yes. But I consider the whole hugetlbfs approach broken. 
Since 2.6.32 or so you can use MAP_HUGETLB together with MAP_ANONYMOUS 
in mmap() to avoid hugetlbfs at all, and I bet that the future will hold 
transparent hugepages anyway (RHEL6 already has them).
I am not sure whether you want to keep the -memfile option and extend it 
with some pseudo compat glue (faked directory names to be interpreted by 
QEMU) to make it work in the future. But anyway in these cases the 
external numactl approach would not work anymore.

> IOW:
> 
> qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem 
> -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
> 
> It's then possible to say:
> 
> numactl --file /dev/shm/node0.mem --interleave=0,1
> numactl --file /dev/shm/node1.mem --membind=2
> 
> I think this approach is nicer because it gives the user a lot more 
> flexibility without having us chase other tools like numactl.  For 
> instance, your patches only support pinning and not interleaving.
That's right. I put it on the list ;-)

Thanks for the good hint on the huge pages issue, as this is not 
properly handled in the current implementation. I will think about a 
proper way to handle this, but would still opt for a (at least 
partially) QEMU integrated solution.
Still open for discussion, though, as I see your point of avoiding 
duplicate NUMA implementation between numactl and QEMU.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 488-3567-12


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-23 22:29   ` Alexander Graf
@ 2010-06-24 10:58     ` Andre Przywara
  2010-06-24 11:12       ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Andre Przywara @ 2010-06-24 10:58 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Anthony Liguori, kvm

Alexander Graf wrote:
> On 24.06.2010, at 00:21, Anthony Liguori wrote:
> 
>> On 06/23/2010 04:09 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> these three patches add basic NUMA pinning to KVM. According to a user
>>> provided assignment parts of the guest's memory will be bound to different
>>> host nodes. This should increase performance in large virtual machines
>>> and on loaded hosts.
>>> These patches are quite basic (but work) and I send them as RFC to get
>>> some feedback before implementing stuff in vain.
>>>
>>> To use it you need to provide a guest NUMA configuration, this could be
>>> as simple as "-numa node -numa node" to give two nodes in the guest. Then
>>> you pin these nodes on a separate command line option to different host
>>> nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
>>> This separation of host and guest config sounds a bit complicated, but
>>> was demanded last time I submitted a similar version.
>>> I refrained from binding the vCPUs to physical CPUs for now, but this
>>> can be added later with an "cpubind" option to "-numa pin,". Also this
>>> could be done from a management application by using sched_setaffinity().
>>>
>>> Please note that this is currently made for qemu-kvm, although I am not
>>> up-to-date regarding the curent status of upstreams QEMU's true SMP
>>> capabilities. The final patch will be made against upstream QEMU anyway.
>>> Also this is currently for Linux hosts (any other KVM hosts alive?) and
>>> for PC guests only. I think both can be fixed easily if someone requests
>>> it (and gives me a pointer to further information).
>>>
>>> Please comment on the approach in general and the implementation.
>>>   
>> If we extended integrated -mem-path with -numa such that a different path could be used with each numa node (and we let an explicit file be specified instead of just a directory), then if I understand correctly, we could use numactl without any specific integration in qemu.  Does this sound correct?
>>
>> IOW:
>>
>> qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
>>
>> It's then possible to say:
>>
>> numactl --file /dev/shm/node0.mem --interleave=0,1
>> numactl --file /dev/shm/node1.mem --membind=2
>>
>> I think this approach is nicer because it gives the user a lot more flexibility without having us chase other tools like numactl.  For instance, your patches only support pinning and not interleaving.
> 
> Interesting idea.
> 
> So who would create the /dev/shm/nodeXX files?
Currently it is QEMU. It creates a somewhat unique filename, opens and 
unlinks it. The difference would be to name the file after the option 
and to not unlink it.

 > I can imagine starting numactl before qemu, even though that's
 > cumbersome. I don't think it's feasible to start numactl after
 > qemu is running. That'd involve way too much magic that I'd prefer
 > qemu to call numactl itself.
Using the current code the files would not exist before QEMU allocated 
RAM, and after that it could already touch pages before numactl set the 
policy.
To avoid this I'd like to see the pinning done from within QEMU. I am 
not sure whether calling numactl via system() and friends is OK, I'd 
prefer to run the syscalls directly (like in patch 3/3) and pull the 
necessary options into the -numa pin,... command line. We could mimic 
numactl's syntax here.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-24 10:58     ` Andre Przywara
@ 2010-06-24 11:12       ` Avi Kivity
  2010-06-24 11:34         ` Andre Przywara
  2010-06-28 16:17         ` Anthony Liguori
  0 siblings, 2 replies; 20+ messages in thread
From: Avi Kivity @ 2010-06-24 11:12 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Alexander Graf, Anthony Liguori, kvm

On 06/24/2010 01:58 PM, Andre Przywara wrote:
>> So who would create the /dev/shm/nodeXX files?
>
> Currently it is QEMU. It creates a somewhat unique filename, opens and 
> unlinks it. The difference would be to name the file after the option 
> and to not unlink it.
>
> > I can imagine starting numactl before qemu, even though that's
> > cumbersome. I don't think it's feasible to start numactl after
> > qemu is running. That'd involve way too much magic that I'd prefer
> > qemu to call numactl itself.
> Using the current code the files would not exist before QEMU allocated 
> RAM, and after that it could already touch pages before numactl set 
> the policy.

Non-anonymous memory doesn't work well with ksm and transparent 
hugepages.  Is it possible to use anonymous memory rather than file backed?

> To avoid this I'd like to see the pinning done from within QEMU. I am 
> not sure whether calling numactl via system() and friends is OK, I'd 
> prefer to run the syscalls directly (like in patch 3/3) and pull the 
> necessary options into the -numa pin,... command line. We could mimic 
> numactl's syntax here.

Definitely not use system(), but IIRC numactl has a library interface?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-24 11:12       ` Avi Kivity
@ 2010-06-24 11:34         ` Andre Przywara
  2010-06-24 11:42           ` Avi Kivity
  2010-06-25 11:00           ` Jes Sorensen
  2010-06-28 16:17         ` Anthony Liguori
  1 sibling, 2 replies; 20+ messages in thread
From: Andre Przywara @ 2010-06-24 11:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Anthony Liguori, kvm

Avi Kivity wrote:
> On 06/24/2010 01:58 PM, Andre Przywara wrote:
>>> So who would create the /dev/shm/nodeXX files?
>> Currently it is QEMU. It creates a somewhat unique filename, opens and 
>> unlinks it. The difference would be to name the file after the option 
>> and to not unlink it.
>>
>>> I can imagine starting numactl before qemu, even though that's
>>> cumbersome. I don't think it's feasible to start numactl after
>>> qemu is running. That'd involve way too much magic that I'd prefer
>>> qemu to call numactl itself.
>> Using the current code the files would not exist before QEMU allocated 
>> RAM, and after that it could already touch pages before numactl set 
>> the policy.
> 
> Non-anonymous memory doesn't work well with ksm and transparent 
> hugepages.  Is it possible to use anonymous memory rather than file backed?
I'd prefer non-file backed, too. But that is how the current huge pages 
implementation is done. We could use MAP_HUGETLB and declare NUMA _and_ 
huge pages as 2.6.32+ only. Unfortunately I didn't find an easy way to 
detect the presence of the MAP_HUGETLB flag. If the kernel does not 
support it, it seems that mmap silently ignores it and uses 4KB pages 
instead.

>> To avoid this I'd like to see the pinning done from within QEMU. I am 
>> not sure whether calling numactl via system() and friends is OK, I'd 
>> prefer to run the syscalls directly (like in patch 3/3) and pull the 
>> necessary options into the -numa pin,... command line. We could mimic 
>> numactl's syntax here.
> 
> Definitely not use system(), but IIRC numactl has a library interface?
Right, that is what I include in patch 3/3 and use. I got the impression 
Anthony wanted to avoid reimplementing parts of numactl, especially 
enabling the full flexibility of the command line interface (like 
specifying nodes, policies and interleaving).
I want QEMU to use the library and pull the necessary options into the 
-numa pin,... parsing, even if this means duplicating numactl functionality.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-24 11:34         ` Andre Przywara
@ 2010-06-24 11:42           ` Avi Kivity
  2010-06-28 16:20             ` Anthony Liguori
  2010-06-25 11:00           ` Jes Sorensen
  1 sibling, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2010-06-24 11:42 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Alexander Graf, Anthony Liguori, kvm

On 06/24/2010 02:34 PM, Andre Przywara wrote:
>> Non-anonymous memory doesn't work well with ksm and transparent 
>> hugepages.  Is it possible to use anonymous memory rather than file 
>> backed?
>
> I'd prefer non-file backed, too. But that is how the current huge 
> pages implementation is done. We could use MAP_HUGETLB and declare 
> NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find an 
> easy way to detect the presence of the MAP_HUGETLB flag. If the kernel 
> does not support it, it seems that mmap silently ignores it and uses 
> 4KB pages instead.

That sucks, unfortunately it is normal practice.  However it is a soft 
failure, everything works just a bit slower.  So it's probably acceptable.

>>> To avoid this I'd like to see the pinning done from within QEMU. I 
>>> am not sure whether calling numactl via system() and friends is OK, 
>>> I'd prefer to run the syscalls directly (like in patch 3/3) and pull 
>>> the necessary options into the -numa pin,... command line. We could 
>>> mimic numactl's syntax here.
>>
>> Definitely not use system(), but IIRC numactl has a library interface?
> Right, that is what I include in patch 3/3 and use. I got the 
> impression Anthony wanted to avoid reimplementing parts of numactl, 
> especially enabling the full flexibility of the command line interface 
> (like specifying nodes, policies and interleaving).
> I want QEMU to use the library and pull the necessary options into the 
> -numa pin,... parsing, even if this means duplicating numactl 
> functionality.
>

I agree with that.  It's a lot easier to use a single tool than to try 
to integrate things yourself, the unix tradition of grep | sort | uniq 
-c | sort -n notwithstanding.  Especially when one of the tools is qemu.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
  2010-06-23 22:29   ` Alexander Graf
  2010-06-24  6:44   ` Andre Przywara
@ 2010-06-24 13:14   ` Andi Kleen
  2 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2010-06-24 13:14 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Andre Przywara, kvm, agraf

Anthony Liguori <anthony@codemonkey.ws> writes:
>
> If we extended integrated -mem-path with -numa such that a different
> path could be used with each numa node (and we let an explicit file be
> specified instead of just a directory), then if I understand
> correctly, we could use numactl without any specific integration in
> qemu.  Does this sound correct?

It's a bit tricky to coordinate because numactl policy only helps
before first fault (unless you want to migrate, but that has more
overhead), and if you run the numactl in parallel with qemu
you never know who faults first. So you would need another step in
precreating the files before starting qemu.

Another issue with using tmpfs this way is that you first need to resize
it to be larger than 0.5*RAM.  So more configuration hazzle.

Overall it would be rather a lot of steps this way. I guess
most people would put it into a wrapper, but why not have
that wrapper in qemu directly? Supporting interleave too
would be rather straight forward.

Also a lot of things you could do with numactl on shm you 
can be also done after the fact with cpusets.

-Andi 

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-24 11:34         ` Andre Przywara
  2010-06-24 11:42           ` Avi Kivity
@ 2010-06-25 11:00           ` Jes Sorensen
  2010-06-25 11:06             ` Andre Przywara
  1 sibling, 1 reply; 20+ messages in thread
From: Jes Sorensen @ 2010-06-25 11:00 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Avi Kivity, Alexander Graf, Anthony Liguori, kvm

On 06/24/10 13:34, Andre Przywara wrote:
> Avi Kivity wrote:
>> On 06/24/2010 01:58 PM, Andre Przywara wrote:
>> Non-anonymous memory doesn't work well with ksm and transparent
>> hugepages.  Is it possible to use anonymous memory rather than file
>> backed?
> I'd prefer non-file backed, too. But that is how the current huge pages
> implementation is done. We could use MAP_HUGETLB and declare NUMA _and_
> huge pages as 2.6.32+ only. Unfortunately I didn't find an easy way to
> detect the presence of the MAP_HUGETLB flag. If the kernel does not
> support it, it seems that mmap silently ignores it and uses 4KB pages
> instead.

Bit behind on the mailing list, but I think this look very promising.

I really think it makes more sense to make QEMU aware of the NUMA setup
as well, rather than relying on numctl to do the work outside.

One thing you need to consider is what happens with migration once a
user specifies -numa. IMHO it is acceptable to simply disable migration
for the given guest.

Cheers,
Jes

PS: Are you planning on submitting anything to Linux Plumbers Conference
about this? :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-25 11:00           ` Jes Sorensen
@ 2010-06-25 11:06             ` Andre Przywara
  2010-06-25 11:37               ` Jes Sorensen
  0 siblings, 1 reply; 20+ messages in thread
From: Andre Przywara @ 2010-06-25 11:06 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Avi Kivity, Alexander Graf, Anthony Liguori, kvm

Jes Sorensen wrote:
> On 06/24/10 13:34, Andre Przywara wrote:
>> Avi Kivity wrote:
>>> On 06/24/2010 01:58 PM, Andre Przywara wrote:
>>> Non-anonymous memory doesn't work well with ksm and transparent
>>> hugepages.  Is it possible to use anonymous memory rather than file
>>> backed?
>> I'd prefer non-file backed, too. But that is how the current huge pages
>> implementation is done. We could use MAP_HUGETLB and declare NUMA _and_
>> huge pages as 2.6.32+ only. Unfortunately I didn't find an easy way to
>> detect the presence of the MAP_HUGETLB flag. If the kernel does not
>> support it, it seems that mmap silently ignores it and uses 4KB pages
>> instead.
> 
> Bit behind on the mailing list, but I think this look very promising.
> 
> I really think it makes more sense to make QEMU aware of the NUMA setup
> as well, rather than relying on numctl to do the work outside.
> 
> One thing you need to consider is what happens with migration once a
> user specifies -numa. IMHO it is acceptable to simply disable migration
> for the given guest.
Is that really a problem? You create the guest on the target with a NUMA 
setup specific to the target machine. That could mean that you pin 
multiple guest nodes to the same host node, but that shouldn't break 
something, right? The guest part can (and should be!) migrated along 
with all the other device state. I think this is still missing from the 
current implementation.

> 
> Cheers,
> Jes
> 
> PS: Are you planning on submitting anything to Linux Plumbers Conference
> about this? :)
Yes, I was planning to submit a proposal, as I saw NUMA mentioned in the 
topics list. AFAIK the deadline is July 19th, right? That gives me 
another week after my vacation (for which I leave in a few minutes).

Regards,
Andre.


-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-25 11:06             ` Andre Przywara
@ 2010-06-25 11:37               ` Jes Sorensen
  0 siblings, 0 replies; 20+ messages in thread
From: Jes Sorensen @ 2010-06-25 11:37 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Avi Kivity, Alexander Graf, Anthony Liguori, kvm

On 06/25/10 13:06, Andre Przywara wrote:
> Jes Sorensen wrote:
>> On 06/24/10 13:34, Andre Przywara wrote:
>> I really think it makes more sense to make QEMU aware of the NUMA setup
>> as well, rather than relying on numctl to do the work outside.
>>
>> One thing you need to consider is what happens with migration once a
>> user specifies -numa. IMHO it is acceptable to simply disable migration
>> for the given guest.
> Is that really a problem? You create the guest on the target with a NUMA
> setup specific to the target machine. That could mean that you pin
> multiple guest nodes to the same host node, but that shouldn't break
> something, right? The guest part can (and should be!) migrated along
> with all the other device state. I think this is still missing from the
> current implementation.

It may be hard to guarantee the memory layout on the target machine it
may have a completely different topology. The numa bindings ought to go
into the state and be checked against the target machine's state, but
for instance you could be trying to bind things to node 7-8 on the first
host while migration target only has 2 nodes, but plenty of memory. Or
you use mode nodes on the first host than you have on the second. It's a
very complicated matrix to try and match.

>> PS: Are you planning on submitting anything to Linux Plumbers Conference
>> about this? :)
> Yes, I was planning to submit a proposal, as I saw NUMA mentioned in the
> topics list. AFAIK the deadline is July 19th, right? That gives me
> another week after my vacation (for which I leave in a few minutes).

Excellent! yes it should still by July 19th.

Enjoy your vacation!

Jes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-24 11:12       ` Avi Kivity
  2010-06-24 11:34         ` Andre Przywara
@ 2010-06-28 16:17         ` Anthony Liguori
  2010-06-29  9:48           ` Avi Kivity
  1 sibling, 1 reply; 20+ messages in thread
From: Anthony Liguori @ 2010-06-28 16:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andre Przywara, Alexander Graf, kvm

On 06/24/2010 06:12 AM, Avi Kivity wrote:
> On 06/24/2010 01:58 PM, Andre Przywara wrote:
>>> So who would create the /dev/shm/nodeXX files?
>>
>> Currently it is QEMU. It creates a somewhat unique filename, opens 
>> and unlinks it. The difference would be to name the file after the 
>> option and to not unlink it.
>>
>> > I can imagine starting numactl before qemu, even though that's
>> > cumbersome. I don't think it's feasible to start numactl after
>> > qemu is running. That'd involve way too much magic that I'd prefer
>> > qemu to call numactl itself.
>> Using the current code the files would not exist before QEMU 
>> allocated RAM, and after that it could already touch pages before 
>> numactl set the policy.
>
> Non-anonymous memory doesn't work well with ksm and transparent 
> hugepages.  Is it possible to use anonymous memory rather than file 
> backed?

You aren't going to be doing NUMA pinning and KSM AFAICT.

Regards,

Anthony Liguori

>> To avoid this I'd like to see the pinning done from within QEMU. I am 
>> not sure whether calling numactl via system() and friends is OK, I'd 
>> prefer to run the syscalls directly (like in patch 3/3) and pull the 
>> necessary options into the -numa pin,... command line. We could mimic 
>> numactl's syntax here.
>
> Definitely not use system(), but IIRC numactl has a library interface?
>
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-24 11:42           ` Avi Kivity
@ 2010-06-28 16:20             ` Anthony Liguori
  2010-06-28 16:26               ` Alexander Graf
  2010-06-29  9:46               ` Avi Kivity
  0 siblings, 2 replies; 20+ messages in thread
From: Anthony Liguori @ 2010-06-28 16:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andre Przywara, Alexander Graf, kvm

On 06/24/2010 06:42 AM, Avi Kivity wrote:
> On 06/24/2010 02:34 PM, Andre Przywara wrote:
>>> Non-anonymous memory doesn't work well with ksm and transparent 
>>> hugepages.  Is it possible to use anonymous memory rather than file 
>>> backed?
>>
>> I'd prefer non-file backed, too. But that is how the current huge 
>> pages implementation is done. We could use MAP_HUGETLB and declare 
>> NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find an 
>> easy way to detect the presence of the MAP_HUGETLB flag. If the 
>> kernel does not support it, it seems that mmap silently ignores it 
>> and uses 4KB pages instead.
>
> That sucks, unfortunately it is normal practice.  However it is a soft 
> failure, everything works just a bit slower.  So it's probably 
> acceptable.
>
>>>> To avoid this I'd like to see the pinning done from within QEMU. I 
>>>> am not sure whether calling numactl via system() and friends is OK, 
>>>> I'd prefer to run the syscalls directly (like in patch 3/3) and 
>>>> pull the necessary options into the -numa pin,... command line. We 
>>>> could mimic numactl's syntax here.
>>>
>>> Definitely not use system(), but IIRC numactl has a library interface?
>> Right, that is what I include in patch 3/3 and use. I got the 
>> impression Anthony wanted to avoid reimplementing parts of numactl, 
>> especially enabling the full flexibility of the command line 
>> interface (like specifying nodes, policies and interleaving).
>> I want QEMU to use the library and pull the necessary options into 
>> the -numa pin,... parsing, even if this means duplicating numactl 
>> functionality.
>>
>
> I agree with that.  It's a lot easier to use a single tool than to try 
> to integrate things yourself, the unix tradition of grep | sort | uniq 
> -c | sort -n notwithstanding.  Especially when one of the tools is qemu.

I could disagree more here.  This is why we don't support CPU pinning 
and instead provide PID information for each VCPU thread.

The folks that want to use pinning are not notice users.  They are not 
going to be happy unless you can make full use of existing tools.  That 
means replicating all of numactl's functionality (which is not what the 
current patches do) or enable numactl to be used with a guest.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-28 16:20             ` Anthony Liguori
@ 2010-06-28 16:26               ` Alexander Graf
  2010-06-29  9:46               ` Avi Kivity
  1 sibling, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2010-06-28 16:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, Andre Przywara, kvm

Anthony Liguori wrote:
> On 06/24/2010 06:42 AM, Avi Kivity wrote:
>> On 06/24/2010 02:34 PM, Andre Przywara wrote:
>>>> Non-anonymous memory doesn't work well with ksm and transparent
>>>> hugepages.  Is it possible to use anonymous memory rather than file
>>>> backed?
>>>
>>> I'd prefer non-file backed, too. But that is how the current huge
>>> pages implementation is done. We could use MAP_HUGETLB and declare
>>> NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find
>>> an easy way to detect the presence of the MAP_HUGETLB flag. If the
>>> kernel does not support it, it seems that mmap silently ignores it
>>> and uses 4KB pages instead.
>>
>> That sucks, unfortunately it is normal practice.  However it is a
>> soft failure, everything works just a bit slower.  So it's probably
>> acceptable.
>>
>>>>> To avoid this I'd like to see the pinning done from within QEMU. I
>>>>> am not sure whether calling numactl via system() and friends is
>>>>> OK, I'd prefer to run the syscalls directly (like in patch 3/3)
>>>>> and pull the necessary options into the -numa pin,... command
>>>>> line. We could mimic numactl's syntax here.
>>>>
>>>> Definitely not use system(), but IIRC numactl has a library interface?
>>> Right, that is what I include in patch 3/3 and use. I got the
>>> impression Anthony wanted to avoid reimplementing parts of numactl,
>>> especially enabling the full flexibility of the command line
>>> interface (like specifying nodes, policies and interleaving).
>>> I want QEMU to use the library and pull the necessary options into
>>> the -numa pin,... parsing, even if this means duplicating numactl
>>> functionality.
>>>
>>
>> I agree with that.  It's a lot easier to use a single tool than to
>> try to integrate things yourself, the unix tradition of grep | sort |
>> uniq -c | sort -n notwithstanding.  Especially when one of the tools
>> is qemu.
>
> I could disagree more here.  This is why we don't support CPU pinning
> and instead provide PID information for each VCPU thread.
>
> The folks that want to use pinning are not notice users.  They are not
> going to be happy unless you can make full use of existing tools. 
> That means replicating all of numactl's functionality (which is not
> what the current patches do) or enable numactl to be used with a guest.

So how about some QMP plumbing that would allow numactl to create the
VMs at defined ranges? So you'd basically get numactl --run-qemu --
qemu-kvm -blah -foo

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-28 16:20             ` Anthony Liguori
  2010-06-28 16:26               ` Alexander Graf
@ 2010-06-29  9:46               ` Avi Kivity
  1 sibling, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2010-06-29  9:46 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Andre Przywara, Alexander Graf, kvm

On 06/28/2010 07:20 PM, Anthony Liguori wrote:
>>
>>>>> To avoid this I'd like to see the pinning done from within QEMU. I 
>>>>> am not sure whether calling numactl via system() and friends is 
>>>>> OK, I'd prefer to run the syscalls directly (like in patch 3/3) 
>>>>> and pull the necessary options into the -numa pin,... command 
>>>>> line. We could mimic numactl's syntax here.
>>>>
>>>> Definitely not use system(), but IIRC numactl has a library interface?
>>> Right, that is what I include in patch 3/3 and use. I got the 
>>> impression Anthony wanted to avoid reimplementing parts of numactl, 
>>> especially enabling the full flexibility of the command line 
>>> interface (like specifying nodes, policies and interleaving).
>>> I want QEMU to use the library and pull the necessary options into 
>>> the -numa pin,... parsing, even if this means duplicating numactl 
>>> functionality.
>>>
>>
>> I agree with that.  It's a lot easier to use a single tool than to 
>> try to integrate things yourself, the unix tradition of grep | sort | 
>> uniq -c | sort -n notwithstanding.  Especially when one of the tools 
>> is qemu.
>
>
> I could disagree more here.  This is why we don't support CPU pinning 
> and instead provide PID information for each VCPU thread.

Good point.  That also allows setting priority, etc.

>
> The folks that want to use pinning are not notice users.  They are not 
> going to be happy unless you can make full use of existing tools.  
> That means replicating all of numactl's functionality (which is not 
> what the current patches do) or enable numactl to be used with a guest.
>

Yeah.  Unfortunately, that also forces us to use non-anonymous memory.  
So it isn't just where to put the functionality, it also has side effects.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3][RFC] NUMA: add host side pinning
  2010-06-28 16:17         ` Anthony Liguori
@ 2010-06-29  9:48           ` Avi Kivity
  0 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2010-06-29  9:48 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Andre Przywara, Alexander Graf, kvm

On 06/28/2010 07:17 PM, Anthony Liguori wrote:
> On 06/24/2010 06:12 AM, Avi Kivity wrote:
>> On 06/24/2010 01:58 PM, Andre Przywara wrote:
>>>> So who would create the /dev/shm/nodeXX files?
>>>
>>> Currently it is QEMU. It creates a somewhat unique filename, opens 
>>> and unlinks it. The difference would be to name the file after the 
>>> option and to not unlink it.
>>>
>>> > I can imagine starting numactl before qemu, even though that's
>>> > cumbersome. I don't think it's feasible to start numactl after
>>> > qemu is running. That'd involve way too much magic that I'd prefer
>>> > qemu to call numactl itself.
>>> Using the current code the files would not exist before QEMU 
>>> allocated RAM, and after that it could already touch pages before 
>>> numactl set the policy.
>>
>> Non-anonymous memory doesn't work well with ksm and transparent 
>> hugepages.  Is it possible to use anonymous memory rather than file 
>> backed?
>
> You aren't going to be doing NUMA pinning and KSM AFAICT.

What about transparent hugepages?


Conceptually, all of this belongs in the scheduler, so whatever we do 
ends up a poorly integrated hack.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-06-29  9:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-23 21:09 [PATCH 0/3][RFC] NUMA: add host side pinning Andre Przywara
2010-06-23 21:09 ` [PATCH 1/3] NUMA: add Linux libnuma detection Andre Przywara
2010-06-23 21:09 ` [PATCH 2/3] NUMA: add parsing of host NUMA pin option Andre Przywara
2010-06-23 21:09 ` [PATCH 3/3] NUMA: realize NUMA memory pinning Andre Przywara
2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
2010-06-23 22:29   ` Alexander Graf
2010-06-24 10:58     ` Andre Przywara
2010-06-24 11:12       ` Avi Kivity
2010-06-24 11:34         ` Andre Przywara
2010-06-24 11:42           ` Avi Kivity
2010-06-28 16:20             ` Anthony Liguori
2010-06-28 16:26               ` Alexander Graf
2010-06-29  9:46               ` Avi Kivity
2010-06-25 11:00           ` Jes Sorensen
2010-06-25 11:06             ` Andre Przywara
2010-06-25 11:37               ` Jes Sorensen
2010-06-28 16:17         ` Anthony Liguori
2010-06-29  9:48           ` Avi Kivity
2010-06-24  6:44   ` Andre Przywara
2010-06-24 13:14   ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.