[Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
@ 2013-07-04  9:53 Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

As you know, QEMU can't direct it's memory allocation now, this may cause
guest cross node access performance regression.
And, the worse thing is that if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policy before
the pages are really mapped.

According to this patch set, we are able to set guest nodes memory policy
like following:

 -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1

This supports "mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}" like format.

And patch 8/10 adds a QMP command "set-mpol" to set the memory policy for every
guest nodes:
    set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1

And patch 9/10 adds a monitor command "set-mpol" whose format like:
    set-mpol 0 mem-policy=membind,mem-hostnode=0-1

And with patch 10/10, we can get the current memory policy of each guest node
using monitor command "info numa", for example:

    (qemu) info numa
    2 nodes
    node 0 cpus: 0
    node 0 size: 1024 MB
    node 0 mempolicy: membind=0,1
    node 1 cpus: 1
    node 1 size: 1024 MB
    node 1 mempolicy: interleave=1


V1->V2:
    change to use QemuOpts in numa options (Paolo)
    handle Error in mpol parser (Paolo)
    change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
V2->V3:
    also handle Error in cpus parser (5/10)
    split out common parser from cpus and hostnode parser (Bandan 6/10)
V3-V4:
    rebase to request for comments


Bandan Das (1):
  NUMA: Support multiple CPU ranges on -numa option

Wanlong Gao (9):
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: Add Linux libnuma detection
  NUMA: parse guest numa nodes memory policy
  NUMA: handle Error in cpus, mpol and hostnode parser
  NUMA: split out the common range parser
  NUMA: set guest numa nodes memory policy
  NUMA: add qmp command set-mpol to set memory policy for NUMA node
  NUMA: add hmp command set-mpol
  NUMA: show host memory policy info in info numa command

 configure               |  32 ++++++
 cpus.c                  | 143 +++++++++++++++++++++++-
 hmp-commands.hx         |  16 +++
 hmp.c                   |  35 ++++++
 hmp.h                   |   1 +
 hw/i386/pc.c            |   4 +-
 hw/net/eepro100.c       |   1 -
 include/sysemu/sysemu.h |  20 +++-
 monitor.c               |  44 +++++++-
 qapi-schema.json        |  15 +++
 qemu-options.hx         |   3 +-
 qmp-commands.hx         |  35 ++++++
 vl.c                    | 285 +++++++++++++++++++++++++++++++++++-------------
 13 files changed, 553 insertions(+), 81 deletions(-)

-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-05 18:41   ` Eduardo Habkost
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

From: Bandan Das <bsd@redhat.com>

This allows us to use the "cpus" property multiple times
to specify multiple cpu (ranges) to the -numa option :

-numa node,cpus=1,cpus=2,cpus=4
or
-numa node,cpus=1-3,cpus=5

Note that after this patch, the defalut suffix of "-numa node,mem=N"
will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
when assigning "N MB" of node memory size.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 qemu-options.hx |   3 +-
 vl.c            | 108 ++++++++++++++++++++++++++++++++++----------------------
 2 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 137a39b..449cf36 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -100,7 +100,8 @@ STEXI
 @item -numa @var{opts}
 @findex -numa
 Simulate a multi node NUMA system. If mem and cpus are omitted, resources
-are split equally.
+are split equally. The "-cpus" property may be specified multiple times
+to denote multiple cpus or cpu ranges.
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
diff --git a/vl.c b/vl.c
index 6d9fd7d..6f2e17a 100644
--- a/vl.c
+++ b/vl.c
@@ -516,6 +516,32 @@ static QemuOptsList qemu_realtime_opts = {
     },
 };
 
+static QemuOptsList qemu_numa_opts = {
+    .name = "numa",
+    .implied_opt_name = "type",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+    .desc = {
+        {
+            .name = "type",
+            .type = QEMU_OPT_STRING,
+            .help = "node type"
+        },{
+            .name = "nodeid",
+            .type = QEMU_OPT_NUMBER,
+            .help = "node ID"
+        },{
+            .name = "mem",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size"
+        },{
+            .name = "cpus",
+            .type = QEMU_OPT_STRING,
+            .help = "cpu number or range"
+        },
+        { /* end of list */ }
+    },
+};
+
 const char *qemu_get_vm_name(void)
 {
     return qemu_name;
@@ -1349,56 +1375,37 @@ error:
     exit(1);
 }
 
-static void numa_add(const char *optarg)
+
+static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
+    int *nodenr = opaque;
 
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
+    if (!strcmp(name, "cpu")) {
+        numa_node_parse_cpus(*nodenr, value);
     }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
+    return 0;
+}
 
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
+static int numa_init_func(QemuOpts *opts, void *opaque)
+{
+    uint64_t nodenr, mem_size;
 
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
+    nodenr = qemu_opt_get_number(opts, "nodeid", nb_numa_nodes++);
 
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            node_mem[nodenr] = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
+    if (nodenr >= MAX_NODES) {
+        fprintf(stderr, "qemu: Max number of NUMA nodes reached : %d\n",
+                (int)nodenr);
         exit(1);
     }
+
+    mem_size = qemu_opt_get_size(opts, "mem", 0);
+    node_mem[nodenr] = mem_size;
+
+    if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
+        return -1;
+    }
+
+    return 0;
 }
 
 static QemuOptsList qemu_smp_opts = {
@@ -2933,6 +2940,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_object_opts);
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
+    qemu_add_opts(&qemu_numa_opts);
 
     runstate_init();
 
@@ -3119,7 +3127,16 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_numa:
-                numa_add(optarg);
+                olist = qemu_find_opts("numa");
+                opts = qemu_opts_parse(olist, optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
+                optarg = qemu_opt_get(opts, "type");
+                if (!optarg || strcmp(optarg, "node")) {
+                    fprintf(stderr, "qemu: Incorrect format for numa option\n");
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_display:
                 display_type = select_display(optarg);
@@ -4195,6 +4212,11 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
 
+    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
     if (nb_numa_nodes > 0) {
         int i;
 
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-05 19:32   ` Eduardo Habkost
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 03/10] NUMA: Add Linux libnuma detection Wanlong Gao
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c                  |  2 +-
 hw/i386/pc.c            |  4 ++--
 hw/net/eepro100.c       |  1 -
 include/sysemu/sysemu.h |  8 ++++++--
 monitor.c               |  2 +-
 vl.c                    | 24 ++++++++++++------------
 6 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/cpus.c b/cpus.c
index 20958e5..496d5ce 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1180,7 +1180,7 @@ void set_numa_modes(void)
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         cpu = ENV_GET_CPU(env);
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
                 cpu->numa_node = i;
             }
         }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 78f92e2..78b5a72 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -650,14 +650,14 @@ static FWCfgState *bochs_bios_init(void)
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + apic_id_limit + nb_numa_nodes) *
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index dc99ea6..478c688 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -105,7 +105,6 @@
 #define PCI_IO_SIZE             64
 #define PCI_FLASH_SIZE          (128 * KiB)
 
-#define BIT(n) (1 << (n))
 #define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
 
 /* The SCB accepts the following controls for the Tx and Rx units: */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 2fb71af..70fd2ed 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 #include "qapi-types.h"
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
+#include "qemu/bitmap.h"
 
 /* vl.c */
 
@@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock;
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
-extern uint64_t node_mem[MAX_NODES];
-extern unsigned long *node_cpumask[MAX_NODES];
+struct node_info {
+    uint64_t node_mem;
+    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+};
+extern struct node_info numa_info[MAX_NODES];
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/monitor.c b/monitor.c
index 9be515c..93ac045 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1820,7 +1820,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            node_mem[i] >> 20);
+            numa_info[i].node_mem >> 20);
     }
 }
 
diff --git a/vl.c b/vl.c
index 6f2e17a..5207b8e 100644
--- a/vl.c
+++ b/vl.c
@@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
-uint64_t node_mem[MAX_NODES];
-unsigned long *node_cpumask[MAX_NODES];
+struct node_info numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
 
@@ -1367,7 +1366,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
         goto error;
     }
 
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return;
 
 error:
@@ -1399,7 +1398,7 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
     }
 
     mem_size = qemu_opt_get_size(opts, "mem", 0);
-    node_mem[nodenr] = mem_size;
+    numa_info[nodenr].node_mem = mem_size;
 
     if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
         return -1;
@@ -2961,8 +2960,8 @@ int main(int argc, char **argv, char **envp)
     translation = BIOS_ATA_TRANSLATION_AUTO;
 
     for (i = 0; i < MAX_NODES; i++) {
-        node_mem[i] = 0;
-        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
+        numa_info[i].node_mem = 0;
+        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
     }
 
     nb_numa_nodes = 0;
@@ -4228,7 +4227,7 @@ int main(int argc, char **argv, char **envp)
          * and distribute the available memory equally across all nodes
          */
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
+            if (numa_info[i].node_mem != 0)
                 break;
         }
         if (i == nb_numa_nodes) {
@@ -4238,14 +4237,15 @@ int main(int argc, char **argv, char **envp)
              * the final node gets the rest.
              */
             for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
+                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
+                                        ~((1 << 23UL) - 1);
+                usedmem += numa_info[i].node_mem;
             }
-            node_mem[i] = ram_size - usedmem;
+            numa_info[i].node_mem = ram_size - usedmem;
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
         }
@@ -4255,7 +4255,7 @@ int main(int argc, char **argv, char **envp)
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
             }
         }
     }
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 03/10] NUMA: Add Linux libnuma detection
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command line,
default is use if available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 configure | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/configure b/configure
index 0e0adde..9d3b4ce 100755
--- a/configure
+++ b/configure
@@ -242,6 +242,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+numa=""
 
 # parse CC options first
 for opt do
@@ -944,6 +945,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1158,6 +1163,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --disable-numa           disable libnuma support"
+echo "  --enable-numa            enable libnuma support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2389,6 +2396,27 @@ EOF
 fi
 
 ##########################################
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "linux NUMA (install numactl?)"
+    fi
+    numa=no
+  fi
+fi
+
+##########################################
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -3557,6 +3585,7 @@ echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3590,6 +3619,9 @@ echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 echo "ARCH=$ARCH" >> $config_host_mak
 
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 04/10] NUMA: parse guest numa nodes memory policy
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (2 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 03/10] NUMA: Add Linux libnuma detection Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 05/10] NUMA: handle Error in cpus, mpol and hostnode parser Wanlong Gao
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

The memory policy setting format is like:
mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}
And we are adding this setting as a suboption of "-numa",
the memory policy then can be set like following:
 -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=!1

Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |   8 ++++
 vl.c                    | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 118 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 70fd2ed..993b8e0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -130,10 +130,18 @@ extern QEMUClock *rtc_clock;
 
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
+#define NODE_HOST_NONE        0x00
+#define NODE_HOST_BIND        0x01
+#define NODE_HOST_INTERLEAVE  0x02
+#define NODE_HOST_PREFERRED   0x03
+#define NODE_HOST_POLICY_MASK 0x03
+#define NODE_HOST_RELATIVE    0x04
 extern int nb_numa_nodes;
 struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+    unsigned int flags;
 };
 extern struct node_info numa_info[MAX_NODES];
 
diff --git a/vl.c b/vl.c
index 5207b8e..495b3a8 100644
--- a/vl.c
+++ b/vl.c
@@ -536,6 +536,14 @@ static QemuOptsList qemu_numa_opts = {
             .name = "cpus",
             .type = QEMU_OPT_STRING,
             .help = "cpu number or range"
+        },{
+            .name = "mem-policy",
+            .type = QEMU_OPT_STRING,
+            .help = "memory policy"
+        },{
+            .name = "mem-hostnode",
+            .type = QEMU_OPT_STRING,
+            .help = "host node number or range for memory policy"
         },
         { /* end of list */ }
     },
@@ -1374,6 +1382,79 @@ error:
     exit(1);
 }
 
+static void numa_node_parse_mpol(int nodenr, const char *mpol)
+{
+    if (!mpol) {
+        return;
+    }
+
+    if (!strcmp(mpol, "interleave")) {
+        numa_info[nodenr].flags |= NODE_HOST_INTERLEAVE;
+    } else if (!strcmp(mpol, "preferred")) {
+        numa_info[nodenr].flags |= NODE_HOST_PREFERRED;
+    } else if (!strcmp(mpol, "membind")) {
+        numa_info[nodenr].flags |= NODE_HOST_BIND;
+    } else {
+        fprintf(stderr, "qemu: Invalid memory policy: %s\n", mpol);
+    }
+}
+
+static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
+{
+    unsigned long long value, endvalue;
+    char *endptr;
+    bool clear = false;
+    unsigned long *bm = numa_info[nodenr].host_mem;
+
+    if (hostnode[0] == '!') {
+        clear = true;
+        bitmap_fill(bm, MAX_CPUMASK_BITS);
+        hostnode++;
+    }
+    if (hostnode[0] == '+') {
+        numa_info[nodenr].flags |= NODE_HOST_RELATIVE;
+        hostnode++;
+    }
+
+    if (!strcmp(hostnode, "all")) {
+        bitmap_fill(bm, MAX_CPUMASK_BITS);
+        return;
+    }
+
+    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
+        goto error;
+    if (*endptr == '-') {
+        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+            goto error;
+        }
+    } else if (*endptr == '\0') {
+        endvalue = value;
+    } else {
+        goto error;
+    }
+
+    if (endvalue >= MAX_CPUMASK_BITS) {
+        endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr,
+            "qemu: NUMA: A max of %d host nodes are supported\n",
+             MAX_CPUMASK_BITS);
+    }
+
+    if (endvalue < value) {
+        goto error;
+    }
+
+    if (clear)
+        bitmap_clear(bm, value, endvalue - value + 1);
+    else
+        bitmap_set(bm, value, endvalue - value + 1);
+
+    return;
+
+error:
+    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnode);
+    return;
+}
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
@@ -1385,6 +1466,25 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque)
     return 0;
 }
 
+static int numa_add_mpol(const char *name, const char *value, void *opaque)
+{
+    int *nodenr = opaque;
+
+    if (!strcmp(name, "mem-policy")) {
+        numa_node_parse_mpol(*nodenr, value);
+    }
+    return 0;
+}
+
+static int numa_add_hostnode(const char *name, const char *value, void *opaque)
+{
+    int *nodenr = opaque;
+    if (!strcmp(name, "mem-hostnode")) {
+        numa_node_parse_hostnode(*nodenr, value);
+    }
+    return 0;
+}
+
 static int numa_init_func(QemuOpts *opts, void *opaque)
 {
     uint64_t nodenr, mem_size;
@@ -1404,6 +1504,14 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
         return -1;
     }
 
+    if (qemu_opt_foreach(opts, numa_add_mpol, &nodenr, 1) < 0) {
+        return -1;
+    }
+
+    if (qemu_opt_foreach(opts, numa_add_hostnode, &nodenr, 1) < 0) {
+        return -1;
+    }
+
     return 0;
 }
 
@@ -2962,6 +3070,8 @@ int main(int argc, char **argv, char **envp)
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_mem = 0;
         bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+        bitmap_zero(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        numa_info[i].flags = NODE_HOST_NONE;
     }
 
     nb_numa_nodes = 0;
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 05/10] NUMA: handle Error in cpus, mpol and hostnode parser
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (3 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 06/10] NUMA: split out the common range parser Wanlong Gao
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

As Paolo pointed out that, handle Error in mpol and hostnode parser
will make it easier to be used for example in mem-hotplug in the future.
And this will be used later in set-mpol QMP command.
Also handle Error in cpus parser to be consistent with others.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  4 ++++
 vl.c                    | 42 ++++++++++++++++++++++++++++++++----------
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 993b8e0..0f135fe 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -144,6 +144,10 @@ struct node_info {
     unsigned int flags;
 };
 extern struct node_info numa_info[MAX_NODES];
+extern void numa_node_parse_mpol(int nodenr, const char *hostnode,
+                                 Error **errp);
+extern void numa_node_parse_hostnode(int nodenr, const char *hostnode,
+                                     Error **errp);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 495b3a8..38e0d3d 100644
--- a/vl.c
+++ b/vl.c
@@ -1338,7 +1338,7 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
+static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
 {
     char *endptr;
     unsigned long long value, endvalue;
@@ -1378,13 +1378,14 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
     return;
 
 error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
+    error_setg(errp, "Invalid NUMA CPU range: %s\n", cpus);
+    return;
 }
 
-static void numa_node_parse_mpol(int nodenr, const char *mpol)
+void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp)
 {
     if (!mpol) {
+        error_setg(errp, "Should specify memory policy");
         return;
     }
 
@@ -1395,11 +1396,11 @@ static void numa_node_parse_mpol(int nodenr, const char *mpol)
     } else if (!strcmp(mpol, "membind")) {
         numa_info[nodenr].flags |= NODE_HOST_BIND;
     } else {
-        fprintf(stderr, "qemu: Invalid memory policy: %s\n", mpol);
+        error_setg(errp, "Invalid memory policy: %s", mpol);
     }
 }
 
-static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
+void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
 {
     unsigned long long value, endvalue;
     char *endptr;
@@ -1452,16 +1453,22 @@ static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
     return;
 
 error:
-    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnode);
+    error_setg(errp, "Invalid host NUMA nodes range: %s", hostnode);
     return;
 }
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
     int *nodenr = opaque;
+    Error *err = NULL;
 
     if (!strcmp(name, "cpu")) {
-        numa_node_parse_cpus(*nodenr, value);
+        numa_node_parse_cpus(*nodenr, value, &err);
+    }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        return -1;
     }
     return 0;
 }
@@ -1469,19 +1476,34 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque)
 static int numa_add_mpol(const char *name, const char *value, void *opaque)
 {
     int *nodenr = opaque;
+    Error *err = NULL;
 
     if (!strcmp(name, "mem-policy")) {
-        numa_node_parse_mpol(*nodenr, value);
+        numa_node_parse_mpol(*nodenr, value, &err);
+    }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        return -1;
     }
+
     return 0;
 }
 
 static int numa_add_hostnode(const char *name, const char *value, void *opaque)
 {
     int *nodenr = opaque;
+    Error *err = NULL;
+
     if (!strcmp(name, "mem-hostnode")) {
-        numa_node_parse_hostnode(*nodenr, value);
+        numa_node_parse_hostnode(*nodenr, value, &err);
     }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        return -1;
+    }
+
     return 0;
 }
 
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 06/10] NUMA: split out the common range parser
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (4 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 05/10] NUMA: handle Error in cpus, mpol and hostnode parser Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 07/10] NUMA: set guest numa nodes memory policy Wanlong Gao
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

Since cpus parser and hostnode parser have the common range parser
part, split it out to the common range parser to avoid the duplicate
code.

Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 vl.c | 89 ++++++++++++++++++++++++++++----------------------------------------
 1 file changed, 37 insertions(+), 52 deletions(-)

diff --git a/vl.c b/vl.c
index 38e0d3d..6e86dcf 100644
--- a/vl.c
+++ b/vl.c
@@ -1338,47 +1338,55 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
+static int numa_node_parse_common(const char *str,
+                                  unsigned long long *value,
+                                  unsigned long long *endvalue)
 {
     char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
+    if (parse_uint(str, value, &endptr, 10) < 0) {
+        return -1;
     }
 
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
     if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
+        if (parse_uint_full(endptr + 1, endvalue, 10) < 0) {
+           return -1;
         }
     } else if (*endptr == '\0') {
-        endvalue = value;
+        *endvalue = *value;
     } else {
-        goto error;
+        return -1;
     }
 
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
+    if (*endvalue >= MAX_CPUMASK_BITS) {
+        *endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr, "qemu: NUMA: A max number %d is supported\n",
+                MAX_CPUMASK_BITS);
     }
 
-    if (endvalue < value) {
-        goto error;
+    if (*endvalue < *value) {
+        return -1;
     }
 
-    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
-    return;
+    return 0;
+}
 
-error:
-    error_setg(errp, "Invalid NUMA CPU range: %s\n", cpus);
+static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
+{
+    unsigned long long value, endvalue;
+
+    /* Empty CPU range strings will be considered valid, they will simply
+     * not set any bit in the CPU bitmap.
+     */
+    if (!*cpus) {
+        return;
+    }
+
+    if (numa_node_parse_common(cpus, &value, &endvalue) < 0) {
+        error_setg(errp, "Invalid NUMA CPU range: %s", cpus);
+        return;
+    }
+
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return;
 }
 
@@ -1403,7 +1411,6 @@ void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp)
 void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
 {
     unsigned long long value, endvalue;
-    char *endptr;
     bool clear = false;
     unsigned long *bm = numa_info[nodenr].host_mem;
 
@@ -1422,27 +1429,9 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
         return;
     }
 
-    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
-        goto error;
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d host nodes are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
+    if (numa_node_parse_common(hostnode, &value, &endvalue) < 0) {
+        error_setg(errp, "Invalid host NUMA ndoes range: %s", hostnode);
+        return;
     }
 
     if (clear)
@@ -1451,10 +1440,6 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
         bitmap_set(bm, value, endvalue - value + 1);
 
     return;
-
-error:
-    error_setg(errp, "Invalid host NUMA nodes range: %s", hostnode);
-    return;
 }
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 07/10] NUMA: set guest numa nodes memory policy
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (5 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 06/10] NUMA: split out the common range parser Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policies before
the pages are really mapped.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/cpus.c b/cpus.c
index 496d5ce..7240de7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -60,6 +60,15 @@
 
 #endif /* CONFIG_LINUX */
 
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+#endif
+
 static CPUArchState *next_cpu;
 
 static bool cpu_thread_is_idle(CPUState *cpu)
@@ -1171,6 +1180,75 @@ static void tcg_exec_all(void)
     exit_request = 0;
 }
 
+#ifdef CONFIG_NUMA
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+    int bind_mode;
+
+    switch (numa_info[nodeid].flags & NODE_HOST_POLICY_MASK) {
+    case NODE_HOST_BIND:
+        bind_mode = MPOL_BIND;
+        break;
+    case NODE_HOST_INTERLEAVE:
+        bind_mode = MPOL_INTERLEAVE;
+        break;
+    case NODE_HOST_PREFERRED:
+        bind_mode = MPOL_PREFERRED;
+        break;
+    default:
+        bind_mode = MPOL_DEFAULT;
+        return bind_mode;
+    }
+
+    bind_mode |= (numa_info[nodeid].flags & NODE_HOST_RELATIVE) ?
+        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+    return bind_mode;
+}
+#endif
+
+static int set_node_mpol(unsigned int nodeid)
+{
+#ifdef CONFIG_NUMA
+    void *ram_ptr;
+    RAMBlock *block;
+    ram_addr_t len, ram_offset = 0;
+    int bind_mode;
+    int i;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(block->mr->name, "pc.ram")) {
+            break;
+        }
+    }
+
+    if (block->host == NULL)
+        return -1;
+
+    ram_ptr = block->host;
+    for (i = 0; i < nodeid; i++) {
+        len = numa_info[i].node_mem;
+        ram_offset += len;
+    }
+
+    len = numa_info[i].node_mem;
+    bind_mode = node_parse_bind_mode(i);
+
+    /* This is a workaround for a long standing bug in Linux'
+     * mbind implementation, which cuts off the last specified
+     * node. To stay compatible should this bug be fixed, we
+     * specify one more node and zero this one out.
+     */
+    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
+    if (mbind(ram_ptr + ram_offset, len, bind_mode,
+        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
+            perror("mbind");
+            return -1;
+    }
+#endif
+    return 0;
+}
+
 void set_numa_modes(void)
 {
     CPUArchState *env;
@@ -1185,6 +1263,15 @@ void set_numa_modes(void)
             }
         }
     }
+
+#ifdef CONFIG_NUMA
+    for (i = 0; i < nb_numa_nodes; i++) {
+        if (set_node_mpol(i) == -1) {
+            fprintf(stderr,
+                    "qemu: can't set host memory policy for node%d\n", i);
+        }
+    }
+#endif
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (6 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 07/10] NUMA: set guest numa nodes memory policy Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-08 18:25   ` Luiz Capitulino
  2013-07-08 19:16   ` Eric Blake
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol Wanlong Gao
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

The QMP command let it be able to set node's memory policy
through the QMP protocol. The qmp-shell command is like:
    set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c           | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 15 +++++++++++++++
 qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)

diff --git a/cpus.c b/cpus.c
index 7240de7..ff42b9d 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1417,3 +1417,57 @@ void qmp_inject_nmi(Error **errp)
     error_set(errp, QERR_UNSUPPORTED);
 #endif
 }
+
+void qmp_set_mpol(int64_t nodeid, bool has_mpol, const char *mpol,
+                  bool has_hostnode, const char *hostnode, Error **errp)
+{
+    unsigned int flags;
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+
+    if (nodeid >= nb_numa_nodes) {
+        error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
+        return;
+    }
+
+    bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    flags = numa_info[nodeid].flags;
+
+    numa_info[nodeid].flags = NODE_HOST_NONE;
+    bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+
+    if (!has_mpol) {
+        if (set_node_mpol(nodeid) == -1) {
+            error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+            goto error;
+        }
+        return;
+    }
+
+    numa_node_parse_mpol(nodeid, mpol, errp);
+    if (error_is_set(errp)) {
+        goto error;
+    }
+
+    if (!has_hostnode) {
+        bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    }
+
+    if (hostnode) {
+        numa_node_parse_hostnode(nodeid, hostnode, errp);
+        if (error_is_set(errp)) {
+            goto error;
+        }
+    }
+
+    if (set_node_mpol(nodeid) == -1) {
+        error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+        goto error;
+    }
+
+    return;
+
+error:
+    bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
+    numa_info[nodeid].flags = flags;
+    return;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 5c32528..0870da2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3712,3 +3712,18 @@
             '*cpuid-input-ecx': 'int',
             'cpuid-register': 'X86CPURegister32',
             'features': 'int' } }
+
+# @set-mpol:
+#
+# Set the host memory binding policy for guest NUMA node.
+#
+# @nodeid: The node ID of guest NUMA node to set memory policy to.
+#
+# @mem-policy: The memory policy string to set.
+#
+# @mem-hostnode: The host node or node range for memory policy.
+#
+# Since: 1.6.0
+##
+{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str',
+                                  '*mem-hostnode': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 362f0e1..ccab51b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3043,3 +3043,38 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name      = "set-mpol",
+        .args_type = "nodeid:i,mem-policy:s?,mem-hostnode:s?",
+        .help      = "Set the host memory binding policy for guest NUMA node",
+        .mhandler.cmd_new = qmp_marshal_input_set_mpol,
+    },
+
+SQMP
+set-mpol
+------
+
+Set the host memory binding policy for guest NUMA node
+
+Arguments:
+
+- "nodeid": The nodeid of guest NUMA node to set memory policy to.
+            (json-int)
+- "mem-policy": The memory policy string to set.
+                (json-string, optional)
+- "mem-hostnode": The host nodes contained to mpol.
+                  (json-string, optional)
+
+Example:
+
+-> { "execute": "set-mpol", "arguments": { "nodeid": 0, "mem-policy": "membind",
+                                           "mem-hostnode": "0-1" }}
+<- { "return": {} }
+
+Notes:
+    1. If "mem-policy" is not set, the memory policy of this "nodeid" will be set
+       to "default".
+    2. If "mem-hostnode" is not set, the node mask of this "mpol" will be set
+       to "all".
+EQMP
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (7 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-08 18:32   ` Luiz Capitulino
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

Add hmp command set-mpol to set host memory policy for a guest
NUMA node. Then we can also set node's memory policy using
the monitor command like:
    (qemu) set-mpol 0 mem-policy=membind,mem-hostnode=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp-commands.hx | 16 ++++++++++++++++
 hmp.c           | 35 +++++++++++++++++++++++++++++++++++
 hmp.h           |  1 +
 3 files changed, 52 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 915b0d1..417b69f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1567,6 +1567,22 @@ Executes a qemu-io command on the given block device.
 ETEXI
 
     {
+        .name       = "set-mpol",
+        .args_type  = "nodeid:i,args:s?",
+        .params     = "nodeid [args]",
+        .help       = "set host memory policy for a guest NUMA node",
+        .mhandler.cmd = hmp_set_mpol,
+    },
+
+STEXI
+@item set-mpol @var{nodeid} @var{args}
+@findex set-mpol
+
+Set host memory policy for a guest NUMA node
+
+ETEXI
+
+    {
         .name       = "info",
         .args_type  = "item:s?",
         .params     = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index 2daed43..57a5730 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1482,3 +1482,38 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
     hmp_handle_error(mon, &err);
 }
+
+void hmp_set_mpol(Monitor *mon, const QDict *qdict)
+{
+    Error *local_err = NULL;
+    bool has_mpol = true;
+    bool has_hostnode = true;
+    const char *mpol = NULL;
+    const char *hostnode = NULL;
+    QemuOpts *opts;
+
+    uint64_t nodeid = qdict_get_int(qdict, "nodeid");
+    const char *args = qdict_get_try_str(qdict, "args");
+
+    if (args == NULL) {
+        has_mpol = false;
+        has_hostnode = false;
+    } else {
+        opts = qemu_opts_parse(qemu_find_opts("numa"), args, 1);
+        if (opts == NULL) {
+            error_setg(&local_err, "Parsing memory policy args failed");
+        } else {
+            mpol = qemu_opt_get(opts, "mem-policy");
+            if (mpol == NULL) {
+                has_mpol = false;
+            }
+            hostnode = qemu_opt_get(opts, "mem-hostnode");
+            if (hostnode == NULL) {
+                has_hostnode = false;
+            }
+        }
+    }
+
+    qmp_set_mpol(nodeid, has_mpol, mpol, has_hostnode, hostnode, &local_err);
+    hmp_handle_error(mon, &local_err);
+}
diff --git a/hmp.h b/hmp.h
index 56d2e92..81f631b 100644
--- a/hmp.h
+++ b/hmp.h
@@ -86,5 +86,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
 void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_qemu_io(Monitor *mon, const QDict *qdict);
+void hmp_set_mpol(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (8 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol Wanlong Gao
@ 2013-07-04  9:53 ` Wanlong Gao
  2013-07-05 18:49   ` Eduardo Habkost
  2013-07-08 18:36   ` Luiz Capitulino
  2013-07-04 19:49 ` [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-04  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lcapitulino, bsd, y-goto, pbonzini, afaerber,
	gaowanlong

Show host memory policy of nodes in the info numa monitor command.
After this patch, the monitor command "info numa" will show the
information like following if the host numa support is enabled:

    (qemu) info numa
    2 nodes
    node 0 cpus: 0
    node 0 size: 1024 MB
    node 0 mempolicy: membind=0,1
    node 1 cpus: 1
    node 1 size: 1024 MB
    node 1 mempolicy: interleave=1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 monitor.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/monitor.c b/monitor.c
index 93ac045..a40415d 100644
--- a/monitor.c
+++ b/monitor.c
@@ -74,6 +74,11 @@
 #endif
 #include "hw/lm32/lm32_pic.h"
 
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#endif
+
 //#define DEBUG
 //#define DEBUG_COMPLETION
 
@@ -1808,6 +1813,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
     int i;
     CPUArchState *env;
     CPUState *cpu;
+    unsigned long first, next;
 
     monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
     for (i = 0; i < nb_numa_nodes; i++) {
@@ -1821,6 +1827,42 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
             numa_info[i].node_mem >> 20);
+
+#ifdef CONFIG_NUMA
+        monitor_printf(mon, "node %d mempolicy: ", i);
+        switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) {
+        case NODE_HOST_BIND:
+            monitor_printf(mon, "membind=");
+            break;
+        case NODE_HOST_INTERLEAVE:
+            monitor_printf(mon, "interleave=");
+            break;
+        case NODE_HOST_PREFERRED:
+            monitor_printf(mon, "preferred=");
+            break;
+        default:
+            monitor_printf(mon, "default\n");
+            continue;
+        }
+
+        if (numa_info[i].flags & NODE_HOST_RELATIVE)
+            monitor_printf(mon, "+");
+
+        next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        monitor_printf(mon, "%lu", first);
+        do {
+            if (next == numa_max_node())
+                break;
+            next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
+                                 next + 1);
+            if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
+                break;
+
+            monitor_printf(mon, ",%lu", next);
+        } while (true);
+
+        monitor_printf(mon, "\n");
+#endif
     }
 }
 
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (9 preceding siblings ...)
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
@ 2013-07-04 19:49 ` Paolo Bonzini
  2013-07-04 21:15   ` Laszlo Ersek
  2013-07-05  0:54   ` Wanlong Gao
  2013-07-05 19:18 ` Eduardo Habkost
  2013-07-11 10:32 ` Peter Huang(Peng)
  12 siblings, 2 replies; 38+ messages in thread
From: Paolo Bonzini @ 2013-07-04 19:49 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	Laszlo Ersek, afaerber

Il 04/07/2013 11:53, Wanlong Gao ha scritto:
> As you know, QEMU can't direct it's memory allocation now, this may cause
> guest cross node access performance regression.
> And, the worse thing is that if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
> 
> So, we should set the guest nodes memory allocation policy before
> the pages are really mapped.
> 
> According to this patch set, we are able to set guest nodes memory policy
> like following:
> 
>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1

Did you see my suggestion to use instead something like this:

    -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
    -numa mem,nodeid=0,size=1G,policy=membind,hostnode=0-1
    -numa mem,nodeid=1,size=2G,policy=interleave,hostnode=1

With an eye to when we'll support memory hotplug, I think it is better.
 It is not hard to implement it using the OptsVisitor; see
14aa0c2de045a6c2fcfadf38c04434fd15909455 for an example of a complex
schema described with OptsVistor.

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-04 19:49 ` [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
@ 2013-07-04 21:15   ` Laszlo Ersek
  2013-07-05  0:55     ` Wanlong Gao
  2013-07-05  0:54   ` Wanlong Gao
  1 sibling, 1 reply; 38+ messages in thread
From: Laszlo Ersek @ 2013-07-04 21:15 UTC (permalink / raw)
  To: Paolo Bonzini, Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto, afaerber

On 07/04/13 21:49, Paolo Bonzini wrote:
> Il 04/07/2013 11:53, Wanlong Gao ha scritto:
>> As you know, QEMU can't direct it's memory allocation now, this may cause
>> guest cross node access performance regression.
>> And, the worse thing is that if PCI-passthrough is used,
>> direct-attached-device uses DMA transfer between device and qemu process.
>> All pages of the guest will be pinned by get_user_pages().
>>
>> KVM_ASSIGN_PCI_DEVICE ioctl
>>   kvm_vm_ioctl_assign_device()
>>     =>kvm_assign_device()
>>       => kvm_iommu_map_memslots()
>>         => kvm_iommu_map_pages()
>>            => kvm_pin_pages()
>>
>> So, with direct-attached-device, all guest page's page count will be +1 and
>> any page migration will not work. AutoNUMA won't too.
>>
>> So, we should set the guest nodes memory allocation policy before
>> the pages are really mapped.
>>
>> According to this patch set, we are able to set guest nodes memory policy
>> like following:
>>
>>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1
> 
> Did you see my suggestion to use instead something like this:
> 
>     -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
>     -numa mem,nodeid=0,size=1G,policy=membind,hostnode=0-1
>     -numa mem,nodeid=1,size=2G,policy=interleave,hostnode=1
> 
> With an eye to when we'll support memory hotplug, I think it is better.
>  It is not hard to implement it using the OptsVisitor; see
> 14aa0c2de045a6c2fcfadf38c04434fd15909455 for an example of a complex
> schema described with OptsVistor.

See also the commit msg of its grandparent, eb7ee2cb, for general notes.
The containing series is d195325b^..1a0c0958.

A more recent (and simpler) use is the 8ccbad5c^..0c764a9d sub-series.

Thanks for the reference, Paolo.

Laszlo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-04 19:49 ` [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
  2013-07-04 21:15   ` Laszlo Ersek
@ 2013-07-05  0:54   ` Wanlong Gao
  1 sibling, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-05  0:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	Laszlo Ersek, afaerber, Wanlong Gao

On 07/05/2013 03:49 AM, Paolo Bonzini wrote:
> Il 04/07/2013 11:53, Wanlong Gao ha scritto:
>> As you know, QEMU can't direct it's memory allocation now, this may cause
>> guest cross node access performance regression.
>> And, the worse thing is that if PCI-passthrough is used,
>> direct-attached-device uses DMA transfer between device and qemu process.
>> All pages of the guest will be pinned by get_user_pages().
>>
>> KVM_ASSIGN_PCI_DEVICE ioctl
>>   kvm_vm_ioctl_assign_device()
>>     =>kvm_assign_device()
>>       => kvm_iommu_map_memslots()
>>         => kvm_iommu_map_pages()
>>            => kvm_pin_pages()
>>
>> So, with direct-attached-device, all guest page's page count will be +1 and
>> any page migration will not work. AutoNUMA won't too.
>>
>> So, we should set the guest nodes memory allocation policy before
>> the pages are really mapped.
>>
>> According to this patch set, we are able to set guest nodes memory policy
>> like following:
>>
>>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1
> 
> Did you see my suggestion to use instead something like this:
> 
>     -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
>     -numa mem,nodeid=0,size=1G,policy=membind,hostnode=0-1
>     -numa mem,nodeid=1,size=2G,policy=interleave,hostnode=1
> 
> With an eye to when we'll support memory hotplug, I think it is better.
>  It is not hard to implement it using the OptsVisitor; see
> 14aa0c2de045a6c2fcfadf38c04434fd15909455 for an example of a complex
> schema described with OptsVistor.

OK, got it, thank you.

Wanlong Gao

> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-04 21:15   ` Laszlo Ersek
@ 2013-07-05  0:55     ` Wanlong Gao
  0 siblings, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-05  0:55 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	Paolo Bonzini, afaerber, Wanlong Gao

On 07/05/2013 05:15 AM, Laszlo Ersek wrote:
> On 07/04/13 21:49, Paolo Bonzini wrote:
>> Il 04/07/2013 11:53, Wanlong Gao ha scritto:
>>> As you know, QEMU can't direct it's memory allocation now, this may cause
>>> guest cross node access performance regression.
>>> And, the worse thing is that if PCI-passthrough is used,
>>> direct-attached-device uses DMA transfer between device and qemu process.
>>> All pages of the guest will be pinned by get_user_pages().
>>>
>>> KVM_ASSIGN_PCI_DEVICE ioctl
>>>   kvm_vm_ioctl_assign_device()
>>>     =>kvm_assign_device()
>>>       => kvm_iommu_map_memslots()
>>>         => kvm_iommu_map_pages()
>>>            => kvm_pin_pages()
>>>
>>> So, with direct-attached-device, all guest page's page count will be +1 and
>>> any page migration will not work. AutoNUMA won't too.
>>>
>>> So, we should set the guest nodes memory allocation policy before
>>> the pages are really mapped.
>>>
>>> According to this patch set, we are able to set guest nodes memory policy
>>> like following:
>>>
>>>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>>>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1
>>
>> Did you see my suggestion to use instead something like this:
>>
>>     -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
>>     -numa mem,nodeid=0,size=1G,policy=membind,hostnode=0-1
>>     -numa mem,nodeid=1,size=2G,policy=interleave,hostnode=1
>>
>> With an eye to when we'll support memory hotplug, I think it is better.
>>  It is not hard to implement it using the OptsVisitor; see
>> 14aa0c2de045a6c2fcfadf38c04434fd15909455 for an example of a complex
>> schema described with OptsVistor.
> 
> See also the commit msg of its grandparent, eb7ee2cb, for general notes.
> The containing series is d195325b^..1a0c0958.
> 
> A more recent (and simpler) use is the 8ccbad5c^..0c764a9d sub-series.

Thank you for your references Laszlo, it's very helpful.

Wanlong Gao

> 
> Thanks for the reference, Paolo.
> 
> Laszlo
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
@ 2013-07-05 18:41   ` Eduardo Habkost
  2013-07-08 19:02     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Eduardo Habkost @ 2013-07-05 18:41 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, qemu-devel, lcapitulino, bsd, pbonzini, y-goto, afaerber

On Thu, Jul 04, 2013 at 05:53:08PM +0800, Wanlong Gao wrote:
> From: Bandan Das <bsd@redhat.com>
> 
> This allows us to use the "cpus" property multiple times
> to specify multiple cpu (ranges) to the -numa option :
> 
> -numa node,cpus=1,cpus=2,cpus=4
> or
> -numa node,cpus=1-3,cpus=5
> 
> Note that after this patch, the defalut suffix of "-numa node,mem=N"
> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
> when assigning "N MB" of node memory size.

Such an incompatible change is not acceptable, as it would break
existing configurations. libvirt doesn't specify any suffix and expects
it to always mean "MB".

> 
> Signed-off-by: Bandan Das <bsd@redhat.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  qemu-options.hx |   3 +-
>  vl.c            | 108 ++++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 67 insertions(+), 44 deletions(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 137a39b..449cf36 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -100,7 +100,8 @@ STEXI
>  @item -numa @var{opts}
>  @findex -numa
>  Simulate a multi node NUMA system. If mem and cpus are omitted, resources
> -are split equally.
> +are split equally. The "-cpus" property may be specified multiple times

The option is not named "-cpus", but just "cpus". And I believe it is
normally called an "option", not a "property".

> +to denote multiple cpus or cpu ranges.
>  ETEXI
>  
>  DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
> diff --git a/vl.c b/vl.c
> index 6d9fd7d..6f2e17a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -516,6 +516,32 @@ static QemuOptsList qemu_realtime_opts = {
>      },
>  };
>  
> +static QemuOptsList qemu_numa_opts = {
> +    .name = "numa",
> +    .implied_opt_name = "type",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
> +    .desc = {
> +        {
> +            .name = "type",
> +            .type = QEMU_OPT_STRING,
> +            .help = "node type"
> +        },{
> +            .name = "nodeid",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "node ID"
> +        },{
> +            .name = "mem",
> +            .type = QEMU_OPT_SIZE,
> +            .help = "memory size"
> +        },{
> +            .name = "cpus",
> +            .type = QEMU_OPT_STRING,
> +            .help = "cpu number or range"
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  const char *qemu_get_vm_name(void)
>  {
>      return qemu_name;
> @@ -1349,56 +1375,37 @@ error:
>      exit(1);
>  }
>  
> -static void numa_add(const char *optarg)
> +
> +static int numa_add_cpus(const char *name, const char *value, void *opaque)
>  {
> -    char option[128];
> -    char *endptr;
> -    unsigned long long nodenr;
> +    int *nodenr = opaque;
>  
> -    optarg = get_opt_name(option, 128, optarg, ',');
> -    if (*optarg == ',') {
> -        optarg++;
> +    if (!strcmp(name, "cpu")) {
> +        numa_node_parse_cpus(*nodenr, value);
>      }
> -    if (!strcmp(option, "node")) {
> -
> -        if (nb_numa_nodes >= MAX_NODES) {
> -            fprintf(stderr, "qemu: too many NUMA nodes\n");
> -            exit(1);
> -        }
> +    return 0;
> +}
>  
> -        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> -            nodenr = nb_numa_nodes;
> -        } else {
> -            if (parse_uint_full(option, &nodenr, 10) < 0) {
> -                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
> -                exit(1);
> -            }
> -        }
> +static int numa_init_func(QemuOpts *opts, void *opaque)
> +{
> +    uint64_t nodenr, mem_size;
>  
> -        if (nodenr >= MAX_NODES) {
> -            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
> -            exit(1);
> -        }
> +    nodenr = qemu_opt_get_number(opts, "nodeid", nb_numa_nodes++);
>  
> -        if (get_param_value(option, 128, "mem", optarg) == 0) {
> -            node_mem[nodenr] = 0;
> -        } else {
> -            int64_t sval;
> -            sval = strtosz(option, &endptr);
> -            if (sval < 0 || *endptr) {
> -                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
> -                exit(1);
> -            }
> -            node_mem[nodenr] = sval;
> -        }
> -        if (get_param_value(option, 128, "cpus", optarg) != 0) {
> -            numa_node_parse_cpus(nodenr, option);
> -        }
> -        nb_numa_nodes++;
> -    } else {
> -        fprintf(stderr, "Invalid -numa option: %s\n", option);
> +    if (nodenr >= MAX_NODES) {
> +        fprintf(stderr, "qemu: Max number of NUMA nodes reached : %d\n",
> +                (int)nodenr);
>          exit(1);
>      }
> +
> +    mem_size = qemu_opt_get_size(opts, "mem", 0);
> +    node_mem[nodenr] = mem_size;
> +
> +    if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
>  }
>  
>  static QemuOptsList qemu_smp_opts = {
> @@ -2933,6 +2940,7 @@ int main(int argc, char **argv, char **envp)
>      qemu_add_opts(&qemu_object_opts);
>      qemu_add_opts(&qemu_tpmdev_opts);
>      qemu_add_opts(&qemu_realtime_opts);
> +    qemu_add_opts(&qemu_numa_opts);
>  
>      runstate_init();
>  
> @@ -3119,7 +3127,16 @@ int main(int argc, char **argv, char **envp)
>                  }
>                  break;
>              case QEMU_OPTION_numa:
> -                numa_add(optarg);
> +                olist = qemu_find_opts("numa");
> +                opts = qemu_opts_parse(olist, optarg, 1);
> +                if (!opts) {
> +                    exit(1);
> +                }
> +                optarg = qemu_opt_get(opts, "type");
> +                if (!optarg || strcmp(optarg, "node")) {
> +                    fprintf(stderr, "qemu: Incorrect format for numa option\n");

Why not do this inside numa_init_func()?

> +                    exit(1);
> +                }
>                  break;
>              case QEMU_OPTION_display:
>                  display_type = select_display(optarg);
> @@ -4195,6 +4212,11 @@ int main(int argc, char **argv, char **envp)
>  
>      register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
>  
> +    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
> +                          NULL, 1) != 0) {
> +        exit(1);
> +    }
> +
>      if (nb_numa_nodes > 0) {
>          int i;
>  
> -- 
> 1.8.3.2.634.g7a3187e
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
@ 2013-07-05 18:49   ` Eduardo Habkost
  2013-07-08 18:36   ` Luiz Capitulino
  1 sibling, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2013-07-05 18:49 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, qemu-devel, lcapitulino, bsd, pbonzini, y-goto, afaerber

On Thu, Jul 04, 2013 at 05:53:17PM +0800, Wanlong Gao wrote:
> Show host memory policy of nodes in the info numa monitor command.
> After this patch, the monitor command "info numa" will show the
> information like following if the host numa support is enabled:
> 
>     (qemu) info numa
>     2 nodes
>     node 0 cpus: 0
>     node 0 size: 1024 MB
>     node 0 mempolicy: membind=0,1
>     node 1 cpus: 1
>     node 1 size: 1024 MB
>     node 1 mempolicy: interleave=1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  monitor.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/monitor.c b/monitor.c
> index 93ac045..a40415d 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -74,6 +74,11 @@
>  #endif
>  #include "hw/lm32/lm32_pic.h"
>  
> +#ifdef CONFIG_NUMA
> +#include <numa.h>
> +#include <numaif.h>
> +#endif
> +
>  //#define DEBUG
>  //#define DEBUG_COMPLETION
>  
> @@ -1808,6 +1813,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
>      int i;
>      CPUArchState *env;
>      CPUState *cpu;
> +    unsigned long first, next;

This breaks compilation with --enable-werror and CONFIG_NUMA disabled:

/home/ehabkost/rh/proj/virt/qemu/monitor.c: In function ‘do_info_numa’:
/home/ehabkost/rh/proj/virt/qemu/monitor.c:1816:26: error: unused variable ‘next’ [-Werror=unused-variable]
/home/ehabkost/rh/proj/virt/qemu/monitor.c:1816:19: error: unused variable ‘first’ [-Werror=unused-variable]
cc1: all warnings being treated as errors
make[1]: *** [monitor.o] Error 1

>  
>      monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
>      for (i = 0; i < nb_numa_nodes; i++) {
> @@ -1821,6 +1827,42 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "\n");
>          monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
>              numa_info[i].node_mem >> 20);
> +
> +#ifdef CONFIG_NUMA
> +        monitor_printf(mon, "node %d mempolicy: ", i);
> +        switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) {
> +        case NODE_HOST_BIND:
> +            monitor_printf(mon, "membind=");
> +            break;
> +        case NODE_HOST_INTERLEAVE:
> +            monitor_printf(mon, "interleave=");
> +            break;
> +        case NODE_HOST_PREFERRED:
> +            monitor_printf(mon, "preferred=");
> +            break;
> +        default:
> +            monitor_printf(mon, "default\n");
> +            continue;
> +        }
> +
> +        if (numa_info[i].flags & NODE_HOST_RELATIVE)
> +            monitor_printf(mon, "+");
> +
> +        next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
> +        monitor_printf(mon, "%lu", first);
> +        do {
> +            if (next == numa_max_node())
> +                break;
> +            next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
> +                                 next + 1);
> +            if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
> +                break;
> +
> +            monitor_printf(mon, ",%lu", next);
> +        } while (true);
> +
> +        monitor_printf(mon, "\n");
> +#endif
>      }
>  }
>  
> -- 
> 1.8.3.2.634.g7a3187e
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (10 preceding siblings ...)
  2013-07-04 19:49 ` [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
@ 2013-07-05 19:18 ` Eduardo Habkost
  2013-07-11 10:32 ` Peter Huang(Peng)
  12 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2013-07-05 19:18 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, qemu-devel, lcapitulino, bsd, pbonzini, y-goto, afaerber

On Thu, Jul 04, 2013 at 05:53:07PM +0800, Wanlong Gao wrote:
[...]
> Bandan Das (1):
>   NUMA: Support multiple CPU ranges on -numa option
> 
> Wanlong Gao (9):
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: Add Linux libnuma detection
>   NUMA: parse guest numa nodes memory policy
>   NUMA: handle Error in cpus, mpol and hostnode parser
>   NUMA: split out the common range parser
>   NUMA: set guest numa nodes memory policy
>   NUMA: add qmp command set-mpol to set memory policy for NUMA node
>   NUMA: add hmp command set-mpol
>   NUMA: show host memory policy info in info numa command

checkpatch.pl issues:


total: 0 errors, 0 warnings, 155 lines checked

/tmp/numav4-fp/0001-NUMA-Support-multiple-CPU-ranges-on-numa-option.patch has no obvious style problems and is ready for submission.
total: 0 errors, 0 warnings, 129 lines checked

/tmp/numav4-fp/0002-NUMA-Add-numa_info-structure-to-contain-numa-nodes-i.patch has no obvious style problems and is ready for submission.
total: 0 errors, 0 warnings, 68 lines checked

/tmp/numav4-fp/0003-NUMA-Add-Linux-libnuma-detection.patch has no obvious style problems and is ready for submission.
WARNING: braces {} are necessary for all arms of this statement
#106: FILE: vl.c:1424:
+    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
[...]

WARNING: braces {} are necessary for all arms of this statement
#129: FILE: vl.c:1447:
+    if (clear)
[...]
+    else
[...]

total: 0 errors, 2 warnings, 158 lines checked

/tmp/numav4-fp/0004-NUMA-parse-guest-numa-nodes-memory-policy.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
total: 0 errors, 0 warnings, 108 lines checked

/tmp/numav4-fp/0005-NUMA-handle-Error-in-cpus-mpol-and-hostnode-parser.patch has no obvious style problems and is ready for submission.
WARNING: suspect code indent for conditional statements (8, 11)
#47: FILE: vl.c:1351:
+        if (parse_uint_full(endptr + 1, endvalue, 10) < 0) {
+           return -1;

total: 0 errors, 1 warnings, 128 lines checked

/tmp/numav4-fp/0006-NUMA-split-out-the-common-range-parser.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
WARNING: braces {} are necessary for all arms of this statement
#100: FILE: cpus.c:1225:
+    if (block->host == NULL)
[...]

total: 0 errors, 1 warnings, 105 lines checked

/tmp/numav4-fp/0007-NUMA-set-guest-numa-nodes-memory-policy.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
total: 0 errors, 0 warnings, 113 lines checked

/tmp/numav4-fp/0008-NUMA-add-qmp-command-set-mpol-to-set-memory-policy-f.patch has no obvious style problems and is ready for submission.
total: 0 errors, 0 warnings, 66 lines checked

/tmp/numav4-fp/0009-NUMA-add-hmp-command-set-mpol.patch has no obvious style problems and is ready for submission.
WARNING: braces {} are necessary for all arms of this statement
#70: FILE: monitor.c:1848:
+        if (numa_info[i].flags & NODE_HOST_RELATIVE)
[...]

WARNING: braces {} are necessary for all arms of this statement
#76: FILE: monitor.c:1854:
+            if (next == numa_max_node())
[...]

WARNING: braces {} are necessary for all arms of this statement
#80: FILE: monitor.c:1858:
+            if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
[...]

total: 0 errors, 3 warnings, 60 lines checked

/tmp/numav4-fp/0010-NUMA-show-host-memory-policy-info-in-info-numa-comma.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
@ 2013-07-05 19:32   ` Eduardo Habkost
  2013-07-05 20:09     ` Andreas Färber
  0 siblings, 1 reply; 38+ messages in thread
From: Eduardo Habkost @ 2013-07-05 19:32 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, qemu-devel, lcapitulino, bsd, pbonzini, y-goto, afaerber

On Thu, Jul 04, 2013 at 05:53:09PM +0800, Wanlong Gao wrote:
> Add the numa_info structure to contain the numa nodes memory,
> VCPUs information and the future added numa nodes host memory
> policies.
> 
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

> ---
>  cpus.c                  |  2 +-
>  hw/i386/pc.c            |  4 ++--
>  hw/net/eepro100.c       |  1 -
>  include/sysemu/sysemu.h |  8 ++++++--
>  monitor.c               |  2 +-
>  vl.c                    | 24 ++++++++++++------------
>  6 files changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 20958e5..496d5ce 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1180,7 +1180,7 @@ void set_numa_modes(void)
>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
>          cpu = ENV_GET_CPU(env);
>          for (i = 0; i < nb_numa_nodes; i++) {
> -            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
> +            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
>                  cpu->numa_node = i;
>              }
>          }
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 78f92e2..78b5a72 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -650,14 +650,14 @@ static FWCfgState *bochs_bios_init(void)
>          unsigned int apic_id = x86_cpu_apic_id_from_index(i);
>          assert(apic_id < apic_id_limit);
>          for (j = 0; j < nb_numa_nodes; j++) {
> -            if (test_bit(i, node_cpumask[j])) {
> +            if (test_bit(i, numa_info[j].node_cpu)) {
>                  numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
>                  break;
>              }
>          }
>      }
>      for (i = 0; i < nb_numa_nodes; i++) {
> -        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
> +        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
>      }
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
>                       (1 + apic_id_limit + nb_numa_nodes) *
> diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
> index dc99ea6..478c688 100644
> --- a/hw/net/eepro100.c
> +++ b/hw/net/eepro100.c
> @@ -105,7 +105,6 @@
>  #define PCI_IO_SIZE             64
>  #define PCI_FLASH_SIZE          (128 * KiB)
>  
> -#define BIT(n) (1 << (n))
>  #define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
>  
>  /* The SCB accepts the following controls for the Tx and Rx units: */
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 2fb71af..70fd2ed 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -9,6 +9,7 @@
>  #include "qapi-types.h"
>  #include "qemu/notify.h"
>  #include "qemu/main-loop.h"
> +#include "qemu/bitmap.h"
>  
>  /* vl.c */
>  
> @@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock;
>  #define MAX_NODES 64
>  #define MAX_CPUMASK_BITS 255
>  extern int nb_numa_nodes;
> -extern uint64_t node_mem[MAX_NODES];
> -extern unsigned long *node_cpumask[MAX_NODES];
> +struct node_info {
> +    uint64_t node_mem;
> +    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
> +};
> +extern struct node_info numa_info[MAX_NODES];
>  
>  #define MAX_OPTION_ROMS 16
>  typedef struct QEMUOptionRom {
> diff --git a/monitor.c b/monitor.c
> index 9be515c..93ac045 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -1820,7 +1820,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
>          }
>          monitor_printf(mon, "\n");
>          monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
> -            node_mem[i] >> 20);
> +            numa_info[i].node_mem >> 20);
>      }
>  }
>  
> diff --git a/vl.c b/vl.c
> index 6f2e17a..5207b8e 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
>      QTAILQ_HEAD_INITIALIZER(fw_boot_order);
>  
>  int nb_numa_nodes;
> -uint64_t node_mem[MAX_NODES];
> -unsigned long *node_cpumask[MAX_NODES];
> +struct node_info numa_info[MAX_NODES];
>  
>  uint8_t qemu_uuid[16];
>  
> @@ -1367,7 +1366,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
>          goto error;
>      }
>  
> -    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> +    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
>      return;
>  
>  error:
> @@ -1399,7 +1398,7 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
>      }
>  
>      mem_size = qemu_opt_get_size(opts, "mem", 0);
> -    node_mem[nodenr] = mem_size;
> +    numa_info[nodenr].node_mem = mem_size;
>  
>      if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
>          return -1;
> @@ -2961,8 +2960,8 @@ int main(int argc, char **argv, char **envp)
>      translation = BIOS_ATA_TRANSLATION_AUTO;
>  
>      for (i = 0; i < MAX_NODES; i++) {
> -        node_mem[i] = 0;
> -        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> +        numa_info[i].node_mem = 0;
> +        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
>      }
>  
>      nb_numa_nodes = 0;
> @@ -4228,7 +4227,7 @@ int main(int argc, char **argv, char **envp)
>           * and distribute the available memory equally across all nodes
>           */
>          for (i = 0; i < nb_numa_nodes; i++) {
> -            if (node_mem[i] != 0)
> +            if (numa_info[i].node_mem != 0)
>                  break;
>          }
>          if (i == nb_numa_nodes) {
> @@ -4238,14 +4237,15 @@ int main(int argc, char **argv, char **envp)
>               * the final node gets the rest.
>               */
>              for (i = 0; i < nb_numa_nodes - 1; i++) {
> -                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
> -                usedmem += node_mem[i];
> +                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
> +                                        ~((1 << 23UL) - 1);
> +                usedmem += numa_info[i].node_mem;
>              }
> -            node_mem[i] = ram_size - usedmem;
> +            numa_info[i].node_mem = ram_size - usedmem;
>          }
>  
>          for (i = 0; i < nb_numa_nodes; i++) {
> -            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> +            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
>                  break;
>              }
>          }
> @@ -4255,7 +4255,7 @@ int main(int argc, char **argv, char **envp)
>           */
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
> -                set_bit(i, node_cpumask[i % nb_numa_nodes]);
> +                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
>              }
>          }
>      }
> -- 
> 1.8.3.2.634.g7a3187e
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info
  2013-07-05 19:32   ` Eduardo Habkost
@ 2013-07-05 20:09     ` Andreas Färber
  0 siblings, 0 replies; 38+ messages in thread
From: Andreas Färber @ 2013-07-05 20:09 UTC (permalink / raw)
  To: Wanlong Gao, Paolo Bonzini
  Cc: aliguori, Eduardo Habkost, qemu-devel, lcapitulino, bsd, y-goto

Am 05.07.2013 21:32, schrieb Eduardo Habkost:
> On Thu, Jul 04, 2013 at 05:53:09PM +0800, Wanlong Gao wrote:
>> Add the numa_info structure to contain the numa nodes memory,
>> VCPUs information and the future added numa nodes host memory
>> policies.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> 
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
>> ---
>>  cpus.c                  |  2 +-
>>  hw/i386/pc.c            |  4 ++--
>>  hw/net/eepro100.c       |  1 -
>>  include/sysemu/sysemu.h |  8 ++++++--
>>  monitor.c               |  2 +-
>>  vl.c                    | 24 ++++++++++++------------
>>  6 files changed, 22 insertions(+), 19 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 20958e5..496d5ce 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1180,7 +1180,7 @@ void set_numa_modes(void)
>>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>          cpu = ENV_GET_CPU(env);
>>          for (i = 0; i < nb_numa_nodes; i++) {
>> -            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
>> +            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
>>                  cpu->numa_node = i;
>>              }
>>          }
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 78f92e2..78b5a72 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -650,14 +650,14 @@ static FWCfgState *bochs_bios_init(void)
>>          unsigned int apic_id = x86_cpu_apic_id_from_index(i);
>>          assert(apic_id < apic_id_limit);
>>          for (j = 0; j < nb_numa_nodes; j++) {
>> -            if (test_bit(i, node_cpumask[j])) {
>> +            if (test_bit(i, numa_info[j].node_cpu)) {
>>                  numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
>>                  break;
>>              }
>>          }
>>      }
>>      for (i = 0; i < nb_numa_nodes; i++) {
>> -        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
>> +        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
>>      }
>>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
>>                       (1 + apic_id_limit + nb_numa_nodes) *
>> diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
>> index dc99ea6..478c688 100644
>> --- a/hw/net/eepro100.c
>> +++ b/hw/net/eepro100.c
>> @@ -105,7 +105,6 @@
>>  #define PCI_IO_SIZE             64
>>  #define PCI_FLASH_SIZE          (128 * KiB)
>>  
>> -#define BIT(n) (1 << (n))
>>  #define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
>>  
>>  /* The SCB accepts the following controls for the Tx and Rx units: */
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index 2fb71af..70fd2ed 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -9,6 +9,7 @@
>>  #include "qapi-types.h"
>>  #include "qemu/notify.h"
>>  #include "qemu/main-loop.h"
>> +#include "qemu/bitmap.h"
>>  
>>  /* vl.c */
>>  
>> @@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock;
>>  #define MAX_NODES 64
>>  #define MAX_CPUMASK_BITS 255
>>  extern int nb_numa_nodes;
>> -extern uint64_t node_mem[MAX_NODES];
>> -extern unsigned long *node_cpumask[MAX_NODES];
>> +struct node_info {

NodeInfo

>> +    uint64_t node_mem;
>> +    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
>> +};

Please add a typedef and use that everywhere below.

>> +extern struct node_info numa_info[MAX_NODES];

I wonder if those structs should be QOM Objects instead, so that we can
use link<> properties from CPUState. I think Paolo suggested something
in that direction?

Regards,
Andreas

>>  
>>  #define MAX_OPTION_ROMS 16
>>  typedef struct QEMUOptionRom {
>> diff --git a/monitor.c b/monitor.c
>> index 9be515c..93ac045 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -1820,7 +1820,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
>>          }
>>          monitor_printf(mon, "\n");
>>          monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
>> -            node_mem[i] >> 20);
>> +            numa_info[i].node_mem >> 20);
>>      }
>>  }
>>  
>> diff --git a/vl.c b/vl.c
>> index 6f2e17a..5207b8e 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
>>      QTAILQ_HEAD_INITIALIZER(fw_boot_order);
>>  
>>  int nb_numa_nodes;
>> -uint64_t node_mem[MAX_NODES];
>> -unsigned long *node_cpumask[MAX_NODES];
>> +struct node_info numa_info[MAX_NODES];
>>  
>>  uint8_t qemu_uuid[16];
>>  
>> @@ -1367,7 +1366,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
>>          goto error;
>>      }
>>  
>> -    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
>> +    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
>>      return;
>>  
>>  error:
>> @@ -1399,7 +1398,7 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
>>      }
>>  
>>      mem_size = qemu_opt_get_size(opts, "mem", 0);
>> -    node_mem[nodenr] = mem_size;
>> +    numa_info[nodenr].node_mem = mem_size;
>>  
>>      if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
>>          return -1;
>> @@ -2961,8 +2960,8 @@ int main(int argc, char **argv, char **envp)
>>      translation = BIOS_ATA_TRANSLATION_AUTO;
>>  
>>      for (i = 0; i < MAX_NODES; i++) {
>> -        node_mem[i] = 0;
>> -        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
>> +        numa_info[i].node_mem = 0;
>> +        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
>>      }
>>  
>>      nb_numa_nodes = 0;
>> @@ -4228,7 +4227,7 @@ int main(int argc, char **argv, char **envp)
>>           * and distribute the available memory equally across all nodes
>>           */
>>          for (i = 0; i < nb_numa_nodes; i++) {
>> -            if (node_mem[i] != 0)
>> +            if (numa_info[i].node_mem != 0)
>>                  break;
>>          }
>>          if (i == nb_numa_nodes) {
>> @@ -4238,14 +4237,15 @@ int main(int argc, char **argv, char **envp)
>>               * the final node gets the rest.
>>               */
>>              for (i = 0; i < nb_numa_nodes - 1; i++) {
>> -                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
>> -                usedmem += node_mem[i];
>> +                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
>> +                                        ~((1 << 23UL) - 1);
>> +                usedmem += numa_info[i].node_mem;
>>              }
>> -            node_mem[i] = ram_size - usedmem;
>> +            numa_info[i].node_mem = ram_size - usedmem;
>>          }
>>  
>>          for (i = 0; i < nb_numa_nodes; i++) {
>> -            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
>> +            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
>>                  break;
>>              }
>>          }
>> @@ -4255,7 +4255,7 @@ int main(int argc, char **argv, char **envp)
>>           */
>>          if (i == nb_numa_nodes) {
>>              for (i = 0; i < max_cpus; i++) {
>> -                set_bit(i, node_cpumask[i % nb_numa_nodes]);
>> +                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
>>              }
>>          }
>>      }
>> -- 
>> 1.8.3.2.634.g7a3187e
>>
>>
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
@ 2013-07-08 18:25   ` Luiz Capitulino
  2013-07-08 18:34     ` Luiz Capitulino
  2013-07-15 11:18     ` Wanlong Gao
  2013-07-08 19:16   ` Eric Blake
  1 sibling, 2 replies; 38+ messages in thread
From: Luiz Capitulino @ 2013-07-08 18:25 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, afaerber

On Thu, 4 Jul 2013 17:53:15 +0800
Wanlong Gao <gaowanlong@cn.fujitsu.com> wrote:

> The QMP command let it be able to set node's memory policy
> through the QMP protocol. The qmp-shell command is like:
>     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  cpus.c           | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 15 +++++++++++++++
>  qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
>  3 files changed, 104 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index 7240de7..ff42b9d 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1417,3 +1417,57 @@ void qmp_inject_nmi(Error **errp)
>      error_set(errp, QERR_UNSUPPORTED);
>  #endif
>  }
> +
> +void qmp_set_mpol(int64_t nodeid, bool has_mpol, const char *mpol,
> +                  bool has_hostnode, const char *hostnode, Error **errp)
> +{
> +    unsigned int flags;
> +    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
> +
> +    if (nodeid >= nb_numa_nodes) {
> +        error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
> +        return;
> +    }
> +
> +    bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
> +    flags = numa_info[nodeid].flags;
> +
> +    numa_info[nodeid].flags = NODE_HOST_NONE;
> +    bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
> +
> +    if (!has_mpol) {
> +        if (set_node_mpol(nodeid) == -1) {
> +            error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
> +            goto error;
> +        }
> +        return;
> +    }
> +
> +    numa_node_parse_mpol(nodeid, mpol, errp);
> +    if (error_is_set(errp)) {
> +        goto error;
> +    }
> +
> +    if (!has_hostnode) {
> +        bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
> +    }
> +
> +    if (hostnode) {
> +        numa_node_parse_hostnode(nodeid, hostnode, errp);
> +        if (error_is_set(errp)) {
> +            goto error;
> +        }
> +    }
> +
> +    if (set_node_mpol(nodeid) == -1) {
> +        error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
> +        goto error;
> +    }
> +
> +    return;
> +
> +error:
> +    bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
> +    numa_info[nodeid].flags = flags;
> +    return;
> +}
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5c32528..0870da2 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3712,3 +3712,18 @@
>              '*cpuid-input-ecx': 'int',
>              'cpuid-register': 'X86CPURegister32',
>              'features': 'int' } }
> +
> +# @set-mpol:
> +#
> +# Set the host memory binding policy for guest NUMA node.
> +#
> +# @nodeid: The node ID of guest NUMA node to set memory policy to.
> +#
> +# @mem-policy: The memory policy string to set.

Shouldn't this be an enum? Also, optional members have a leading '#optional'
string and if a default value is used it should be documented.

> +#
> +# @mem-hostnode: The host node or node range for memory policy.

It doesn't seem appropriate to use a string here. Maybe we could
use a list with only to values (like [0,2] for 0-2) or maybe a
list of nodes if that makes sense (like [0,1,2]).

> +#
> +# Since: 1.6.0
> +##
> +{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str',
> +                                  '*mem-hostnode': 'str'} }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 362f0e1..ccab51b 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3043,3 +3043,38 @@ Example:
>  <- { "return": {} }
>  
>  EQMP
> +
> +    {
> +        .name      = "set-mpol",
> +        .args_type = "nodeid:i,mem-policy:s?,mem-hostnode:s?",
> +        .help      = "Set the host memory binding policy for guest NUMA node",
> +        .mhandler.cmd_new = qmp_marshal_input_set_mpol,
> +    },
> +
> +SQMP
> +set-mpol
> +------
> +
> +Set the host memory binding policy for guest NUMA node
> +
> +Arguments:
> +
> +- "nodeid": The nodeid of guest NUMA node to set memory policy to.
> +            (json-int)
> +- "mem-policy": The memory policy string to set.
> +                (json-string, optional)
> +- "mem-hostnode": The host nodes contained to mpol.
> +                  (json-string, optional)
> +
> +Example:
> +
> +-> { "execute": "set-mpol", "arguments": { "nodeid": 0, "mem-policy": "membind",
> +                                           "mem-hostnode": "0-1" }}
> +<- { "return": {} }
> +
> +Notes:
> +    1. If "mem-policy" is not set, the memory policy of this "nodeid" will be set
> +       to "default".
> +    2. If "mem-hostnode" is not set, the node mask of this "mpol" will be set
> +       to "all".
> +EQMP

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol Wanlong Gao
@ 2013-07-08 18:32   ` Luiz Capitulino
  0 siblings, 0 replies; 38+ messages in thread
From: Luiz Capitulino @ 2013-07-08 18:32 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, afaerber

On Thu, 4 Jul 2013 17:53:16 +0800
Wanlong Gao <gaowanlong@cn.fujitsu.com> wrote:

> Add hmp command set-mpol to set host memory policy for a guest
> NUMA node. Then we can also set node's memory policy using
> the monitor command like:
>     (qemu) set-mpol 0 mem-policy=membind,mem-hostnode=0-1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  hmp-commands.hx | 16 ++++++++++++++++
>  hmp.c           | 35 +++++++++++++++++++++++++++++++++++
>  hmp.h           |  1 +
>  3 files changed, 52 insertions(+)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 915b0d1..417b69f 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1567,6 +1567,22 @@ Executes a qemu-io command on the given block device.
>  ETEXI
>  
>      {
> +        .name       = "set-mpol",
> +        .args_type  = "nodeid:i,args:s?",
> +        .params     = "nodeid [args]",
> +        .help       = "set host memory policy for a guest NUMA node",
> +        .mhandler.cmd = hmp_set_mpol,
> +    },
> +
> +STEXI
> +@item set-mpol @var{nodeid} @var{args}
> +@findex set-mpol
> +
> +Set host memory policy for a guest NUMA node
> +
> +ETEXI
> +
> +    {
>          .name       = "info",
>          .args_type  = "item:s?",
>          .params     = "[subcommand]",
> diff --git a/hmp.c b/hmp.c
> index 2daed43..57a5730 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1482,3 +1482,38 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
>  
>      hmp_handle_error(mon, &err);
>  }
> +
> +void hmp_set_mpol(Monitor *mon, const QDict *qdict)
> +{
> +    Error *local_err = NULL;
> +    bool has_mpol = true;
> +    bool has_hostnode = true;
> +    const char *mpol = NULL;
> +    const char *hostnode = NULL;
> +    QemuOpts *opts;
> +
> +    uint64_t nodeid = qdict_get_int(qdict, "nodeid");
> +    const char *args = qdict_get_try_str(qdict, "args");
> +
> +    if (args == NULL) {
> +        has_mpol = false;
> +        has_hostnode = false;
> +    } else {
> +        opts = qemu_opts_parse(qemu_find_opts("numa"), args, 1);
> +        if (opts == NULL) {
> +            error_setg(&local_err, "Parsing memory policy args failed");

You're still going to call qmp_set_mpol() if this fails. You can replace
error_setg() to a monitor_printf() call and return.

> +        } else {
> +            mpol = qemu_opt_get(opts, "mem-policy");
> +            if (mpol == NULL) {
> +                has_mpol = false;
> +            }
> +            hostnode = qemu_opt_get(opts, "mem-hostnode");
> +            if (hostnode == NULL) {
> +                has_hostnode = false;
> +            }
> +        }
> +    }
> +
> +    qmp_set_mpol(nodeid, has_mpol, mpol, has_hostnode, hostnode, &local_err);
> +    hmp_handle_error(mon, &local_err);
> +}
> diff --git a/hmp.h b/hmp.h
> index 56d2e92..81f631b 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -86,5 +86,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
>  void hmp_chardev_add(Monitor *mon, const QDict *qdict);
>  void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
>  void hmp_qemu_io(Monitor *mon, const QDict *qdict);
> +void hmp_set_mpol(Monitor *mon, const QDict *qdict);
>  
>  #endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-08 18:25   ` Luiz Capitulino
@ 2013-07-08 18:34     ` Luiz Capitulino
  2013-07-08 18:50       ` Andreas Färber
  2013-07-15 11:18     ` Wanlong Gao
  1 sibling, 1 reply; 38+ messages in thread
From: Luiz Capitulino @ 2013-07-08 18:34 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, afaerber

On Mon, 8 Jul 2013 14:25:14 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> On Thu, 4 Jul 2013 17:53:15 +0800
> Wanlong Gao <gaowanlong@cn.fujitsu.com> wrote:
> 
> > The QMP command let it be able to set node's memory policy
> > through the QMP protocol. The qmp-shell command is like:
> >     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
> > 
> > Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > ---
> >  cpus.c           | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json | 15 +++++++++++++++
> >  qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
> >  3 files changed, 104 insertions(+)
> > 
> > diff --git a/cpus.c b/cpus.c
> > index 7240de7..ff42b9d 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -1417,3 +1417,57 @@ void qmp_inject_nmi(Error **errp)
> >      error_set(errp, QERR_UNSUPPORTED);
> >  #endif
> >  }
> > +
> > +void qmp_set_mpol(int64_t nodeid, bool has_mpol, const char *mpol,
> > +                  bool has_hostnode, const char *hostnode, Error **errp)
> > +{
> > +    unsigned int flags;
> > +    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
> > +
> > +    if (nodeid >= nb_numa_nodes) {
> > +        error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
> > +        return;
> > +    }
> > +
> > +    bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
> > +    flags = numa_info[nodeid].flags;
> > +
> > +    numa_info[nodeid].flags = NODE_HOST_NONE;
> > +    bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
> > +
> > +    if (!has_mpol) {
> > +        if (set_node_mpol(nodeid) == -1) {
> > +            error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
> > +            goto error;
> > +        }
> > +        return;
> > +    }
> > +
> > +    numa_node_parse_mpol(nodeid, mpol, errp);
> > +    if (error_is_set(errp)) {
> > +        goto error;
> > +    }
> > +
> > +    if (!has_hostnode) {
> > +        bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
> > +    }
> > +
> > +    if (hostnode) {
> > +        numa_node_parse_hostnode(nodeid, hostnode, errp);
> > +        if (error_is_set(errp)) {
> > +            goto error;
> > +        }
> > +    }
> > +
> > +    if (set_node_mpol(nodeid) == -1) {
> > +        error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
> > +        goto error;
> > +    }
> > +
> > +    return;
> > +
> > +error:
> > +    bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
> > +    numa_info[nodeid].flags = flags;
> > +    return;
> > +}
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 5c32528..0870da2 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -3712,3 +3712,18 @@
> >              '*cpuid-input-ecx': 'int',
> >              'cpuid-register': 'X86CPURegister32',
> >              'features': 'int' } }
> > +
> > +# @set-mpol:
> > +#
> > +# Set the host memory binding policy for guest NUMA node.
> > +#
> > +# @nodeid: The node ID of guest NUMA node to set memory policy to.
> > +#
> > +# @mem-policy: The memory policy string to set.
> 
> Shouldn't this be an enum? Also, optional members have a leading '#optional'
> string and if a default value is used it should be documented.
> 
> > +#
> > +# @mem-hostnode: The host node or node range for memory policy.
> 
> It doesn't seem appropriate to use a string here. Maybe we could
> use a list with only to values (like [0,2] for 0-2) or maybe a
> list of nodes if that makes sense (like [0,1,2]).

I forgot to bike-shed on the naming. I'd suggest the following:

 - s/set-mpol/set-numa-policy
 - s/mem-policy/policy
 - s/mem-hostnode/host-nodes

> 
> > +#
> > +# Since: 1.6.0
> > +##
> > +{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str',
> > +                                  '*mem-hostnode': 'str'} }
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index 362f0e1..ccab51b 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -3043,3 +3043,38 @@ Example:
> >  <- { "return": {} }
> >  
> >  EQMP
> > +
> > +    {
> > +        .name      = "set-mpol",
> > +        .args_type = "nodeid:i,mem-policy:s?,mem-hostnode:s?",
> > +        .help      = "Set the host memory binding policy for guest NUMA node",
> > +        .mhandler.cmd_new = qmp_marshal_input_set_mpol,
> > +    },
> > +
> > +SQMP
> > +set-mpol
> > +------
> > +
> > +Set the host memory binding policy for guest NUMA node
> > +
> > +Arguments:
> > +
> > +- "nodeid": The nodeid of guest NUMA node to set memory policy to.
> > +            (json-int)
> > +- "mem-policy": The memory policy string to set.
> > +                (json-string, optional)
> > +- "mem-hostnode": The host nodes contained to mpol.
> > +                  (json-string, optional)
> > +
> > +Example:
> > +
> > +-> { "execute": "set-mpol", "arguments": { "nodeid": 0, "mem-policy": "membind",
> > +                                           "mem-hostnode": "0-1" }}
> > +<- { "return": {} }
> > +
> > +Notes:
> > +    1. If "mem-policy" is not set, the memory policy of this "nodeid" will be set
> > +       to "default".
> > +    2. If "mem-hostnode" is not set, the node mask of this "mpol" will be set
> > +       to "all".
> > +EQMP
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
  2013-07-05 18:49   ` Eduardo Habkost
@ 2013-07-08 18:36   ` Luiz Capitulino
  1 sibling, 0 replies; 38+ messages in thread
From: Luiz Capitulino @ 2013-07-08 18:36 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, afaerber

On Thu, 4 Jul 2013 17:53:17 +0800
Wanlong Gao <gaowanlong@cn.fujitsu.com> wrote:

> Show host memory policy of nodes in the info numa monitor command.
> After this patch, the monitor command "info numa" will show the
> information like following if the host numa support is enabled:

As you're adding a QMP command to set the policy, wouldn't it make
sense to convert info numa to QMP so that we also have query-numa?

> 
>     (qemu) info numa
>     2 nodes
>     node 0 cpus: 0
>     node 0 size: 1024 MB
>     node 0 mempolicy: membind=0,1
>     node 1 cpus: 1
>     node 1 size: 1024 MB
>     node 1 mempolicy: interleave=1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  monitor.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/monitor.c b/monitor.c
> index 93ac045..a40415d 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -74,6 +74,11 @@
>  #endif
>  #include "hw/lm32/lm32_pic.h"
>  
> +#ifdef CONFIG_NUMA
> +#include <numa.h>
> +#include <numaif.h>
> +#endif
> +
>  //#define DEBUG
>  //#define DEBUG_COMPLETION
>  
> @@ -1808,6 +1813,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
>      int i;
>      CPUArchState *env;
>      CPUState *cpu;
> +    unsigned long first, next;
>  
>      monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
>      for (i = 0; i < nb_numa_nodes; i++) {
> @@ -1821,6 +1827,42 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "\n");
>          monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
>              numa_info[i].node_mem >> 20);
> +
> +#ifdef CONFIG_NUMA
> +        monitor_printf(mon, "node %d mempolicy: ", i);
> +        switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) {
> +        case NODE_HOST_BIND:
> +            monitor_printf(mon, "membind=");
> +            break;
> +        case NODE_HOST_INTERLEAVE:
> +            monitor_printf(mon, "interleave=");
> +            break;
> +        case NODE_HOST_PREFERRED:
> +            monitor_printf(mon, "preferred=");
> +            break;
> +        default:
> +            monitor_printf(mon, "default\n");
> +            continue;
> +        }
> +
> +        if (numa_info[i].flags & NODE_HOST_RELATIVE)
> +            monitor_printf(mon, "+");
> +
> +        next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
> +        monitor_printf(mon, "%lu", first);
> +        do {
> +            if (next == numa_max_node())
> +                break;
> +            next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
> +                                 next + 1);
> +            if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
> +                break;
> +
> +            monitor_printf(mon, ",%lu", next);
> +        } while (true);
> +
> +        monitor_printf(mon, "\n");
> +#endif
>      }
>  }
>  

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-08 18:34     ` Luiz Capitulino
@ 2013-07-08 18:50       ` Andreas Färber
  2013-07-08 19:03         ` Luiz Capitulino
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Färber @ 2013-07-08 18:50 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, Wanlong Gao

Am 08.07.2013 20:34, schrieb Luiz Capitulino:
> I forgot to bike-shed on the naming. I'd suggest the following:
> 
>  - s/set-mpol/set-numa-policy
>  - s/mem-policy/policy
>  - s/mem-hostnode/host-nodes

I had suggested s/set-mpol/set-memory-policy/g on the previous round
(but didn't get any kind of response).

http://patchwork.ozlabs.org/patch/253699/

I'm fine with anything readable and following our conventions though.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-05 18:41   ` Eduardo Habkost
@ 2013-07-08 19:02     ` Eric Blake
  2013-07-08 19:25       ` Eduardo Habkost
                         ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Eric Blake @ 2013-07-08 19:02 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: aliguori, qemu-devel, lcapitulino, bsd, y-goto, pbonzini,
	afaerber, Wanlong Gao

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

On 07/05/2013 12:41 PM, Eduardo Habkost wrote:
> On Thu, Jul 04, 2013 at 05:53:08PM +0800, Wanlong Gao wrote:
>> From: Bandan Das <bsd@redhat.com>
>>
>> This allows us to use the "cpus" property multiple times
>> to specify multiple cpu (ranges) to the -numa option :
>>
>> -numa node,cpus=1,cpus=2,cpus=4
>> or
>> -numa node,cpus=1-3,cpus=5
>>
>> Note that after this patch, the defalut suffix of "-numa node,mem=N"
>> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
>> when assigning "N MB" of node memory size.
> 
> Such an incompatible change is not acceptable, as it would break
> existing configurations. libvirt doesn't specify any suffix and expects
> it to always mean "MB".

Newer libvirt can be taught to append 'M' when it detects it is talking
to newer qemu.  While you have a point that it is annoying to force
users to upgrade to a newer libvirt merely because they upgraded qemu,
the libvirt point of view is that the following are supported:

old libvirt -> old qemu
new libvirt -> old qemu
new libvirt -> new qemu

but that this combination is always best effort and not required to work:

old libvirt -> new qemu

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-08 18:50       ` Andreas Färber
@ 2013-07-08 19:03         ` Luiz Capitulino
  0 siblings, 0 replies; 38+ messages in thread
From: Luiz Capitulino @ 2013-07-08 19:03 UTC (permalink / raw)
  To: Andreas Färber
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, Wanlong Gao

On Mon, 08 Jul 2013 20:50:46 +0200
Andreas Färber <afaerber@suse.de> wrote:

> Am 08.07.2013 20:34, schrieb Luiz Capitulino:
> > I forgot to bike-shed on the naming. I'd suggest the following:
> > 
> >  - s/set-mpol/set-numa-policy
> >  - s/mem-policy/policy
> >  - s/mem-hostnode/host-nodes
> 
> I had suggested s/set-mpol/set-memory-policy/g on the previous round
> (but didn't get any kind of response).
> 
> http://patchwork.ozlabs.org/patch/253699/
> 
> I'm fine with anything readable and following our conventions though.

I'm fine with your suggestion too.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
  2013-07-08 18:25   ` Luiz Capitulino
@ 2013-07-08 19:16   ` Eric Blake
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Blake @ 2013-07-08 19:16 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, afaerber

[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]

On 07/04/2013 03:53 AM, Wanlong Gao wrote:
> The QMP command let it be able to set node's memory policy

s/let it be able/allows users/

> through the QMP protocol. The qmp-shell command is like:
>     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---

Just an interface review:

> +++ b/qapi-schema.json
> @@ -3712,3 +3712,18 @@
>              '*cpuid-input-ecx': 'int',
>              'cpuid-register': 'X86CPURegister32',
>              'features': 'int' } }
> +
> +# @set-mpol:

I agree with other requests in this thread to make the name closer to
English words (set-memory-policy).  Also, I hate write-only interfaces;
what is the corresponding query-* command that lets me learn the policy
that is currently in effect?  I'm expecting that this series either
modifies an existing command or adds a new query command as the
counterpart to this set command.

> +#
> +# Set the host memory binding policy for guest NUMA node.
> +#
> +# @nodeid: The node ID of guest NUMA node to set memory policy to.
> +#
> +# @mem-policy: The memory policy string to set.
> +#
> +# @mem-hostnode: The host node or node range for memory policy.
> +#
> +# Since: 1.6.0
> +##
> +{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str',
> +                                  '*mem-hostnode': 'str'} }

Make mem-policy an enum, not an open-coded string.  Also, make
mem-hostnode an array of nodes - a general rule of thumb is that if the
receiver (here, qemu) has to do further parsing of the data (such as
scraping out integer vs. dash to decide if it is one node or a range),
then the JSON was too high-level.  Using ['int'] instead of 'str' will
let the data be available already parsed into an integer list by the
visitor code, so you aren't having to write your own ad hoc parser on
the receiving end.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-08 19:02     ` Eric Blake
@ 2013-07-08 19:25       ` Eduardo Habkost
  2013-07-08 19:25       ` Anthony Liguori
  2013-07-14 11:34       ` Paolo Bonzini
  2 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2013-07-08 19:25 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, qemu-devel, lcapitulino, bsd, y-goto, pbonzini,
	afaerber, Wanlong Gao

On Mon, Jul 08, 2013 at 01:02:41PM -0600, Eric Blake wrote:
> On 07/05/2013 12:41 PM, Eduardo Habkost wrote:
> > On Thu, Jul 04, 2013 at 05:53:08PM +0800, Wanlong Gao wrote:
> >> From: Bandan Das <bsd@redhat.com>
> >>
> >> This allows us to use the "cpus" property multiple times
> >> to specify multiple cpu (ranges) to the -numa option :
> >>
> >> -numa node,cpus=1,cpus=2,cpus=4
> >> or
> >> -numa node,cpus=1-3,cpus=5
> >>
> >> Note that after this patch, the defalut suffix of "-numa node,mem=N"
> >> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
> >> when assigning "N MB" of node memory size.
> > 
> > Such an incompatible change is not acceptable, as it would break
> > existing configurations. libvirt doesn't specify any suffix and expects
> > it to always mean "MB".
> 
> Newer libvirt can be taught to append 'M' when it detects it is talking
> to newer qemu.  While you have a point that it is annoying to force
> users to upgrade to a newer libvirt merely because they upgraded qemu,
> the libvirt point of view is that the following are supported:
> 
> old libvirt -> old qemu
> new libvirt -> old qemu
> new libvirt -> new qemu
> 
> but that this combination is always best effort and not required to work:
> 
> old libvirt -> new qemu

I assume the rules above apply only if "new libvirt" gets released
before "new qemu", right? Otherwise people won't be able to use latest
released libvirt with latest released qemu until "new libvirt" gets
released.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-08 19:02     ` Eric Blake
  2013-07-08 19:25       ` Eduardo Habkost
@ 2013-07-08 19:25       ` Anthony Liguori
  2013-07-09  3:28         ` Wanlong Gao
  2013-07-14 11:34       ` Paolo Bonzini
  2 siblings, 1 reply; 38+ messages in thread
From: Anthony Liguori @ 2013-07-08 19:25 UTC (permalink / raw)
  To: Eric Blake, Eduardo Habkost
  Cc: qemu-devel, lcapitulino, bsd, pbonzini, y-goto, afaerber, Wanlong Gao

Eric Blake <eblake@redhat.com> writes:

> On 07/05/2013 12:41 PM, Eduardo Habkost wrote:
>> On Thu, Jul 04, 2013 at 05:53:08PM +0800, Wanlong Gao wrote:
>>> From: Bandan Das <bsd@redhat.com>
>>>
>>> This allows us to use the "cpus" property multiple times
>>> to specify multiple cpu (ranges) to the -numa option :
>>>
>>> -numa node,cpus=1,cpus=2,cpus=4
>>> or
>>> -numa node,cpus=1-3,cpus=5
>>>
>>> Note that after this patch, the defalut suffix of "-numa node,mem=N"
>>> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
>>> when assigning "N MB" of node memory size.
>> 
>> Such an incompatible change is not acceptable, as it would break
>> existing configurations. libvirt doesn't specify any suffix and expects
>> it to always mean "MB".
>
> Newer libvirt can be taught to append 'M' when it detects it is talking
> to newer qemu.  While you have a point that it is annoying to force
> users to upgrade to a newer libvirt merely because they upgraded qemu,
> the libvirt point of view is that the following are supported:
>
> old libvirt -> old qemu
> new libvirt -> old qemu
> new libvirt -> new qemu
>
> but that this combination is always best effort and not required to work:
>
> old libvirt -> new qemu

That's fine for libvirt, but we don't break command line compatibility
in QEMU.  So this patch needs to change.

Regards,

Anthony Liguori

>
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-08 19:25       ` Anthony Liguori
@ 2013-07-09  3:28         ` Wanlong Gao
  2013-07-09  3:34           ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Wanlong Gao @ 2013-07-09  3:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Eduardo Habkost, qemu-devel, lcapitulino, bsd, y-goto, pbonzini,
	afaerber, Wanlong Gao

On 07/09/2013 03:25 AM, Anthony Liguori wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 07/05/2013 12:41 PM, Eduardo Habkost wrote:
>>> On Thu, Jul 04, 2013 at 05:53:08PM +0800, Wanlong Gao wrote:
>>>> From: Bandan Das <bsd@redhat.com>
>>>>
>>>> This allows us to use the "cpus" property multiple times
>>>> to specify multiple cpu (ranges) to the -numa option :
>>>>
>>>> -numa node,cpus=1,cpus=2,cpus=4
>>>> or
>>>> -numa node,cpus=1-3,cpus=5
>>>>
>>>> Note that after this patch, the defalut suffix of "-numa node,mem=N"
>>>> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
>>>> when assigning "N MB" of node memory size.
>>>
>>> Such an incompatible change is not acceptable, as it would break
>>> existing configurations. libvirt doesn't specify any suffix and expects
>>> it to always mean "MB".
>>
>> Newer libvirt can be taught to append 'M' when it detects it is talking
>> to newer qemu.  While you have a point that it is annoying to force
>> users to upgrade to a newer libvirt merely because they upgraded qemu,
>> the libvirt point of view is that the following are supported:
>>
>> old libvirt -> old qemu
>> new libvirt -> old qemu
>> new libvirt -> new qemu
>>
>> but that this combination is always best effort and not required to work:
>>
>> old libvirt -> new qemu
> 
> That's fine for libvirt, but we don't break command line compatibility
> in QEMU.  So this patch needs to change.

But if we follow Paolo's suggestion like:
    -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
    -numa mem,nodeid=0,size=1G,policy=membind,hostnode=0-1
    -numa mem,nodeid=1,size=2G,policy=interleave,hostnode=1

We already break the command line compatibility.
Why not change it be a really "size" options without default suffix?

Thanks,
Wanlong Gao

> 
> Regards,
> 
> Anthony Liguori
> 
>>
>> -- 
>> Eric Blake   eblake redhat com    +1-919-301-3266
>> Libvirt virtualization library http://libvirt.org
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-09  3:28         ` Wanlong Gao
@ 2013-07-09  3:34           ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2013-07-09  3:34 UTC (permalink / raw)
  To: gaowanlong
  Cc: Anthony Liguori, Eduardo Habkost, qemu-devel, lcapitulino, bsd,
	y-goto, pbonzini, afaerber

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On 07/08/2013 09:28 PM, Wanlong Gao wrote:

>>>>>
>>>>> Note that after this patch, the defalut suffix of "-numa node,mem=N"
>>>>> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
>>>>> when assigning "N MB" of node memory size.
>>>>

> 
> But if we follow Paolo's suggestion like:
>     -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
>     -numa mem,nodeid=0,size=1G,policy=membind,hostnode=0-1
>     -numa mem,nodeid=1,size=2G,policy=interleave,hostnode=1
> 
> We already break the command line compatibility.

New command options can have whatever syntax makes sense.  The worry
here is that if you use the old command line without any new options,
the old syntax must behave the same.

> Why not change it be a really "size" options without default suffix?

Because users are annoyed when a command line that worked with qemu 1.5
needlessly fails to work with qemu 1.6.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (11 preceding siblings ...)
  2013-07-05 19:18 ` Eduardo Habkost
@ 2013-07-11 10:32 ` Peter Huang(Peng)
  2013-07-11 13:10   ` Eduardo Habkost
  12 siblings, 1 reply; 38+ messages in thread
From: Peter Huang(Peng) @ 2013-07-11 10:32 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, afaerber

Hi,Wanlong

>From the patch discription below,  seems that qemu numa only support cpu/memory node binding.
As we know, binding is not the common usage due to VM migration may happen or the load balance
would be disabled.
So, do we have any plan of generating virtual numa automatically?

For example, if we create a 16vCPU VM on a 4 8-core node physical box, we can automatically place
it to two physical node, not by binding.

On 2013-07-04 17:53, Wanlong Gao wrote:
> As you know, QEMU can't direct it's memory allocation now, this may cause
> guest cross node access performance regression.
> And, the worse thing is that if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
>
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
>
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
>
> So, we should set the guest nodes memory allocation policy before
> the pages are really mapped.
>
> According to this patch set, we are able to set guest nodes memory policy
> like following:
>
>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1
>
> This supports "mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}" like format.
>
> And patch 8/10 adds a QMP command "set-mpol" to set the memory policy for every
> guest nodes:
>     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
>
> And patch 9/10 adds a monitor command "set-mpol" whose format like:
>     set-mpol 0 mem-policy=membind,mem-hostnode=0-1
>
> And with patch 10/10, we can get the current memory policy of each guest node
> using monitor command "info numa", for example:
>
>     (qemu) info numa
>     2 nodes
>     node 0 cpus: 0
>     node 0 size: 1024 MB
>     node 0 mempolicy: membind=0,1
>     node 1 cpus: 1
>     node 1 size: 1024 MB
>     node 1 mempolicy: interleave=1
>
>
> V1->V2:
>     change to use QemuOpts in numa options (Paolo)
>     handle Error in mpol parser (Paolo)
>     change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
> V2->V3:
>     also handle Error in cpus parser (5/10)
>     split out common parser from cpus and hostnode parser (Bandan 6/10)
> V3-V4:
>     rebase to request for comments
>
>
> Bandan Das (1):
>   NUMA: Support multiple CPU ranges on -numa option
>
> Wanlong Gao (9):
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: Add Linux libnuma detection
>   NUMA: parse guest numa nodes memory policy
>   NUMA: handle Error in cpus, mpol and hostnode parser
>   NUMA: split out the common range parser
>   NUMA: set guest numa nodes memory policy
>   NUMA: add qmp command set-mpol to set memory policy for NUMA node
>   NUMA: add hmp command set-mpol
>   NUMA: show host memory policy info in info numa command
>
>  configure               |  32 ++++++
>  cpus.c                  | 143 +++++++++++++++++++++++-
>  hmp-commands.hx         |  16 +++
>  hmp.c                   |  35 ++++++
>  hmp.h                   |   1 +
>  hw/i386/pc.c            |   4 +-
>  hw/net/eepro100.c       |   1 -
>  include/sysemu/sysemu.h |  20 +++-
>  monitor.c               |  44 +++++++-
>  qapi-schema.json        |  15 +++
>  qemu-options.hx         |   3 +-
>  qmp-commands.hx         |  35 ++++++
>  vl.c                    | 285 +++++++++++++++++++++++++++++++++++-------------
>  13 files changed, 553 insertions(+), 81 deletions(-)
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-07-11 10:32 ` Peter Huang(Peng)
@ 2013-07-11 13:10   ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2013-07-11 13:10 UTC (permalink / raw)
  To: Peter Huang(Peng)
  Cc: aliguori, qemu-devel, lcapitulino, bsd, pbonzini, y-goto,
	afaerber, Wanlong Gao

On Thu, Jul 11, 2013 at 06:32:48PM +0800, Peter Huang(Peng) wrote:
> Hi,Wanlong
> 
> From the patch discription below,  seems that qemu numa only support cpu/memory node binding.
> As we know, binding is not the common usage due to VM migration may happen or the load balance
> would be disabled.
> So, do we have any plan of generating virtual numa automatically?
> 
> For example, if we create a 16vCPU VM on a 4 8-core node physical box, we can automatically place
> it to two physical node, not by binding.

Do you mean automatically generating the NUMA topology configuration for
the VM, or automatically migrating between physical nodes?

The guest-visible NUMA topology is part of the VM configuration. If
automatically creating a config optimized for a specific host is
desired, that's a task for other tools (like libvirt, or tools built on
top of libvirt).

If you are talking about automatic migration of guest RAM and VCPUs,
eventually the kernel may be able to do it efficiently, but we don't
even have performance numbers for manually-tuned static binding setups
to compare with (because static binding is not possible yet).


> 
> On 2013-07-04 17:53, Wanlong Gao wrote:
> > As you know, QEMU can't direct it's memory allocation now, this may cause
> > guest cross node access performance regression.
> > And, the worse thing is that if PCI-passthrough is used,
> > direct-attached-device uses DMA transfer between device and qemu process.
> > All pages of the guest will be pinned by get_user_pages().
> >
> > KVM_ASSIGN_PCI_DEVICE ioctl
> >   kvm_vm_ioctl_assign_device()
> >     =>kvm_assign_device()
> >       => kvm_iommu_map_memslots()
> >         => kvm_iommu_map_pages()
> >            => kvm_pin_pages()
> >
> > So, with direct-attached-device, all guest page's page count will be +1 and
> > any page migration will not work. AutoNUMA won't too.
> >
> > So, we should set the guest nodes memory allocation policy before
> > the pages are really mapped.
> >
> > According to this patch set, we are able to set guest nodes memory policy
> > like following:
> >
> >  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
> >  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1
> >
> > This supports "mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}" like format.
> >
> > And patch 8/10 adds a QMP command "set-mpol" to set the memory policy for every
> > guest nodes:
> >     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
> >
> > And patch 9/10 adds a monitor command "set-mpol" whose format like:
> >     set-mpol 0 mem-policy=membind,mem-hostnode=0-1
> >
> > And with patch 10/10, we can get the current memory policy of each guest node
> > using monitor command "info numa", for example:
> >
> >     (qemu) info numa
> >     2 nodes
> >     node 0 cpus: 0
> >     node 0 size: 1024 MB
> >     node 0 mempolicy: membind=0,1
> >     node 1 cpus: 1
> >     node 1 size: 1024 MB
> >     node 1 mempolicy: interleave=1
> >
> >
> > V1->V2:
> >     change to use QemuOpts in numa options (Paolo)
> >     handle Error in mpol parser (Paolo)
> >     change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
> > V2->V3:
> >     also handle Error in cpus parser (5/10)
> >     split out common parser from cpus and hostnode parser (Bandan 6/10)
> > V3-V4:
> >     rebase to request for comments
> >
> >
> > Bandan Das (1):
> >   NUMA: Support multiple CPU ranges on -numa option
> >
> > Wanlong Gao (9):
> >   NUMA: Add numa_info structure to contain numa nodes info
> >   NUMA: Add Linux libnuma detection
> >   NUMA: parse guest numa nodes memory policy
> >   NUMA: handle Error in cpus, mpol and hostnode parser
> >   NUMA: split out the common range parser
> >   NUMA: set guest numa nodes memory policy
> >   NUMA: add qmp command set-mpol to set memory policy for NUMA node
> >   NUMA: add hmp command set-mpol
> >   NUMA: show host memory policy info in info numa command
> >
> >  configure               |  32 ++++++
> >  cpus.c                  | 143 +++++++++++++++++++++++-
> >  hmp-commands.hx         |  16 +++
> >  hmp.c                   |  35 ++++++
> >  hmp.h                   |   1 +
> >  hw/i386/pc.c            |   4 +-
> >  hw/net/eepro100.c       |   1 -
> >  include/sysemu/sysemu.h |  20 +++-
> >  monitor.c               |  44 +++++++-
> >  qapi-schema.json        |  15 +++
> >  qemu-options.hx         |   3 +-
> >  qmp-commands.hx         |  35 ++++++
> >  vl.c                    | 285 +++++++++++++++++++++++++++++++++++-------------
> >  13 files changed, 553 insertions(+), 81 deletions(-)
> >

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-08 19:02     ` Eric Blake
  2013-07-08 19:25       ` Eduardo Habkost
  2013-07-08 19:25       ` Anthony Liguori
@ 2013-07-14 11:34       ` Paolo Bonzini
  2013-07-15 21:33         ` Eric Blake
  2 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2013-07-14 11:34 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, Eduardo Habkost, qemu-devel, lcapitulino, bsd, y-goto,
	afaerber, Wanlong Gao

Il 08/07/2013 21:02, Eric Blake ha scritto:
> On 07/05/2013 12:41 PM, Eduardo Habkost wrote:
>> On Thu, Jul 04, 2013 at 05:53:08PM +0800, Wanlong Gao wrote:
>>> From: Bandan Das <bsd@redhat.com>
>>>
>>> This allows us to use the "cpus" property multiple times
>>> to specify multiple cpu (ranges) to the -numa option :
>>>
>>> -numa node,cpus=1,cpus=2,cpus=4
>>> or
>>> -numa node,cpus=1-3,cpus=5
>>>
>>> Note that after this patch, the defalut suffix of "-numa node,mem=N"
>>> will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
>>> when assigning "N MB" of node memory size.
>>
>> Such an incompatible change is not acceptable, as it would break
>> existing configurations. libvirt doesn't specify any suffix and expects
>> it to always mean "MB".
> 
> Newer libvirt can be taught to append 'M' when it detects it is talking
> to newer qemu.  While you have a point that it is annoying to force
> users to upgrade to a newer libvirt merely because they upgraded qemu,
> the libvirt point of view is that the following are supported:
> 
> old libvirt -> old qemu
> new libvirt -> old qemu
> new libvirt -> new qemu
> 
> but that this combination is always best effort and not required to work:
> 
> old libvirt -> new qemu

I don't think this is the case, unless you're talking of *very* old
libvirt (e.g. pre-QMP).

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-07-08 18:25   ` Luiz Capitulino
  2013-07-08 18:34     ` Luiz Capitulino
@ 2013-07-15 11:18     ` Wanlong Gao
  1 sibling, 0 replies; 38+ messages in thread
From: Wanlong Gao @ 2013-07-15 11:18 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: aliguori, ehabkost, qemu-devel, bsd, y-goto, pbonzini, afaerber,
	Wanlong Gao

On 07/09/2013 02:25 AM, Luiz Capitulino wrote:
>> +
>> > +# @set-mpol:
>> > +#
>> > +# Set the host memory binding policy for guest NUMA node.
>> > +#
>> > +# @nodeid: The node ID of guest NUMA node to set memory policy to.
>> > +#
>> > +# @mem-policy: The memory policy string to set.
> Shouldn't this be an enum? Also, optional members have a leading '#optional'
> string and if a default value is used it should be documented.

Thank you, will fix in V5.

> 
>> > +#
>> > +# @mem-hostnode: The host node or node range for memory policy.
> It doesn't seem appropriate to use a string here. Maybe we could
> use a list with only to values (like [0,2] for 0-2) or maybe a
> list of nodes if that makes sense (like [0,1,2]).

I wonder if not using string, how to support "+" and "!" ?

Thanks,
Wanlong Gao

 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-14 11:34       ` Paolo Bonzini
@ 2013-07-15 21:33         ` Eric Blake
  2013-07-16  6:24           ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2013-07-15 21:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, Eduardo Habkost, qemu-devel, lcapitulino, bsd, y-goto,
	afaerber, Wanlong Gao

[-- Attachment #1: Type: text/plain, Size: 1680 bytes --]

On 07/14/2013 05:34 AM, Paolo Bonzini wrote:

>>> Such an incompatible change is not acceptable, as it would break
>>> existing configurations. libvirt doesn't specify any suffix and expects
>>> it to always mean "MB".
>>
>> Newer libvirt can be taught to append 'M' when it detects it is talking
>> to newer qemu.  While you have a point that it is annoying to force
>> users to upgrade to a newer libvirt merely because they upgraded qemu,
>> the libvirt point of view is that the following are supported:
>>
>> old libvirt -> old qemu
>> new libvirt -> old qemu
>> new libvirt -> new qemu
>>
>> but that this combination is always best effort and not required to work:
>>
>> old libvirt -> new qemu
> 
> I don't think this is the case, unless you're talking of *very* old
> libvirt (e.g. pre-QMP).

As a counter-example, I can recall a case where a qemu release that used
just two digits (was that 1.2?) broke operation under older libvirt that
assumed versions would always be three digits; but it definitely
occurred after 0.15.x which is the point at which libvirt started
favoring QMP.  That is, we had a case in Fedora where if you upgraded
qemu, you HAD to also update libvirt to be able to keep your guests running.

But yes, the goal of having command line compatibility, so that any
application using the same command line it always uses will get the same
guest, regardless of a qemu upgrade in the meantime, should be our
default mode of operation, even if newer apps should prefer newer
(better) command line interfaces.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-07-15 21:33         ` Eric Blake
@ 2013-07-16  6:24           ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2013-07-16  6:24 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, Eduardo Habkost, qemu-devel, lcapitulino, bsd, y-goto,
	afaerber, Wanlong Gao

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 15/07/2013 23:33, Eric Blake ha scritto:
>>>>> Newer libvirt can be taught to append 'M' when it detects
>>>>> it is talking to newer qemu.  While you have a point that
>>>>> it is annoying to force users to upgrade to a newer libvirt
>>>>> merely because they upgraded qemu, the libvirt point of
>>>>> view is that the following are supported:
>>>>> 
>>>>> old libvirt -> old qemu new libvirt -> old qemu new libvirt
>>>>> -> new qemu
>>>>> 
>>>>> but that this combination is always best effort and not
>>>>> required to work:
>>>>> 
>>>>> old libvirt -> new qemu
>>> 
>>> I don't think this is the case, unless you're talking of *very*
>>> old libvirt (e.g. pre-QMP).
> As a counter-example, I can recall a case where a qemu release that
> used just two digits (was that 1.2?) broke operation under older
> libvirt that assumed versions would always be three digits; but it
> definitely occurred after 0.15.x which is the point at which
> libvirt started favoring QMP.  That is, we had a case in Fedora
> where if you upgraded qemu, you HAD to also update libvirt to be
> able to keep your guests running.

Right, I remember that now.  So, better: "we have some interfaces
which are considered API, and old libvirt -> new QEMU should not break
for things that use those interfaces".  QMP and the command line are
definitely one.

The case you mentioned was about -help, if I remember correctly, which
was indeed quite brittle (like HMP).

> But yes, the goal of having command line compatibility, so that
> any application using the same command line it always uses will get
> the same guest, regardless of a qemu upgrade in the meantime,
> should be our default mode of operation, even if newer apps should
> prefer newer (better) command line interfaces.

Yes, the command line *is* part of the API.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR5OcaAAoJEBvWZb6bTYbyGQgQAJVrX6z8/0hozhjAz81G7tuX
cVmnC5cw+TgfYspf73yoBLBYZPUY/Ydb7WiabKrSfweMFX848WVhQr7rkwp0DVKQ
X0WSbEKrVIGRMCjtvEMkzw1fmXintPLsaoxaLqYZs2MFgEsEP1eEG2MT/2JwpFd/
iDkqVmQ9fPxCEm8beoJXN8HV4Mwz5YY5E04tSqCktJzPh9+sGwB4cPy7PPiPjvHK
I8nIdLHtOqFs4SwX1ic6HEZbeBE71swxr5QKhSg3/v6MzjZbK9/IU0RBcY69ftek
3fRJV8/hs8mHhfT7LsvB7XCNOxYq8jD1Bzy4oMJ/3LcAOyTLt1QJzFW4yaRSNGBK
6V/pDSWlghefulZu/aZASMh/IyxuCJRJ0uMVUEi20FeaIs96Bq5QBEInN/1JIYdH
Qkek7C6dTrP1EdfbZFRa8+RzYEIDL0XmJFce8oicPZLGbhr/Jg1tZAkcUGr9gaeh
z9bOTgAI98z29ZSHm4Bb3rb1WWSJY7BBRAgIDDxZuf34wuVUGWEvJOHRkB2iRa87
d6howw9eqWogVNNYHKYoTQCxEaTe7/PB0wXdWX5+AAZ29C0ETPZFOBYVwh9QLuCD
V9WxGutqlXohbTgOk8rERHcUMJLlNblJg/i0tOMDU2Me4Uv+nW8UawEWUJClZiZS
5emSnaCr7UwPS9qUz1n1
=LQs8
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2013-07-16  6:24 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-04  9:53 [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
2013-07-05 18:41   ` Eduardo Habkost
2013-07-08 19:02     ` Eric Blake
2013-07-08 19:25       ` Eduardo Habkost
2013-07-08 19:25       ` Anthony Liguori
2013-07-09  3:28         ` Wanlong Gao
2013-07-09  3:34           ` Eric Blake
2013-07-14 11:34       ` Paolo Bonzini
2013-07-15 21:33         ` Eric Blake
2013-07-16  6:24           ` Paolo Bonzini
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-07-05 19:32   ` Eduardo Habkost
2013-07-05 20:09     ` Andreas Färber
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 03/10] NUMA: Add Linux libnuma detection Wanlong Gao
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 05/10] NUMA: handle Error in cpus, mpol and hostnode parser Wanlong Gao
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 06/10] NUMA: split out the common range parser Wanlong Gao
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 07/10] NUMA: set guest numa nodes memory policy Wanlong Gao
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
2013-07-08 18:25   ` Luiz Capitulino
2013-07-08 18:34     ` Luiz Capitulino
2013-07-08 18:50       ` Andreas Färber
2013-07-08 19:03         ` Luiz Capitulino
2013-07-15 11:18     ` Wanlong Gao
2013-07-08 19:16   ` Eric Blake
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol Wanlong Gao
2013-07-08 18:32   ` Luiz Capitulino
2013-07-04  9:53 ` [Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
2013-07-05 18:49   ` Eduardo Habkost
2013-07-08 18:36   ` Luiz Capitulino
2013-07-04 19:49 ` [Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
2013-07-04 21:15   ` Laszlo Ersek
2013-07-05  0:55     ` Wanlong Gao
2013-07-05  0:54   ` Wanlong Gao
2013-07-05 19:18 ` Eduardo Habkost
2013-07-11 10:32 ` Peter Huang(Peng)
2013-07-11 13:10   ` Eduardo Habkost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.