All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes
@ 2013-06-24  7:11 Wanlong Gao
  2013-06-24  7:11 ` [Qemu-devel] [PATCH V3 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

As you know, QEMU can't direct it's memory allocation now, this may cause
guest cross node access performance regression.
And, the worse thing is that if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policy before
the pages are really mapped.

According to this patch set, we are able to set guest nodes memory policy
like following:

 -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1

This supports "mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}" like format.

And patch 8/10 adds a QMP command "set-mpol" to set the memory policy for every
guest nodes:
    set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1

And patch 9/10 adds a monitor command "set-mpol" whose format like:
    set-mpol 0 mem-policy=membind,mem-hostnode=0-1

And with patch 10/10, we can get the current memory policy of each guest node
using monitor command "info numa", for example:

    (qemu) info numa
    2 nodes
    node 0 cpus: 0
    node 0 size: 1024 MB
    node 0 mempolicy: membind=0,1
    node 1 cpus: 1
    node 1 size: 1024 MB
    node 1 mempolicy: interleave=1


V1->V2:
    change to use QemuOpts in numa options (Paolo)
    handle Error in mpol parser (Paolo)
    change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
V2->V3:
    also handle Error in cpus parser (5/10)
    split out common parser from cpus and hostnode parser (Bandan 6/10)


Bandan Das (1):
  NUMA: Support multiple CPU ranges on -numa option

Wanlong Gao (9):
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: Add Linux libnuma detection
  NUMA: parse guest numa nodes memory policy
  NUMA: handle Error in cpus, mpol and hostnode parser
  NUMA: split out the common range parser
  NUMA: set guest numa nodes memory policy
  NUMA: add qmp command set-mpol to set memory policy for NUMA node
  NUMA: add hmp command set-mpol
  NUMA: show host memory policy info in info numa command

 configure               |  32 ++++++
 cpus.c                  | 143 +++++++++++++++++++++++-
 hmp-commands.hx         |  16 +++
 hmp.c                   |  35 ++++++
 hmp.h                   |   1 +
 hw/i386/pc.c            |   4 +-
 hw/net/eepro100.c       |   1 -
 include/sysemu/sysemu.h |  20 +++-
 monitor.c               |  44 +++++++-
 qapi-schema.json        |  15 +++
 qemu-options.hx         |   3 +-
 qmp-commands.hx         |  35 ++++++
 vl.c                    | 285 +++++++++++++++++++++++++++++++++++-------------
 13 files changed, 553 insertions(+), 81 deletions(-)

-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 01/10] NUMA: Support multiple CPU ranges on -numa option
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
@ 2013-06-24  7:11 ` Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

From: Bandan Das <bsd@redhat.com>

This allows us to use the "cpus" property multiple times
to specify multiple cpu (ranges) to the -numa option :

-numa node,cpus=1,cpus=2,cpus=4
or
-numa node,cpus=1-3,cpus=5

Note that after this patch, the defalut suffix of "-numa node,mem=N"
will no longer be "M". So we must add the suffix "M" like "-numa node,mem=NM"
when assigning "N MB" of node memory size.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 qemu-options.hx |   3 +-
 vl.c            | 108 ++++++++++++++++++++++++++++++++++----------------------
 2 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 8355f9b..767e601 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -100,7 +100,8 @@ STEXI
 @item -numa @var{opts}
 @findex -numa
 Simulate a multi node NUMA system. If mem and cpus are omitted, resources
-are split equally.
+are split equally. The "-cpus" property may be specified multiple times
+to denote multiple cpus or cpu ranges.
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
diff --git a/vl.c b/vl.c
index 767e020..a1e5ce9 100644
--- a/vl.c
+++ b/vl.c
@@ -516,6 +516,32 @@ static QemuOptsList qemu_realtime_opts = {
     },
 };
 
+static QemuOptsList qemu_numa_opts = {
+    .name = "numa",
+    .implied_opt_name = "type",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+    .desc = {
+        {
+            .name = "type",
+            .type = QEMU_OPT_STRING,
+            .help = "node type"
+        },{
+            .name = "nodeid",
+            .type = QEMU_OPT_NUMBER,
+            .help = "node ID"
+        },{
+            .name = "mem",
+            .type = QEMU_OPT_SIZE,
+            .help = "memory size"
+        },{
+            .name = "cpus",
+            .type = QEMU_OPT_STRING,
+            .help = "cpu number or range"
+        },
+        { /* end of list */ }
+    },
+};
+
 const char *qemu_get_vm_name(void)
 {
     return qemu_name;
@@ -1349,56 +1375,37 @@ error:
     exit(1);
 }
 
-static void numa_add(const char *optarg)
+
+static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
+    int *nodenr = opaque;
 
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
+    if (!strcmp(name, "cpu")) {
+        numa_node_parse_cpus(*nodenr, value);
     }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
+    return 0;
+}
 
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
+static int numa_init_func(QemuOpts *opts, void *opaque)
+{
+    uint64_t nodenr, mem_size;
 
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
+    nodenr = qemu_opt_get_number(opts, "nodeid", nb_numa_nodes++);
 
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            node_mem[nodenr] = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
+    if (nodenr >= MAX_NODES) {
+        fprintf(stderr, "qemu: Max number of NUMA nodes reached : %d\n",
+                (int)nodenr);
         exit(1);
     }
+
+    mem_size = qemu_opt_get_size(opts, "mem", 0);
+    node_mem[nodenr] = mem_size;
+
+    if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
+        return -1;
+    }
+
+    return 0;
 }
 
 static void smp_parse(const char *optarg)
@@ -2901,6 +2908,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_object_opts);
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
+    qemu_add_opts(&qemu_numa_opts);
 
     runstate_init();
 
@@ -3087,7 +3095,16 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_numa:
-                numa_add(optarg);
+                olist = qemu_find_opts("numa");
+                opts = qemu_opts_parse(olist, optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
+                optarg = qemu_opt_get(opts, "type");
+                if (!optarg || strcmp(optarg, "node")) {
+                    fprintf(stderr, "qemu: Incorrect format for numa option\n");
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_display:
                 display_type = select_display(optarg);
@@ -4180,6 +4197,11 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
 
+    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
     if (nb_numa_nodes > 0) {
         int i;
 
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 02/10] NUMA: Add numa_info structure to contain numa nodes info
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-06-24  7:11 ` [Qemu-devel] [PATCH V3 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 03/10] NUMA: Add Linux libnuma detection Wanlong Gao
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c                  |  2 +-
 hw/i386/pc.c            |  4 ++--
 hw/net/eepro100.c       |  1 -
 include/sysemu/sysemu.h |  8 ++++++--
 monitor.c               |  2 +-
 vl.c                    | 24 ++++++++++++------------
 6 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/cpus.c b/cpus.c
index c8bc8ad..e123d3f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1195,7 +1195,7 @@ void set_numa_modes(void)
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         cpu = ENV_GET_CPU(env);
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
                 cpu->numa_node = i;
             }
         }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5e8f143..32d039a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -650,14 +650,14 @@ static FWCfgState *bochs_bios_init(void)
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + apic_id_limit + nb_numa_nodes) *
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index dc99ea6..478c688 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -105,7 +105,6 @@
 #define PCI_IO_SIZE             64
 #define PCI_FLASH_SIZE          (128 * KiB)
 
-#define BIT(n) (1 << (n))
 #define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
 
 /* The SCB accepts the following controls for the Tx and Rx units: */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 2fb71af..70fd2ed 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 #include "qapi-types.h"
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
+#include "qemu/bitmap.h"
 
 /* vl.c */
 
@@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock;
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
-extern uint64_t node_mem[MAX_NODES];
-extern unsigned long *node_cpumask[MAX_NODES];
+struct node_info {
+    uint64_t node_mem;
+    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+};
+extern struct node_info numa_info[MAX_NODES];
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/monitor.c b/monitor.c
index 70ae8f5..61dbebb 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1819,7 +1819,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            node_mem[i] >> 20);
+            numa_info[i].node_mem >> 20);
     }
 }
 
diff --git a/vl.c b/vl.c
index a1e5ce9..357137b 100644
--- a/vl.c
+++ b/vl.c
@@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
-uint64_t node_mem[MAX_NODES];
-unsigned long *node_cpumask[MAX_NODES];
+struct node_info numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
 
@@ -1367,7 +1366,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
         goto error;
     }
 
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return;
 
 error:
@@ -1399,7 +1398,7 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
     }
 
     mem_size = qemu_opt_get_size(opts, "mem", 0);
-    node_mem[nodenr] = mem_size;
+    numa_info[nodenr].node_mem = mem_size;
 
     if (qemu_opt_foreach(opts, numa_add_cpus, &nodenr, 1) < 0) {
         return -1;
@@ -2929,8 +2928,8 @@ int main(int argc, char **argv, char **envp)
     translation = BIOS_ATA_TRANSLATION_AUTO;
 
     for (i = 0; i < MAX_NODES; i++) {
-        node_mem[i] = 0;
-        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
+        numa_info[i].node_mem = 0;
+        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
     }
 
     nb_numa_nodes = 0;
@@ -4213,7 +4212,7 @@ int main(int argc, char **argv, char **envp)
          * and distribute the available memory equally across all nodes
          */
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
+            if (numa_info[i].node_mem != 0)
                 break;
         }
         if (i == nb_numa_nodes) {
@@ -4223,14 +4222,15 @@ int main(int argc, char **argv, char **envp)
              * the final node gets the rest.
              */
             for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
+                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
+                                        ~((1 << 23UL) - 1);
+                usedmem += numa_info[i].node_mem;
             }
-            node_mem[i] = ram_size - usedmem;
+            numa_info[i].node_mem = ram_size - usedmem;
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
         }
@@ -4240,7 +4240,7 @@ int main(int argc, char **argv, char **envp)
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
             }
         }
     }
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 03/10] NUMA: Add Linux libnuma detection
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-06-24  7:11 ` [Qemu-devel] [PATCH V3 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command line,
default is use if available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 configure | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/configure b/configure
index ad32f87..2d2b177 100755
--- a/configure
+++ b/configure
@@ -242,6 +242,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+numa=""
 
 # parse CC options first
 for opt do
@@ -944,6 +945,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1158,6 +1163,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --disable-numa           disable libnuma support"
+echo "  --enable-numa            enable libnuma support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2389,6 +2396,27 @@ EOF
 fi
 
 ##########################################
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "linux NUMA (install numactl?)"
+    fi
+    numa=no
+  fi
+fi
+
+##########################################
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -3556,6 +3584,7 @@ echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3589,6 +3618,9 @@ echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 echo "ARCH=$ARCH" >> $config_host_mak
 
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 04/10] NUMA: parse guest numa nodes memory policy
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (2 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 03/10] NUMA: Add Linux libnuma detection Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24 19:09   ` Bandan Das
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 05/10] NUMA: handle Error in cpus, mpol and hostnode parser Wanlong Gao
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

The memory policy setting format is like:
mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}
And we are adding this setting as a suboption of "-numa",
the memory policy then can be set like following:
 -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=!1

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |   8 ++++
 vl.c                    | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 118 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 70fd2ed..993b8e0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -130,10 +130,18 @@ extern QEMUClock *rtc_clock;
 
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
+#define NODE_HOST_NONE        0x00
+#define NODE_HOST_BIND        0x01
+#define NODE_HOST_INTERLEAVE  0x02
+#define NODE_HOST_PREFERRED   0x03
+#define NODE_HOST_POLICY_MASK 0x03
+#define NODE_HOST_RELATIVE    0x04
 extern int nb_numa_nodes;
 struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+    unsigned int flags;
 };
 extern struct node_info numa_info[MAX_NODES];
 
diff --git a/vl.c b/vl.c
index 357137b..4dbf5cc 100644
--- a/vl.c
+++ b/vl.c
@@ -536,6 +536,14 @@ static QemuOptsList qemu_numa_opts = {
             .name = "cpus",
             .type = QEMU_OPT_STRING,
             .help = "cpu number or range"
+        },{
+            .name = "mem-policy",
+            .type = QEMU_OPT_STRING,
+            .help = "memory policy"
+        },{
+            .name = "mem-hostnode",
+            .type = QEMU_OPT_STRING,
+            .help = "host node number or range for memory policy"
         },
         { /* end of list */ }
     },
@@ -1374,6 +1382,79 @@ error:
     exit(1);
 }
 
+static void numa_node_parse_mpol(int nodenr, const char *mpol)
+{
+    if (!mpol) {
+        return;
+    }
+
+    if (!strcmp(mpol, "interleave")) {
+        numa_info[nodenr].flags |= NODE_HOST_INTERLEAVE;
+    } else if (!strcmp(mpol, "preferred")) {
+        numa_info[nodenr].flags |= NODE_HOST_PREFERRED;
+    } else if (!strcmp(mpol, "membind")) {
+        numa_info[nodenr].flags |= NODE_HOST_BIND;
+    } else {
+        fprintf(stderr, "qemu: Invalid memory policy: %s\n", mpol);
+    }
+}
+
+static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
+{
+    unsigned long long value, endvalue;
+    char *endptr;
+    bool clear = false;
+    unsigned long *bm = numa_info[nodenr].host_mem;
+
+    if (hostnode[0] == '!') {
+        clear = true;
+        bitmap_fill(bm, MAX_CPUMASK_BITS);
+        hostnode++;
+    }
+    if (hostnode[0] == '+') {
+        numa_info[nodenr].flags |= NODE_HOST_RELATIVE;
+        hostnode++;
+    }
+
+    if (!strcmp(hostnode, "all")) {
+        bitmap_fill(bm, MAX_CPUMASK_BITS);
+        return;
+    }
+
+    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
+        goto error;
+    if (*endptr == '-') {
+        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+            goto error;
+        }
+    } else if (*endptr == '\0') {
+        endvalue = value;
+    } else {
+        goto error;
+    }
+
+    if (endvalue >= MAX_CPUMASK_BITS) {
+        endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr,
+            "qemu: NUMA: A max of %d host nodes are supported\n",
+             MAX_CPUMASK_BITS);
+    }
+
+    if (endvalue < value) {
+        goto error;
+    }
+
+    if (clear)
+        bitmap_clear(bm, value, endvalue - value + 1);
+    else
+        bitmap_set(bm, value, endvalue - value + 1);
+
+    return;
+
+error:
+    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnode);
+    return;
+}
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
@@ -1385,6 +1466,25 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque)
     return 0;
 }
 
+static int numa_add_mpol(const char *name, const char *value, void *opaque)
+{
+    int *nodenr = opaque;
+
+    if (!strcmp(name, "mem-policy")) {
+        numa_node_parse_mpol(*nodenr, value);
+    }
+    return 0;
+}
+
+static int numa_add_hostnode(const char *name, const char *value, void *opaque)
+{
+    int *nodenr = opaque;
+    if (!strcmp(name, "mem-hostnode")) {
+        numa_node_parse_hostnode(*nodenr, value);
+    }
+    return 0;
+}
+
 static int numa_init_func(QemuOpts *opts, void *opaque)
 {
     uint64_t nodenr, mem_size;
@@ -1404,6 +1504,14 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
         return -1;
     }
 
+    if (qemu_opt_foreach(opts, numa_add_mpol, &nodenr, 1) < 0) {
+        return -1;
+    }
+
+    if (qemu_opt_foreach(opts, numa_add_hostnode, &nodenr, 1) < 0) {
+        return -1;
+    }
+
     return 0;
 }
 
@@ -2930,6 +3038,8 @@ int main(int argc, char **argv, char **envp)
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_mem = 0;
         bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+        bitmap_zero(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        numa_info[i].flags = NODE_HOST_NONE;
     }
 
     nb_numa_nodes = 0;
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 05/10] NUMA: handle Error in cpus, mpol and hostnode parser
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (3 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 06/10] NUMA: split out the common range parser Wanlong Gao
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

As Paolo pointed out that, handle Error in mpol and hostnode parser
will make it easier to be used for example in mem-hotplug in the future.
And this will be used later in set-mpol QMP command.
Also handle Error in cpus parser to be consistent with others.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  4 ++++
 vl.c                    | 42 ++++++++++++++++++++++++++++++++----------
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 993b8e0..0f135fe 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -144,6 +144,10 @@ struct node_info {
     unsigned int flags;
 };
 extern struct node_info numa_info[MAX_NODES];
+extern void numa_node_parse_mpol(int nodenr, const char *hostnode,
+                                 Error **errp);
+extern void numa_node_parse_hostnode(int nodenr, const char *hostnode,
+                                     Error **errp);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 4dbf5cc..79c39b9 100644
--- a/vl.c
+++ b/vl.c
@@ -1338,7 +1338,7 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
+static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
 {
     char *endptr;
     unsigned long long value, endvalue;
@@ -1378,13 +1378,14 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
     return;
 
 error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
+    error_setg(errp, "Invalid NUMA CPU range: %s\n", cpus);
+    return;
 }
 
-static void numa_node_parse_mpol(int nodenr, const char *mpol)
+void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp)
 {
     if (!mpol) {
+        error_setg(errp, "Should specify memory policy");
         return;
     }
 
@@ -1395,11 +1396,11 @@ static void numa_node_parse_mpol(int nodenr, const char *mpol)
     } else if (!strcmp(mpol, "membind")) {
         numa_info[nodenr].flags |= NODE_HOST_BIND;
     } else {
-        fprintf(stderr, "qemu: Invalid memory policy: %s\n", mpol);
+        error_setg(errp, "Invalid memory policy: %s", mpol);
     }
 }
 
-static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
+void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
 {
     unsigned long long value, endvalue;
     char *endptr;
@@ -1452,16 +1453,22 @@ static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
     return;
 
 error:
-    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnode);
+    error_setg(errp, "Invalid host NUMA nodes range: %s", hostnode);
     return;
 }
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
 {
     int *nodenr = opaque;
+    Error *err = NULL;
 
     if (!strcmp(name, "cpu")) {
-        numa_node_parse_cpus(*nodenr, value);
+        numa_node_parse_cpus(*nodenr, value, &err);
+    }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        return -1;
     }
     return 0;
 }
@@ -1469,19 +1476,34 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque)
 static int numa_add_mpol(const char *name, const char *value, void *opaque)
 {
     int *nodenr = opaque;
+    Error *err = NULL;
 
     if (!strcmp(name, "mem-policy")) {
-        numa_node_parse_mpol(*nodenr, value);
+        numa_node_parse_mpol(*nodenr, value, &err);
+    }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        return -1;
     }
+
     return 0;
 }
 
 static int numa_add_hostnode(const char *name, const char *value, void *opaque)
 {
     int *nodenr = opaque;
+    Error *err = NULL;
+
     if (!strcmp(name, "mem-hostnode")) {
-        numa_node_parse_hostnode(*nodenr, value);
+        numa_node_parse_hostnode(*nodenr, value, &err);
     }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        return -1;
+    }
+
     return 0;
 }
 
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 06/10] NUMA: split out the common range parser
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (4 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 05/10] NUMA: handle Error in cpus, mpol and hostnode parser Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24 19:15   ` Bandan Das
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 07/10] NUMA: set guest numa nodes memory policy Wanlong Gao
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

Since cpus parser and hostnode parser have the common range parser
part, split it out to the common range parser to avoid the duplicate
code.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 vl.c | 89 ++++++++++++++++++++++++++++----------------------------------------
 1 file changed, 37 insertions(+), 52 deletions(-)

diff --git a/vl.c b/vl.c
index 79c39b9..c9d25bd 100644
--- a/vl.c
+++ b/vl.c
@@ -1338,47 +1338,55 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
+static int numa_node_parse_common(const char *str,
+                                  unsigned long long *value,
+                                  unsigned long long *endvalue)
 {
     char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
+    if (parse_uint(str, value, &endptr, 10) < 0) {
+        return -1;
     }
 
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
     if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
+        if (parse_uint_full(endptr + 1, endvalue, 10) < 0) {
+           return -1;
         }
     } else if (*endptr == '\0') {
-        endvalue = value;
+        *endvalue = *value;
     } else {
-        goto error;
+        return -1;
     }
 
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
+    if (*endvalue >= MAX_CPUMASK_BITS) {
+        *endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr, "qemu: NUMA: A max number %d is supported\n",
+                MAX_CPUMASK_BITS);
     }
 
-    if (endvalue < value) {
-        goto error;
+    if (*endvalue < *value) {
+        return -1;
     }
 
-    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
-    return;
+    return 0;
+}
 
-error:
-    error_setg(errp, "Invalid NUMA CPU range: %s\n", cpus);
+static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
+{
+    unsigned long long value, endvalue;
+
+    /* Empty CPU range strings will be considered valid, they will simply
+     * not set any bit in the CPU bitmap.
+     */
+    if (!*cpus) {
+        return;
+    }
+
+    if (numa_node_parse_common(cpus, &value, &endvalue) < 0) {
+        error_setg(errp, "Invalid NUMA CPU range: %s", cpus);
+        return;
+    }
+
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return;
 }
 
@@ -1403,7 +1411,6 @@ void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp)
 void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
 {
     unsigned long long value, endvalue;
-    char *endptr;
     bool clear = false;
     unsigned long *bm = numa_info[nodenr].host_mem;
 
@@ -1422,27 +1429,9 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
         return;
     }
 
-    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
-        goto error;
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d host nodes are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
+    if (numa_node_parse_common(hostnode, &value, &endvalue) < 0) {
+        error_setg(errp, "Invalid host NUMA ndoes range: %s", hostnode);
+        return;
     }
 
     if (clear)
@@ -1451,10 +1440,6 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
         bitmap_set(bm, value, endvalue - value + 1);
 
     return;
-
-error:
-    error_setg(errp, "Invalid host NUMA nodes range: %s", hostnode);
-    return;
 }
 
 static int numa_add_cpus(const char *name, const char *value, void *opaque)
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 07/10] NUMA: set guest numa nodes memory policy
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (5 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 06/10] NUMA: split out the common range parser Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policies before
the pages are really mapped.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/cpus.c b/cpus.c
index e123d3f..677ee15 100644
--- a/cpus.c
+++ b/cpus.c
@@ -60,6 +60,15 @@
 
 #endif /* CONFIG_LINUX */
 
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+#endif
+
 static CPUArchState *next_cpu;
 
 static bool cpu_thread_is_idle(CPUArchState *env)
@@ -1186,6 +1195,75 @@ static void tcg_exec_all(void)
     exit_request = 0;
 }
 
+#ifdef CONFIG_NUMA
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+    int bind_mode;
+
+    switch (numa_info[nodeid].flags & NODE_HOST_POLICY_MASK) {
+    case NODE_HOST_BIND:
+        bind_mode = MPOL_BIND;
+        break;
+    case NODE_HOST_INTERLEAVE:
+        bind_mode = MPOL_INTERLEAVE;
+        break;
+    case NODE_HOST_PREFERRED:
+        bind_mode = MPOL_PREFERRED;
+        break;
+    default:
+        bind_mode = MPOL_DEFAULT;
+        return bind_mode;
+    }
+
+    bind_mode |= (numa_info[nodeid].flags & NODE_HOST_RELATIVE) ?
+        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+    return bind_mode;
+}
+#endif
+
+static int set_node_mpol(unsigned int nodeid)
+{
+#ifdef CONFIG_NUMA
+    void *ram_ptr;
+    RAMBlock *block;
+    ram_addr_t len, ram_offset = 0;
+    int bind_mode;
+    int i;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(block->mr->name, "pc.ram")) {
+            break;
+        }
+    }
+
+    if (block->host == NULL)
+        return -1;
+
+    ram_ptr = block->host;
+    for (i = 0; i < nodeid; i++) {
+        len = numa_info[i].node_mem;
+        ram_offset += len;
+    }
+
+    len = numa_info[i].node_mem;
+    bind_mode = node_parse_bind_mode(i);
+
+    /* This is a workaround for a long standing bug in Linux'
+     * mbind implementation, which cuts off the last specified
+     * node. To stay compatible should this bug be fixed, we
+     * specify one more node and zero this one out.
+     */
+    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
+    if (mbind(ram_ptr + ram_offset, len, bind_mode,
+        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
+            perror("mbind");
+            return -1;
+    }
+#endif
+    return 0;
+}
+
 void set_numa_modes(void)
 {
     CPUArchState *env;
@@ -1200,6 +1278,15 @@ void set_numa_modes(void)
             }
         }
     }
+
+#ifdef CONFIG_NUMA
+    for (i = 0; i < nb_numa_nodes; i++) {
+        if (set_node_mpol(i) == -1) {
+            fprintf(stderr,
+                    "qemu: can't set host memory policy for node%d\n", i);
+        }
+    }
+#endif
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (6 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 07/10] NUMA: set guest numa nodes memory policy Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-07-04  5:19   ` Andreas Färber
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 09/10] NUMA: add hmp command set-mpol Wanlong Gao
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

The QMP command let it be able to set node's memory policy
through the QMP protocol. The qmp-shell command is like:
    set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c           | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 15 +++++++++++++++
 qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)

diff --git a/cpus.c b/cpus.c
index 677ee15..9c2706c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1432,3 +1432,57 @@ void qmp_inject_nmi(Error **errp)
     error_set(errp, QERR_UNSUPPORTED);
 #endif
 }
+
+void qmp_set_mpol(int64_t nodeid, bool has_mpol, const char *mpol,
+                  bool has_hostnode, const char *hostnode, Error **errp)
+{
+    unsigned int flags;
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+
+    if (nodeid >= nb_numa_nodes) {
+        error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
+        return;
+    }
+
+    bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    flags = numa_info[nodeid].flags;
+
+    numa_info[nodeid].flags = NODE_HOST_NONE;
+    bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+
+    if (!has_mpol) {
+        if (set_node_mpol(nodeid) == -1) {
+            error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+            goto error;
+        }
+        return;
+    }
+
+    numa_node_parse_mpol(nodeid, mpol, errp);
+    if (error_is_set(errp)) {
+        goto error;
+    }
+
+    if (!has_hostnode) {
+        bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    }
+
+    if (hostnode) {
+        numa_node_parse_hostnode(nodeid, hostnode, errp);
+        if (error_is_set(errp)) {
+            goto error;
+        }
+    }
+
+    if (set_node_mpol(nodeid) == -1) {
+        error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+        goto error;
+    }
+
+    return;
+
+error:
+    bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
+    numa_info[nodeid].flags = flags;
+    return;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index a80ee40..cedcbe1 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3608,3 +3608,18 @@
             '*cpuid-input-ecx': 'int',
             'cpuid-register': 'X86CPURegister32',
             'features': 'int' } }
+
+# @set-mpol:
+#
+# Set the host memory binding policy for guest NUMA node.
+#
+# @nodeid: The node ID of guest NUMA node to set memory policy to.
+#
+# @mem-policy: The memory policy string to set.
+#
+# @mem-hostnode: The host node or node range for memory policy.
+#
+# Since: 1.6.0
+##
+{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str',
+                                  '*mem-hostnode': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8cea5e5..7bb5038 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2997,3 +2997,38 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name      = "set-mpol",
+        .args_type = "nodeid:i,mem-policy:s?,mem-hostnode:s?",
+        .help      = "Set the host memory binding policy for guest NUMA node",
+        .mhandler.cmd_new = qmp_marshal_input_set_mpol,
+    },
+
+SQMP
+set-mpol
+------
+
+Set the host memory binding policy for guest NUMA node
+
+Arguments:
+
+- "nodeid": The nodeid of guest NUMA node to set memory policy to.
+            (json-int)
+- "mem-policy": The memory policy string to set.
+                (json-string, optional)
+- "mem-hostnode": The host nodes contained to mpol.
+                  (json-string, optional)
+
+Example:
+
+-> { "execute": "set-mpol", "arguments": { "nodeid": 0, "mem-policy": "membind",
+                                           "mem-hostnode": "0-1" }}
+<- { "return": {} }
+
+Notes:
+    1. If "mem-policy" is not set, the memory policy of this "nodeid" will be set
+       to "default".
+    2. If "mem-hostnode" is not set, the node mask of this "mpol" will be set
+       to "all".
+EQMP
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 09/10] NUMA: add hmp command set-mpol
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (7 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
  2013-07-04  0:57 ` [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

Add hmp command set-mpol to set host memory policy for a guest
NUMA node. Then we can also set node's memory policy using
the monitor command like:
    (qemu) set-mpol 0 mem-policy=membind,mem-hostnode=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp-commands.hx | 16 ++++++++++++++++
 hmp.c           | 35 +++++++++++++++++++++++++++++++++++
 hmp.h           |  1 +
 3 files changed, 52 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 915b0d1..417b69f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1567,6 +1567,22 @@ Executes a qemu-io command on the given block device.
 ETEXI
 
     {
+        .name       = "set-mpol",
+        .args_type  = "nodeid:i,args:s?",
+        .params     = "nodeid [args]",
+        .help       = "set host memory policy for a guest NUMA node",
+        .mhandler.cmd = hmp_set_mpol,
+    },
+
+STEXI
+@item set-mpol @var{nodeid} @var{args}
+@findex set-mpol
+
+Set host memory policy for a guest NUMA node
+
+ETEXI
+
+    {
         .name       = "info",
         .args_type  = "item:s?",
         .params     = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index 494a9aa..81bddb1 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1464,3 +1464,38 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
     hmp_handle_error(mon, &err);
 }
+
+void hmp_set_mpol(Monitor *mon, const QDict *qdict)
+{
+    Error *local_err = NULL;
+    bool has_mpol = true;
+    bool has_hostnode = true;
+    const char *mpol = NULL;
+    const char *hostnode = NULL;
+    QemuOpts *opts;
+
+    uint64_t nodeid = qdict_get_int(qdict, "nodeid");
+    const char *args = qdict_get_try_str(qdict, "args");
+
+    if (args == NULL) {
+        has_mpol = false;
+        has_hostnode = false;
+    } else {
+        opts = qemu_opts_parse(qemu_find_opts("numa"), args, 1);
+        if (opts == NULL) {
+            error_setg(&local_err, "Parsing memory policy args failed");
+        } else {
+            mpol = qemu_opt_get(opts, "mem-policy");
+            if (mpol == NULL) {
+                has_mpol = false;
+            }
+            hostnode = qemu_opt_get(opts, "mem-hostnode");
+            if (hostnode == NULL) {
+                has_hostnode = false;
+            }
+        }
+    }
+
+    qmp_set_mpol(nodeid, has_mpol, mpol, has_hostnode, hostnode, &local_err);
+    hmp_handle_error(mon, &local_err);
+}
diff --git a/hmp.h b/hmp.h
index 56d2e92..81f631b 100644
--- a/hmp.h
+++ b/hmp.h
@@ -86,5 +86,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
 void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_qemu_io(Monitor *mon, const QDict *qdict);
+void hmp_set_mpol(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH V3 10/10] NUMA: show host memory policy info in info numa command
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (8 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 09/10] NUMA: add hmp command set-mpol Wanlong Gao
@ 2013-06-24  7:12 ` Wanlong Gao
  2013-07-04  0:57 ` [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-06-24  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, pbonzini, y-goto, afaerber, gaowanlong

Show host memory policy of nodes in the info numa monitor command.
After this patch, the monitor command "info numa" will show the
information like following if the host numa support is enabled:

    (qemu) info numa
    2 nodes
    node 0 cpus: 0
    node 0 size: 1024 MB
    node 0 mempolicy: membind=0,1
    node 1 cpus: 1
    node 1 size: 1024 MB
    node 1 mempolicy: interleave=1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 monitor.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/monitor.c b/monitor.c
index 61dbebb..b6e93e5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -74,6 +74,11 @@
 #endif
 #include "hw/lm32/lm32_pic.h"
 
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#endif
+
 //#define DEBUG
 //#define DEBUG_COMPLETION
 
@@ -1807,6 +1812,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
     int i;
     CPUArchState *env;
     CPUState *cpu;
+    unsigned long first, next;
 
     monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
     for (i = 0; i < nb_numa_nodes; i++) {
@@ -1820,6 +1826,42 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
             numa_info[i].node_mem >> 20);
+
+#ifdef CONFIG_NUMA
+        monitor_printf(mon, "node %d mempolicy: ", i);
+        switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) {
+        case NODE_HOST_BIND:
+            monitor_printf(mon, "membind=");
+            break;
+        case NODE_HOST_INTERLEAVE:
+            monitor_printf(mon, "interleave=");
+            break;
+        case NODE_HOST_PREFERRED:
+            monitor_printf(mon, "preferred=");
+            break;
+        default:
+            monitor_printf(mon, "default\n");
+            continue;
+        }
+
+        if (numa_info[i].flags & NODE_HOST_RELATIVE)
+            monitor_printf(mon, "+");
+
+        next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        monitor_printf(mon, "%lu", first);
+        do {
+            if (next == numa_max_node())
+                break;
+            next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
+                                 next + 1);
+            if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
+                break;
+
+            monitor_printf(mon, ",%lu", next);
+        } while (true);
+
+        monitor_printf(mon, "\n");
+#endif
     }
 }
 
-- 
1.8.3.1.448.gfb7dfaa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 04/10] NUMA: parse guest numa nodes memory policy
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-06-24 19:09   ` Bandan Das
  0 siblings, 0 replies; 15+ messages in thread
From: Bandan Das @ 2013-06-24 19:09 UTC (permalink / raw)
  To: Wanlong Gao; +Cc: aliguori, ehabkost, qemu-devel, y-goto, pbonzini, afaerber

Wanlong Gao <gaowanlong@cn.fujitsu.com> writes:

> The memory policy setting format is like:
> mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}
> And we are adding this setting as a suboption of "-numa",
> the memory policy then can be set like following:
>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=!1
>
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---

Reviewed-by: Bandan Das <bsd@redhat.com>

>  include/sysemu/sysemu.h |   8 ++++
>  vl.c                    | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 118 insertions(+)
>
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 70fd2ed..993b8e0 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -130,10 +130,18 @@ extern QEMUClock *rtc_clock;
>  
>  #define MAX_NODES 64
>  #define MAX_CPUMASK_BITS 255
> +#define NODE_HOST_NONE        0x00
> +#define NODE_HOST_BIND        0x01
> +#define NODE_HOST_INTERLEAVE  0x02
> +#define NODE_HOST_PREFERRED   0x03
> +#define NODE_HOST_POLICY_MASK 0x03
> +#define NODE_HOST_RELATIVE    0x04
>  extern int nb_numa_nodes;
>  struct node_info {
>      uint64_t node_mem;
>      DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
> +    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
> +    unsigned int flags;
>  };
>  extern struct node_info numa_info[MAX_NODES];
>  
> diff --git a/vl.c b/vl.c
> index 357137b..4dbf5cc 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -536,6 +536,14 @@ static QemuOptsList qemu_numa_opts = {
>              .name = "cpus",
>              .type = QEMU_OPT_STRING,
>              .help = "cpu number or range"
> +        },{
> +            .name = "mem-policy",
> +            .type = QEMU_OPT_STRING,
> +            .help = "memory policy"
> +        },{
> +            .name = "mem-hostnode",
> +            .type = QEMU_OPT_STRING,
> +            .help = "host node number or range for memory policy"
>          },
>          { /* end of list */ }
>      },
> @@ -1374,6 +1382,79 @@ error:
>      exit(1);
>  }
>  
> +static void numa_node_parse_mpol(int nodenr, const char *mpol)
> +{
> +    if (!mpol) {
> +        return;
> +    }
> +
> +    if (!strcmp(mpol, "interleave")) {
> +        numa_info[nodenr].flags |= NODE_HOST_INTERLEAVE;
> +    } else if (!strcmp(mpol, "preferred")) {
> +        numa_info[nodenr].flags |= NODE_HOST_PREFERRED;
> +    } else if (!strcmp(mpol, "membind")) {
> +        numa_info[nodenr].flags |= NODE_HOST_BIND;
> +    } else {
> +        fprintf(stderr, "qemu: Invalid memory policy: %s\n", mpol);
> +    }
> +}
> +
> +static void numa_node_parse_hostnode(int nodenr, const char *hostnode)
> +{
> +    unsigned long long value, endvalue;
> +    char *endptr;
> +    bool clear = false;
> +    unsigned long *bm = numa_info[nodenr].host_mem;
> +
> +    if (hostnode[0] == '!') {
> +        clear = true;
> +        bitmap_fill(bm, MAX_CPUMASK_BITS);
> +        hostnode++;
> +    }
> +    if (hostnode[0] == '+') {
> +        numa_info[nodenr].flags |= NODE_HOST_RELATIVE;
> +        hostnode++;
> +    }
> +
> +    if (!strcmp(hostnode, "all")) {
> +        bitmap_fill(bm, MAX_CPUMASK_BITS);
> +        return;
> +    }
> +
> +    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
> +        goto error;
> +    if (*endptr == '-') {
> +        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
> +            goto error;
> +        }
> +    } else if (*endptr == '\0') {
> +        endvalue = value;
> +    } else {
> +        goto error;
> +    }
> +
> +    if (endvalue >= MAX_CPUMASK_BITS) {
> +        endvalue = MAX_CPUMASK_BITS - 1;
> +        fprintf(stderr,
> +            "qemu: NUMA: A max of %d host nodes are supported\n",
> +             MAX_CPUMASK_BITS);
> +    }
> +
> +    if (endvalue < value) {
> +        goto error;
> +    }
> +
> +    if (clear)
> +        bitmap_clear(bm, value, endvalue - value + 1);
> +    else
> +        bitmap_set(bm, value, endvalue - value + 1);
> +
> +    return;
> +
> +error:
> +    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnode);
> +    return;
> +}
>  
>  static int numa_add_cpus(const char *name, const char *value, void *opaque)
>  {
> @@ -1385,6 +1466,25 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque)
>      return 0;
>  }
>  
> +static int numa_add_mpol(const char *name, const char *value, void *opaque)
> +{
> +    int *nodenr = opaque;
> +
> +    if (!strcmp(name, "mem-policy")) {
> +        numa_node_parse_mpol(*nodenr, value);
> +    }
> +    return 0;
> +}
> +
> +static int numa_add_hostnode(const char *name, const char *value, void *opaque)
> +{
> +    int *nodenr = opaque;
> +    if (!strcmp(name, "mem-hostnode")) {
> +        numa_node_parse_hostnode(*nodenr, value);
> +    }
> +    return 0;
> +}
> +
>  static int numa_init_func(QemuOpts *opts, void *opaque)
>  {
>      uint64_t nodenr, mem_size;
> @@ -1404,6 +1504,14 @@ static int numa_init_func(QemuOpts *opts, void *opaque)
>          return -1;
>      }
>  
> +    if (qemu_opt_foreach(opts, numa_add_mpol, &nodenr, 1) < 0) {
> +        return -1;
> +    }
> +
> +    if (qemu_opt_foreach(opts, numa_add_hostnode, &nodenr, 1) < 0) {
> +        return -1;
> +    }
> +
>      return 0;
>  }
>  
> @@ -2930,6 +3038,8 @@ int main(int argc, char **argv, char **envp)
>      for (i = 0; i < MAX_NODES; i++) {
>          numa_info[i].node_mem = 0;
>          bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
> +        bitmap_zero(numa_info[i].host_mem, MAX_CPUMASK_BITS);
> +        numa_info[i].flags = NODE_HOST_NONE;
>      }
>  
>      nb_numa_nodes = 0;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 06/10] NUMA: split out the common range parser
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 06/10] NUMA: split out the common range parser Wanlong Gao
@ 2013-06-24 19:15   ` Bandan Das
  0 siblings, 0 replies; 15+ messages in thread
From: Bandan Das @ 2013-06-24 19:15 UTC (permalink / raw)
  To: Wanlong Gao; +Cc: aliguori, ehabkost, qemu-devel, y-goto, pbonzini, afaerber

Wanlong Gao <gaowanlong@cn.fujitsu.com> writes:

> Since cpus parser and hostnode parser have the common range parser
> part, split it out to the common range parser to avoid the duplicate
> code.
>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  vl.c | 89 ++++++++++++++++++++++++++++----------------------------------------
>  1 file changed, 37 insertions(+), 52 deletions(-)

For the patch order, I was thinking along the lines of introducing the common
function first followed by the ones that use it, but I guess this is fine too :)

Thanks for taking care of this!

Reviewed-by: Bandan Das <bsd@redhat.com>

> diff --git a/vl.c b/vl.c
> index 79c39b9..c9d25bd 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1338,47 +1338,55 @@ char *get_boot_devices_list(size_t *size)
>      return list;
>  }
>  
> -static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
> +static int numa_node_parse_common(const char *str,
> +                                  unsigned long long *value,
> +                                  unsigned long long *endvalue)
>  {
>      char *endptr;
> -    unsigned long long value, endvalue;
> -
> -    /* Empty CPU range strings will be considered valid, they will simply
> -     * not set any bit in the CPU bitmap.
> -     */
> -    if (!*cpus) {
> -        return;
> +    if (parse_uint(str, value, &endptr, 10) < 0) {
> +        return -1;
>      }
>  
> -    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
> -        goto error;
> -    }
>      if (*endptr == '-') {
> -        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
> -            goto error;
> +        if (parse_uint_full(endptr + 1, endvalue, 10) < 0) {
> +           return -1;
>          }
>      } else if (*endptr == '\0') {
> -        endvalue = value;
> +        *endvalue = *value;
>      } else {
> -        goto error;
> +        return -1;
>      }
>  
> -    if (endvalue >= MAX_CPUMASK_BITS) {
> -        endvalue = MAX_CPUMASK_BITS - 1;
> -        fprintf(stderr,
> -            "qemu: NUMA: A max of %d VCPUs are supported\n",
> -             MAX_CPUMASK_BITS);
> +    if (*endvalue >= MAX_CPUMASK_BITS) {
> +        *endvalue = MAX_CPUMASK_BITS - 1;
> +        fprintf(stderr, "qemu: NUMA: A max number %d is supported\n",
> +                MAX_CPUMASK_BITS);
>      }
>  
> -    if (endvalue < value) {
> -        goto error;
> +    if (*endvalue < *value) {
> +        return -1;
>      }
>  
> -    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
> -    return;
> +    return 0;
> +}
>  
> -error:
> -    error_setg(errp, "Invalid NUMA CPU range: %s\n", cpus);
> +static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp)
> +{
> +    unsigned long long value, endvalue;
> +
> +    /* Empty CPU range strings will be considered valid, they will simply
> +     * not set any bit in the CPU bitmap.
> +     */
> +    if (!*cpus) {
> +        return;
> +    }
> +
> +    if (numa_node_parse_common(cpus, &value, &endvalue) < 0) {
> +        error_setg(errp, "Invalid NUMA CPU range: %s", cpus);
> +        return;
> +    }
> +
> +    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
>      return;
>  }
>  
> @@ -1403,7 +1411,6 @@ void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp)
>  void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
>  {
>      unsigned long long value, endvalue;
> -    char *endptr;
>      bool clear = false;
>      unsigned long *bm = numa_info[nodenr].host_mem;
>  
> @@ -1422,27 +1429,9 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
>          return;
>      }
>  
> -    if (parse_uint(hostnode, &value, &endptr, 10) < 0)
> -        goto error;
> -    if (*endptr == '-') {
> -        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
> -            goto error;
> -        }
> -    } else if (*endptr == '\0') {
> -        endvalue = value;
> -    } else {
> -        goto error;
> -    }
> -
> -    if (endvalue >= MAX_CPUMASK_BITS) {
> -        endvalue = MAX_CPUMASK_BITS - 1;
> -        fprintf(stderr,
> -            "qemu: NUMA: A max of %d host nodes are supported\n",
> -             MAX_CPUMASK_BITS);
> -    }
> -
> -    if (endvalue < value) {
> -        goto error;
> +    if (numa_node_parse_common(hostnode, &value, &endvalue) < 0) {
> +        error_setg(errp, "Invalid host NUMA ndoes range: %s", hostnode);
> +        return;
>      }
>  
>      if (clear)
> @@ -1451,10 +1440,6 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp)
>          bitmap_set(bm, value, endvalue - value + 1);
>  
>      return;
> -
> -error:
> -    error_setg(errp, "Invalid host NUMA nodes range: %s", hostnode);
> -    return;
>  }
>  
>  static int numa_add_cpus(const char *name, const char *value, void *opaque)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes
  2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (9 preceding siblings ...)
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
@ 2013-07-04  0:57 ` Wanlong Gao
  10 siblings, 0 replies; 15+ messages in thread
From: Wanlong Gao @ 2013-07-04  0:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, bsd, y-goto, pbonzini, afaerber, Wanlong Gao

Hi,

who can pick this set up?

Thanks,
Wanlong Gao

> As you know, QEMU can't direct it's memory allocation now, this may cause
> guest cross node access performance regression.
> And, the worse thing is that if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
> 
> So, we should set the guest nodes memory allocation policy before
> the pages are really mapped.
> 
> According to this patch set, we are able to set guest nodes memory policy
> like following:
> 
>  -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1
>  -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1
> 
> This supports "mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N}" like format.
> 
> And patch 8/10 adds a QMP command "set-mpol" to set the memory policy for every
> guest nodes:
>     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
> 
> And patch 9/10 adds a monitor command "set-mpol" whose format like:
>     set-mpol 0 mem-policy=membind,mem-hostnode=0-1
> 
> And with patch 10/10, we can get the current memory policy of each guest node
> using monitor command "info numa", for example:
> 
>     (qemu) info numa
>     2 nodes
>     node 0 cpus: 0
>     node 0 size: 1024 MB
>     node 0 mempolicy: membind=0,1
>     node 1 cpus: 1
>     node 1 size: 1024 MB
>     node 1 mempolicy: interleave=1
> 
> 
> V1->V2:
>     change to use QemuOpts in numa options (Paolo)
>     handle Error in mpol parser (Paolo)
>     change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
> V2->V3:
>     also handle Error in cpus parser (5/10)
>     split out common parser from cpus and hostnode parser (Bandan 6/10)
> 
> 
> Bandan Das (1):
>   NUMA: Support multiple CPU ranges on -numa option
> 
> Wanlong Gao (9):
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: Add Linux libnuma detection
>   NUMA: parse guest numa nodes memory policy
>   NUMA: handle Error in cpus, mpol and hostnode parser
>   NUMA: split out the common range parser
>   NUMA: set guest numa nodes memory policy
>   NUMA: add qmp command set-mpol to set memory policy for NUMA node
>   NUMA: add hmp command set-mpol
>   NUMA: show host memory policy info in info numa command
> 
>  configure               |  32 ++++++
>  cpus.c                  | 143 +++++++++++++++++++++++-
>  hmp-commands.hx         |  16 +++
>  hmp.c                   |  35 ++++++
>  hmp.h                   |   1 +
>  hw/i386/pc.c            |   4 +-
>  hw/net/eepro100.c       |   1 -
>  include/sysemu/sysemu.h |  20 +++-
>  monitor.c               |  44 +++++++-
>  qapi-schema.json        |  15 +++
>  qemu-options.hx         |   3 +-
>  qmp-commands.hx         |  35 ++++++
>  vl.c                    | 285 +++++++++++++++++++++++++++++++++++-------------
>  13 files changed, 553 insertions(+), 81 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH V3 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
  2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
@ 2013-07-04  5:19   ` Andreas Färber
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Färber @ 2013-07-04  5:19 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, Luiz Capitulino, bsd, y-goto, pbonzini

Am 24.06.2013 09:12, schrieb Wanlong Gao:
> The QMP command let it be able to set node's memory policy
> through the QMP protocol. The qmp-shell command is like:
>     set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Can we make that a little more self-documenting please? Suggest
set-memory-policy. I don't see any Reviewed-by yet, so it's a bit early
for picking it up. ;) CC'ing Erik and Luiz.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-07-04  5:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-24  7:11 [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-06-24  7:11 ` [Qemu-devel] [PATCH V3 01/10] NUMA: Support multiple CPU ranges on -numa option Wanlong Gao
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 02/10] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 03/10] NUMA: Add Linux libnuma detection Wanlong Gao
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 04/10] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-06-24 19:09   ` Bandan Das
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 05/10] NUMA: handle Error in cpus, mpol and hostnode parser Wanlong Gao
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 06/10] NUMA: split out the common range parser Wanlong Gao
2013-06-24 19:15   ` Bandan Das
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 07/10] NUMA: set guest numa nodes memory policy Wanlong Gao
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node Wanlong Gao
2013-07-04  5:19   ` Andreas Färber
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 09/10] NUMA: add hmp command set-mpol Wanlong Gao
2013-06-24  7:12 ` [Qemu-devel] [PATCH V3 10/10] NUMA: show host memory policy info in info numa command Wanlong Gao
2013-07-04  0:57 ` [Qemu-devel] [PATCH V3 00/10] Add support for binding guest numa nodes to host numa nodes Wanlong Gao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.