All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes
@ 2013-07-17  9:29 Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

As you know, QEMU can't direct it's memory allocation now, this may cause
guest cross node access performance regression.
And, the worse thing is that if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policy before
the pages are really mapped.

According to this patch set, we are able to set guest nodes memory policy
like following:

 -numa node,nodeid=0,cpus=0, \
 -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
 -numa node,nodeid=1,cpus=1 \
 -numa mem,size=1024M,policy=interleave,host-nodes=1

This supports "policy={membind|interleave|preferred},host-nodes=[+|!]{all|N-N}" like format.

And patch 9/12 adds a QMP command "set-mem-policy" to set the memory policy
for every guest nodes:
    set-mem-policy nodeid=0 policy=membind host-nodes=0-1

And patch 10/12 adds a monitor command "set-mem-policy" whose format like:
    set-mem-policy 0 policy=membind,host-nodes=0-1

And patch 11/12 adds a QMP command "query-numa" to show numa info through
this API.

And patch 12/12 converts the "info numa" monitor command to use this
QMP command "query-numa".


V1->V2:
    change to use QemuOpts in numa options (Paolo)
    handle Error in mpol parser (Paolo)
    change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
V2->V3:
    also handle Error in cpus parser (5/10)
    split out common parser from cpus and hostnode parser (Bandan 6/10)
V3-V4:
    rebase to request for comments
V4->V5:
    use OptVisitor and split -numa option (Paolo)
     - s/set-mpol/set-mem-policy (Andreas)
     - s/mem-policy/policy
     - s/mem-hostnode/host-nodes
    fix hmp command process after error (Luiz)
    add qmp command query-numa and convert info numa to it (Luiz)


Wanlong Gao (12):
  NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  NUMA: split -numa option
  NUMA: move numa related code to numa.c
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: Add Linux libnuma detection
  NUMA: parse guest numa nodes memory policy
  NUMA: split out the common range parser
  NUMA: set guest numa nodes memory policy
  NUMA: add qmp command set-mem-policy to set memory policy for NUMA
    node
  NUMA: add hmp command set-mem-policy
  NUMA: add qmp command query-numa
  NUMA: convert hmp command info_numa to use qmp command query_numa

 Makefile.target         |   2 +-
 configure               |  32 +++
 cpus.c                  |  14 --
 hmp-commands.hx         |  16 ++
 hmp.c                   |  70 +++++++
 hmp.h                   |   2 +
 hw/i386/pc.c            |   4 +-
 hw/net/eepro100.c       |   1 -
 include/sysemu/cpus.h   |   1 -
 include/sysemu/sysemu.h |  20 +-
 monitor.c               |  21 +-
 numa.c                  | 513 ++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json        | 103 ++++++++++
 qemu-options.hx         |   6 +-
 qmp-commands.hx         |  84 ++++++++
 vl.c                    | 157 ++-------------
 16 files changed, 861 insertions(+), 185 deletions(-)
 create mode 100644 numa.c

-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17 10:35   ` Laszlo Ersek
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option Wanlong Gao
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 qapi-schema.json | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 7b9fef1..f753a35 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3679,3 +3679,47 @@
             '*cpuid-input-ecx': 'int',
             'cpuid-register': 'X86CPURegister32',
             'features': 'int' } }
+
+##
+# @NumaOptions
+#
+# A discriminated record of NUMA options.
+#
+# Since 1.6
+##
+{ 'union': 'NumaOptions',
+  'data': {
+    'node':	'NumaNodeOptions',
+    'mem':	'NumaMemOptions' }}
+
+##
+# @NumaNodeOptions
+#
+# Create a guest NUMA node.
+#
+# @nodeid: #optional NUMA node ID
+#
+# @cpus: #optional VCPUs belong to this node
+#
+# Since: 1.6
+##
+{ 'type': 'NumaNodeOptions',
+  'data': {
+   '*nodeid':		'int',
+   '*cpus':		'str' }}
+
+##
+# @NumaMemOptions
+#
+# Set memory information of guest NUMA node.
+#
+# @nodeid: #optional NUMA node ID
+#
+# @size: #optional memory size of this node
+#
+# Since 1.6
+##
+{ 'type': 'NumaMemOptions',
+  'data': {
+   '*nodeid':		'int',
+   '*size':		'size' }}
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17 11:00   ` Laszlo Ersek
  2013-07-17 11:13   ` Paolo Bonzini
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 03/12] NUMA: move numa related code to numa.c Wanlong Gao
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Change -numa option like following as Paolo suggested:
    -numa node,nodeid=0,cpus=0-1 \
    -numa mem,nodeid=0,size=1G

This new option will make later coming memory hotplug better.
And this new option is implemented using OptsVisitor.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 Makefile.target         |   2 +-
 include/sysemu/sysemu.h |   3 +
 numa.c                  | 164 ++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx         |   6 +-
 vl.c                    | 107 +++----------------------------
 5 files changed, 182 insertions(+), 100 deletions(-)
 create mode 100644 numa.c

diff --git a/Makefile.target b/Makefile.target
index 9a49852..7e1fddf 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -113,7 +113,7 @@ endif #CONFIG_BSD_USER
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
 obj-y += qtest.o
 obj-y += hw/
 obj-$(CONFIG_FDT) += device_tree.o
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3caeb66..cf8e6e5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,6 +132,9 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
+extern QemuOptsList qemu_numa_opts;
+int numa_init_func(QemuOpts *opts, void *opaque);
+void set_numa_nodes(void);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
new file mode 100644
index 0000000..da68c4b
--- /dev/null
+++ b/numa.c
@@ -0,0 +1,164 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2013 Fujitsu Ltd.
+ * Author: Wanlong Gao <gaowanlong@cn.fujitsu.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "sysemu/sysemu.h"
+#include "qemu/bitmap.h"
+#include "qapi-visit.h"
+#include "qapi/opts-visitor.h"
+#include "qapi/dealloc-visitor.h"
+
+QemuOptsList qemu_numa_opts = {
+    .name = "numa",
+    .implied_opt_name = "type",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+    .desc = { { 0 } } /* validated with OptsVisitor */
+};
+
+static int numa_node_parse_cpus(int nodenr, const char *cpus)
+{
+    char *endptr;
+    unsigned long long value, endvalue;
+
+    /* Empty CPU range strings will be considered valid, they will simply
+     * not set any bit in the CPU bitmap.
+     */
+    if (!*cpus) {
+        return 0;
+    }
+
+    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
+        goto error;
+    }
+    if (*endptr == '-') {
+        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+            goto error;
+        }
+    } else if (*endptr == '\0') {
+        endvalue = value;
+    } else {
+        goto error;
+    }
+
+    if (endvalue >= MAX_CPUMASK_BITS) {
+        endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr,
+            "qemu: NUMA: A max of %d VCPUs are supported\n",
+             MAX_CPUMASK_BITS);
+    }
+
+    if (endvalue < value) {
+        goto error;
+    }
+
+    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    return 0;
+
+error:
+    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
+    return -1;
+}
+
+static int numa_node_parse(NumaNodeOptions *opts)
+{
+    uint64_t nodenr;
+    const char *cpus = NULL;
+
+    nodenr = opts->nodeid;
+    if (nodenr >= MAX_NODES) {
+        fprintf(stderr, "qemu: Max number of NUMA nodes reached: %d\n",
+                (int)nodenr);
+        return -1;
+    }
+
+    cpus = opts->cpus;
+    return numa_node_parse_cpus(nodenr, cpus);
+}
+
+static int numa_mem_parse(NumaMemOptions *opts)
+{
+    uint64_t nodenr, mem_size;
+
+    nodenr = opts->nodeid;
+    if (nodenr >= MAX_NODES) {
+        fprintf(stderr, "qemu: Max number of NUMA nodes reached: %d\n",
+                (int)nodenr);
+        return -1;
+    }
+
+    mem_size = opts->size;
+    node_mem[nodenr] = mem_size;
+
+    return 0;
+}
+
+int numa_init_func(QemuOpts *opts, void *opaque)
+{
+    NumaOptions *object = NULL;
+    Error *err = NULL;
+    int ret = 0;
+
+    {
+        OptsVisitor *ov = opts_visitor_new(opts);
+        visit_type_NumaOptions(opts_get_visitor(ov), &object, NULL, &err);
+        opts_visitor_cleanup(ov);
+    }
+
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        ret = -1;
+        goto error;
+    }
+
+    switch (object->kind) {
+    case NUMA_OPTIONS_KIND_NODE:
+        if (nb_numa_nodes >= MAX_NODES) {
+            fprintf(stderr, "qemu: too many NUMA nodes\n");
+            ret = -1;
+            goto error;
+        }
+        nb_numa_nodes++;
+        ret = numa_node_parse(object->node);
+        break;
+    case NUMA_OPTIONS_KIND_MEM:
+        ret = numa_mem_parse(object->mem);
+        break;
+    default:
+        fprintf(stderr, "qemu: Invalid NUMA options type.\n");
+        ret = -1;
+        goto error;
+    }
+
+error:
+    if (object) {
+        QapiDeallocVisitor *dv = qapi_dealloc_visitor_new();
+        visit_type_NumaOptions(qapi_dealloc_get_visitor(dv),
+                               &object, NULL, NULL);
+        qapi_dealloc_visitor_cleanup(dv);
+    }
+
+    return ret;
+}
+
diff --git a/qemu-options.hx b/qemu-options.hx
index 4e98b4f..7ec4486 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -95,11 +95,13 @@ specifies the maximum number of hotpluggable CPUs.
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+    "-numa node[,nodeid=node][,cpus=cpu[-cpu]]\n"
+    "-numa mem[,nodeid=node][,size=size]\n"
+    , QEMU_ARCH_ALL)
 STEXI
 @item -numa @var{opts}
 @findex -numa
-Simulate a multi node NUMA system. If mem and cpus are omitted, resources
+Simulate a multi node NUMA system. If @var{size} and @var{cpus} are omitted, resources
 are split equally.
 ETEXI
 
diff --git a/vl.c b/vl.c
index 25b8f2f..d3e6d8c 100644
--- a/vl.c
+++ b/vl.c
@@ -1330,102 +1330,6 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
-{
-    char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
-    }
-
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
-    }
-
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
-    return;
-
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
-}
-
-static void numa_add(const char *optarg)
-{
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
-
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
-    }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
-
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            node_mem[nodenr] = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
-        exit(1);
-    }
-}
-
 static QemuOptsList qemu_smp_opts = {
     .name = "smp-opts",
     .implied_opt_name = "cpus",
@@ -2961,6 +2865,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
     qemu_add_opts(&qemu_msg_opts);
+    qemu_add_opts(&qemu_numa_opts);
 
     runstate_init();
 
@@ -3147,7 +3052,10 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_numa:
-                numa_add(optarg);
+                opts = qemu_opts_parse(qemu_find_opts("numa"), optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_display:
                 display_type = select_display(optarg);
@@ -4226,6 +4134,11 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
 
+    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
     if (nb_numa_nodes > 0) {
         int i;
 
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 03/12] NUMA: move numa related code to numa.c
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 04/12] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c                  | 14 ------------
 include/sysemu/cpus.h   |  1 -
 include/sysemu/sysemu.h |  1 +
 numa.c                  | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                    | 43 +----------------------------------
 5 files changed, 61 insertions(+), 57 deletions(-)

diff --git a/cpus.c b/cpus.c
index 8062cdd..f45fb5d 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1198,20 +1198,6 @@ static void tcg_exec_all(void)
     exit_request = 0;
 }
 
-void set_numa_modes(void)
-{
-    CPUState *cpu;
-    int i;
-
-    for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
-                cpu->numa_node = i;
-            }
-        }
-    }
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 6502488..4f79081 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -23,7 +23,6 @@ extern int smp_threads;
 #define smp_threads 1
 #endif
 
-void set_numa_modes(void);
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index cf8e6e5..7acfad0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -135,6 +135,7 @@ extern unsigned long *node_cpumask[MAX_NODES];
 extern QemuOptsList qemu_numa_opts;
 int numa_init_func(QemuOpts *opts, void *opaque);
 void set_numa_nodes(void);
+void set_numa_modes(void);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
index da68c4b..fca5e6f 100644
--- a/numa.c
+++ b/numa.c
@@ -162,3 +162,62 @@ error:
     return ret;
 }
 
+void set_numa_nodes(void)
+{
+    if (nb_numa_nodes > 0) {
+        int i;
+
+        if (nb_numa_nodes > MAX_NODES) {
+            nb_numa_nodes = MAX_NODES;
+        }
+
+        /* If no memory size if given for any node, assume the default case
+         * and distribute the available memory equally across all nodes
+         */
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (node_mem[i] != 0)
+                break;
+        }
+        if (i == nb_numa_nodes) {
+            uint64_t usedmem = 0;
+
+            /* On Linux, the each node's border has to be 8MB aligned,
+             * the final node gets the rest.
+             */
+            for (i = 0; i < nb_numa_nodes - 1; i++) {
+                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
+                usedmem += node_mem[i];
+            }
+            node_mem[i] = ram_size - usedmem;
+        }
+
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+                break;
+            }
+        }
+        /* assigning the VCPUs round-robin is easier to implement, guest OSes
+         * must cope with this anyway, because there are BIOSes out there in
+         * real machines which also use this scheme.
+         */
+        if (i == nb_numa_nodes) {
+            for (i = 0; i < max_cpus; i++) {
+                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+            }
+        }
+    }
+}
+
+void set_numa_modes(void)
+{
+    CPUState *cpu;
+    int i;
+
+    for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+                cpu->numa_node = i;
+            }
+        }
+    }
+}
diff --git a/vl.c b/vl.c
index d3e6d8c..7cea925 100644
--- a/vl.c
+++ b/vl.c
@@ -4139,48 +4139,7 @@ int main(int argc, char **argv, char **envp)
         exit(1);
     }
 
-    if (nb_numa_nodes > 0) {
-        int i;
-
-        if (nb_numa_nodes > MAX_NODES) {
-            nb_numa_nodes = MAX_NODES;
-        }
-
-        /* If no memory size if given for any node, assume the default case
-         * and distribute the available memory equally across all nodes
-         */
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
-                break;
-        }
-        if (i == nb_numa_nodes) {
-            uint64_t usedmem = 0;
-
-            /* On Linux, the each node's border has to be 8MB aligned,
-             * the final node gets the rest.
-             */
-            for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
-            }
-            node_mem[i] = ram_size - usedmem;
-        }
-
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
-                break;
-            }
-        }
-        /* assigning the VCPUs round-robin is easier to implement, guest OSes
-         * must cope with this anyway, because there are BIOSes out there in
-         * real machines which also use this scheme.
-         */
-        if (i == nb_numa_nodes) {
-            for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
-            }
-        }
-    }
+    set_numa_nodes();
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 04/12] NUMA: Add numa_info structure to contain numa nodes info
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (2 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 03/12] NUMA: move numa related code to numa.c Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 05/12] NUMA: Add Linux libnuma detection Wanlong Gao
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hw/i386/pc.c            |  4 ++--
 hw/net/eepro100.c       |  1 -
 include/sysemu/sysemu.h |  8 ++++++--
 monitor.c               |  2 +-
 numa.c                  | 18 +++++++++---------
 vl.c                    |  7 +++----
 6 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c5d8570..d518876 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -654,14 +654,14 @@ static FWCfgState *bochs_bios_init(void)
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + apic_id_limit + nb_numa_nodes) *
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index e0befb2..3dc4937 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -105,7 +105,6 @@
 #define PCI_IO_SIZE             64
 #define PCI_FLASH_SIZE          (128 * KiB)
 
-#define BIT(n) (1 << (n))
 #define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
 
 /* The SCB accepts the following controls for the Tx and Rx units: */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 7acfad0..28fe305 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 #include "qapi-types.h"
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
+#include "qemu/bitmap.h"
 
 /* vl.c */
 
@@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock;
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
-extern uint64_t node_mem[MAX_NODES];
-extern unsigned long *node_cpumask[MAX_NODES];
+typedef struct node_info {
+    uint64_t node_mem;
+    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+} NodeInfo;
+extern NodeInfo numa_info[MAX_NODES];
 extern QemuOptsList qemu_numa_opts;
 int numa_init_func(QemuOpts *opts, void *opaque);
 void set_numa_nodes(void);
diff --git a/monitor.c b/monitor.c
index 2ba7876..566709a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1818,7 +1818,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            node_mem[i] >> 20);
+            numa_info[i].node_mem >> 20);
     }
 }
 
diff --git a/numa.c b/numa.c
index fca5e6f..766e111 100644
--- a/numa.c
+++ b/numa.c
@@ -72,7 +72,7 @@ static int numa_node_parse_cpus(int nodenr, const char *cpus)
         goto error;
     }
 
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return 0;
 
 error:
@@ -108,7 +108,7 @@ static int numa_mem_parse(NumaMemOptions *opts)
     }
 
     mem_size = opts->size;
-    node_mem[nodenr] = mem_size;
+    numa_info[nodenr].node_mem = mem_size;
 
     return 0;
 }
@@ -175,7 +175,7 @@ void set_numa_nodes(void)
          * and distribute the available memory equally across all nodes
          */
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
+            if (numa_info[i].node_mem != 0)
                 break;
         }
         if (i == nb_numa_nodes) {
@@ -185,14 +185,14 @@ void set_numa_nodes(void)
              * the final node gets the rest.
              */
             for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
+                numa_info[i].node_mem = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
+                usedmem += numa_info[i].node_mem;
             }
-            node_mem[i] = ram_size - usedmem;
+            numa_info[i].node_mem = ram_size - usedmem;
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
         }
@@ -202,7 +202,7 @@ void set_numa_nodes(void)
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
             }
         }
     }
@@ -215,7 +215,7 @@ void set_numa_modes(void)
 
     for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
                 cpu->numa_node = i;
             }
         }
diff --git a/vl.c b/vl.c
index 7cea925..5fdba97 100644
--- a/vl.c
+++ b/vl.c
@@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
-uint64_t node_mem[MAX_NODES];
-unsigned long *node_cpumask[MAX_NODES];
+NodeInfo numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
 
@@ -2886,8 +2885,8 @@ int main(int argc, char **argv, char **envp)
     translation = BIOS_ATA_TRANSLATION_AUTO;
 
     for (i = 0; i < MAX_NODES; i++) {
-        node_mem[i] = 0;
-        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
+        numa_info[i].node_mem = 0;
+        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
     }
 
     nb_numa_nodes = 0;
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 05/12] NUMA: Add Linux libnuma detection
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (3 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 04/12] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy Wanlong Gao
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command line,
default is use if available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 configure | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/configure b/configure
index cb0f870..7fb14aa 100755
--- a/configure
+++ b/configure
@@ -242,6 +242,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+numa=""
 
 # parse CC options first
 for opt do
@@ -944,6 +945,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1158,6 +1163,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --disable-numa           disable libnuma support"
+echo "  --enable-numa            enable libnuma support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2389,6 +2396,27 @@ EOF
 fi
 
 ##########################################
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "linux NUMA (install numactl?)"
+    fi
+    numa=no
+  fi
+fi
+
+##########################################
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -3587,6 +3615,7 @@ echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3620,6 +3649,9 @@ echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 echo "ARCH=$ARCH" >> $config_host_mak
 
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (4 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 05/12] NUMA: Add Linux libnuma detection Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17 12:31   ` Eric Blake
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 07/12] NUMA: split out the common range parser Wanlong Gao
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

The memory policy setting format is like:
    policy={membind|interleave|preferred},host-node=[+|!]{all|N-N}
And we are adding this setting as a suboption of "-numa mem,",
the memory policy then can be set like following:
    -numa node,nodeid=0,cpus=0 \
    -numa node,nodeid=1,cpus=1 \
    -numa mem,nodeid=0,size=1G,policy=membind,host-nodes=0-1 \
    -numa mem,nodeid=1,size=1G,policy=interleave,host-nodes=!1

Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  8 +++++
 numa.c                  | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json        |  8 ++++-
 vl.c                    |  2 ++
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 28fe305..af17c02 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -130,10 +130,18 @@ extern QEMUClock *rtc_clock;
 
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
+#define NODE_HOST_NONE        0x00
+#define NODE_HOST_BIND        0x01
+#define NODE_HOST_INTERLEAVE  0x02
+#define NODE_HOST_PREFERRED   0x03
+#define NODE_HOST_POLICY_MASK 0x03
+#define NODE_HOST_RELATIVE    0x04
 extern int nb_numa_nodes;
 typedef struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+    unsigned int flags;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
 extern QemuOptsList qemu_numa_opts;
diff --git a/numa.c b/numa.c
index 766e111..a2eceb1 100644
--- a/numa.c
+++ b/numa.c
@@ -96,6 +96,79 @@ static int numa_node_parse(NumaNodeOptions *opts)
     return numa_node_parse_cpus(nodenr, cpus);
 }
 
+static int numa_mem_parse_policy(int nodenr, const char *policy)
+{
+    if (!strcmp(policy, "interleave")) {
+        numa_info[nodenr].flags |= NODE_HOST_INTERLEAVE;
+    } else if (!strcmp(policy, "preferred")) {
+        numa_info[nodenr].flags |= NODE_HOST_PREFERRED;
+    } else if (!strcmp(policy, "membind")) {
+        numa_info[nodenr].flags |= NODE_HOST_BIND;
+    } else {
+        fprintf(stderr, "qemu: Invalid memory policy: %s\n", policy);
+        return -1;
+    }
+
+    return 0;
+}
+
+static int numa_mem_parse_hostnodes(int nodenr, const char *hostnodes)
+{
+    unsigned long long value, endvalue;
+    char *endptr;
+    bool clear = false;
+    unsigned long *bm = numa_info[nodenr].host_mem;
+
+    if (hostnodes[0] == '!') {
+        clear = true;
+        bitmap_fill(bm, MAX_CPUMASK_BITS);
+        hostnodes++;
+    }
+    if (hostnodes[0] == '+') {
+        numa_info[nodenr].flags |= NODE_HOST_RELATIVE;
+        hostnodes++;
+    }
+
+    if (!strcmp(hostnodes, "all")) {
+        bitmap_fill(bm, MAX_CPUMASK_BITS);
+        return 0;
+    }
+
+    if (parse_uint(hostnodes, &value, &endptr, 10) < 0)
+        goto error;
+    if (*endptr == '-') {
+        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+            goto error;
+        }
+    } else if (*endptr == '\0') {
+        endvalue = value;
+    } else {
+        goto error;
+    }
+
+    if (endvalue >= MAX_CPUMASK_BITS) {
+        endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr,
+            "qemu: NUMA: A max of %d host nodes are supported\n",
+             MAX_CPUMASK_BITS);
+    }
+
+    if (endvalue < value) {
+        goto error;
+    }
+
+    if (clear)
+        bitmap_clear(bm, value, endvalue - value + 1);
+    else
+        bitmap_set(bm, value, endvalue - value + 1);
+
+    return 0;
+
+error:
+    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnodes);
+    return -1;
+}
+
 static int numa_mem_parse(NumaMemOptions *opts)
 {
     uint64_t nodenr, mem_size;
@@ -110,6 +183,16 @@ static int numa_mem_parse(NumaMemOptions *opts)
     mem_size = opts->size;
     numa_info[nodenr].node_mem = mem_size;
 
+    const char *policy = opts->policy;
+    if (numa_mem_parse_policy(nodenr, policy) == -1) {
+        return -1;
+    }
+
+    const char *hostnodes = opts->host_nodes;
+    if (numa_mem_parse_hostnodes(nodenr, hostnodes) == -1) {
+        return -1;
+    }
+
     return 0;
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index f753a35..49eac70 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3717,9 +3717,15 @@
 #
 # @size: #optional memory size of this node
 #
+# @policy: #optional memory policy of this node
+#
+# @host-nodes: #optional host nodes for its memory policy
+#
 # Since 1.6
 ##
 { 'type': 'NumaMemOptions',
   'data': {
    '*nodeid':		'int',
-   '*size':		'size' }}
+   '*size':		'size',
+   '*policy':		'str',
+   '*host-nodes':	'str' }}
diff --git a/vl.c b/vl.c
index 5fdba97..dc8131c 100644
--- a/vl.c
+++ b/vl.c
@@ -2887,6 +2887,8 @@ int main(int argc, char **argv, char **envp)
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_mem = 0;
         bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+        bitmap_zero(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        numa_info[i].flags = NODE_HOST_NONE;
     }
 
     nb_numa_nodes = 0;
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 07/12] NUMA: split out the common range parser
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (5 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 08/12] NUMA: set guest numa nodes memory policy Wanlong Gao
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Since cpus parser and hostnode parser have the common range parser
part, split it out to the common range parser to avoid the duplicate
code.

Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c | 86 ++++++++++++++++++++++++++++--------------------------------------
 1 file changed, 37 insertions(+), 49 deletions(-)

diff --git a/numa.c b/numa.c
index a2eceb1..736acbb 100644
--- a/numa.c
+++ b/numa.c
@@ -36,48 +36,59 @@ QemuOptsList qemu_numa_opts = {
     .desc = { { 0 } } /* validated with OptsVisitor */
 };
 
-static int numa_node_parse_cpus(int nodenr, const char *cpus)
+static int numa_range_parse(const char *str,
+                            unsigned long long *value,
+                            unsigned long long *endvalue)
 {
     char *endptr;
-    unsigned long long value, endvalue;
 
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return 0;
+    if (parse_uint(str, value, &endptr, 10) < 0) {
+        return -1;
     }
 
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
     if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
+        if (parse_uint_full(endptr + 1, endvalue, 10) < 0) {
+            return -1;
         }
     } else if (*endptr == '\0') {
-        endvalue = value;
+        *endvalue = *value;
     } else {
-        goto error;
+        return -1;
     }
 
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
+    if (*endvalue >= MAX_CPUMASK_BITS) {
+        *endvalue = MAX_CPUMASK_BITS - 1;
         fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
+            "qemu: NUMA: A max of %d VCPUs/Nodes are supported\n",
              MAX_CPUMASK_BITS);
     }
 
-    if (endvalue < value) {
-        goto error;
+    if (*endvalue < *value) {
+        return -1;
     }
 
-    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return 0;
+}
 
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    return -1;
+
+static int numa_node_parse_cpus(int nodenr, const char *cpus)
+{
+    unsigned long long value, endvalue;
+
+    /* Empty CPU range strings will be considered valid, they will simply
+     * not set any bit in the CPU bitmap.
+     */
+    if (!*cpus) {
+        return 0;
+    }
+
+    if (numa_range_parse(cpus, &value, &endvalue) < 0) {
+        fprintf(stderr, "Invalid NUMA CPU range: %s", cpus);
+        return -1;
+    }
+
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
+    return 0;
 }
 
 static int numa_node_parse(NumaNodeOptions *opts)
@@ -115,7 +126,6 @@ static int numa_mem_parse_policy(int nodenr, const char *policy)
 static int numa_mem_parse_hostnodes(int nodenr, const char *hostnodes)
 {
     unsigned long long value, endvalue;
-    char *endptr;
     bool clear = false;
     unsigned long *bm = numa_info[nodenr].host_mem;
 
@@ -134,27 +144,9 @@ static int numa_mem_parse_hostnodes(int nodenr, const char *hostnodes)
         return 0;
     }
 
-    if (parse_uint(hostnodes, &value, &endptr, 10) < 0)
-        goto error;
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d host nodes are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
+    if (numa_range_parse(hostnodes, &value, &endvalue) < 0) {
+        fprintf(stderr, "Invalid host NUMA nodes range: %s", hostnodes);
+        return -1;
     }
 
     if (clear)
@@ -163,10 +155,6 @@ static int numa_mem_parse_hostnodes(int nodenr, const char *hostnodes)
         bitmap_set(bm, value, endvalue - value + 1);
 
     return 0;
-
-error:
-    fprintf(stderr, "qemu: Invalid host NUMA nodes range: %s\n", hostnodes);
-    return -1;
 }
 
 static int numa_mem_parse(NumaMemOptions *opts)
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 08/12] NUMA: set guest numa nodes memory policy
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (6 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 07/12] NUMA: split out the common range parser Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policies before
the pages are really mapped.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/numa.c b/numa.c
index 736acbb..24c886b 100644
--- a/numa.c
+++ b/numa.c
@@ -28,6 +28,16 @@
 #include "qapi-visit.h"
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
+#include "exec/memory.h"
+
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+#endif
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -279,6 +289,75 @@ void set_numa_nodes(void)
     }
 }
 
+#ifdef CONFIG_NUMA
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+    int bind_mode;
+
+    switch (numa_info[nodeid].flags & NODE_HOST_POLICY_MASK) {
+    case NODE_HOST_BIND:
+        bind_mode = MPOL_BIND;
+        break;
+    case NODE_HOST_INTERLEAVE:
+        bind_mode = MPOL_INTERLEAVE;
+        break;
+    case NODE_HOST_PREFERRED:
+        bind_mode = MPOL_PREFERRED;
+        break;
+    default:
+        bind_mode = MPOL_DEFAULT;
+        return bind_mode;
+    }
+
+    bind_mode |= (numa_info[nodeid].flags & NODE_HOST_RELATIVE) ?
+        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+    return bind_mode;
+}
+#endif
+
+static int set_node_mem_policy(unsigned int nodeid)
+{
+#ifdef CONFIG_NUMA
+    void *ram_ptr;
+    RAMBlock *block;
+    ram_addr_t len, ram_offset = 0;
+    int bind_mode;
+    int i;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(block->mr->name, "pc.ram")) {
+            break;
+        }
+    }
+
+    if (block->host == NULL)
+        return -1;
+
+    ram_ptr = block->host;
+    for (i = 0; i < nodeid; i++) {
+        len = numa_info[i].node_mem;
+        ram_offset += len;
+    }
+
+    len = numa_info[i].node_mem;
+    bind_mode = node_parse_bind_mode(i);
+
+    /* This is a workaround for a long standing bug in Linux'
+     * mbind implementation, which cuts off the last specified
+     * node. To stay compatible should this bug be fixed, we
+     * specify one more node and zero this one out.
+     */
+    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
+    if (mbind(ram_ptr + ram_offset, len, bind_mode,
+        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
+            perror("mbind");
+            return -1;
+    }
+#endif
+    return 0;
+}
+
 void set_numa_modes(void)
 {
     CPUState *cpu;
@@ -291,4 +370,11 @@ void set_numa_modes(void)
             }
         }
     }
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        if (set_node_mem_policy(i) == -1) {
+            fprintf(stderr,
+                    "qemu: can't set host memory policy for node%d\n", i);
+        }
+    }
 }
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (7 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 08/12] NUMA: set guest numa nodes memory policy Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17 12:36   ` Eric Blake
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 10/12] NUMA: add hmp command set-mem-policy Wanlong Gao
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

This QMP command allows user set guest node's memory policy
through the QMP protocol. The qmp-shell command is like:
    set-mem-policy nodeid=0 policy=membind host-nodes=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c           | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 19 +++++++++++++++++++
 qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 109 insertions(+)

diff --git a/numa.c b/numa.c
index 24c886b..3befc56 100644
--- a/numa.c
+++ b/numa.c
@@ -29,6 +29,7 @@
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
 #include "exec/memory.h"
+#include "qmp-commands.h"
 
 #ifdef CONFIG_NUMA
 #include <numa.h>
@@ -378,3 +379,57 @@ void set_numa_modes(void)
         }
     }
 }
+
+void qmp_set_mem_policy(int64_t nodeid, bool has_policy, const char *policy,
+                        bool has_host_nodes, const char *host_nodes, Error **errp)
+{
+    unsigned int flags;
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+
+    if (nodeid >= nb_numa_nodes) {
+        error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
+        return;
+    }
+
+    bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    flags = numa_info[nodeid].flags;
+
+    numa_info[nodeid].flags = NODE_HOST_NONE;
+    bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+
+    if (!has_policy) {
+        if (set_node_mem_policy(nodeid) == -1) {
+            error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+            goto error;
+        }
+        return;
+    }
+
+    if (numa_mem_parse_policy(nodeid, policy) == -1) {
+        error_setg(errp, "Failed to parse memory policy: %s", policy);
+        goto error;
+    }
+
+    if (!has_host_nodes) {
+        bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    }
+
+    if (host_nodes) {
+        if (numa_mem_parse_hostnodes(nodeid, host_nodes) == -1) {
+            error_setg(errp, "Failed to parse host nodes range %s", host_nodes);
+            goto error;
+        }
+    }
+
+    if (set_node_mem_policy(nodeid) == -1) {
+        error_setg(errp, "Failed to set memory policy for node%lu", nodeid);
+        goto error;
+    }
+
+    return;
+
+error:
+    bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
+    numa_info[nodeid].flags = flags;
+    return;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 49eac70..fd4c615 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3729,3 +3729,22 @@
    '*size':		'size',
    '*policy':		'str',
    '*host-nodes':	'str' }}
+
+##
+# @set-mem-policy:
+#
+# Set the host memory binding policy for guest NUMA node.
+#
+# @nodeid: The node ID of guest NUMA node to set memory policy to.
+#
+# @policy: #optional The memory policy to be set (default 'default').
+#
+# @host-nodes: #optional The host nodes range for memory policy.
+#
+# Returns: Nothing on success
+#
+# Since: 1.6
+##
+{ 'command': 'set-mem-policy',
+  'data': {'nodeid': 'int', '*policy': 'str',
+           '*host-nodes': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e075df4..aac88e0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3047,3 +3047,38 @@ Example:
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name      = "set-mem-policy",
+        .args_type = "nodeid:i,policy:s?,host-nodes:s?",
+        .help      = "Set the host memory binding policy for guest NUMA node",
+        .mhandler.cmd_new = qmp_marshal_input_set_mem_policy,
+    },
+
+SQMP
+set-mem-policy
+------
+
+Set the host memory binding policy for guest NUMA node
+
+Arguments:
+
+- "nodeid": The nodeid of guest NUMA node to set memory policy to.
+            (json-int)
+- "policy": The memory policy string to set.
+            (json-string, optional)
+- "host-nodes": The host nodes contained to this memory policy.
+                (json-string, optional)
+
+Example:
+
+-> { "execute": "set-mem-policy", "arguments": { "nodeid": 0, "policy": "membind",
+                                                 "host-nodes": "0-1" }}
+<- { "return": {} }
+
+Notes:
+    1. If "policy" is not set, the memory policy of this "nodeid" will be set
+       to "default".
+    2. If "host-nodes" is not set, the node mask of this "policy" will be set
+       to "all".
+EQMP
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 10/12] NUMA: add hmp command set-mem-policy
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (8 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa Wanlong Gao
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 12/12] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Add hmp command set-mem-policy to set host memory policy for a guest
NUMA node. Then we can also set node's memory policy using
the monitor command like:
    (qemu) set-mem-policy 0 policy=membind,host-nodes=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp-commands.hx | 16 ++++++++++++++++
 hmp.c           | 36 ++++++++++++++++++++++++++++++++++++
 hmp.h           |  1 +
 3 files changed, 53 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8c6b91a..fe3a26f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1587,6 +1587,22 @@ Executes a qemu-io command on the given block device.
 ETEXI
 
     {
+        .name       = "set-mem-policy",
+        .args_type  = "nodeid:i,args:s?",
+        .params     = "nodeid [args]",
+        .help       = "set host memory policy for a guest NUMA node",
+        .mhandler.cmd = hmp_set_mem_policy,
+    },
+
+STEXI
+@item set-mem-policy @var{nodeid} @var{args}
+@findex set-mem-policy
+
+Set host memory policy for a guest NUMA node
+
+ETEXI
+
+    {
         .name       = "info",
         .args_type  = "item:s?",
         .params     = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index dc4d8d4..7798339 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1510,3 +1510,39 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
     hmp_handle_error(mon, &err);
 }
+
+void hmp_set_mem_policy(Monitor *mon, const QDict *qdict)
+{
+    Error *local_err = NULL;
+    bool has_policy = true;
+    bool has_host_nodes = true;
+    const char *policy = NULL;
+    const char *host_nodes = NULL;
+    QemuOpts *opts;
+
+    uint64_t nodeid = qdict_get_int(qdict, "nodeid");
+    const char *args = qdict_get_try_str(qdict, "args");
+
+    if (args == NULL) {
+        has_policy = false;
+        has_host_nodes = false;
+    } else {
+        opts = qemu_opts_parse(qemu_find_opts("numa"), args, 1);
+        if (opts == NULL) {
+            monitor_printf(mon, "Parsing memory policy args failed\n");
+            return;
+        } else {
+            policy = qemu_opt_get(opts, "policy");
+            if (policy == NULL) {
+                has_policy = false;
+            }
+            host_nodes = qemu_opt_get(opts, "hostnode");
+            if (host_nodes == NULL) {
+                has_host_nodes = false;
+            }
+        }
+    }
+
+    qmp_set_mem_policy(nodeid, has_policy, policy, has_host_nodes, host_nodes, &local_err);
+    hmp_handle_error(mon, &local_err);
+}
diff --git a/hmp.h b/hmp.h
index 6c3bdcd..ae09525 100644
--- a/hmp.h
+++ b/hmp.h
@@ -87,5 +87,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
 void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_qemu_io(Monitor *mon, const QDict *qdict);
+void hmp_set_mem_policy(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (9 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 10/12] NUMA: add hmp command set-mem-policy Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  2013-07-17 12:41   ` Eric Blake
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 12/12] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
  11 siblings, 1 reply; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Add qmp command query-numa to show guest NUMA information.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c           | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 34 ++++++++++++++++++++++++
 qmp-commands.hx  | 49 +++++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+)

diff --git a/numa.c b/numa.c
index 3befc56..21fc7b9 100644
--- a/numa.c
+++ b/numa.c
@@ -433,3 +433,81 @@ error:
     numa_info[nodeid].flags = flags;
     return;
 }
+
+NUMAInfoList *qmp_query_numa(Error **errp)
+{
+    NUMAInfoList *head = NULL, *cur_item = NULL;
+    CPUState *cpu;
+    int i;
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        NUMAInfoList *info;
+        intList *cur_cpu_item = NULL;
+        info = g_malloc0(sizeof(*info));
+        info->value = g_malloc0(sizeof(*info->value));
+        info->value->nodeid = i;
+        for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
+            if (cpu->numa_node == i) {
+                intList *node_cpu = g_malloc0(sizeof(*node_cpu));
+                node_cpu->value = cpu->cpu_index;
+
+                if (!cur_cpu_item) {
+                    info->value->cpus = cur_cpu_item = node_cpu;
+                } else {
+                    cur_cpu_item->next = node_cpu;
+                    cur_cpu_item = node_cpu;
+                }
+            }
+        }
+        info->value->memory = numa_info[i].node_mem >> 20;
+
+#ifdef CONFIG_NUMA
+        switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) {
+        case NODE_HOST_BIND:
+            info->value->policy = g_strdup("membind");
+            break;
+        case NODE_HOST_INTERLEAVE:
+            info->value->policy = g_strdup("interleave");
+            break;
+        case NODE_HOST_PREFERRED:
+            info->value->policy = g_strdup("preferred");
+            break;
+        default:
+            info->value->policy = g_strdup("defalut");
+            break;
+        }
+
+        if (numa_info[i].flags & NODE_HOST_RELATIVE) {
+            info->value->relative = true;
+        }
+
+        unsigned long first, next;
+        next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        intList *cur_node_item = g_malloc0(sizeof(*cur_node_item));
+        cur_node_item->value = first;
+        info->value->host_nodes = cur_node_item;
+        do {
+            if (next == numa_max_node())
+                break;
+            next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
+                                 next + 1);
+            if (next > numa_max_node() || next == MAX_CPUMASK_BITS)
+                break;
+
+            intList *host_node = g_malloc0(sizeof(*host_node));
+            host_node->value = next;
+            cur_node_item->next = host_node;
+            cur_node_item = host_node;
+        } while (true);
+#endif
+
+        if (!cur_item) {
+            head = cur_item = info;
+        } else {
+            cur_item->next = info;
+            cur_item = info;
+        }
+    }
+
+    return head;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index fd4c615..085ba75 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3748,3 +3748,37 @@
 { 'command': 'set-mem-policy',
   'data': {'nodeid': 'int', '*policy': 'str',
            '*host-nodes': 'str'} }
+##
+# @NUMAInfo:
+#
+# Information about guest NUMA nodes
+#
+# @nodeid: NUMA node ID
+#
+# @cpus: VCPUs contained to this node
+#
+# @memory: memory size of this node
+#
+# @policy: memory policy of this node
+#
+# @relative: if host nodes is relative for memory policy
+#
+# @host-nodes: host nodes for its memory policy
+#
+# Since: 1.6
+#
+##
+{ 'type': 'NUMAInfo',
+  'data': {'nodeid': 'int', 'cpus': ['int'], 'memory': 'int',
+           'policy': 'str', 'relative': 'bool', 'host-nodes': ['int'] }}
+
+##
+# @query-numa:
+#
+# Returns a list of information about each guest node.
+#
+# Returns: a list of @NUMAInfo for each guest node
+#
+# Since: 1.6
+##
+{ 'command': 'query-numa', 'returns': ['NUMAInfo'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index aac88e0..998fb00 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3082,3 +3082,52 @@ Notes:
     2. If "host-nodes" is not set, the node mask of this "policy" will be set
        to "all".
 EQMP
+
+    {
+        .name = "query-numa",
+        .args_type = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_numa,
+    },
+
+SQMP
+query-numa
+---------
+
+Show NUMA information.
+
+Return a json-array. Each NUMA node is represented by a json-object,
+which contains:
+
+- "nodeid": NUMA node ID (json-int)
+- "cpus": a json-arry of contained VCPUs
+- "memory": amount of memory in each node in MB (json-int)
+- "policy": memory policy of this node (json-string)
+- "relative": if host nodes is relative for its memory policy (json-bool)
+- "host-nodes": a json-array of host nodes for its memory policy
+
+Arguments:
+
+Example:
+
+-> { "excute": "query-numa" }
+<- { "return":[
+        {
+            "nodeid": 0,
+            "cpus": [0, 1],
+            "memory": 512,
+            "policy": "membind",
+            "relative": false,
+            "host-nodes": [0, 1]
+        },
+        {
+            "nodeid": 1,
+            "cpus": [2, 3],
+            "memory": 512,
+            "policy": "interleave",
+            "relative": false,
+            "host-nodes": [1]
+        }
+     ]
+   }
+
+EQMP
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH V5 12/12] NUMA: convert hmp command info_numa to use qmp command query_numa
  2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (10 preceding siblings ...)
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa Wanlong Gao
@ 2013-07-17  9:29 ` Wanlong Gao
  11 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp.c     | 34 ++++++++++++++++++++++++++++++++++
 hmp.h     |  1 +
 monitor.c | 21 +--------------------
 3 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/hmp.c b/hmp.c
index 7798339..66c2fef 100644
--- a/hmp.c
+++ b/hmp.c
@@ -24,6 +24,7 @@
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
+#include "sysemu/sysemu.h"
 
 static void hmp_handle_error(Monitor *mon, Error **errp)
 {
@@ -1546,3 +1547,36 @@ void hmp_set_mem_policy(Monitor *mon, const QDict *qdict)
     qmp_set_mem_policy(nodeid, has_policy, policy, has_host_nodes, host_nodes, &local_err);
     hmp_handle_error(mon, &local_err);
 }
+
+void hmp_info_numa(Monitor *mon, const QDict *qdict)
+{
+    NUMAInfoList *node_list, *node;
+    intList *head;
+    int nodeid;
+
+    node_list = qmp_query_numa(NULL);
+
+    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
+    for (node = node_list; node; node = node->next) {
+        nodeid = node->value->nodeid;
+        monitor_printf(mon, "node %d cpus:", nodeid);
+        head = node->value->cpus;
+        for (head = node->value->cpus; head != NULL; head = head->next) {
+            monitor_printf(mon, " %d", (int)head->value);
+        }
+        monitor_printf(mon, "\n");
+        monitor_printf(mon, "node %d size: %" PRId64 " MB\n",
+                       nodeid, node->value->memory);
+        monitor_printf(mon, "node %d policy: %s\n",
+                       nodeid, node->value->policy ? : " ");
+        monitor_printf(mon, "node %d relative: %s\n", nodeid,
+                       node->value->relative ? "true" : "false");
+        monitor_printf(mon, "node %d host-nodes:", nodeid);
+        for (head = node->value->host_nodes; head != NULL; head = head->next) {
+            monitor_printf(mon, " %d", (int)head->value);
+        }
+        monitor_printf(mon, "\n");
+    }
+
+    qapi_free_NUMAInfoList(node_list);
+}
diff --git a/hmp.h b/hmp.h
index ae09525..56a5efd 100644
--- a/hmp.h
+++ b/hmp.h
@@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict);
 void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
+void hmp_info_numa(Monitor *mon, const QDict *qdict);
 void hmp_quit(Monitor *mon, const QDict *qdict);
 void hmp_stop(Monitor *mon, const QDict *qdict);
 void hmp_system_reset(Monitor *mon, const QDict *qdict);
diff --git a/monitor.c b/monitor.c
index 566709a..87133fc 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1803,25 +1803,6 @@ static void do_info_mtree(Monitor *mon, const QDict *qdict)
     mtree_info((fprintf_function)monitor_printf, mon);
 }
 
-static void do_info_numa(Monitor *mon, const QDict *qdict)
-{
-    int i;
-    CPUState *cpu;
-
-    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
-    for (i = 0; i < nb_numa_nodes; i++) {
-        monitor_printf(mon, "node %d cpus:", i);
-        for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
-            if (cpu->numa_node == i) {
-                monitor_printf(mon, " %d", cpu->cpu_index);
-            }
-        }
-        monitor_printf(mon, "\n");
-        monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            numa_info[i].node_mem >> 20);
-    }
-}
-
 #ifdef CONFIG_PROFILER
 
 int64_t qemu_time;
@@ -2589,7 +2570,7 @@ static mon_cmd_t info_cmds[] = {
         .args_type  = "",
         .params     = "",
         .help       = "show NUMA information",
-        .mhandler.cmd = do_info_numa,
+        .mhandler.cmd = hmp_info_numa,
     },
     {
         .name       = "usb",
-- 
1.8.3.2.634.g7a3187e

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
@ 2013-07-17 10:35   ` Laszlo Ersek
  2013-07-17 11:11     ` Paolo Bonzini
  2013-07-17 12:24     ` Eric Blake
  0 siblings, 2 replies; 34+ messages in thread
From: Laszlo Ersek @ 2013-07-17 10:35 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, pbonzini, y-goto, afaerber

comments below

On 07/17/13 11:29, Wanlong Gao wrote:
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  qapi-schema.json | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7b9fef1..f753a35 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3679,3 +3679,47 @@
>              '*cpuid-input-ecx': 'int',
>              'cpuid-register': 'X86CPURegister32',
>              'features': 'int' } }
> +
> +##
> +# @NumaOptions
> +#
> +# A discriminated record of NUMA options.
> +#
> +# Since 1.6
> +##
> +{ 'union': 'NumaOptions',
> +  'data': {
> +    'node':	'NumaNodeOptions',
> +    'mem':	'NumaMemOptions' }}
> +
> +##
> +# @NumaNodeOptions
> +#
> +# Create a guest NUMA node.
> +#
> +# @nodeid: #optional NUMA node ID
> +#
> +# @cpus: #optional VCPUs belong to this node
> +#
> +# Since: 1.6
> +##
> +{ 'type': 'NumaNodeOptions',
> +  'data': {
> +   '*nodeid':		'int',
> +   '*cpus':		'str' }}
> +

Should we document the format for "cpus" here too?

> +##
> +# @NumaMemOptions
> +#
> +# Set memory information of guest NUMA node.
> +#
> +# @nodeid: #optional NUMA node ID
> +#
> +# @size: #optional memory size of this node
> +#
> +# Since 1.6
> +##
> +{ 'type': 'NumaMemOptions',
> +  'data': {
> +   '*nodeid':		'int',
> +   '*size':		'size' }}
> 

Looks good in general but I'm not sure if hardware tabs are allowed (or
usual) in this file.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option Wanlong Gao
@ 2013-07-17 11:00   ` Laszlo Ersek
  2013-07-17 11:14     ` Paolo Bonzini
  2013-07-17 11:13   ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Laszlo Ersek @ 2013-07-17 11:00 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, pbonzini, y-goto, afaerber

I'm reviewing this with respect to opts-visitor usage:

On 07/17/13 11:29, Wanlong Gao wrote:
> Change -numa option like following as Paolo suggested:
>     -numa node,nodeid=0,cpus=0-1 \
>     -numa mem,nodeid=0,size=1G
> 
> This new option will make later coming memory hotplug better.
> And this new option is implemented using OptsVisitor.
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  Makefile.target         |   2 +-
>  include/sysemu/sysemu.h |   3 +
>  numa.c                  | 164 ++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-options.hx         |   6 +-
>  vl.c                    | 107 +++----------------------------
>  5 files changed, 182 insertions(+), 100 deletions(-)
>  create mode 100644 numa.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index 9a49852..7e1fddf 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -113,7 +113,7 @@ endif #CONFIG_BSD_USER
>  #########################################################
>  # System emulator target
>  ifdef CONFIG_SOFTMMU
> -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o
> +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
>  obj-y += qtest.o
>  obj-y += hw/
>  obj-$(CONFIG_FDT) += device_tree.o
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 3caeb66..cf8e6e5 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -132,6 +132,9 @@ extern QEMUClock *rtc_clock;
>  extern int nb_numa_nodes;
>  extern uint64_t node_mem[MAX_NODES];
>  extern unsigned long *node_cpumask[MAX_NODES];
> +extern QemuOptsList qemu_numa_opts;
> +int numa_init_func(QemuOpts *opts, void *opaque);
> +void set_numa_nodes(void);
>  
>  #define MAX_OPTION_ROMS 16
>  typedef struct QEMUOptionRom {
> diff --git a/numa.c b/numa.c
> new file mode 100644
> index 0000000..da68c4b
> --- /dev/null
> +++ b/numa.c
> @@ -0,0 +1,164 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2013 Fujitsu Ltd.
> + * Author: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "sysemu/sysemu.h"
> +#include "qemu/bitmap.h"
> +#include "qapi-visit.h"
> +#include "qapi/opts-visitor.h"
> +#include "qapi/dealloc-visitor.h"
> +
> +QemuOptsList qemu_numa_opts = {
> +    .name = "numa",
> +    .implied_opt_name = "type",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
> +    .desc = { { 0 } } /* validated with OptsVisitor */
> +};

Looks OK.

> +
> +static int numa_node_parse_cpus(int nodenr, const char *cpus)
> +{
> +    char *endptr;
> +    unsigned long long value, endvalue;
> +
> +    /* Empty CPU range strings will be considered valid, they will simply
> +     * not set any bit in the CPU bitmap.
> +     */
> +    if (!*cpus) {
> +        return 0;
> +    }

I think this can crash, can't it? All option arguments (struct members)
you introduce in 01/12 are optional. For scalar types (boolean, int etc)
you can usually simply check the member itself (field XXXX will have
value 0 if has_XXXX is false), but strings, sub-structs etc. are
generated as pointers, and are NULL if has_XXXX is false.

IOW numa_node_parse() may pass in a NULL "cpus" if ",cpus=..." is
completely absent.

> +
> +    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
> +        goto error;
> +    }
> +    if (*endptr == '-') {
> +        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
> +            goto error;
> +        }
> +    } else if (*endptr == '\0') {
> +        endvalue = value;
> +    } else {
> +        goto error;
> +    }
> +
> +    if (endvalue >= MAX_CPUMASK_BITS) {
> +        endvalue = MAX_CPUMASK_BITS - 1;
> +        fprintf(stderr,
> +            "qemu: NUMA: A max of %d VCPUs are supported\n",
> +             MAX_CPUMASK_BITS);
> +    }
> +
> +    if (endvalue < value) {
> +        goto error;
> +    }
> +
> +    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> +    return 0;
> +
> +error:
> +    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
> +    return -1;
> +}

The function looks OK to me otherwise.

BTW are discontiguous CPU maps unsupported on purpose?

Hm, I think that's actually supported, the user can pass

  -numa node,nodeid=1,cpus=1-2 -numa node,nodeid=1,cpus=5-6

and the bitmask will be the union.

If that's the case, opts-visitor would support the following notation
too (you'd have to turn "cpus" into a list of strings):

  -numa node,nodeid=1,cpus=1-2,cpus=5-6

But I'm not sure if that's the intent and/or if it would be worth the churn.

> +
> +static int numa_node_parse(NumaNodeOptions *opts)
> +{
> +    uint64_t nodenr;
> +    const char *cpus = NULL;
> +
> +    nodenr = opts->nodeid;
> +    if (nodenr >= MAX_NODES) {
> +        fprintf(stderr, "qemu: Max number of NUMA nodes reached: %d\n",
> +                (int)nodenr);
> +        return -1;
> +    }
> +
> +    cpus = opts->cpus;
> +    return numa_node_parse_cpus(nodenr, cpus);
> +}

"nodeid" can be negative (has type 'int' in the json, corresponding to
int64_t in generated C code). I guess casting it to uint64_t here
catches it in practice, but you could also change the type to something
unsigned in the json.

Also, what should happen if nodeid was not specified? (Nodenr will have
value 0.) If that's intended, maybe it should be documented in the json.

> +
> +static int numa_mem_parse(NumaMemOptions *opts)
> +{
> +    uint64_t nodenr, mem_size;
> +
> +    nodenr = opts->nodeid;
> +    if (nodenr >= MAX_NODES) {
> +        fprintf(stderr, "qemu: Max number of NUMA nodes reached: %d\n",
> +                (int)nodenr);

PRIu64 and no casting would be more fortunate I believe (or, if you
change the nodeid type in the json to uint32 for example, that format
specified).

> +        return -1;
> +    }
> +
> +    mem_size = opts->size;
> +    node_mem[nodenr] = mem_size;
> +
> +    return 0;
> +}
> +
> +int numa_init_func(QemuOpts *opts, void *opaque)
> +{
> +    NumaOptions *object = NULL;
> +    Error *err = NULL;
> +    int ret = 0;
> +
> +    {
> +        OptsVisitor *ov = opts_visitor_new(opts);
> +        visit_type_NumaOptions(opts_get_visitor(ov), &object, NULL, &err);
> +        opts_visitor_cleanup(ov);
> +    }
> +
> +    if (error_is_set(&err)) {
> +        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
> +        error_free(err);
> +        ret = -1;
> +        goto error;
> +    }
> +
> +    switch (object->kind) {
> +    case NUMA_OPTIONS_KIND_NODE:
> +        if (nb_numa_nodes >= MAX_NODES) {
> +            fprintf(stderr, "qemu: too many NUMA nodes\n");
> +            ret = -1;
> +            goto error;
> +        }
> +        nb_numa_nodes++;
> +        ret = numa_node_parse(object->node);
> +        break;

This assumes that nodeid values are unique.

> +    case NUMA_OPTIONS_KIND_MEM:
> +        ret = numa_mem_parse(object->mem);
> +        break;
> +    default:
> +        fprintf(stderr, "qemu: Invalid NUMA options type.\n");
> +        ret = -1;
> +        goto error;
> +    }

Small nit: this last "goto error" is useless.

> +
> +error:
> +    if (object) {
> +        QapiDeallocVisitor *dv = qapi_dealloc_visitor_new();
> +        visit_type_NumaOptions(qapi_dealloc_get_visitor(dv),
> +                               &object, NULL, NULL);
> +        qapi_dealloc_visitor_cleanup(dv);
> +    }
> +
> +    return ret;
> +}
> +

I think the general logic is OK.

> diff --git a/qemu-options.hx b/qemu-options.hx
> index 4e98b4f..7ec4486 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -95,11 +95,13 @@ specifies the maximum number of hotpluggable CPUs.
>  ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> -    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> +    "-numa node[,nodeid=node][,cpus=cpu[-cpu]]\n"
> +    "-numa mem[,nodeid=node][,size=size]\n"
> +    , QEMU_ARCH_ALL)
>  STEXI
>  @item -numa @var{opts}
>  @findex -numa
> -Simulate a multi node NUMA system. If mem and cpus are omitted, resources
> +Simulate a multi node NUMA system. If @var{size} and @var{cpus} are omitted, resources
>  are split equally.
>  ETEXI
>  
> diff --git a/vl.c b/vl.c
> index 25b8f2f..d3e6d8c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1330,102 +1330,6 @@ char *get_boot_devices_list(size_t *size)
>      return list;
>  }
>  
> -static void numa_node_parse_cpus(int nodenr, const char *cpus)
> -{
> -    char *endptr;
> -    unsigned long long value, endvalue;
> -
> -    /* Empty CPU range strings will be considered valid, they will simply
> -     * not set any bit in the CPU bitmap.
> -     */
> -    if (!*cpus) {
> -        return;
> -    }
> -
> -    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
> -        goto error;
> -    }
> -    if (*endptr == '-') {
> -        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
> -            goto error;
> -        }
> -    } else if (*endptr == '\0') {
> -        endvalue = value;
> -    } else {
> -        goto error;
> -    }
> -
> -    if (endvalue >= MAX_CPUMASK_BITS) {
> -        endvalue = MAX_CPUMASK_BITS - 1;
> -        fprintf(stderr,
> -            "qemu: NUMA: A max of %d VCPUs are supported\n",
> -             MAX_CPUMASK_BITS);
> -    }
> -
> -    if (endvalue < value) {
> -        goto error;
> -    }
> -
> -    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
> -    return;
> -
> -error:
> -    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
> -    exit(1);
> -}
> -
> -static void numa_add(const char *optarg)
> -{
> -    char option[128];
> -    char *endptr;
> -    unsigned long long nodenr;
> -
> -    optarg = get_opt_name(option, 128, optarg, ',');
> -    if (*optarg == ',') {
> -        optarg++;
> -    }
> -    if (!strcmp(option, "node")) {
> -
> -        if (nb_numa_nodes >= MAX_NODES) {
> -            fprintf(stderr, "qemu: too many NUMA nodes\n");
> -            exit(1);
> -        }
> -
> -        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> -            nodenr = nb_numa_nodes;
> -        } else {
> -            if (parse_uint_full(option, &nodenr, 10) < 0) {
> -                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
> -                exit(1);
> -            }
> -        }
> -
> -        if (nodenr >= MAX_NODES) {
> -            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
> -            exit(1);
> -        }
> -
> -        if (get_param_value(option, 128, "mem", optarg) == 0) {
> -            node_mem[nodenr] = 0;
> -        } else {
> -            int64_t sval;
> -            sval = strtosz(option, &endptr);
> -            if (sval < 0 || *endptr) {
> -                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
> -                exit(1);
> -            }
> -            node_mem[nodenr] = sval;
> -        }
> -        if (get_param_value(option, 128, "cpus", optarg) != 0) {
> -            numa_node_parse_cpus(nodenr, option);
> -        }
> -        nb_numa_nodes++;
> -    } else {
> -        fprintf(stderr, "Invalid -numa option: %s\n", option);
> -        exit(1);
> -    }
> -}
> -
>  static QemuOptsList qemu_smp_opts = {
>      .name = "smp-opts",
>      .implied_opt_name = "cpus",
> @@ -2961,6 +2865,7 @@ int main(int argc, char **argv, char **envp)
>      qemu_add_opts(&qemu_tpmdev_opts);
>      qemu_add_opts(&qemu_realtime_opts);
>      qemu_add_opts(&qemu_msg_opts);
> +    qemu_add_opts(&qemu_numa_opts);
>  
>      runstate_init();
>  
> @@ -3147,7 +3052,10 @@ int main(int argc, char **argv, char **envp)
>                  }
>                  break;
>              case QEMU_OPTION_numa:
> -                numa_add(optarg);
> +                opts = qemu_opts_parse(qemu_find_opts("numa"), optarg, 1);
> +                if (!opts) {
> +                    exit(1);
> +                }
>                  break;
>              case QEMU_OPTION_display:
>                  display_type = select_display(optarg);
> @@ -4226,6 +4134,11 @@ int main(int argc, char **argv, char **envp)
>  
>      register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
>  
> +    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
> +                          NULL, 1) != 0) {
> +        exit(1);
> +    }
> +
>      if (nb_numa_nodes > 0) {
>          int i;
>  
> 

Please feel free to ignore any of the above notes if you disagree with them.

Would it be OK with you if I skipped the rest of the series? :)

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 10:35   ` Laszlo Ersek
@ 2013-07-17 11:11     ` Paolo Bonzini
  2013-07-17 13:16       ` Wanlong Gao
  2013-07-17 12:24     ` Eric Blake
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 11:11 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, y-goto, afaerber, Wanlong Gao

Il 17/07/2013 12:35, Laszlo Ersek ha scritto:
> comments below
> 
> On 07/17/13 11:29, Wanlong Gao wrote:
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>>  qapi-schema.json | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 44 insertions(+)
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 7b9fef1..f753a35 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -3679,3 +3679,47 @@
>>              '*cpuid-input-ecx': 'int',
>>              'cpuid-register': 'X86CPURegister32',
>>              'features': 'int' } }
>> +
>> +##
>> +# @NumaOptions
>> +#
>> +# A discriminated record of NUMA options.
>> +#
>> +# Since 1.6
>> +##
>> +{ 'union': 'NumaOptions',
>> +  'data': {
>> +    'node':	'NumaNodeOptions',
>> +    'mem':	'NumaMemOptions' }}
>> +
>> +##
>> +# @NumaNodeOptions
>> +#
>> +# Create a guest NUMA node.
>> +#
>> +# @nodeid: #optional NUMA node ID
>> +#
>> +# @cpus: #optional VCPUs belong to this node
>> +#
>> +# Since: 1.6
>> +##
>> +{ 'type': 'NumaNodeOptions',
>> +  'data': {
>> +   '*nodeid':		'int',
>> +   '*cpus':		'str' }}
>> +
> 
> Should we document the format for "cpus" here too?

I think so---good catch.

>> +##
>> +# @NumaMemOptions
>> +#
>> +# Set memory information of guest NUMA node.
>> +#
>> +# @nodeid: #optional NUMA node ID
>> +#
>> +# @size: #optional memory size of this node
>> +#
>> +# Since 1.6
>> +##
>> +{ 'type': 'NumaMemOptions',
>> +  'data': {
>> +   '*nodeid':		'int',
>> +   '*size':		'size' }}
>>
> 
> Looks good in general but I'm not sure if hardware tabs are allowed (or
> usual) in this file.

Definitely not usual---in fact not used at all, so they're probably not
allowed too.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option Wanlong Gao
  2013-07-17 11:00   ` Laszlo Ersek
@ 2013-07-17 11:13   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 11:13 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, y-goto,
	peter.huangpeng, afaerber

Il 17/07/2013 11:29, Wanlong Gao ha scritto:
> Change -numa option like following as Paolo suggested:
>     -numa node,nodeid=0,cpus=0-1 \
>     -numa mem,nodeid=0,size=1G
> 
> This new option will make later coming memory hotplug better.
> And this new option is implemented using OptsVisitor.

It shouldn't be changed.  The old "mem" suboption should remain valid
(as legacy), and it should be equivalent to a separate "-numa mem"
argument.  Never, ever break existing command lines.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option
  2013-07-17 11:00   ` Laszlo Ersek
@ 2013-07-17 11:14     ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 11:14 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, y-goto, afaerber, Wanlong Gao

Il 17/07/2013 13:00, Laszlo Ersek ha scritto:
> Hm, I think that's actually supported, the user can pass
> 
>   -numa node,nodeid=1,cpus=1-2 -numa node,nodeid=1,cpus=5-6
> 
> and the bitmask will be the union.
> 
> If that's the case, opts-visitor would support the following notation
> too (you'd have to turn "cpus" into a list of strings):
> 
>   -numa node,nodeid=1,cpus=1-2,cpus=5-6
> 
> But I'm not sure if that's the intent and/or if it would be worth the churn.

It is the intent, and it would be worth the churn. :)

Excellent review!

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 10:35   ` Laszlo Ersek
  2013-07-17 11:11     ` Paolo Bonzini
@ 2013-07-17 12:24     ` Eric Blake
  2013-07-17 13:57       ` Laszlo Ersek
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Blake @ 2013-07-17 12:24 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, y-goto, pbonzini, afaerber, Wanlong Gao

[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

On 07/17/2013 04:35 AM, Laszlo Ersek wrote:

>> +# @cpus: #optional VCPUs belong to this node
>> +#
>> +# Since: 1.6
>> +##
>> +{ 'type': 'NumaNodeOptions',
>> +  'data': {
>> +   '*nodeid':		'int',
>> +   '*cpus':		'str' }}
>> +
> 
> Should we document the format for "cpus" here too?

Not only that, but is this even the right representation?  The fact that
you are requiring the receiver to further parse this string means you
probably represented it at the wrong level in JSON.  That is, a JSON
string "1,2,4" requires post-processing to turn it into 3 processor ids,
while a JSON array [1, 2, 4] does not, so you should probably consider
'*cpus':['int'] as your preferred datatype.

>> +# Since 1.6
>> +##
>> +{ 'type': 'NumaMemOptions',
>> +  'data': {
>> +   '*nodeid':		'int',
>> +   '*size':		'size' }}
>>
> 
> Looks good in general but I'm not sure if hardware tabs are allowed (or
> usual) in this file.

Drop the tabs.  Also, this missed soft freeze for 1.6, so you will
probably end up using Since 1.7 by the time it actually gets accepted.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-07-17 12:31   ` Eric Blake
  2013-07-17 13:12     ` Wanlong Gao
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2013-07-17 12:31 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On 07/17/2013 03:29 AM, Wanlong Gao wrote:
> The memory policy setting format is like:
>     policy={membind|interleave|preferred},host-node=[+|!]{all|N-N}
> And we are adding this setting as a suboption of "-numa mem,",
> the memory policy then can be set like following:
>     -numa node,nodeid=0,cpus=0 \
>     -numa node,nodeid=1,cpus=1 \
>     -numa mem,nodeid=0,size=1G,policy=membind,host-nodes=0-1 \
>     -numa mem,nodeid=1,size=1G,policy=interleave,host-nodes=!1
> 
> Reviewed-by: Bandan Das <bsd@redhat.com>
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---

> +++ b/qapi-schema.json
> @@ -3717,9 +3717,15 @@
>  #
>  # @size: #optional memory size of this node
>  #
> +# @policy: #optional memory policy of this node
> +#
> +# @host-nodes: #optional host nodes for its memory policy
> +#
>  # Since 1.6
>  ##
>  { 'type': 'NumaMemOptions',
>    'data': {
>     '*nodeid':		'int',
> -   '*size':		'size' }}
> +   '*size':		'size',
> +   '*policy':		'str',

What are the valid values for 'policy'?  If it is a finite set, please
make an 'enum' type that lists the valid values, and make this
'*policy':'NumaMemPolicy' rather than a free-form 'str'.

> +   '*host-nodes':	'str' }}

Missing documentation on how this 'str' is formatted, and same concerns
as in 1/12 about whether it is the right JSON representation, or whether
you have crammed too much information into a single string that now
requires post-processing.  Why is an array not a better choice?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
@ 2013-07-17 12:36   ` Eric Blake
  2013-07-17 13:22     ` Wanlong Gao
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2013-07-17 12:36 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber

[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]

On 07/17/2013 03:29 AM, Wanlong Gao wrote:
> This QMP command allows user set guest node's memory policy
> through the QMP protocol. The qmp-shell command is like:
>     set-mem-policy nodeid=0 policy=membind host-nodes=0-1
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  numa.c           | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 19 +++++++++++++++++++
>  qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
>  3 files changed, 109 insertions(+)
> 

> +++ b/qapi-schema.json
> @@ -3729,3 +3729,22 @@
>     '*size':		'size',
>     '*policy':		'str',
>     '*host-nodes':	'str' }}
> +
> +##
> +# @set-mem-policy:
> +#
> +# Set the host memory binding policy for guest NUMA node.
> +#
> +# @nodeid: The node ID of guest NUMA node to set memory policy to.
> +#
> +# @policy: #optional The memory policy to be set (default 'default').

Needs to be an enum type, not an open-coded 'str'.  Probably the same
enum type as you add for patch 6/12.

> +#
> +# @host-nodes: #optional The host nodes range for memory policy.

Again, how is this string interpreted; and should it be an array?

> +#
> +# Returns: Nothing on success
> +#
> +# Since: 1.6
> +##
> +{ 'command': 'set-mem-policy',
> +  'data': {'nodeid': 'int', '*policy': 'str',
> +           '*host-nodes': 'str'} }

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa
  2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa Wanlong Gao
@ 2013-07-17 12:41   ` Eric Blake
  2013-07-17 13:24     ` Wanlong Gao
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2013-07-17 12:41 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber

[-- Attachment #1: Type: text/plain, Size: 1990 bytes --]

On 07/17/2013 03:29 AM, Wanlong Gao wrote:
> Add qmp command query-numa to show guest NUMA information.
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  numa.c           | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 34 ++++++++++++++++++++++++
>  qmp-commands.hx  | 49 +++++++++++++++++++++++++++++++++++
>  3 files changed, 161 insertions(+)

Focusing on just the interface...

> +++ b/qapi-schema.json
> @@ -3748,3 +3748,37 @@
>  { 'command': 'set-mem-policy',
>    'data': {'nodeid': 'int', '*policy': 'str',
>             '*host-nodes': 'str'} }
> +##
> +# @NUMAInfo:
> +#
> +# Information about guest NUMA nodes
> +#
> +# @nodeid: NUMA node ID
> +#
> +# @cpus: VCPUs contained to this node

s/to/in/

> +#
> +# @memory: memory size of this node

In what unit? Preferably bytes, please (although your example listed
512, which is awfully small).  HMP can round to nearest k or M, but QMP
should be precise.

> +#
> +# @policy: memory policy of this node
> +#
> +# @relative: if host nodes is relative for memory policy

s/is/are/

> +#
> +# @host-nodes: host nodes for its memory policy
> +#
> +# Since: 1.6
> +#
> +##
> +{ 'type': 'NUMAInfo',
> +  'data': {'nodeid': 'int', 'cpus': ['int'], 'memory': 'int',
> +           'policy': 'str', 'relative': 'bool', 'host-nodes': ['int'] }}

'policy' needs to be an 'enum' type (the same one as I have requested
you to use in your other patches).

> +
> +##
> +# @query-numa:
> +#
> +# Returns a list of information about each guest node.
> +#
> +# Returns: a list of @NUMAInfo for each guest node
> +#
> +# Since: 1.6
> +##
> +{ 'command': 'query-numa', 'returns': ['NUMAInfo'] }

Seems like a reasonable command.  As in 1/12, you may end up using
'Since: 1.7' by the time this patch is actually taken.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy
  2013-07-17 12:31   ` Eric Blake
@ 2013-07-17 13:12     ` Wanlong Gao
  0 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17 13:12 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber, Wanlong Gao

On 07/17/2013 08:31 PM, Eric Blake wrote:
> On 07/17/2013 03:29 AM, Wanlong Gao wrote:
>> The memory policy setting format is like:
>>     policy={membind|interleave|preferred},host-node=[+|!]{all|N-N}
>> And we are adding this setting as a suboption of "-numa mem,",
>> the memory policy then can be set like following:
>>     -numa node,nodeid=0,cpus=0 \
>>     -numa node,nodeid=1,cpus=1 \
>>     -numa mem,nodeid=0,size=1G,policy=membind,host-nodes=0-1 \
>>     -numa mem,nodeid=1,size=1G,policy=interleave,host-nodes=!1
>>
>> Reviewed-by: Bandan Das <bsd@redhat.com>
>> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
> 
>> +++ b/qapi-schema.json
>> @@ -3717,9 +3717,15 @@
>>  #
>>  # @size: #optional memory size of this node
>>  #
>> +# @policy: #optional memory policy of this node
>> +#
>> +# @host-nodes: #optional host nodes for its memory policy
>> +#
>>  # Since 1.6
>>  ##
>>  { 'type': 'NumaMemOptions',
>>    'data': {
>>     '*nodeid':		'int',
>> -   '*size':		'size' }}
>> +   '*size':		'size',
>> +   '*policy':		'str',
> 
> What are the valid values for 'policy'?  If it is a finite set, please
> make an 'enum' type that lists the valid values, and make this
> '*policy':'NumaMemPolicy' rather than a free-form 'str'.

OK, will follow this. Luiz also suggested like this.

> 
>> +   '*host-nodes':	'str' }}
> 
> Missing documentation on how this 'str' is formatted, and same concerns
> as in 1/12 about whether it is the right JSON representation, or whether
> you have crammed too much information into a single string that now
> requires post-processing.  Why is an array not a better choice?

Will try to use array here, thank you for your suggestion.

Thanks,
Wanlong Gao

> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 11:11     ` Paolo Bonzini
@ 2013-07-17 13:16       ` Wanlong Gao
  0 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17 13:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, y-goto, Laszlo Ersek, afaerber, Wanlong Gao

On 07/17/2013 07:11 PM, Paolo Bonzini wrote:
> Il 17/07/2013 12:35, Laszlo Ersek ha scritto:
>> comments below
>>
>> On 07/17/13 11:29, Wanlong Gao wrote:
>>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>>> ---
>>>  qapi-schema.json | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 44 insertions(+)
>>>
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index 7b9fef1..f753a35 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -3679,3 +3679,47 @@
>>>              '*cpuid-input-ecx': 'int',
>>>              'cpuid-register': 'X86CPURegister32',
>>>              'features': 'int' } }
>>> +
>>> +##
>>> +# @NumaOptions
>>> +#
>>> +# A discriminated record of NUMA options.
>>> +#
>>> +# Since 1.6
>>> +##
>>> +{ 'union': 'NumaOptions',
>>> +  'data': {
>>> +    'node':	'NumaNodeOptions',
>>> +    'mem':	'NumaMemOptions' }}
>>> +
>>> +##
>>> +# @NumaNodeOptions
>>> +#
>>> +# Create a guest NUMA node.
>>> +#
>>> +# @nodeid: #optional NUMA node ID
>>> +#
>>> +# @cpus: #optional VCPUs belong to this node
>>> +#
>>> +# Since: 1.6
>>> +##
>>> +{ 'type': 'NumaNodeOptions',
>>> +  'data': {
>>> +   '*nodeid':		'int',
>>> +   '*cpus':		'str' }}
>>> +
>>
>> Should we document the format for "cpus" here too?
> 
> I think so---good catch.

Got it, thank you.

> 
>>> +##
>>> +# @NumaMemOptions
>>> +#
>>> +# Set memory information of guest NUMA node.
>>> +#
>>> +# @nodeid: #optional NUMA node ID
>>> +#
>>> +# @size: #optional memory size of this node
>>> +#
>>> +# Since 1.6
>>> +##
>>> +{ 'type': 'NumaMemOptions',
>>> +  'data': {
>>> +   '*nodeid':		'int',
>>> +   '*size':		'size' }}
>>>
>>
>> Looks good in general but I'm not sure if hardware tabs are allowed (or
>> usual) in this file.
> 
> Definitely not usual---in fact not used at all, so they're probably not
> allowed too.

Got it, thank you.

Wanlong Gao

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node
  2013-07-17 12:36   ` Eric Blake
@ 2013-07-17 13:22     ` Wanlong Gao
  0 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17 13:22 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber, Wanlong Gao

On 07/17/2013 08:36 PM, Eric Blake wrote:
> On 07/17/2013 03:29 AM, Wanlong Gao wrote:
>> This QMP command allows user set guest node's memory policy
>> through the QMP protocol. The qmp-shell command is like:
>>     set-mem-policy nodeid=0 policy=membind host-nodes=0-1
>>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>>  numa.c           | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qapi-schema.json | 19 +++++++++++++++++++
>>  qmp-commands.hx  | 35 +++++++++++++++++++++++++++++++++++
>>  3 files changed, 109 insertions(+)
>>
> 
>> +++ b/qapi-schema.json
>> @@ -3729,3 +3729,22 @@
>>     '*size':		'size',
>>     '*policy':		'str',
>>     '*host-nodes':	'str' }}
>> +
>> +##
>> +# @set-mem-policy:
>> +#
>> +# Set the host memory binding policy for guest NUMA node.
>> +#
>> +# @nodeid: The node ID of guest NUMA node to set memory policy to.
>> +#
>> +# @policy: #optional The memory policy to be set (default 'default').
> 
> Needs to be an enum type, not an open-coded 'str'.  Probably the same
> enum type as you add for patch 6/12.

OK.

> 
>> +#
>> +# @host-nodes: #optional The host nodes range for memory policy.
> 
> Again, how is this string interpreted; and should it be an array?

OK, will take care of this. Thanks for your review.

Wanlong Gao

> 
>> +#
>> +# Returns: Nothing on success
>> +#
>> +# Since: 1.6
>> +##
>> +{ 'command': 'set-mem-policy',
>> +  'data': {'nodeid': 'int', '*policy': 'str',
>> +           '*host-nodes': 'str'} }
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa
  2013-07-17 12:41   ` Eric Blake
@ 2013-07-17 13:24     ` Wanlong Gao
  0 siblings, 0 replies; 34+ messages in thread
From: Wanlong Gao @ 2013-07-17 13:24 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber, Wanlong Gao

On 07/17/2013 08:41 PM, Eric Blake wrote:
> On 07/17/2013 03:29 AM, Wanlong Gao wrote:
>> Add qmp command query-numa to show guest NUMA information.
>>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>>  numa.c           | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qapi-schema.json | 34 ++++++++++++++++++++++++
>>  qmp-commands.hx  | 49 +++++++++++++++++++++++++++++++++++
>>  3 files changed, 161 insertions(+)
> 
> Focusing on just the interface...
> 
>> +++ b/qapi-schema.json
>> @@ -3748,3 +3748,37 @@
>>  { 'command': 'set-mem-policy',
>>    'data': {'nodeid': 'int', '*policy': 'str',
>>             '*host-nodes': 'str'} }
>> +##
>> +# @NUMAInfo:
>> +#
>> +# Information about guest NUMA nodes
>> +#
>> +# @nodeid: NUMA node ID
>> +#
>> +# @cpus: VCPUs contained to this node
> 
> s/to/in/

OK.

> 
>> +#
>> +# @memory: memory size of this node
> 
> In what unit? Preferably bytes, please (although your example listed
> 512, which is awfully small).  HMP can round to nearest k or M, but QMP
> should be precise.
> 
>> +#
>> +# @policy: memory policy of this node
>> +#
>> +# @relative: if host nodes is relative for memory policy
> 
> s/is/are/

OK.

> 
>> +#
>> +# @host-nodes: host nodes for its memory policy
>> +#
>> +# Since: 1.6
>> +#
>> +##
>> +{ 'type': 'NUMAInfo',
>> +  'data': {'nodeid': 'int', 'cpus': ['int'], 'memory': 'int',
>> +           'policy': 'str', 'relative': 'bool', 'host-nodes': ['int'] }}
> 
> 'policy' needs to be an 'enum' type (the same one as I have requested
> you to use in your other patches).

Yeah, got it.

> 
>> +
>> +##
>> +# @query-numa:
>> +#
>> +# Returns a list of information about each guest node.
>> +#
>> +# Returns: a list of @NUMAInfo for each guest node
>> +#
>> +# Since: 1.6
>> +##
>> +{ 'command': 'query-numa', 'returns': ['NUMAInfo'] }
> 
> Seems like a reasonable command.  As in 1/12, you may end up using
> 'Since: 1.7' by the time this patch is actually taken.

Yeah, got it. Thank you very much for your review.

Wanlong Gao

> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 12:24     ` Eric Blake
@ 2013-07-17 13:57       ` Laszlo Ersek
  2013-07-17 14:20         ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Laszlo Ersek @ 2013-07-17 13:57 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, qemu-devel, peter.huangpeng, lcapitulino,
	bsd, y-goto, pbonzini, afaerber, Wanlong Gao

On 07/17/13 14:24, Eric Blake wrote:
> On 07/17/2013 04:35 AM, Laszlo Ersek wrote:
> 
>>> +# @cpus: #optional VCPUs belong to this node
>>> +#
>>> +# Since: 1.6
>>> +##
>>> +{ 'type': 'NumaNodeOptions',
>>> +  'data': {
>>> +   '*nodeid':		'int',
>>> +   '*cpus':		'str' }}
>>> +
>>
>> Should we document the format for "cpus" here too?
> 
> Not only that, but is this even the right representation?  The fact that
> you are requiring the receiver to further parse this string means you
> probably represented it at the wrong level in JSON.  That is, a JSON
> string "1,2,4" requires post-processing to turn it into 3 processor ids,
> while a JSON array [1, 2, 4] does not, so you should probably consider
> '*cpus':['int'] as your preferred datatype.

opts-visitor can handle lists of simple scalar types. Ie. it can do
-numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
intervals (eg. 3-4).

This is of course not to say that the interface should be limited by
what opts-visitor can do; just that opts-visitor may not be appropriate
for (or solve completely the needs of) very intricate options.

Laszlo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 13:57       ` Laszlo Ersek
@ 2013-07-17 14:20         ` Paolo Bonzini
  2013-07-17 14:33           ` Laszlo Ersek
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 14:20 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

Il 17/07/2013 15:57, Laszlo Ersek ha scritto:
>> > Not only that, but is this even the right representation?  The fact that
>> > you are requiring the receiver to further parse this string means you
>> > probably represented it at the wrong level in JSON.  That is, a JSON
>> > string "1,2,4" requires post-processing to turn it into 3 processor ids,
>> > while a JSON array [1, 2, 4] does not, so you should probably consider
>> > '*cpus':['int'] as your preferred datatype.
> opts-visitor can handle lists of simple scalar types. Ie. it can do
> -numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
> intervals (eg. 3-4).

Saving the parsing of intervals is not necessary for this use case.  So
if we can make it '*cpus':['int'], we should.

But is it the opts-visitor "can handle" lists of integers, or does code
have to be written?  If the latter, can you whip up a prototype?

Paolo

> This is of course not to say that the interface should be limited by
> what opts-visitor can do; just that opts-visitor may not be appropriate
> for (or solve completely the needs of) very intricate options.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 14:20         ` Paolo Bonzini
@ 2013-07-17 14:33           ` Laszlo Ersek
  2013-07-17 14:44             ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Laszlo Ersek @ 2013-07-17 14:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

On 07/17/13 16:20, Paolo Bonzini wrote:
> Il 17/07/2013 15:57, Laszlo Ersek ha scritto:
>>>> Not only that, but is this even the right representation?  The fact that
>>>> you are requiring the receiver to further parse this string means you
>>>> probably represented it at the wrong level in JSON.  That is, a JSON
>>>> string "1,2,4" requires post-processing to turn it into 3 processor ids,
>>>> while a JSON array [1, 2, 4] does not, so you should probably consider
>>>> '*cpus':['int'] as your preferred datatype.
>> opts-visitor can handle lists of simple scalar types. Ie. it can do
>> -numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
>> intervals (eg. 3-4).
> 
> Saving the parsing of intervals is not necessary for this use case.  So
> if we can make it '*cpus':['int'], we should.
> 
> But is it the opts-visitor "can handle" lists of integers, or does code
> have to be written?  If the latter, can you whip up a prototype?

No extra code needs to be written. The current use case is
NetdevUserOptions.{dnssearch,hostfwd,guestfwd}; see commit 094f15c5, and
(by Klaus Stengel) commit 63d2960b.

Laszlo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 14:33           ` Laszlo Ersek
@ 2013-07-17 14:44             ` Paolo Bonzini
  2013-07-17 15:24               ` Laszlo Ersek
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 14:44 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

Il 17/07/2013 16:33, Laszlo Ersek ha scritto:
>>> >> opts-visitor can handle lists of simple scalar types. Ie. it can do
>>> >> -numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
>>> >> intervals (eg. 3-4).
>> > 
>> > Saving the parsing of intervals is not necessary for this use case.  So
>> > if we can make it '*cpus':['int'], we should.
>> > 
>> > But is it the opts-visitor "can handle" lists of integers, or does code
>> > have to be written?  If the latter, can you whip up a prototype?
> No extra code needs to be written. The current use case is
> NetdevUserOptions.{dnssearch,hostfwd,guestfwd}; see commit 094f15c5, and
> (by Klaus Stengel) commit 63d2960b.

This is to handle lists, but want about converting

  cpus=3-4,cpus=9-10

to

  'cpus': [3,4,9,10]

?

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 14:44             ` Paolo Bonzini
@ 2013-07-17 15:24               ` Laszlo Ersek
  2013-07-17 15:26                 ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Laszlo Ersek @ 2013-07-17 15:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

On 07/17/13 16:44, Paolo Bonzini wrote:
> Il 17/07/2013 16:33, Laszlo Ersek ha scritto:
>>>>>> opts-visitor can handle lists of simple scalar types. Ie. it can do
>>>>>> -numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
>>>>>> intervals (eg. 3-4).
>>>>
>>>> Saving the parsing of intervals is not necessary for this use case.  So
>>>> if we can make it '*cpus':['int'], we should.
>>>>
>>>> But is it the opts-visitor "can handle" lists of integers, or does code
>>>> have to be written?  If the latter, can you whip up a prototype?
>> No extra code needs to be written. The current use case is
>> NetdevUserOptions.{dnssearch,hostfwd,guestfwd}; see commit 094f15c5, and
>> (by Klaus Stengel) commit 63d2960b.
> 
> This is to handle lists, but want about converting
> 
>   cpus=3-4,cpus=9-10
> 
> to
> 
>   'cpus': [3,4,9,10]

Oh, that. :) That does need extra code. Something along the lines of:

(a), in the JSON, reuse the existing String wrapper type, and make
"cpus" an optional list of String[s]:

{ 'type': 'NumaNodeOptions',
  'data': {
   '*nodeid':		'uint16',
   '*cpus':		['String'] }}

(b) in the code, traverse the StringList like net_init_slirp_configs()
or slirp_dnssearch() does. Parse each element as an interval, set bit
ranges / report errors.

    static int numa_node_parse_cpu_range(int nodeid,
                                         const char *cpu_range)
    {
        /* what numa_node_parse_cpus() does in 02/12 */
    }

    static int numa_node_parse(const NumaNodeOptions *opts)
    {
        const StringList *cpu_range;

        /* not sure how to handle the (!opts->has_nodeid) case; let's
         * assume we have a nodeid here */

        if (opts->nodeid >= MAX_NODES) {
            fprintf(stderr,
                    "NUMA nodeid %d reaches / exceeds maximum %d\n",
                    opts->nodeid, MAX_NODES);
            return -1;
        }

        for (cpu_range = opts->cpus;
             cpu_range != NULL;
             cpu_range = cpu_range->next) {
            int ret;

            ret = numa_node_parse_cpu_range(opts->nodeid,
                                            cpu_range->value->str);
            if (ret < 0) {
                return ret;
            }
        }
        return 0;
    }

Did you mean something like this by prototype?

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 15:24               ` Laszlo Ersek
@ 2013-07-17 15:26                 ` Paolo Bonzini
  2013-07-17 15:45                   ` Laszlo Ersek
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 15:26 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

Il 17/07/2013 17:24, Laszlo Ersek ha scritto:
> On 07/17/13 16:44, Paolo Bonzini wrote:
>> Il 17/07/2013 16:33, Laszlo Ersek ha scritto:
>>>>>>> opts-visitor can handle lists of simple scalar types. Ie. it can do
>>>>>>> -numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
>>>>>>> intervals (eg. 3-4).
>>>>>
>>>>> Saving the parsing of intervals is not necessary for this use case.  So
>>>>> if we can make it '*cpus':['int'], we should.
>>>>>
>>>>> But is it the opts-visitor "can handle" lists of integers, or does code
>>>>> have to be written?  If the latter, can you whip up a prototype?
>>> No extra code needs to be written. The current use case is
>>> NetdevUserOptions.{dnssearch,hostfwd,guestfwd}; see commit 094f15c5, and
>>> (by Klaus Stengel) commit 63d2960b.
>>
>> This is to handle lists, but want about converting
>>
>>   cpus=3-4,cpus=9-10
>>
>> to
>>
>>   'cpus': [3,4,9,10]
> 
> Oh, that. :) That does need extra code. Something along the lines of:
> 
> (a), in the JSON, reuse the existing String wrapper type, and make
> "cpus" an optional list of String[s]:
> 
> { 'type': 'NumaNodeOptions',
>   'data': {
>    '*nodeid':		'uint16',
>    '*cpus':		['String'] }}
> 
> (b) in the code, traverse the StringList like net_init_slirp_configs()
> or slirp_dnssearch() does. Parse each element as an interval, set bit
> ranges / report errors.
> 
>     static int numa_node_parse_cpu_range(int nodeid,
>                                          const char *cpu_range)
>     {
>         /* what numa_node_parse_cpus() does in 02/12 */
>     }
> 
>     static int numa_node_parse(const NumaNodeOptions *opts)
>     {
>         const StringList *cpu_range;
> 
>         /* not sure how to handle the (!opts->has_nodeid) case; let's
>          * assume we have a nodeid here */
> 
>         if (opts->nodeid >= MAX_NODES) {
>             fprintf(stderr,
>                     "NUMA nodeid %d reaches / exceeds maximum %d\n",
>                     opts->nodeid, MAX_NODES);
>             return -1;
>         }
> 
>         for (cpu_range = opts->cpus;
>              cpu_range != NULL;
>              cpu_range = cpu_range->next) {
>             int ret;
> 
>             ret = numa_node_parse_cpu_range(opts->nodeid,
>                                             cpu_range->value->str);
>             if (ret < 0) {
>                 return ret;
>             }
>         }
>         return 0;
>     }
> 
> Did you mean something like this by prototype?

Yes, though I guess Wanlong could do this by himself.  A more
interesting prototype is "how to add code to OptsVisitor that parses
intervals when it sees ['int']", and this where you can help the most.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 15:26                 ` Paolo Bonzini
@ 2013-07-17 15:45                   ` Laszlo Ersek
  2013-07-17 15:54                     ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Laszlo Ersek @ 2013-07-17 15:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

On 07/17/13 17:26, Paolo Bonzini wrote:
> Il 17/07/2013 17:24, Laszlo Ersek ha scritto:
>> On 07/17/13 16:44, Paolo Bonzini wrote:
>>> Il 17/07/2013 16:33, Laszlo Ersek ha scritto:
>>>>>>>> opts-visitor can handle lists of simple scalar types. Ie. it can do
>>>>>>>> -numa node,nodeid=3,cpus=3-4,cpus=9-10. It can't save the parsing of
>>>>>>>> intervals (eg. 3-4).
>>>>>>
>>>>>> Saving the parsing of intervals is not necessary for this use case.  So
>>>>>> if we can make it '*cpus':['int'], we should.
>>>>>>
>>>>>> But is it the opts-visitor "can handle" lists of integers, or does code
>>>>>> have to be written?  If the latter, can you whip up a prototype?
>>>> No extra code needs to be written. The current use case is
>>>> NetdevUserOptions.{dnssearch,hostfwd,guestfwd}; see commit 094f15c5, and
>>>> (by Klaus Stengel) commit 63d2960b.
>>>
>>> This is to handle lists, but want about converting
>>>
>>>   cpus=3-4,cpus=9-10
>>>
>>> to
>>>
>>>   'cpus': [3,4,9,10]
>>
>> Oh, that. :) That does need extra code. Something along the lines of:
>>
>> (a), in the JSON, reuse the existing String wrapper type, and make
>> "cpus" an optional list of String[s]:
>>
>> { 'type': 'NumaNodeOptions',
>>   'data': {
>>    '*nodeid':		'uint16',
>>    '*cpus':		['String'] }}
>>
>> (b) in the code, traverse the StringList like net_init_slirp_configs()
>> or slirp_dnssearch() does. Parse each element as an interval, set bit
>> ranges / report errors.
>>
>>     static int numa_node_parse_cpu_range(int nodeid,
>>                                          const char *cpu_range)
>>     {
>>         /* what numa_node_parse_cpus() does in 02/12 */
>>     }
>>
>>     static int numa_node_parse(const NumaNodeOptions *opts)
>>     {
>>         const StringList *cpu_range;
>>
>>         /* not sure how to handle the (!opts->has_nodeid) case; let's
>>          * assume we have a nodeid here */
>>
>>         if (opts->nodeid >= MAX_NODES) {
>>             fprintf(stderr,
>>                     "NUMA nodeid %d reaches / exceeds maximum %d\n",
>>                     opts->nodeid, MAX_NODES);
>>             return -1;
>>         }
>>
>>         for (cpu_range = opts->cpus;
>>              cpu_range != NULL;
>>              cpu_range = cpu_range->next) {
>>             int ret;
>>
>>             ret = numa_node_parse_cpu_range(opts->nodeid,
>>                                             cpu_range->value->str);
>>             if (ret < 0) {
>>                 return ret;
>>             }
>>         }
>>         return 0;
>>     }
>>
>> Did you mean something like this by prototype?
> 
> Yes, though I guess Wanlong could do this by himself.  A more
> interesting prototype is "how to add code to OptsVisitor that parses
> intervals when it sees ['int']", and this where you can help the most.

Do you want each element of the range present in the flat list, or just
the boundaries? (The above example, ie [3..4]U[9..10] doesn't
distinguish between these two.)

If the list contains the boundaries only, that's more frugal but
requires smarter code. If the list contains all elements, then big
ranges will result in huge lists (which are then easy to handle piecewise).

Do you also want a<=b checking for [a,b]? (That would imply "inclusive"
on both sides and not allow empty sets easily.)

This is going to be a huge hack, but I can already express the condition
"I'm in a list and looking for the next element as int" in the code. So
maybe I could force some more state into OptsVisitor (specifically
opts_type_int()/opts_type_uint64() and lookup_scalar()) and fake extra
elements.

Better: I could pop "a-b" off "ov->repeated_opts", return "a" (after
parsing), and push back "b". Then "b" would not differ from the current
"individual int" case, and I wouldn't have to add extra state to
maintain between calls.

You just torpedoed planned review efforts for today / tomorrow :)

Laszlo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-07-17 15:45                   ` Laszlo Ersek
@ 2013-07-17 15:54                     ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2013-07-17 15:54 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, peter.huangpeng, qemu-devel, bsd, y-goto,
	lcapitulino, afaerber, Wanlong Gao

Il 17/07/2013 17:45, Laszlo Ersek ha scritto:
>> Yes, though I guess Wanlong could do this by himself.  A more
>> interesting prototype is "how to add code to OptsVisitor that parses
>> intervals when it sees ['int']", and this where you can help the most.
> 
> Do you want each element of the range present in the flat list, or just
> the boundaries? (The above example, ie [3..4]U[9..10] doesn't
> distinguish between these two.)

It should be a real list.

> If the list contains the boundaries only, that's more frugal but
> requires smarter code. If the list contains all elements, then big
> ranges will result in huge lists (which are then easy to handle piecewise).

Huge lists wouldn't be a problem, I think.

> Do you also want a<=b checking for [a,b]? (That would imply "inclusive"
> on both sides and not allow empty sets easily.)
> 
> This is going to be a huge hack, but I can already express the condition
> "I'm in a list and looking for the next element as int" in the code. So
> maybe I could force some more state into OptsVisitor (specifically
> opts_type_int()/opts_type_uint64() and lookup_scalar()) and fake extra
> elements.

Yeah, I guessed something like that.

> Better: I could pop "a-b" off "ov->repeated_opts", return "a" (after
> parsing), and push back "b". Then "b" would not differ from the current
> "individual int" case, and I wouldn't have to add extra state to
> maintain between calls.
> 
> You just torpedoed planned review efforts for today / tomorrow :)

:)

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-07-17 15:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-17  9:29 [Qemu-devel] [PATCH V5 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
2013-07-17 10:35   ` Laszlo Ersek
2013-07-17 11:11     ` Paolo Bonzini
2013-07-17 13:16       ` Wanlong Gao
2013-07-17 12:24     ` Eric Blake
2013-07-17 13:57       ` Laszlo Ersek
2013-07-17 14:20         ` Paolo Bonzini
2013-07-17 14:33           ` Laszlo Ersek
2013-07-17 14:44             ` Paolo Bonzini
2013-07-17 15:24               ` Laszlo Ersek
2013-07-17 15:26                 ` Paolo Bonzini
2013-07-17 15:45                   ` Laszlo Ersek
2013-07-17 15:54                     ` Paolo Bonzini
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 02/12] NUMA: split -numa option Wanlong Gao
2013-07-17 11:00   ` Laszlo Ersek
2013-07-17 11:14     ` Paolo Bonzini
2013-07-17 11:13   ` Paolo Bonzini
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 03/12] NUMA: move numa related code to numa.c Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 04/12] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 05/12] NUMA: Add Linux libnuma detection Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 06/12] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-07-17 12:31   ` Eric Blake
2013-07-17 13:12     ` Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 07/12] NUMA: split out the common range parser Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 08/12] NUMA: set guest numa nodes memory policy Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
2013-07-17 12:36   ` Eric Blake
2013-07-17 13:22     ` Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 10/12] NUMA: add hmp command set-mem-policy Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 11/12] NUMA: add qmp command query-numa Wanlong Gao
2013-07-17 12:41   ` Eric Blake
2013-07-17 13:24     ` Wanlong Gao
2013-07-17  9:29 ` [Qemu-devel] [PATCH V5 12/12] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.