All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
@ 2014-06-09 10:25 Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 01/29] NUMA: move numa related code to new file numa.c Hu Tao
                   ` (29 more replies)
  0 siblings, 30 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

note: this series is based on MST's pci tree.

changes to v3.2:

- rebase to latest git tree since there are several conflicts
- no error_is_set() now
- no QEMUMachineInitArgs now
- use mbind flag MPOL_MF_STRICT to catch memory allocations that don't
  follow policy
- some document & error message fix


Hu Tao (8):
  hostmem: separate allocation from UserCreatable complete method
  hostmem: add properties for NUMA memory policy
  Introduce signed range.
  qapi: make string input visitor parse int list
  qapi: make string output visitor parse int list
  qom: introduce object_property_get_enum and
    object_property_get_uint16List
  qmp: add query-memdev
  hmp: add info memdev

Luiz Capitulino (1):
  man: improve -numa doc

Paolo Bonzini (14):
  vl: redo -object parsing
  qmp: improve error reporting for -object and object-add
  pc: pass MachineState to pc_memory_init
  numa: introduce memory_region_allocate_system_memory
  numa: add -numa node,memdev= option
  memory: reorganize file-based allocation
  memory: move mem_path handling to memory_region_allocate_system_memory
  memory: add error propagation to file-based RAM allocation
  memory: move preallocation code out of exec.c
  memory: move RAM_PREALLOC_MASK to exec.c, rename
  hostmem: add file-based HostMemoryBackend
  hostmem: add merge and dump properties
  hostmem: allow preallocation of any memory region
  hostmem: add property to map memory with MAP_SHARED

Wanlong Gao (6):
  NUMA: move numa related code to new file numa.c
  NUMA: check if the total numa memory size is equal to ram_size
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: convert -numa option to use OptsVisitor
  NUMA: expand MAX_NODES from 64 to 128
  configure: add Linux libnuma detection

 Makefile.target                    |   2 +-
 backends/Makefile.objs             |   1 +
 backends/hostmem-file.c            | 134 ++++++++++++++
 backends/hostmem-ram.c             |   7 +-
 backends/hostmem.c                 | 300 +++++++++++++++++++++++++++++--
 configure                          |  32 ++++
 cpus.c                             |  14 --
 exec.c                             | 211 +++++++++++-----------
 hmp.c                              |  36 ++++
 hmp.h                              |   1 +
 hw/i386/pc.c                       |  37 ++--
 hw/i386/pc_piix.c                  |   8 +-
 hw/i386/pc_q35.c                   |   4 +-
 hw/ppc/spapr.c                     |  11 +-
 include/exec/cpu-all.h             |   8 -
 include/exec/cpu-common.h          |   2 +
 include/exec/memory.h              |  33 ++++
 include/exec/ram_addr.h            |   4 +
 include/hw/boards.h                |   6 +-
 include/hw/i386/pc.h               |   7 +-
 include/qemu/osdep.h               |  12 ++
 include/qemu/range.h               | 124 +++++++++++++
 include/qom/object.h               |  28 +++
 include/sysemu/cpus.h              |   1 -
 include/sysemu/hostmem.h           |   8 +
 include/sysemu/sysemu.h            |  18 +-
 memory.c                           |  29 +++
 monitor.c                          |   9 +-
 numa.c                             | 354 +++++++++++++++++++++++++++++++++++++
 qapi-schema.json                   |  91 ++++++++++
 qapi/string-input-visitor.c        | 181 ++++++++++++++++++-
 qapi/string-output-visitor.c       | 230 ++++++++++++++++++++++--
 qemu-options.hx                    |  16 +-
 qmp-commands.hx                    |  32 ++++
 qmp.c                              |   3 +-
 qom/object.c                       |  35 ++++
 tests/test-string-input-visitor.c  |  39 ++++
 tests/test-string-output-visitor.c |  34 ++++
 util/oslib-posix.c                 |  73 ++++++++
 vl.c                               | 216 +++++-----------------
 40 files changed, 2014 insertions(+), 377 deletions(-)
 create mode 100644 backends/hostmem-file.c
 create mode 100644 numa.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 01/29] NUMA: move numa related code to new file numa.c
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size Hu Tao
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andre Przywara, Eduardo Habkost, Michael S. Tsirkin, Blue Swirl,
	Igor Mammedov, Paolo Bonzini, Yasunori Goto, Wanlong Gao

From: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 Makefile.target           |   2 +-
 cpus.c                    |  14 ----
 include/exec/cpu-all.h    |   2 -
 include/exec/cpu-common.h |   2 +
 include/sysemu/cpus.h     |   1 -
 include/sysemu/sysemu.h   |   3 +
 numa.c                    | 185 ++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                      | 139 +---------------------------------
 8 files changed, 192 insertions(+), 156 deletions(-)
 create mode 100644 numa.c

diff --git a/Makefile.target b/Makefile.target
index 9986047..dd815bb 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -112,7 +112,7 @@ endif #CONFIG_BSD_USER
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
 obj-y += qtest.o
 obj-y += hw/
 obj-$(CONFIG_FDT) += device_tree.o
diff --git a/cpus.c b/cpus.c
index dd7ac13..ce668b7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1312,20 +1312,6 @@ static void tcg_exec_all(void)
     exit_request = 0;
 }
 
-void set_numa_modes(void)
-{
-    CPUState *cpu;
-    int i;
-
-    CPU_FOREACH(cpu) {
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
-                cpu->numa_node = i;
-            }
-        }
-    }
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e8363d7..ed28f1e 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -297,8 +297,6 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 /* memory API */
 
-extern ram_addr_t ram_size;
-
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC_MASK   (1 << 0)
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index a21b65a..e8c7970 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -45,6 +45,8 @@ typedef uintptr_t ram_addr_t;
 #  define RAM_ADDR_FMT "%" PRIxPTR
 #endif
 
+extern ram_addr_t ram_size;
+
 /* memory API */
 
 typedef void CPUWriteMemoryFunc(void *opaque, hwaddr addr, uint32_t value);
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 6502488..4f79081 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -23,7 +23,6 @@ extern int smp_threads;
 #define smp_threads 1
 #endif
 
-void set_numa_modes(void);
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ba5c7f8..565c8f6 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -144,6 +144,9 @@ extern QEMUClockType rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
+void numa_add(const char *optarg);
+void set_numa_nodes(void);
+void set_numa_modes(void);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
new file mode 100644
index 0000000..c419deb
--- /dev/null
+++ b/numa.c
@@ -0,0 +1,185 @@
+/*
+ * NUMA parameter parsing
+ *
+ * Copyright (c) 2014 Fujitsu Ltd.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "sysemu/sysemu.h"
+#include "exec/cpu-common.h"
+#include "qemu/bitmap.h"
+#include "qom/cpu.h"
+
+static void numa_node_parse_cpus(int nodenr, const char *cpus)
+{
+    char *endptr;
+    unsigned long long value, endvalue;
+
+    /* Empty CPU range strings will be considered valid, they will simply
+     * not set any bit in the CPU bitmap.
+     */
+    if (!*cpus) {
+        return;
+    }
+
+    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
+        goto error;
+    }
+    if (*endptr == '-') {
+        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+            goto error;
+        }
+    } else if (*endptr == '\0') {
+        endvalue = value;
+    } else {
+        goto error;
+    }
+
+    if (endvalue >= MAX_CPUMASK_BITS) {
+        endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr,
+            "qemu: NUMA: A max of %d VCPUs are supported\n",
+             MAX_CPUMASK_BITS);
+    }
+
+    if (endvalue < value) {
+        goto error;
+    }
+
+    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    return;
+
+error:
+    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
+    exit(1);
+}
+
+void numa_add(const char *optarg)
+{
+    char option[128];
+    char *endptr;
+    unsigned long long nodenr;
+
+    optarg = get_opt_name(option, 128, optarg, ',');
+    if (*optarg == ',') {
+        optarg++;
+    }
+    if (!strcmp(option, "node")) {
+
+        if (nb_numa_nodes >= MAX_NODES) {
+            fprintf(stderr, "qemu: too many NUMA nodes\n");
+            exit(1);
+        }
+
+        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
+            nodenr = nb_numa_nodes;
+        } else {
+            if (parse_uint_full(option, &nodenr, 10) < 0) {
+                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
+                exit(1);
+            }
+        }
+
+        if (nodenr >= MAX_NODES) {
+            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
+            exit(1);
+        }
+
+        if (get_param_value(option, 128, "mem", optarg) == 0) {
+            node_mem[nodenr] = 0;
+        } else {
+            int64_t sval;
+            sval = strtosz(option, &endptr);
+            if (sval < 0 || *endptr) {
+                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
+                exit(1);
+            }
+            node_mem[nodenr] = sval;
+        }
+        if (get_param_value(option, 128, "cpus", optarg) != 0) {
+            numa_node_parse_cpus(nodenr, option);
+        }
+        nb_numa_nodes++;
+    } else {
+        fprintf(stderr, "Invalid -numa option: %s\n", option);
+        exit(1);
+    }
+}
+
+void set_numa_nodes(void)
+{
+    if (nb_numa_nodes > 0) {
+        int i;
+
+        if (nb_numa_nodes > MAX_NODES) {
+            nb_numa_nodes = MAX_NODES;
+        }
+
+        /* If no memory size if given for any node, assume the default case
+         * and distribute the available memory equally across all nodes
+         */
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (node_mem[i] != 0) {
+                break;
+            }
+        }
+        if (i == nb_numa_nodes) {
+            uint64_t usedmem = 0;
+
+            /* On Linux, the each node's border has to be 8MB aligned,
+             * the final node gets the rest.
+             */
+            for (i = 0; i < nb_numa_nodes - 1; i++) {
+                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
+                usedmem += node_mem[i];
+            }
+            node_mem[i] = ram_size - usedmem;
+        }
+
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+                break;
+            }
+        }
+        /* assigning the VCPUs round-robin is easier to implement, guest OSes
+         * must cope with this anyway, because there are BIOSes out there in
+         * real machines which also use this scheme.
+         */
+        if (i == nb_numa_nodes) {
+            for (i = 0; i < max_cpus; i++) {
+                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+            }
+        }
+    }
+}
+
+void set_numa_modes(void)
+{
+    CPUState *cpu;
+    int i;
+
+    CPU_FOREACH(cpu) {
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+                cpu->numa_node = i;
+            }
+        }
+    }
+}
diff --git a/vl.c b/vl.c
index 5e77a27..eacbcc7 100644
--- a/vl.c
+++ b/vl.c
@@ -1275,102 +1275,6 @@ char *get_boot_devices_list(size_t *size, bool ignore_suffixes)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
-{
-    char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
-    }
-
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
-    }
-
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
-    return;
-
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
-}
-
-static void numa_add(const char *optarg)
-{
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
-
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
-    }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
-
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            node_mem[nodenr] = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
-        exit(1);
-    }
-}
-
 static QemuOptsList qemu_smp_opts = {
     .name = "smp-opts",
     .implied_opt_name = "cpus",
@@ -4400,48 +4304,7 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-    if (nb_numa_nodes > 0) {
-        int i;
-
-        if (nb_numa_nodes > MAX_NODES) {
-            nb_numa_nodes = MAX_NODES;
-        }
-
-        /* If no memory size if given for any node, assume the default case
-         * and distribute the available memory equally across all nodes
-         */
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
-                break;
-        }
-        if (i == nb_numa_nodes) {
-            uint64_t usedmem = 0;
-
-            /* On Linux, the each node's border has to be 8MB aligned,
-             * the final node gets the rest.
-             */
-            for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
-            }
-            node_mem[i] = ram_size - usedmem;
-        }
-
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
-                break;
-            }
-        }
-        /* assigning the VCPUs round-robin is easier to implement, guest OSes
-         * must cope with this anyway, because there are BIOSes out there in
-         * real machines which also use this scheme.
-         */
-        if (i == nb_numa_nodes) {
-            for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
-            }
-        }
-    }
+    set_numa_nodes();
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 01/29] NUMA: move numa related code to new file numa.c Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 23:02   ` Eric Blake
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 03/29] NUMA: Add numa_info structure to contain numa nodes info Hu Tao
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, Igor Mammedov,
	Paolo Bonzini, Yasunori Goto, Wanlong Gao

From: Wanlong Gao <gaowanlong@cn.fujitsu.com>

If the total number of the assigned numa nodes memory is not
equal to the assigned ram size, it will write the wrong data
to ACPI table, then the guest will ignore the wrong ACPI table
and recognize all memory to one node. It's buggy, we should
check it to ensure that we write the right data to ACPI table.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 numa.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/numa.c b/numa.c
index c419deb..b106531 100644
--- a/numa.c
+++ b/numa.c
@@ -26,6 +26,8 @@
 #include "exec/cpu-common.h"
 #include "qemu/bitmap.h"
 #include "qom/cpu.h"
+#include "qemu/error-report.h"
+#include "include/exec/cpu-common.h" /* for RAM_ADDR_FMT */
 
 static void numa_node_parse_cpus(int nodenr, const char *cpus)
 {
@@ -126,6 +128,7 @@ void numa_add(const char *optarg)
 void set_numa_nodes(void)
 {
     if (nb_numa_nodes > 0) {
+        uint64_t numa_total;
         int i;
 
         if (nb_numa_nodes > MAX_NODES) {
@@ -153,6 +156,17 @@ void set_numa_nodes(void)
             node_mem[i] = ram_size - usedmem;
         }
 
+        numa_total = 0;
+        for (i = 0; i < nb_numa_nodes; i++) {
+            numa_total += node_mem[i];
+        }
+        if (numa_total != ram_size) {
+            error_report("qemu: total memory size for NUMA nodes (%" PRIu64 ")"
+                         " should equal to RAM size (" RAM_ADDR_FMT ")\n",
+                         numa_total, ram_size);
+            exit(1);
+        }
+
         for (i = 0; i < nb_numa_nodes; i++) {
             if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
                 break;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 03/29] NUMA: Add numa_info structure to contain numa nodes info
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 01/29] NUMA: move numa related code to new file numa.c Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor Hu Tao
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andre Przywara, Eduardo Habkost, Michael S. Tsirkin,
	Igor Mammedov, Paolo Bonzini, Yasunori Goto, Wanlong Gao

From: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
[Fix hw/ppc/spapr.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/i386/pc.c            | 12 ++++++++----
 hw/ppc/spapr.c          | 11 ++++++-----
 include/sysemu/sysemu.h |  8 ++++++--
 monitor.c               |  2 +-
 numa.c                  | 23 ++++++++++++-----------
 vl.c                    |  7 +++----
 6 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7cdba10..2c75ecc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -705,14 +705,14 @@ static FWCfgState *bochs_bios_init(void)
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + apic_id_limit + nb_numa_nodes) *
@@ -1126,8 +1126,12 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
     guest_info->apic_id_limit = pc_apic_id_limit(max_cpus);
     guest_info->apic_xrupt_override = kvm_allows_irq0_override();
     guest_info->numa_nodes = nb_numa_nodes;
-    guest_info->node_mem = g_memdup(node_mem, guest_info->numa_nodes *
+    guest_info->node_mem = g_malloc0(guest_info->numa_nodes *
                                     sizeof *guest_info->node_mem);
+    for (i = 0; i < nb_numa_nodes; i++) {
+        guest_info->node_mem[i] = numa_info[i].node_mem;
+    }
+
     guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit *
                                      sizeof *guest_info->node_cpu);
 
@@ -1135,7 +1139,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < guest_info->apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 guest_info->node_cpu[apic_id] = j;
                 break;
             }
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 57e9578..d252a60 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -538,8 +538,8 @@ static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt)
     int i, off;
 
     /* memory node(s) */
-    if (nb_numa_nodes > 1 && node_mem[0] < ram_size) {
-        node0_size = node_mem[0];
+    if (nb_numa_nodes > 1 && numa_info[0].node_mem < ram_size) {
+        node0_size = numa_info[0].node_mem;
     } else {
         node0_size = ram_size;
     }
@@ -577,7 +577,7 @@ static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt)
         if (mem_start >= ram_size) {
             node_size = 0;
         } else {
-            node_size = node_mem[i];
+            node_size = numa_info[i].node_mem;
             if (node_size > ram_size - mem_start) {
                 node_size = ram_size - mem_start;
             }
@@ -722,7 +722,8 @@ static void spapr_reset_htab(sPAPREnvironment *spapr)
 
     /* Update the RMA size if necessary */
     if (spapr->vrma_adjust) {
-        hwaddr node0_size = (nb_numa_nodes > 1) ? node_mem[0] : ram_size;
+        hwaddr node0_size = (nb_numa_nodes > 1) ?
+            numa_info[0].node_mem : ram_size;
         spapr->rma_size = kvmppc_rma_size(node0_size, spapr->htab_shift);
     }
 }
@@ -1155,7 +1156,7 @@ static void ppc_spapr_init(MachineState *machine)
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *ram = g_new(MemoryRegion, 1);
     hwaddr rma_alloc_size;
-    hwaddr node0_size = (nb_numa_nodes > 1) ? node_mem[0] : ram_size;
+    hwaddr node0_size = (nb_numa_nodes > 1) ? numa_info[0].node_mem : ram_size;
     uint32_t initrd_base = 0;
     long kernel_size = 0, initrd_size = 0;
     long load_limit, rtas_limit, fw_size;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 565c8f6..3a9308b 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 #include "qapi-types.h"
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
+#include "qemu/bitmap.h"
 
 /* vl.c */
 
@@ -142,8 +143,11 @@ extern QEMUClockType rtc_clock;
 #define MAX_CPUMASK_BITS 255
 
 extern int nb_numa_nodes;
-extern uint64_t node_mem[MAX_NODES];
-extern unsigned long *node_cpumask[MAX_NODES];
+typedef struct node_info {
+    uint64_t node_mem;
+    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+} NodeInfo;
+extern NodeInfo numa_info[MAX_NODES];
 void numa_add(const char *optarg);
 void set_numa_nodes(void);
 void set_numa_modes(void);
diff --git a/monitor.c b/monitor.c
index 0565816..a2a4466 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2006,7 +2006,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            node_mem[i] >> 20);
+            numa_info[i].node_mem >> 20);
     }
 }
 
diff --git a/numa.c b/numa.c
index b106531..fb9bffc 100644
--- a/numa.c
+++ b/numa.c
@@ -65,7 +65,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
         goto error;
     }
 
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return;
 
 error:
@@ -105,7 +105,7 @@ void numa_add(const char *optarg)
         }
 
         if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
+            numa_info[nodenr].node_mem = 0;
         } else {
             int64_t sval;
             sval = strtosz(option, &endptr);
@@ -113,7 +113,7 @@ void numa_add(const char *optarg)
                 fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
                 exit(1);
             }
-            node_mem[nodenr] = sval;
+            numa_info[nodenr].node_mem = sval;
         }
         if (get_param_value(option, 128, "cpus", optarg) != 0) {
             numa_node_parse_cpus(nodenr, option);
@@ -139,7 +139,7 @@ void set_numa_nodes(void)
          * and distribute the available memory equally across all nodes
          */
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0) {
+            if (numa_info[i].node_mem != 0) {
                 break;
             }
         }
@@ -150,15 +150,16 @@ void set_numa_nodes(void)
              * the final node gets the rest.
              */
             for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
+                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
+                                        ~((1 << 23UL) - 1);
+                usedmem += numa_info[i].node_mem;
             }
-            node_mem[i] = ram_size - usedmem;
+            numa_info[i].node_mem = ram_size - usedmem;
         }
 
         numa_total = 0;
         for (i = 0; i < nb_numa_nodes; i++) {
-            numa_total += node_mem[i];
+            numa_total += numa_info[i].node_mem;
         }
         if (numa_total != ram_size) {
             error_report("qemu: total memory size for NUMA nodes (%" PRIu64 ")"
@@ -168,7 +169,7 @@ void set_numa_nodes(void)
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
         }
@@ -178,7 +179,7 @@ void set_numa_nodes(void)
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
             }
         }
     }
@@ -191,7 +192,7 @@ void set_numa_modes(void)
 
     CPU_FOREACH(cpu) {
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
                 cpu->numa_node = i;
             }
         }
diff --git a/vl.c b/vl.c
index eacbcc7..a38e6dd 100644
--- a/vl.c
+++ b/vl.c
@@ -195,8 +195,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
-uint64_t node_mem[MAX_NODES];
-unsigned long *node_cpumask[MAX_NODES];
+NodeInfo numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
 bool qemu_uuid_set;
@@ -2959,8 +2958,8 @@ int main(int argc, char **argv, char **envp)
     translation = BIOS_ATA_TRANSLATION_AUTO;
 
     for (i = 0; i < MAX_NODES; i++) {
-        node_mem[i] = 0;
-        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
+        numa_info[i].node_mem = 0;
+        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
     }
 
     nb_numa_nodes = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (2 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 03/29] NUMA: Add numa_info structure to contain numa nodes info Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-16 14:08   ` Eduardo Habkost
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 05/29] NUMA: expand MAX_NODES from 64 to 128 Hu Tao
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, Igor Mammedov,
	Paolo Bonzini, Yasunori Goto, Wanlong Gao

From: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |   3 +-
 numa.c                  | 145 +++++++++++++++++++++++-------------------------
 qapi-schema.json        |  32 +++++++++++
 vl.c                    |  11 +++-
 4 files changed, 114 insertions(+), 77 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3a9308b..4102be3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -148,9 +148,10 @@ typedef struct node_info {
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
-void numa_add(const char *optarg);
 void set_numa_nodes(void);
 void set_numa_modes(void);
+extern QemuOptsList qemu_numa_opts;
+int numa_init_func(QemuOpts *opts, void *opaque);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
index fb9bffc..43573c5 100644
--- a/numa.c
+++ b/numa.c
@@ -28,101 +28,96 @@
 #include "qom/cpu.h"
 #include "qemu/error-report.h"
 #include "include/exec/cpu-common.h" /* for RAM_ADDR_FMT */
-
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
+#include "qapi-visit.h"
+#include "qapi/opts-visitor.h"
+#include "qapi/dealloc-visitor.h"
+#include "qapi/qmp/qerror.h"
+
+QemuOptsList qemu_numa_opts = {
+    .name = "numa",
+    .implied_opt_name = "type",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+    .desc = { { 0 } } /* validated with OptsVisitor */
+};
+
+static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
 {
-    char *endptr;
-    unsigned long long value, endvalue;
+    uint16_t nodenr;
+    uint16List *cpus = NULL;
 
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
-    }
-
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
+    if (node->has_nodeid) {
+        nodenr = node->nodeid;
     } else {
-        goto error;
+        nodenr = nb_numa_nodes;
     }
 
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
+    if (nodenr >= MAX_NODES) {
+        error_setg(errp, "Max number of NUMA nodes reached: %"
+                   PRIu16 "\n", nodenr);
+        return;
     }
 
-    if (endvalue < value) {
-        goto error;
+    for (cpus = node->cpus; cpus; cpus = cpus->next) {
+        if (cpus->value > MAX_CPUMASK_BITS) {
+            error_setg(errp, "CPU number %" PRIu16 " is bigger than %d",
+                       cpus->value, MAX_CPUMASK_BITS);
+            return;
+        }
+        bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
     }
 
-    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
-    return;
-
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
+    if (node->has_mem) {
+        uint64_t mem_size = node->mem;
+        const char *mem_str = qemu_opt_get(opts, "mem");
+        /* Fix up legacy suffix-less format */
+        if (g_ascii_isdigit(mem_str[strlen(mem_str) - 1])) {
+            mem_size <<= 20;
+        }
+        numa_info[nodenr].node_mem = mem_size;
+    }
 }
 
-void numa_add(const char *optarg)
+int numa_init_func(QemuOpts *opts, void *opaque)
 {
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
+    NumaOptions *object = NULL;
+    Error *err = NULL;
 
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
+    {
+        OptsVisitor *ov = opts_visitor_new(opts);
+        visit_type_NumaOptions(opts_get_visitor(ov), &object, NULL, &err);
+        opts_visitor_cleanup(ov);
     }
-    if (!strcmp(option, "node")) {
 
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
+    if (err) {
+        goto error;
+    }
 
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
+    switch (object->kind) {
+    case NUMA_OPTIONS_KIND_NODE:
+        numa_node_parse(object->node, opts, &err);
+        if (err) {
+            goto error;
         }
+        nb_numa_nodes++;
+        break;
+    default:
+        abort();
+    }
 
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
+    return 0;
 
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            numa_info[nodenr].node_mem = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            numa_info[nodenr].node_mem = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
-        exit(1);
+error:
+    qerror_report_err(err);
+    error_free(err);
+
+    if (object) {
+        QapiDeallocVisitor *dv = qapi_dealloc_visitor_new();
+        visit_type_NumaOptions(qapi_dealloc_get_visitor(dv),
+                               &object, NULL, NULL);
+        qapi_dealloc_visitor_cleanup(dv);
     }
+
+    return -1;
 }
 
 void set_numa_nodes(void)
diff --git a/qapi-schema.json b/qapi-schema.json
index 7bc33ea..8ce01cb 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4722,3 +4722,35 @@
               'btn'     : 'InputBtnEvent',
               'rel'     : 'InputMoveEvent',
               'abs'     : 'InputMoveEvent' } }
+
+##
+# @NumaOptions
+#
+# A discriminated record of NUMA options. (for OptsVisitor)
+#
+# Since 2.1
+##
+{ 'union': 'NumaOptions',
+  'data': {
+    'node': 'NumaNodeOptions' }}
+
+##
+# @NumaNodeOptions
+#
+# Create a guest NUMA node. (for OptsVisitor)
+#
+# @nodeid: #optional NUMA node ID (increase by 1 from 0 if omitted)
+#
+# @cpus: #optional VCPUs belonging to this node (assign VCPUS round-robin
+#         if omitted)
+#
+# @mem: #optional memory size of this node (equally divide total memory among
+#        nodes if omitted)
+#
+# Since: 2.1
+##
+{ 'type': 'NumaNodeOptions',
+  'data': {
+   '*nodeid': 'uint16',
+   '*cpus':   ['uint16'],
+   '*mem':    'size' }}
diff --git a/vl.c b/vl.c
index a38e6dd..730bd1e 100644
--- a/vl.c
+++ b/vl.c
@@ -2938,6 +2938,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_realtime_opts);
     qemu_add_opts(&qemu_msg_opts);
     qemu_add_opts(&qemu_name_opts);
+    qemu_add_opts(&qemu_numa_opts);
 
     runstate_init();
 
@@ -3133,7 +3134,10 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_numa:
-                numa_add(optarg);
+                opts = qemu_opts_parse(qemu_find_opts("numa"), optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_display:
                 display_type = select_display(optarg);
@@ -4303,6 +4307,11 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
+    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
     set_numa_nodes();
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 05/29] NUMA: expand MAX_NODES from 64 to 128
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (3 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 06/29] man: improve -numa doc Hu Tao
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, Igor Mammedov,
	Paolo Bonzini, Yasunori Goto, Wanlong Gao

From: Wanlong Gao <gaowanlong@cn.fujitsu.com>

libnuma choosed 128 for MAX_NODES, so we follow libnuma here.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 include/sysemu/sysemu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 4102be3..423d49e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -133,7 +133,7 @@ extern size_t boot_splash_filedata_size;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 
-#define MAX_NODES 64
+#define MAX_NODES 128
 
 /* The following shall be true for all CPUs:
  *   cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 06/29] man: improve -numa doc
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (4 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 05/29] NUMA: expand MAX_NODES from 64 to 128 Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 07/29] vl: redo -object parsing Hu Tao
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, Luiz Capitulino,
	Igor Mammedov, Paolo Bonzini, Yasunori Goto

From: Luiz Capitulino <lcapitulino@redhat.com>

The -numa option documentation in qemu's manpage lacks the command-line
options and some information regarding how it relates to options -m and
-smp. This commit fills in the missing text.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 qemu-options.hx | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 5a4eff9..d3cd2ce 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -97,10 +97,14 @@ ETEXI
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
 STEXI
-@item -numa @var{opts}
+@item -numa node[,mem=@var{size}][,cpus=@var{cpu[-cpu]}][,nodeid=@var{node}]
 @findex -numa
-Simulate a multi node NUMA system. If mem and cpus are omitted, resources
-are split equally.
+Simulate a multi node NUMA system. If @samp{mem}
+and @samp{cpus} are omitted, resources are split equally. Also, note
+that the -@option{numa} option doesn't allocate any of the specified
+resources. That is, it just assigns existing resources to NUMA nodes. This
+means that one still has to use the @option{-m}, @option{-smp} options
+to allocate RAM and vCPUs respectively.
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 07/29] vl: redo -object parsing
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (5 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 06/29] man: improve -numa doc Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add Hu Tao
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Follow the lines of the HMP implementation, using OptsVisitor
to parse the options.  This gives access to OptsVisitor's
rich parsing of integer lists.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 vl.c | 65 ++++++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/vl.c b/vl.c
index 730bd1e..1617013 100644
--- a/vl.c
+++ b/vl.c
@@ -116,7 +116,7 @@ int main(int argc, char **argv)
 
 #include "ui/qemu-spice.h"
 #include "qapi/string-input-visitor.h"
-#include "qom/object_interfaces.h"
+#include "qapi/opts-visitor.h"
 
 #define DEFAULT_RAM_SIZE 128
 
@@ -2822,44 +2822,51 @@ static int object_set_property(const char *name, const char *value, void *opaque
 
 static int object_create(QemuOpts *opts, void *opaque)
 {
-    const char *type = qemu_opt_get(opts, "qom-type");
-    const char *id = qemu_opts_id(opts);
-    Error *local_err = NULL;
-    Object *obj;
-
-    g_assert(type != NULL);
-
-    if (id == NULL) {
-        qerror_report(QERR_MISSING_PARAMETER, "id");
-        return -1;
+    Error *err = NULL;
+    char *type = NULL;
+    char *id = NULL;
+    void *dummy = NULL;
+    OptsVisitor *ov;
+    QDict *pdict;
+
+    ov = opts_visitor_new(opts);
+    pdict = qemu_opts_to_qdict(opts, NULL);
+
+    visit_start_struct(opts_get_visitor(ov), &dummy, NULL, NULL, 0, &err);
+    if (err) {
+        goto out;
     }
 
-    obj = object_new(type);
-    if (qemu_opt_foreach(opts, object_set_property, obj, 1) < 0) {
-        object_unref(obj);
-        return -1;
+    qdict_del(pdict, "qom-type");
+    visit_type_str(opts_get_visitor(ov), &type, "qom-type", &err);
+    if (err) {
+        goto out;
     }
 
-    if (!object_dynamic_cast(obj, TYPE_USER_CREATABLE)) {
-        error_setg(&local_err, "object '%s' isn't supported by -object",
-                   id);
+    qdict_del(pdict, "id");
+    visit_type_str(opts_get_visitor(ov), &id, "id", &err);
+    if (err) {
         goto out;
     }
 
-    object_property_add_child(container_get(object_get_root(), "/objects"),
-                              id, obj, &local_err);
-
-    user_creatable_complete(obj, &local_err);
-    if (local_err) {
-        object_property_del(container_get(object_get_root(), "/objects"),
-                            id, &error_abort);
+    object_add(type, id, pdict, opts_get_visitor(ov), &err);
+    if (err) {
         goto out;
     }
+    visit_end_struct(opts_get_visitor(ov), &err);
+    if (err) {
+        qmp_object_del(id, NULL);
+    }
+
 out:
-    object_unref(obj);
-    if (local_err) {
-        qerror_report_err(local_err);
-        error_free(local_err);
+    opts_visitor_cleanup(ov);
+
+    QDECREF(pdict);
+    g_free(id);
+    g_free(type);
+    g_free(dummy);
+    if (err) {
+        qerror_report_err(err);
         return -1;
     }
     return 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (6 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 07/29] vl: redo -object parsing Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 15:57   ` Igor Mammedov
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 09/29] pc: pass MachineState to pc_memory_init Hu Tao
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Use QERR_INVALID_PARAMETER_VALUE for consistency.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
---
 qmp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qmp.c b/qmp.c
index b722dbe..cef60fb 100644
--- a/qmp.c
+++ b/qmp.c
@@ -540,7 +540,8 @@ void object_add(const char *type, const char *id, const QDict *qdict,
 
     klass = object_class_by_name(type);
     if (!klass) {
-        error_setg(errp, "invalid class name");
+        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "qom-type", "a valid class name");
         return;
     }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 09/29] pc: pass MachineState to pc_memory_init
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (7 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 13:14   ` Igor Mammedov
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 10/29] numa: introduce memory_region_allocate_system_memory Hu Tao
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/i386/pc.c         | 23 +++++++++++------------
 hw/i386/pc_piix.c    |  8 +++-----
 hw/i386/pc_q35.c     |  4 +---
 include/hw/i386/pc.h |  7 +++----
 4 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2c75ecc..9860e3f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1194,10 +1194,8 @@ void pc_acpi_init(const char *default_dsdt)
     }
 }
 
-FWCfgState *pc_memory_init(MemoryRegion *system_memory,
-                           const char *kernel_filename,
-                           const char *kernel_cmdline,
-                           const char *initrd_filename,
+FWCfgState *pc_memory_init(MachineState *machine,
+                           MemoryRegion *system_memory,
                            ram_addr_t below_4g_mem_size,
                            ram_addr_t above_4g_mem_size,
                            MemoryRegion *rom_memory,
@@ -1208,18 +1206,18 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
     MemoryRegion *ram, *option_rom_mr;
     MemoryRegion *ram_below_4g, *ram_above_4g;
     FWCfgState *fw_cfg;
-    ram_addr_t ram_size = below_4g_mem_size + above_4g_mem_size;
-    MachineState *machine = MACHINE(qdev_get_machine());
     PCMachineState *pcms = PC_MACHINE(machine);
 
-    linux_boot = (kernel_filename != NULL);
+    assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
+
+    linux_boot = (machine->kernel_filename != NULL);
 
     /* Allocate RAM.  We allocate it as a single memory region and use
      * aliases to address portions of it, mostly for backwards compatibility
      * with older qemus that used qemu_ram_alloc().
      */
     ram = g_malloc(sizeof(*ram));
-    memory_region_init_ram(ram, NULL, "pc.ram", ram_size);
+    memory_region_init_ram(ram, NULL, "pc.ram", machine->ram_size);
     vmstate_register_ram_global(ram);
     *ram_memory = ram;
     ram_below_4g = g_malloc(sizeof(*ram_below_4g));
@@ -1238,7 +1236,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 
     if (!guest_info->has_reserved_memory &&
         (machine->ram_slots ||
-         (machine->maxram_size > ram_size))) {
+         (machine->maxram_size > machine->ram_size))) {
         MachineClass *mc = MACHINE_GET_CLASS(machine);
 
         error_report("\"-memory 'slots|maxmem'\" is not supported by: %s",
@@ -1248,9 +1246,9 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 
     /* initialize hotplug memory address space */
     if (guest_info->has_reserved_memory &&
-        (ram_size < machine->maxram_size)) {
+        (machine->ram_size < machine->maxram_size)) {
         ram_addr_t hotplug_mem_size =
-            machine->maxram_size - ram_size;
+            machine->maxram_size - machine->ram_size;
 
         if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
             error_report("unsupported amount of memory slots: %"PRIu64,
@@ -1295,7 +1293,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
     }
 
     if (linux_boot) {
-        load_linux(fw_cfg, kernel_filename, initrd_filename, kernel_cmdline, below_4g_mem_size);
+        load_linux(fw_cfg, machine->kernel_filename, machine->initrd_filename,
+                   machine->kernel_cmdline, below_4g_mem_size);
     }
 
     for (i = 0; i < nb_option_roms; i++) {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a13e8d6..3e7524b 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -156,11 +156,9 @@ static void pc_init1(MachineState *machine,
 
     /* allocate ram and load rom/bios */
     if (!xen_enabled()) {
-        fw_cfg = pc_memory_init(system_memory,
-                       machine->kernel_filename, machine->kernel_cmdline,
-                       machine->initrd_filename,
-                       below_4g_mem_size, above_4g_mem_size,
-                       rom_memory, &ram_memory, guest_info);
+        fw_cfg = pc_memory_init(machine, system_memory,
+                                below_4g_mem_size, above_4g_mem_size,
+                                rom_memory, &ram_memory, guest_info);
     }
 
     gsi_state = g_malloc0(sizeof(*gsi_state));
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 629eb2d..aa71332 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -143,9 +143,7 @@ static void pc_q35_init(MachineState *machine)
 
     /* allocate ram and load rom/bios */
     if (!xen_enabled()) {
-        pc_memory_init(get_system_memory(),
-                       machine->kernel_filename, machine->kernel_cmdline,
-                       machine->initrd_filename,
+        pc_memory_init(machine, get_system_memory(),
                        below_4g_mem_size, above_4g_mem_size,
                        rom_memory, &ram_memory, guest_info);
     }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index fe9e18b..f337d54 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -3,6 +3,7 @@
 
 #include "qemu-common.h"
 #include "exec/memory.h"
+#include "hw/boards.h"
 #include "hw/isa/isa.h"
 #include "hw/block/fdc.h"
 #include "net/net.h"
@@ -183,10 +184,8 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
 void pc_pci_as_mapping_init(Object *owner, MemoryRegion *system_memory,
                             MemoryRegion *pci_address_space);
 
-FWCfgState *pc_memory_init(MemoryRegion *system_memory,
-                           const char *kernel_filename,
-                           const char *kernel_cmdline,
-                           const char *initrd_filename,
+FWCfgState *pc_memory_init(MachineState *machine,
+                           MemoryRegion *system_memory,
                            ram_addr_t below_4g_mem_size,
                            ram_addr_t above_4g_mem_size,
                            MemoryRegion *rom_memory,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 10/29] numa: introduce memory_region_allocate_system_memory
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (8 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 09/29] pc: pass MachineState to pc_memory_init Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method Hu Tao
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hw/i386/pc.c            | 4 ++--
 include/hw/boards.h     | 6 +++++-
 include/sysemu/sysemu.h | 1 +
 numa.c                  | 9 +++++++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 9860e3f..37344ce 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1217,8 +1217,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
      * with older qemus that used qemu_ram_alloc().
      */
     ram = g_malloc(sizeof(*ram));
-    memory_region_init_ram(ram, NULL, "pc.ram", machine->ram_size);
-    vmstate_register_ram_global(ram);
+    memory_region_allocate_system_memory(ram, NULL, "pc.ram",
+                                         machine->ram_size);
     *ram_memory = ram;
     ram_below_4g = g_malloc(sizeof(*ram_below_4g));
     memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 429ac43..605a970 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -43,9 +43,13 @@ struct QEMUMachine {
     const char *hw_version;
 };
 
-#define TYPE_MACHINE_SUFFIX "-machine"
+void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
+                                          const char *name,
+                                          uint64_t ram_size);
+
 int qemu_register_machine(QEMUMachine *m);
 
+#define TYPE_MACHINE_SUFFIX "-machine"
 #define TYPE_MACHINE "machine"
 #undef MACHINE  /* BSD defines it and QEMU does not use it */
 #define MACHINE(obj) \
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 423d49e..caf88dd 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -10,6 +10,7 @@
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
 #include "qemu/bitmap.h"
+#include "qom/object.h"
 
 /* vl.c */
 
diff --git a/numa.c b/numa.c
index 43573c5..efdebf4 100644
--- a/numa.c
+++ b/numa.c
@@ -32,6 +32,7 @@
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
 #include "qapi/qmp/qerror.h"
+#include "hw/boards.h"
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -193,3 +194,11 @@ void set_numa_modes(void)
         }
     }
 }
+
+void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
+                                          const char *name,
+                                          uint64_t ram_size)
+{
+    memory_region_init_ram(mr, owner, name, ram_size);
+    vmstate_register_ram_global(mr);
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (9 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 10/29] numa: introduce memory_region_allocate_system_memory Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:47   ` Igor Mammedov
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node,memdev= option Hu Tao
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

This allows the superclass to set various policies on the memory
region that the subclass creates. Drops hostmem-ram's complete method
accordingly.

While at file hostmem.c, s/hostmemory/host_memory/ to keep names
consistant.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 backends/hostmem-ram.c   |  7 +++----
 backends/hostmem.c       | 40 ++++++++++++++++++++++++++++++----------
 include/sysemu/hostmem.h |  2 ++
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index bba2ebc..d9a8290 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -16,9 +16,8 @@
 
 
 static void
-ram_backend_memory_init(UserCreatable *uc, Error **errp)
+ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
-    HostMemoryBackend *backend = MEMORY_BACKEND(uc);
     char *path;
 
     if (!backend->size) {
@@ -35,9 +34,9 @@ ram_backend_memory_init(UserCreatable *uc, Error **errp)
 static void
 ram_backend_class_init(ObjectClass *oc, void *data)
 {
-    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
 
-    ucc->complete = ram_backend_memory_init;
+    bc->alloc = ram_backend_memory_alloc;
 }
 
 static const TypeInfo ram_backend_info = {
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 2f578ac..cc57c13 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -17,7 +17,7 @@
 #include "qom/object_interfaces.h"
 
 static void
-hostmemory_backend_get_size(Object *obj, Visitor *v, void *opaque,
+host_memory_backend_get_size(Object *obj, Visitor *v, void *opaque,
                             const char *name, Error **errp)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -27,7 +27,7 @@ hostmemory_backend_get_size(Object *obj, Visitor *v, void *opaque,
 }
 
 static void
-hostmemory_backend_set_size(Object *obj, Visitor *v, void *opaque,
+host_memory_backend_set_size(Object *obj, Visitor *v, void *opaque,
                             const char *name, Error **errp)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -53,14 +53,14 @@ out:
     error_propagate(errp, local_err);
 }
 
-static void hostmemory_backend_init(Object *obj)
+static void host_memory_backend_init(Object *obj)
 {
     object_property_add(obj, "size", "int",
-                        hostmemory_backend_get_size,
-                        hostmemory_backend_set_size, NULL, NULL, NULL);
+                        host_memory_backend_get_size,
+                        host_memory_backend_set_size, NULL, NULL, NULL);
 }
 
-static void hostmemory_backend_finalize(Object *obj)
+static void host_memory_backend_finalize(Object *obj)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
@@ -75,14 +75,34 @@ host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
     return memory_region_size(&backend->mr) ? &backend->mr : NULL;
 }
 
-static const TypeInfo hostmemory_backend_info = {
+static void
+host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(uc);
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
+
+    if (bc->alloc) {
+        bc->alloc(backend, errp);
+    }
+}
+
+static void
+host_memory_backend_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+
+    ucc->complete = host_memory_backend_memory_complete;
+}
+
+static const TypeInfo host_memory_backend_info = {
     .name = TYPE_MEMORY_BACKEND,
     .parent = TYPE_OBJECT,
     .abstract = true,
     .class_size = sizeof(HostMemoryBackendClass),
+    .class_init = host_memory_backend_class_init,
     .instance_size = sizeof(HostMemoryBackend),
-    .instance_init = hostmemory_backend_init,
-    .instance_finalize = hostmemory_backend_finalize,
+    .instance_init = host_memory_backend_init,
+    .instance_finalize = host_memory_backend_finalize,
     .interfaces = (InterfaceInfo[]) {
         { TYPE_USER_CREATABLE },
         { }
@@ -91,7 +111,7 @@ static const TypeInfo hostmemory_backend_info = {
 
 static void register_types(void)
 {
-    type_register_static(&hostmemory_backend_info);
+    type_register_static(&host_memory_backend_info);
 }
 
 type_init(register_types);
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 4fc081e..923f672 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -34,6 +34,8 @@ typedef struct HostMemoryBackendClass HostMemoryBackendClass;
  */
 struct HostMemoryBackendClass {
     ObjectClass parent_class;
+
+    void (*alloc)(HostMemoryBackend *backend, Error **errp);
 };
 
 /**
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 12/29] numa: add -numa node,memdev= option
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (10 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 17:22   ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node, memdev= option Eric Blake
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 13/29] memory: reorganize file-based allocation Hu Tao
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

This option provides the infrastructure for binding guest NUMA nodes
to host NUMA nodes.  For example:

 -object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
 -numa node,nodeid=0,cpus=0,memdev=ram-node0 \
 -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
 -numa node,nodeid=1,cpus=1,memdev=ram-node1

The option replaces "-numa node,mem=".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  1 +
 numa.c                  | 62 +++++++++++++++++++++++++++++++++++++++++++++++--
 qapi-schema.json        | 11 ++++++---
 qemu-options.hx         | 12 ++++++----
 4 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index caf88dd..1e141e3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -147,6 +147,7 @@ extern int nb_numa_nodes;
 typedef struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+    struct HostMemoryBackend *node_memdev;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
 void set_numa_nodes(void);
diff --git a/numa.c b/numa.c
index efdebf4..ce43e69 100644
--- a/numa.c
+++ b/numa.c
@@ -33,6 +33,7 @@
 #include "qapi/dealloc-visitor.h"
 #include "qapi/qmp/qerror.h"
 #include "hw/boards.h"
+#include "sysemu/hostmem.h"
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -41,6 +42,8 @@ QemuOptsList qemu_numa_opts = {
     .desc = { { 0 } } /* validated with OptsVisitor */
 };
 
+static int have_memdevs = -1;
+
 static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
 {
     uint16_t nodenr;
@@ -67,6 +70,20 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
         bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
     }
 
+    if (node->has_mem && node->has_memdev) {
+        error_setg(errp, "qemu: cannot specify both mem= and memdev=\n");
+        return;
+    }
+
+    if (have_memdevs == -1) {
+        have_memdevs = node->has_memdev;
+    }
+    if (node->has_memdev != have_memdevs) {
+        error_setg(errp, "qemu: memdev option must be specified for either "
+                   "all or no nodes\n");
+        return;
+    }
+
     if (node->has_mem) {
         uint64_t mem_size = node->mem;
         const char *mem_str = qemu_opt_get(opts, "mem");
@@ -76,6 +93,18 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
         }
         numa_info[nodenr].node_mem = mem_size;
     }
+    if (node->has_memdev) {
+        Object *o;
+        o = object_resolve_path_type(node->memdev, TYPE_MEMORY_BACKEND, NULL);
+        if (!o) {
+            error_setg(errp, "memdev=%s is ambiguous", node->memdev);
+            return;
+        }
+
+        object_ref(o);
+        numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
+        numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
+    }
 }
 
 int numa_init_func(QemuOpts *opts, void *opaque)
@@ -195,10 +224,39 @@ void set_numa_modes(void)
     }
 }
 
+static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
+                                           const char *name,
+                                           uint64_t ram_size)
+{
+    memory_region_init_ram(mr, owner, name, ram_size);
+    vmstate_register_ram_global(mr);
+}
+
 void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
                                           const char *name,
                                           uint64_t ram_size)
 {
-    memory_region_init_ram(mr, owner, name, ram_size);
-    vmstate_register_ram_global(mr);
+    uint64_t addr = 0;
+    int i;
+
+    if (nb_numa_nodes == 0 || !have_memdevs) {
+        allocate_system_memory_nonnuma(mr, owner, name, ram_size);
+        return;
+    }
+
+    memory_region_init(mr, owner, name, ram_size);
+    for (i = 0; i < nb_numa_nodes; i++) {
+        Error *local_err = NULL;
+        uint64_t size = numa_info[i].node_mem;
+        HostMemoryBackend *backend = numa_info[i].node_memdev;
+        MemoryRegion *seg = host_memory_backend_get_memory(backend, &local_err);
+        if (local_err) {
+            qerror_report_err(local_err);
+            exit(1);
+        }
+
+        memory_region_add_subregion(mr, addr, seg);
+        vmstate_register_ram_global(seg);
+        addr += size;
+    }
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 8ce01cb..d5ab066 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4744,8 +4744,12 @@
 # @cpus: #optional VCPUs belonging to this node (assign VCPUS round-robin
 #         if omitted)
 #
-# @mem: #optional memory size of this node (equally divide total memory among
-#        nodes if omitted)
+# @mem: #optional memory size of this node; mutually exclusive with @memdev.
+#       Equally divide total memory among nodes if both @mem and @memdev are
+#       omitted.
+#
+# @memdev: #optional memory backend object.  If specified for one node,
+#          it must be specified for all nodes.
 #
 # Since: 2.1
 ##
@@ -4753,4 +4757,5 @@
   'data': {
    '*nodeid': 'uint16',
    '*cpus':   ['uint16'],
-   '*mem':    'size' }}
+   '*mem':    'size',
+   '*memdev': 'str' }}
diff --git a/qemu-options.hx b/qemu-options.hx
index d3cd2ce..e448d33 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -95,16 +95,20 @@ specifies the maximum number of hotpluggable CPUs.
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+    "-numa node[,mem=size][,memdev=id][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
 STEXI
-@item -numa node[,mem=@var{size}][,cpus=@var{cpu[-cpu]}][,nodeid=@var{node}]
+@item -numa node[,mem=@var{size}][,memdev=@var{id}][,cpus=@var{cpu[-cpu]}][,nodeid=@var{node}]
 @findex -numa
-Simulate a multi node NUMA system. If @samp{mem}
+Simulate a multi node NUMA system. If @samp{mem}, @samp{memdev}
 and @samp{cpus} are omitted, resources are split equally. Also, note
 that the -@option{numa} option doesn't allocate any of the specified
 resources. That is, it just assigns existing resources to NUMA nodes. This
 means that one still has to use the @option{-m}, @option{-smp} options
-to allocate RAM and vCPUs respectively.
+to allocate RAM and vCPU srespectively, and possibly @option{-object}
+to specify the memory backend for the @samp{memdev} suboption.
+
+@samp{mem} and @samp{memdev} are mutually exclusive.  Furthermore, if one
+node uses @samp{memdev}, all of them have to use it.
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 13/29] memory: reorganize file-based allocation
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (11 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node,memdev= option Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 14/29] memory: move mem_path handling to memory_region_allocate_system_memory Hu Tao
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Split the internal interface in exec.c to a separate function, and
push the check on mem_path up to memory_region_init_ram.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 exec.c                  | 105 +++++++++++++++++++++++++++++-------------------
 include/exec/cpu-all.h  |   3 --
 include/exec/ram_addr.h |   2 +
 include/sysemu/sysemu.h |   2 +
 memory.c                |   7 +++-
 5 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/exec.c b/exec.c
index 4e179a6..525fc04 100644
--- a/exec.c
+++ b/exec.c
@@ -1246,56 +1246,30 @@ static int memory_try_enable_merging(void *addr, size_t len)
     return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
-ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
-                                   MemoryRegion *mr)
+static ram_addr_t ram_block_add(RAMBlock *new_block)
 {
-    RAMBlock *block, *new_block;
+    RAMBlock *block;
     ram_addr_t old_ram_size, new_ram_size;
 
     old_ram_size = last_ram_offset() >> TARGET_PAGE_BITS;
 
-    size = TARGET_PAGE_ALIGN(size);
-    new_block = g_malloc0(sizeof(*new_block));
-    new_block->fd = -1;
-
     /* This assumes the iothread lock is taken here too.  */
     qemu_mutex_lock_ramlist();
-    new_block->mr = mr;
-    new_block->offset = find_ram_offset(size);
-    if (host) {
-        new_block->host = host;
-        new_block->flags |= RAM_PREALLOC_MASK;
-    } else if (xen_enabled()) {
-        if (mem_path) {
-            fprintf(stderr, "-mem-path not supported with Xen\n");
-            exit(1);
-        }
-        xen_ram_alloc(new_block->offset, size, mr);
-    } else {
-        if (mem_path) {
-            if (phys_mem_alloc != qemu_anon_ram_alloc) {
-                /*
-                 * file_ram_alloc() needs to allocate just like
-                 * phys_mem_alloc, but we haven't bothered to provide
-                 * a hook there.
-                 */
-                fprintf(stderr,
-                        "-mem-path not supported with this accelerator\n");
-                exit(1);
-            }
-            new_block->host = file_ram_alloc(new_block, size, mem_path);
-        }
-        if (!new_block->host) {
-            new_block->host = phys_mem_alloc(size);
+    new_block->offset = find_ram_offset(new_block->length);
+
+    if (!new_block->host) {
+        if (xen_enabled()) {
+            xen_ram_alloc(new_block->offset, new_block->length, new_block->mr);
+        } else {
+            new_block->host = phys_mem_alloc(new_block->length);
             if (!new_block->host) {
                 fprintf(stderr, "Cannot set up guest memory '%s': %s\n",
                         new_block->mr->name, strerror(errno));
                 exit(1);
             }
-            memory_try_enable_merging(new_block->host, size);
+            memory_try_enable_merging(new_block->host, new_block->length);
         }
     }
-    new_block->length = size;
 
     /* Keep the list sorted from biggest to smallest block.  */
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
@@ -1323,18 +1297,65 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    old_ram_size, new_ram_size);
        }
     }
-    cpu_physical_memory_set_dirty_range(new_block->offset, size);
+    cpu_physical_memory_set_dirty_range(new_block->offset, new_block->length);
 
-    qemu_ram_setup_dump(new_block->host, size);
-    qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);
-    qemu_madvise(new_block->host, size, QEMU_MADV_DONTFORK);
+    qemu_ram_setup_dump(new_block->host, new_block->length);
+    qemu_madvise(new_block->host, new_block->length, QEMU_MADV_HUGEPAGE);
+    qemu_madvise(new_block->host, new_block->length, QEMU_MADV_DONTFORK);
 
-    if (kvm_enabled())
-        kvm_setup_guest_memory(new_block->host, size);
+    if (kvm_enabled()) {
+        kvm_setup_guest_memory(new_block->host, new_block->length);
+    }
 
     return new_block->offset;
 }
 
+ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
+                                    const char *mem_path)
+{
+    RAMBlock *new_block;
+
+    if (xen_enabled()) {
+        fprintf(stderr, "-mem-path not supported with Xen\n");
+        exit(1);
+    }
+
+    if (phys_mem_alloc != qemu_anon_ram_alloc) {
+        /*
+         * file_ram_alloc() needs to allocate just like
+         * phys_mem_alloc, but we haven't bothered to provide
+         * a hook there.
+         */
+        fprintf(stderr,
+                "-mem-path not supported with this accelerator\n");
+        exit(1);
+    }
+
+    size = TARGET_PAGE_ALIGN(size);
+    new_block = g_malloc0(sizeof(*new_block));
+    new_block->mr = mr;
+    new_block->length = size;
+    new_block->host = file_ram_alloc(new_block, size, mem_path);
+    return ram_block_add(new_block);
+}
+
+ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
+                                   MemoryRegion *mr)
+{
+    RAMBlock *new_block;
+
+    size = TARGET_PAGE_ALIGN(size);
+    new_block = g_malloc0(sizeof(*new_block));
+    new_block->mr = mr;
+    new_block->length = size;
+    new_block->fd = -1;
+    new_block->host = host;
+    if (host) {
+        new_block->flags |= RAM_PREALLOC_MASK;
+    }
+    return ram_block_add(new_block);
+}
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr)
 {
     return qemu_ram_alloc_from_ptr(size, NULL, mr);
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ed28f1e..eaddea6 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -325,9 +325,6 @@ typedef struct RAMList {
 } RAMList;
 extern RAMList ram_list;
 
-extern const char *mem_path;
-extern int mem_prealloc;
-
 /* Flags stored in the low bits of the TLB virtual address.  These are
    defined so that fast path ram access is all zeros.  */
 /* Zero if TLB entry is valid.  */
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 2edfa96..dedb258 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -22,6 +22,8 @@
 #ifndef CONFIG_USER_ONLY
 #include "hw/xen/xen.h"
 
+ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
+                                    const char *mem_path);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 1e141e3..277230d 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -133,6 +133,8 @@ extern uint8_t *boot_splash_filedata;
 extern size_t boot_splash_filedata_size;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
+extern const char *mem_path;
+extern int mem_prealloc;
 
 #define MAX_NODES 128
 
diff --git a/memory.c b/memory.c
index 93afea7..43b90eb 100644
--- a/memory.c
+++ b/memory.c
@@ -23,6 +23,7 @@
 
 #include "exec/memory-internal.h"
 #include "exec/ram_addr.h"
+#include "sysemu/sysemu.h"
 
 //#define DEBUG_UNASSIGNED
 
@@ -1016,7 +1017,11 @@ void memory_region_init_ram(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_addr = qemu_ram_alloc(size, mr);
+    if (mem_path) {
+        mr->ram_addr = qemu_ram_alloc_from_file(size, mr, mem_path);
+    } else {
+        mr->ram_addr = qemu_ram_alloc(size, mr);
+    }
 }
 
 void memory_region_init_ram_ptr(MemoryRegion *mr,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 14/29] memory: move mem_path handling to memory_region_allocate_system_memory
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (12 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 13/29] memory: reorganize file-based allocation Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 15/29] memory: add error propagation to file-based RAM allocation Hu Tao
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Like the previous patch did in exec.c, split memory_region_init_ram and
memory_region_init_ram_from_file, and push mem_path one step further up.
Other RAM regions than system memory will now be backed by regular RAM.

Also, boards that do not use memory_region_allocate_system_memory will
not support -mem-path anymore.  This can be changed before the patches
are merged by migrating boards to use the function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 exec.c                | 10 ++--------
 include/exec/memory.h | 18 ++++++++++++++++++
 memory.c              | 21 ++++++++++++++++-----
 numa.c                | 11 ++++++++++-
 4 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/exec.c b/exec.c
index 525fc04..cbbb2a7 100644
--- a/exec.c
+++ b/exec.c
@@ -1129,14 +1129,6 @@ error:
     }
     return NULL;
 }
-#else
-static void *file_ram_alloc(RAMBlock *block,
-                            ram_addr_t memory,
-                            const char *path)
-{
-    fprintf(stderr, "-mem-path not supported on this host\n");
-    exit(1);
-}
 #endif
 
 static ram_addr_t find_ram_offset(ram_addr_t size)
@@ -1310,6 +1302,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block)
     return new_block->offset;
 }
 
+#ifdef __linux__
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
                                     const char *mem_path)
 {
@@ -1338,6 +1331,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
     new_block->host = file_ram_alloc(new_block, size, mem_path);
     return ram_block_add(new_block);
 }
+#endif
 
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index ab11c32..58c3fe4 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -311,6 +311,24 @@ void memory_region_init_ram(MemoryRegion *mr,
                             const char *name,
                             uint64_t size);
 
+#ifdef __linux__
+/**
+ * memory_region_init_ram_from_file:  Initialize RAM memory region with a
+ *                                    mmap-ed backend.
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @name: the name of the region.
+ * @size: size of the region.
+ * @path: the path in which to allocate the RAM.
+ */
+void memory_region_init_ram_from_file(MemoryRegion *mr,
+                                      struct Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      const char *path);
+#endif
+
 /**
  * memory_region_init_ram_ptr:  Initialize RAM memory region from a
  *                              user-provided pointer.  Accesses into the
diff --git a/memory.c b/memory.c
index 43b90eb..6192377 100644
--- a/memory.c
+++ b/memory.c
@@ -1017,13 +1017,24 @@ void memory_region_init_ram(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    if (mem_path) {
-        mr->ram_addr = qemu_ram_alloc_from_file(size, mr, mem_path);
-    } else {
-        mr->ram_addr = qemu_ram_alloc(size, mr);
-    }
+    mr->ram_addr = qemu_ram_alloc(size, mr);
 }
 
+#ifdef __linux__
+void memory_region_init_ram_from_file(MemoryRegion *mr,
+                                      struct Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      const char *path)
+{
+    memory_region_init(mr, owner, name, size);
+    mr->ram = true;
+    mr->terminates = true;
+    mr->destructor = memory_region_destructor_ram;
+    mr->ram_addr = qemu_ram_alloc_from_file(size, mr, path);
+}
+#endif
+
 void memory_region_init_ram_ptr(MemoryRegion *mr,
                                 Object *owner,
                                 const char *name,
diff --git a/numa.c b/numa.c
index ce43e69..7846ba8 100644
--- a/numa.c
+++ b/numa.c
@@ -228,7 +228,16 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
                                            const char *name,
                                            uint64_t ram_size)
 {
-    memory_region_init_ram(mr, owner, name, ram_size);
+    if (mem_path) {
+#ifdef __linux__
+        memory_region_init_ram_from_file(mr, owner, name, ram_size, mem_path);
+#else
+        fprintf(stderr, "-mem-path not supported on this host\n");
+        exit(1);
+#endif
+    } else {
+        memory_region_init_ram(mr, owner, name, ram_size);
+    }
     vmstate_register_ram_global(mr);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 15/29] memory: add error propagation to file-based RAM allocation
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (13 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 14/29] memory: move mem_path handling to memory_region_allocate_system_memory Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c Hu Tao
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Right now, -mem-path will fall back to RAM-based allocation in some
cases.  This should never happen with "-object memory-file", prepare
the code by adding correct error propagation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 exec.c                  | 36 ++++++++++++++++++++++++------------
 include/exec/memory.h   |  5 ++++-
 include/exec/ram_addr.h |  2 +-
 memory.c                |  5 +++--
 numa.c                  | 13 ++++++++++++-
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/exec.c b/exec.c
index cbbb2a7..36301e2 100644
--- a/exec.c
+++ b/exec.c
@@ -1020,7 +1020,8 @@ static void sigbus_handler(int signal)
 
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
-                            const char *path)
+                            const char *path,
+                            Error **errp)
 {
     char *filename;
     char *sanitized_name;
@@ -1039,7 +1040,8 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
-        fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path unsupported\n");
+        error_setg(errp,
+                   "host lacks kvm mmu notifiers, -mem-path unsupported\n");
         goto error;
     }
 
@@ -1056,7 +1058,8 @@ static void *file_ram_alloc(RAMBlock *block,
 
     fd = mkstemp(filename);
     if (fd < 0) {
-        perror("unable to create backing store for hugepages");
+        error_setg_errno(errp, errno,
+                         "unable to create backing store for hugepages");
         g_free(filename);
         goto error;
     }
@@ -1071,12 +1074,14 @@ static void *file_ram_alloc(RAMBlock *block,
      * If anything goes wrong with it under other filesystems,
      * mmap will fail.
      */
-    if (ftruncate(fd, memory))
+    if (ftruncate(fd, memory)) {
         perror("ftruncate");
+    }
 
     area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
     if (area == MAP_FAILED) {
-        perror("file_ram_alloc: can't mmap RAM pages");
+        error_setg_errno(errp, errno,
+                         "unable to map backing store for hugepages");
         close(fd);
         goto error;
     }
@@ -1304,13 +1309,14 @@ static ram_addr_t ram_block_add(RAMBlock *new_block)
 
 #ifdef __linux__
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                    const char *mem_path)
+                                    const char *mem_path,
+                                    Error **errp)
 {
     RAMBlock *new_block;
 
     if (xen_enabled()) {
-        fprintf(stderr, "-mem-path not supported with Xen\n");
-        exit(1);
+        error_setg(errp, "-mem-path not supported with Xen\n");
+        return -1;
     }
 
     if (phys_mem_alloc != qemu_anon_ram_alloc) {
@@ -1319,16 +1325,22 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
          * phys_mem_alloc, but we haven't bothered to provide
          * a hook there.
          */
-        fprintf(stderr,
-                "-mem-path not supported with this accelerator\n");
-        exit(1);
+        error_setg(errp,
+                   "-mem-path not supported with this accelerator\n");
+        return -1;
     }
 
     size = TARGET_PAGE_ALIGN(size);
     new_block = g_malloc0(sizeof(*new_block));
     new_block->mr = mr;
     new_block->length = size;
-    new_block->host = file_ram_alloc(new_block, size, mem_path);
+    new_block->host = file_ram_alloc(new_block, size,
+                                     mem_path, errp);
+    if (!new_block->host) {
+        g_free(new_block);
+        return -1;
+    }
+
     return ram_block_add(new_block);
 }
 #endif
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 58c3fe4..82d7781 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -31,6 +31,7 @@
 #include "qemu/queue.h"
 #include "qemu/int128.h"
 #include "qemu/notify.h"
+#include "qapi/error.h"
 
 #define MAX_PHYS_ADDR_SPACE_BITS 62
 #define MAX_PHYS_ADDR            (((hwaddr)1 << MAX_PHYS_ADDR_SPACE_BITS) - 1)
@@ -321,12 +322,14 @@ void memory_region_init_ram(MemoryRegion *mr,
  * @name: the name of the region.
  * @size: size of the region.
  * @path: the path in which to allocate the RAM.
+ * @errp: pointer to Error*, to store an error if it happens.
  */
 void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       struct Object *owner,
                                       const char *name,
                                       uint64_t size,
-                                      const char *path);
+                                      const char *path,
+                                      Error **errp);
 #endif
 
 /**
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index dedb258..f9518a6 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -23,7 +23,7 @@
 #include "hw/xen/xen.h"
 
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                    const char *mem_path);
+                                    const char *mem_path, Error **errp);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
diff --git a/memory.c b/memory.c
index 6192377..310729a 100644
--- a/memory.c
+++ b/memory.c
@@ -1025,13 +1025,14 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       struct Object *owner,
                                       const char *name,
                                       uint64_t size,
-                                      const char *path)
+                                      const char *path,
+                                      Error **errp)
 {
     memory_region_init(mr, owner, name, size);
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_addr = qemu_ram_alloc_from_file(size, mr, path);
+    mr->ram_addr = qemu_ram_alloc_from_file(size, mr, path, errp);
 }
 #endif
 
diff --git a/numa.c b/numa.c
index 7846ba8..039d401 100644
--- a/numa.c
+++ b/numa.c
@@ -230,7 +230,18 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
 {
     if (mem_path) {
 #ifdef __linux__
-        memory_region_init_ram_from_file(mr, owner, name, ram_size, mem_path);
+        Error *err = NULL;
+        memory_region_init_ram_from_file(mr, owner, name, ram_size,
+                                         mem_path, &err);
+
+        /* Legacy behavior: if allocation failed, fall back to
+         * regular RAM allocation.
+         */
+        if (!memory_region_size(mr)) {
+            qerror_report_err(err);
+            error_free(err);
+            memory_region_init_ram(mr, owner, name, ram_size);
+        }
 #else
         fprintf(stderr, "-mem-path not supported on this host\n");
         exit(1);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (14 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 15/29] memory: add error propagation to file-based RAM allocation Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-18 19:14   ` Michael S. Tsirkin
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 17/29] memory: move RAM_PREALLOC_MASK to exec.c, rename Hu Tao
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

So that backends can use it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 exec.c               | 44 +------------------------------
 include/qemu/osdep.h |  2 ++
 util/oslib-posix.c   | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 43 deletions(-)

diff --git a/exec.c b/exec.c
index 36301e2..b640425 100644
--- a/exec.c
+++ b/exec.c
@@ -1011,13 +1011,6 @@ static long gethugepagesize(const char *path)
     return fs.f_bsize;
 }
 
-static sigjmp_buf sigjump;
-
-static void sigbus_handler(int signal)
-{
-    siglongjmp(sigjump, 1);
-}
-
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path,
@@ -1087,42 +1080,7 @@ static void *file_ram_alloc(RAMBlock *block,
     }
 
     if (mem_prealloc) {
-        int ret, i;
-        struct sigaction act, oldact;
-        sigset_t set, oldset;
-
-        memset(&act, 0, sizeof(act));
-        act.sa_handler = &sigbus_handler;
-        act.sa_flags = 0;
-
-        ret = sigaction(SIGBUS, &act, &oldact);
-        if (ret) {
-            perror("file_ram_alloc: failed to install signal handler");
-            exit(1);
-        }
-
-        /* unblock SIGBUS */
-        sigemptyset(&set);
-        sigaddset(&set, SIGBUS);
-        pthread_sigmask(SIG_UNBLOCK, &set, &oldset);
-
-        if (sigsetjmp(sigjump, 1)) {
-            fprintf(stderr, "file_ram_alloc: failed to preallocate pages\n");
-            exit(1);
-        }
-
-        /* MAP_POPULATE silently ignores failures */
-        for (i = 0; i < (memory/hpagesize); i++) {
-            memset(area + (hpagesize*i), 0, 1);
-        }
-
-        ret = sigaction(SIGBUS, &oldact, NULL);
-        if (ret) {
-            perror("file_ram_alloc: failed to reinstall signal handler");
-            exit(1);
-        }
-
-        pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+        os_mem_prealloc(fd, area, memory);
     }
 
     block->fd = fd;
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index ffb2966..9c1a119 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -251,4 +251,6 @@ void qemu_init_auxval(char **envp);
 
 void qemu_set_tty_echo(int fd, bool echo);
 
+void os_mem_prealloc(int fd, char *area, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 8e9c770..1524ead 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -46,6 +46,7 @@ extern int daemon(int, int);
 #else
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
+#define HUGETLBFS_MAGIC       0x958458f6
 
 #include <termios.h>
 #include <unistd.h>
@@ -58,9 +59,12 @@ extern int daemon(int, int);
 #include "qemu/sockets.h"
 #include <sys/mman.h>
 #include <libgen.h>
+#include <setjmp.h>
+#include <sys/signal.h>
 
 #ifdef CONFIG_LINUX
 #include <sys/syscall.h>
+#include <sys/vfs.h>
 #endif
 
 #ifdef __FreeBSD__
@@ -332,3 +336,72 @@ char *qemu_get_exec_dir(void)
 {
     return g_strdup(exec_dir);
 }
+
+static sigjmp_buf sigjump;
+
+static void sigbus_handler(int signal)
+{
+    siglongjmp(sigjump, 1);
+}
+
+static size_t fd_getpagesize(int fd)
+{
+#ifdef CONFIG_LINUX
+    struct statfs fs;
+    int ret;
+
+    if (fd != -1) {
+        do {
+            ret = fstatfs(fd, &fs);
+        } while (ret != 0 && errno == EINTR);
+
+        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
+            return fs.f_bsize;
+        }
+    }
+#endif
+
+    return getpagesize();
+}
+
+void os_mem_prealloc(int fd, char *area, size_t memory)
+{
+    int ret, i;
+    struct sigaction act, oldact;
+    sigset_t set, oldset;
+    size_t hpagesize = fd_getpagesize(fd);
+
+    memset(&act, 0, sizeof(act));
+    act.sa_handler = &sigbus_handler;
+    act.sa_flags = 0;
+
+    ret = sigaction(SIGBUS, &act, &oldact);
+    if (ret) {
+        perror("os_mem_prealloc: failed to install signal handler");
+        exit(1);
+    }
+
+    /* unblock SIGBUS */
+    sigemptyset(&set);
+    sigaddset(&set, SIGBUS);
+    pthread_sigmask(SIG_UNBLOCK, &set, &oldset);
+
+    if (sigsetjmp(sigjump, 1)) {
+        fprintf(stderr, "os_mem_prealloc: failed to preallocate pages\n");
+        exit(1);
+    }
+
+    /* MAP_POPULATE silently ignores failures */
+    memory = (memory + hpagesize - 1) & -hpagesize;
+    for (i = 0; i < (memory/hpagesize); i++) {
+        memset(area + (hpagesize*i), 0, 1);
+    }
+
+    ret = sigaction(SIGBUS, &oldact, NULL);
+    if (ret) {
+        perror("os_mem_prealloc: failed to reinstall signal handler");
+        exit(1);
+    }
+
+    pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 17/29] memory: move RAM_PREALLOC_MASK to exec.c, rename
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (15 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend Hu Tao
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Prepare for adding more flags.  The "_MASK" suffix is unique, kill it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 exec.c                 | 9 ++++++---
 include/exec/cpu-all.h | 3 ---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index b640425..739f0cf 100644
--- a/exec.c
+++ b/exec.c
@@ -70,6 +70,9 @@ AddressSpace address_space_memory;
 MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
 
+/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
+#define RAM_PREALLOC   (1 << 0)
+
 #endif
 
 struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
@@ -1315,7 +1318,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     new_block->fd = -1;
     new_block->host = host;
     if (host) {
-        new_block->flags |= RAM_PREALLOC_MASK;
+        new_block->flags |= RAM_PREALLOC;
     }
     return ram_block_add(new_block);
 }
@@ -1354,7 +1357,7 @@ void qemu_ram_free(ram_addr_t addr)
             QTAILQ_REMOVE(&ram_list.blocks, block, next);
             ram_list.mru_block = NULL;
             ram_list.version++;
-            if (block->flags & RAM_PREALLOC_MASK) {
+            if (block->flags & RAM_PREALLOC) {
                 ;
             } else if (xen_enabled()) {
                 xen_invalidate_map_cache_entry(block->host);
@@ -1386,7 +1389,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
         offset = addr - block->offset;
         if (offset < block->length) {
             vaddr = block->host + offset;
-            if (block->flags & RAM_PREALLOC_MASK) {
+            if (block->flags & RAM_PREALLOC) {
                 ;
             } else if (xen_enabled()) {
                 abort();
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index eaddea6..f91581f 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -297,9 +297,6 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 /* memory API */
 
-/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
-#define RAM_PREALLOC_MASK   (1 << 0)
-
 typedef struct RAMBlock {
     struct MemoryRegion *mr;
     uint8_t *host;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (16 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 17/29] memory: move RAM_PREALLOC_MASK to exec.c, rename Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 11:32   ` Igor Mammedov
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 19/29] hostmem: add merge and dump properties Hu Tao
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 backends/Makefile.objs  |   1 +
 backends/hostmem-file.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 backends/hostmem-file.c

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 7fb7acd..506a46c 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -8,3 +8,4 @@ baum.o-cflags := $(SDL_CFLAGS)
 common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
+common-obj-$(CONFIG_LINUX) += hostmem-file.o
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
new file mode 100644
index 0000000..b8df933
--- /dev/null
+++ b/backends/hostmem-file.c
@@ -0,0 +1,107 @@
+/*
+ * QEMU Host Memory Backend for hugetlbfs
+ *
+ * Copyright (C) 2013 Red Hat Inc
+ *
+ * Authors:
+ *   Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "sysemu/hostmem.h"
+#include "qom/object_interfaces.h"
+
+/* hostmem-file.c */
+/**
+ * @TYPE_MEMORY_BACKEND_FILE:
+ * name of backend that uses mmap on a file descriptor
+ */
+#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
+
+#define MEMORY_BACKEND_FILE(obj) \
+    OBJECT_CHECK(HostMemoryBackendFile, (obj), TYPE_MEMORY_BACKEND_FILE)
+
+typedef struct HostMemoryBackendFile HostMemoryBackendFile;
+
+struct HostMemoryBackendFile {
+    HostMemoryBackend parent_obj;
+    char *mem_path;
+};
+
+static void
+file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
+
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return;
+    }
+    if (!fb->mem_path) {
+        error_setg(errp, "mem-path property not set");
+        return;
+    }
+#ifndef CONFIG_LINUX
+    error_setg(errp, "-mem-path not supported on this host");
+#else
+    if (!memory_region_size(&backend->mr)) {
+        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
+                                 object_get_canonical_path(OBJECT(backend)),
+                                 backend->size,
+                                 fb->mem_path, errp);
+    }
+#endif
+}
+
+static void
+file_backend_class_init(ObjectClass *oc, void *data)
+{
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+
+    bc->alloc = file_backend_memory_alloc;
+}
+
+static char *get_mem_path(Object *o, Error **errp)
+{
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+    return g_strdup(fb->mem_path);
+}
+
+static void set_mem_path(Object *o, const char *str, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+    if (fb->mem_path) {
+        g_free(fb->mem_path);
+    }
+    fb->mem_path = g_strdup(str);
+}
+
+static void
+file_backend_instance_init(Object *o)
+{
+    object_property_add_str(o, "mem-path", get_mem_path,
+                            set_mem_path, NULL);
+}
+
+static const TypeInfo file_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_FILE,
+    .parent = TYPE_MEMORY_BACKEND,
+    .class_init = file_backend_class_init,
+    .instance_init = file_backend_instance_init,
+    .instance_size = sizeof(HostMemoryBackendFile),
+};
+
+static void register_types(void)
+{
+    type_register_static(&file_backend_info);
+}
+
+type_init(register_types);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 19/29] hostmem: add merge and dump properties
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (17 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region Hu Tao
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 backends/hostmem.c       | 84 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/qemu/osdep.h     | 10 ++++++
 include/sysemu/hostmem.h |  1 +
 3 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index cc57c13..fa306b4 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -53,8 +53,73 @@ out:
     error_propagate(errp, local_err);
 }
 
+static bool host_memory_backend_get_merge(Object *obj, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    return backend->merge;
+}
+
+static void host_memory_backend_set_merge(Object *obj, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    if (!memory_region_size(&backend->mr)) {
+        backend->merge = value;
+        return;
+    }
+
+    if (value != backend->merge) {
+        void *ptr = memory_region_get_ram_ptr(&backend->mr);
+        uint64_t sz = memory_region_size(&backend->mr);
+
+        qemu_madvise(ptr, sz,
+                     value ? QEMU_MADV_MERGEABLE : QEMU_MADV_UNMERGEABLE);
+        backend->merge = value;
+    }
+}
+
+static bool host_memory_backend_get_dump(Object *obj, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    return backend->dump;
+}
+
+static void host_memory_backend_set_dump(Object *obj, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    if (!memory_region_size(&backend->mr)) {
+        backend->dump = value;
+        return;
+    }
+
+    if (value != backend->dump) {
+        void *ptr = memory_region_get_ram_ptr(&backend->mr);
+        uint64_t sz = memory_region_size(&backend->mr);
+
+        qemu_madvise(ptr, sz,
+                     value ? QEMU_MADV_DODUMP : QEMU_MADV_DONTDUMP);
+        backend->dump = value;
+    }
+}
+
 static void host_memory_backend_init(Object *obj)
 {
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    backend->merge = qemu_opt_get_bool(qemu_get_machine_opts(),
+                                       "mem-merge", true);
+    backend->dump = qemu_opt_get_bool(qemu_get_machine_opts(),
+                                      "dump-guest-core", true);
+
+    object_property_add_bool(obj, "merge",
+                        host_memory_backend_get_merge,
+                        host_memory_backend_set_merge, NULL);
+    object_property_add_bool(obj, "dump",
+                        host_memory_backend_get_dump,
+                        host_memory_backend_set_dump, NULL);
     object_property_add(obj, "size", "int",
                         host_memory_backend_get_size,
                         host_memory_backend_set_size, NULL, NULL, NULL);
@@ -80,9 +145,26 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(uc);
     HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
+    Error *local_err = NULL;
+    void *ptr;
+    uint64_t sz;
 
     if (bc->alloc) {
-        bc->alloc(backend, errp);
+        bc->alloc(backend, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        ptr = memory_region_get_ram_ptr(&backend->mr);
+        sz = memory_region_size(&backend->mr);
+
+        if (backend->merge) {
+            qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
+        }
+        if (!backend->dump) {
+            qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
+        }
     }
 }
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9c1a119..820c5d0 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -116,6 +116,16 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #else
 #define QEMU_MADV_MERGEABLE QEMU_MADV_INVALID
 #endif
+#ifdef MADV_UNMERGEABLE
+#define QEMU_MADV_UNMERGEABLE MADV_UNMERGEABLE
+#else
+#define QEMU_MADV_UNMERGEABLE QEMU_MADV_INVALID
+#endif
+#ifdef MADV_DODUMP
+#define QEMU_MADV_DODUMP MADV_DODUMP
+#else
+#define QEMU_MADV_DODUMP QEMU_MADV_INVALID
+#endif
 #ifdef MADV_DONTDUMP
 #define QEMU_MADV_DONTDUMP MADV_DONTDUMP
 #else
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 923f672..ede5ec9 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -52,6 +52,7 @@ struct HostMemoryBackend {
 
     /* protected */
     uint64_t size;
+    bool merge, dump;
 
     MemoryRegion mr;
 };
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (18 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 19/29] hostmem: add merge and dump properties Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 12:28   ` Igor Mammedov
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 21/29] hostmem: add property to map memory with MAP_SHARED Hu Tao
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

And allow preallocation of file-based memory even without -mem-prealloc.
Some care is necessary because -mem-prealloc does not allow disabling
preallocation for hostmem-file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 backends/hostmem-file.c  |  3 +++
 backends/hostmem.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 exec.c                   |  7 +++++++
 include/exec/memory.h    | 10 ++++++++++
 include/exec/ram_addr.h  |  1 +
 include/sysemu/hostmem.h |  1 +
 memory.c                 | 11 +++++++++++
 7 files changed, 75 insertions(+)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index b8df933..d3a7ef3 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -9,7 +9,9 @@
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
  */
+#include "qemu-common.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/sysemu.h"
 #include "qom/object_interfaces.h"
 
 /* hostmem-file.c */
@@ -46,6 +48,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     error_setg(errp, "-mem-path not supported on this host");
 #else
     if (!memory_region_size(&backend->mr)) {
+        backend->force_prealloc = mem_prealloc;
         memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  object_get_canonical_path(OBJECT(backend)),
                                  backend->size,
diff --git a/backends/hostmem.c b/backends/hostmem.c
index fa306b4..e437275 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -105,6 +105,41 @@ static void host_memory_backend_set_dump(Object *obj, bool value, Error **errp)
     }
 }
 
+static bool host_memory_backend_get_prealloc(Object *obj, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    return backend->prealloc || backend->force_prealloc;
+}
+
+static void host_memory_backend_set_prealloc(Object *obj, bool value,
+                                             Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+
+    if (backend->force_prealloc) {
+        if (value) {
+            error_setg(errp,
+                       "remove -mem-prealloc to use the prealloc property");
+            return;
+        }
+    }
+
+    if (!memory_region_size(&backend->mr)) {
+        backend->prealloc = value;
+        return;
+    }
+
+    if (value && !backend->prealloc) {
+        int fd = memory_region_get_fd(&backend->mr);
+        void *ptr = memory_region_get_ram_ptr(&backend->mr);
+        uint64_t sz = memory_region_size(&backend->mr);
+
+        os_mem_prealloc(fd, ptr, sz);
+        backend->prealloc = true;
+    }
+}
+
 static void host_memory_backend_init(Object *obj)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -113,6 +148,7 @@ static void host_memory_backend_init(Object *obj)
                                        "mem-merge", true);
     backend->dump = qemu_opt_get_bool(qemu_get_machine_opts(),
                                       "dump-guest-core", true);
+    backend->prealloc = mem_prealloc;
 
     object_property_add_bool(obj, "merge",
                         host_memory_backend_get_merge,
@@ -120,6 +156,9 @@ static void host_memory_backend_init(Object *obj)
     object_property_add_bool(obj, "dump",
                         host_memory_backend_get_dump,
                         host_memory_backend_set_dump, NULL);
+    object_property_add_bool(obj, "prealloc",
+                        host_memory_backend_get_prealloc,
+                        host_memory_backend_set_prealloc, NULL);
     object_property_add(obj, "size", "int",
                         host_memory_backend_get_size,
                         host_memory_backend_set_size, NULL, NULL, NULL);
@@ -165,6 +204,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
         if (!backend->dump) {
             qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
         }
+        if (backend->prealloc) {
+            os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz);
+        }
     }
 }
 
diff --git a/exec.c b/exec.c
index 739f0cf..520d673 100644
--- a/exec.c
+++ b/exec.c
@@ -1432,6 +1432,13 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
+int qemu_get_ram_fd(ram_addr_t addr)
+{
+    RAMBlock *block = qemu_get_ram_block(addr);
+
+    return block->fd;
+}
+
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
    With the exception of the softmmu code in this file, this should
    only be used for local memory (e.g. video ram) that the device owns,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 82d7781..36226f7 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -534,6 +534,16 @@ bool memory_region_is_logging(MemoryRegion *mr);
 bool memory_region_is_rom(MemoryRegion *mr);
 
 /**
+ * memory_region_get_fd: Get a file descriptor backing a RAM memory region.
+ *
+ * Returns a file descriptor backing a file-based RAM memory region,
+ * or -1 if the region is not a file-based RAM memory region.
+ *
+ * @mr: the RAM or alias memory region being queried.
+ */
+int memory_region_get_fd(MemoryRegion *mr);
+
+/**
  * memory_region_get_ram_ptr: Get a pointer into a RAM memory region.
  *
  * Returns a host pointer to a RAM memory region (created with
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f9518a6..d352f60 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -27,6 +27,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
+int qemu_get_ram_fd(ram_addr_t addr);
 void *qemu_get_ram_ptr(ram_addr_t addr);
 void qemu_ram_free(ram_addr_t addr);
 void qemu_ram_free_from_ptr(ram_addr_t addr);
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index ede5ec9..4cae673 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -53,6 +53,7 @@ struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump;
+    bool prealloc, force_prealloc;
 
     MemoryRegion mr;
 };
diff --git a/memory.c b/memory.c
index 310729a..bcef72b 100644
--- a/memory.c
+++ b/memory.c
@@ -1258,6 +1258,17 @@ void memory_region_reset_dirty(MemoryRegion *mr, hwaddr addr,
     cpu_physical_memory_reset_dirty(mr->ram_addr + addr, size, client);
 }
 
+int memory_region_get_fd(MemoryRegion *mr)
+{
+    if (mr->alias) {
+        return memory_region_get_fd(mr->alias);
+    }
+
+    assert(mr->terminates);
+
+    return qemu_get_ram_fd(mr->ram_addr & TARGET_PAGE_MASK);
+}
+
 void *memory_region_get_ram_ptr(MemoryRegion *mr)
 {
     if (mr->alias) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 21/29] hostmem: add property to map memory with MAP_SHARED
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (19 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 22/29] configure: add Linux libnuma detection Hu Tao
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

From: Paolo Bonzini <pbonzini@redhat.com>

A new "share" property can be used with the "memory-file" backend to
map memory with MAP_SHARED instead of MAP_PRIVATE.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 backends/hostmem-file.c | 26 +++++++++++++++++++++++++-
 exec.c                  | 18 ++++++++++--------
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 memory.c                |  3 ++-
 numa.c                  |  2 +-
 6 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index d3a7ef3..47e12fa 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -28,6 +28,8 @@ typedef struct HostMemoryBackendFile HostMemoryBackendFile;
 
 struct HostMemoryBackendFile {
     HostMemoryBackend parent_obj;
+
+    bool share;
     char *mem_path;
 };
 
@@ -51,7 +53,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         backend->force_prealloc = mem_prealloc;
         memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  object_get_canonical_path(OBJECT(backend)),
-                                 backend->size,
+                                 backend->size, fb->share,
                                  fb->mem_path, errp);
     }
 #endif
@@ -87,9 +89,31 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
     fb->mem_path = g_strdup(str);
 }
 
+static bool file_memory_backend_get_share(Object *o, Error **errp)
+{
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+    return fb->share;
+}
+
+static void file_memory_backend_set_share(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+    fb->share = value;
+}
+
 static void
 file_backend_instance_init(Object *o)
 {
+    object_property_add_bool(o, "share",
+                        file_memory_backend_get_share,
+                        file_memory_backend_set_share, NULL);
     object_property_add_str(o, "mem-path", get_mem_path,
                             set_mem_path, NULL);
 }
diff --git a/exec.c b/exec.c
index 520d673..8705cc5 100644
--- a/exec.c
+++ b/exec.c
@@ -73,6 +73,9 @@ static MemoryRegion io_mem_unassigned;
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC   (1 << 0)
 
+/* RAM is mmap-ed with MAP_SHARED */
+#define RAM_SHARED     (1 << 1)
+
 #endif
 
 struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
@@ -1074,7 +1077,9 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+    area = mmap(0, memory, PROT_READ | PROT_WRITE,
+                (block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE),
+                fd, 0);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
                          "unable to map backing store for hugepages");
@@ -1270,7 +1275,7 @@ static ram_addr_t ram_block_add(RAMBlock *new_block)
 
 #ifdef __linux__
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                    const char *mem_path,
+                                    bool share, const char *mem_path,
                                     Error **errp)
 {
     RAMBlock *new_block;
@@ -1295,6 +1300,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
     new_block = g_malloc0(sizeof(*new_block));
     new_block->mr = mr;
     new_block->length = size;
+    new_block->flags = share ? RAM_SHARED : 0;
     new_block->host = file_ram_alloc(new_block, size,
                                      mem_path, errp);
     if (!new_block->host) {
@@ -1397,12 +1403,8 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 flags = MAP_FIXED;
                 munmap(vaddr, length);
                 if (block->fd >= 0) {
-#ifdef MAP_POPULATE
-                    flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
-                        MAP_PRIVATE;
-#else
-                    flags |= MAP_PRIVATE;
-#endif
+                    flags |= (block->flags & RAM_SHARED ?
+                              MAP_SHARED : MAP_PRIVATE);
                     area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
                                 flags, block->fd, offset);
                 } else {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 36226f7..f01f623 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -321,6 +321,7 @@ void memory_region_init_ram(MemoryRegion *mr,
  * @owner: the object that tracks the region's reference count
  * @name: the name of the region.
  * @size: size of the region.
+ * @share: %true if memory must be mmaped with the MAP_SHARED flag
  * @path: the path in which to allocate the RAM.
  * @errp: pointer to Error*, to store an error if it happens.
  */
@@ -328,6 +329,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       struct Object *owner,
                                       const char *name,
                                       uint64_t size,
+                                      bool share,
                                       const char *path,
                                       Error **errp);
 #endif
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index d352f60..1d4ac74 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -23,7 +23,8 @@
 #include "hw/xen/xen.h"
 
 ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
-                                    const char *mem_path, Error **errp);
+                                    bool share, const char *mem_path,
+                                    Error **errp);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
diff --git a/memory.c b/memory.c
index bcef72b..203b097 100644
--- a/memory.c
+++ b/memory.c
@@ -1025,6 +1025,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
                                       struct Object *owner,
                                       const char *name,
                                       uint64_t size,
+                                      bool share,
                                       const char *path,
                                       Error **errp)
 {
@@ -1032,7 +1033,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
     mr->ram = true;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_ram;
-    mr->ram_addr = qemu_ram_alloc_from_file(size, mr, path, errp);
+    mr->ram_addr = qemu_ram_alloc_from_file(size, mr, share, path, errp);
 }
 #endif
 
diff --git a/numa.c b/numa.c
index 039d401..1a83733 100644
--- a/numa.c
+++ b/numa.c
@@ -231,7 +231,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
     if (mem_path) {
 #ifdef __linux__
         Error *err = NULL;
-        memory_region_init_ram_from_file(mr, owner, name, ram_size,
+        memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
                                          mem_path, &err);
 
         /* Legacy behavior: if allocation failed, fall back to
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 22/29] configure: add Linux libnuma detection
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (20 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 21/29] hostmem: add property to map memory with MAP_SHARED Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 23/29] hostmem: add properties for NUMA memory policy Hu Tao
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andre Przywara, Eduardo Habkost, Michael S. Tsirkin,
	Igor Mammedov, Paolo Bonzini, Yasunori Goto, Wanlong Gao

From: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command
line, default is use if available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 configure | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/configure b/configure
index 0e516f9..e50d28b 100755
--- a/configure
+++ b/configure
@@ -325,6 +325,7 @@ tpm="no"
 libssh2=""
 vhdx=""
 quorum="no"
+numa=""
 
 # parse CC options first
 for opt do
@@ -1094,6 +1095,10 @@ for opt do
   ;;
   --enable-quorum) quorum="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *)
       echo "ERROR: unknown option $opt"
       echo "Try '$0 --help' for more information"
@@ -1360,6 +1365,8 @@ Advanced options (experts only):
   --enable-vhdx            enable support for the Microsoft VHDX image format
   --disable-quorum         disable quorum block filter support
   --enable-quorum          enable quorum block filter support
+  --disable-numa           disable libnuma support
+  --enable-numa            enable libnuma support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -3135,6 +3142,26 @@ if compile_prog "" "" ; then
 fi
 
 ##########################################
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "numa" "install numactl devel"
+    fi
+    numa=no
+  fi
+fi
+
+##########################################
 # signalfd probe
 signalfd="no"
 cat > $TMPC << EOF
@@ -4208,6 +4235,7 @@ echo "vhdx              $vhdx"
 echo "Quorum            $quorum"
 echo "lzo support       $lzo"
 echo "snappy support    $snappy"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5173,6 +5201,10 @@ if [ "$dtc_internal" = "yes" ]; then
   echo "config-host.h: subdir-dtc" >> $config_host_mak
 fi
 
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
+
 # build tree in object directory in case the source is not in the current directory
 DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests"
 DIRS="$DIRS fsdev"
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 23/29] hostmem: add properties for NUMA memory policy
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (21 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 22/29] configure: add Linux libnuma detection Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 24/29] Introduce signed range Hu Tao
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, Marcelo Tosatti,
	Igor Mammedov, Paolo Bonzini, Yasunori Goto

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
[Raise errors on setting properties if !CONFIG_NUMA.  Add BUILD_BUG_ON
 checks. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 backends/hostmem.c       | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
 include/sysemu/hostmem.h |   4 ++
 qapi-schema.json         |  20 +++++++
 3 files changed, 159 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index e437275..b7de5c7 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -10,12 +10,21 @@
  * See the COPYING file in the top-level directory.
  */
 #include "sysemu/hostmem.h"
-#include "sysemu/sysemu.h"
 #include "qapi/visitor.h"
+#include "qapi-types.h"
+#include "qapi-visit.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/config-file.h"
 #include "qom/object_interfaces.h"
 
+#ifdef CONFIG_NUMA
+#include <numaif.h>
+QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_DEFAULT != MPOL_DEFAULT);
+QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_PREFERRED != MPOL_PREFERRED);
+QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
+QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
+#endif
+
 static void
 host_memory_backend_get_size(Object *obj, Visitor *v, void *opaque,
                             const char *name, Error **errp)
@@ -53,6 +62,84 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void
+host_memory_backend_get_host_nodes(Object *obj, Visitor *v, void *opaque,
+                                   const char *name, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    uint16List *host_nodes = NULL;
+    uint16List **node = &host_nodes;
+    unsigned long value;
+
+    value = find_first_bit(backend->host_nodes, MAX_NODES);
+    if (value == MAX_NODES) {
+        return;
+    }
+
+    *node = g_malloc0(sizeof(**node));
+    (*node)->value = value;
+    node = &(*node)->next;
+
+    do {
+        value = find_next_bit(backend->host_nodes, MAX_NODES, value + 1);
+        if (value == MAX_NODES) {
+            break;
+        }
+
+        *node = g_malloc0(sizeof(**node));
+        (*node)->value = value;
+        node = &(*node)->next;
+    } while (true);
+
+    visit_type_uint16List(v, &host_nodes, name, errp);
+}
+
+static void
+host_memory_backend_set_host_nodes(Object *obj, Visitor *v, void *opaque,
+                                   const char *name, Error **errp)
+{
+#ifdef CONFIG_NUMA
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    uint16List *l = NULL;
+
+    visit_type_uint16List(v, &l, name, errp);
+
+    while (l) {
+        bitmap_set(backend->host_nodes, l->value, 1);
+        l = l->next;
+    }
+#else
+    error_setg(errp, "NUMA node binding are not supported by this QEMU");
+#endif
+}
+
+static void
+host_memory_backend_get_policy(Object *obj, Visitor *v, void *opaque,
+                               const char *name, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    int policy = backend->policy;
+
+    visit_type_enum(v, &policy, HostMemPolicy_lookup, NULL, name, errp);
+}
+
+static void
+host_memory_backend_set_policy(Object *obj, Visitor *v, void *opaque,
+                               const char *name, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(obj);
+    int policy;
+
+    visit_type_enum(v, &policy, HostMemPolicy_lookup, NULL, name, errp);
+    backend->policy = policy;
+
+#ifndef CONFIG_NUMA
+    if (policy != HOST_MEM_POLICY_DEFAULT) {
+        error_setg(errp, "NUMA policies are not supported by this QEMU");
+    }
+#endif
+}
+
 static bool host_memory_backend_get_merge(Object *obj, Error **errp)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -162,6 +249,12 @@ static void host_memory_backend_init(Object *obj)
     object_property_add(obj, "size", "int",
                         host_memory_backend_get_size,
                         host_memory_backend_set_size, NULL, NULL, NULL);
+    object_property_add(obj, "host-nodes", "int",
+                        host_memory_backend_get_host_nodes,
+                        host_memory_backend_set_host_nodes, NULL, NULL, NULL);
+    object_property_add(obj, "policy", "str",
+                        host_memory_backend_get_policy,
+                        host_memory_backend_set_policy, NULL, NULL, NULL);
 }
 
 static void host_memory_backend_finalize(Object *obj)
@@ -204,6 +297,47 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
         if (!backend->dump) {
             qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
         }
+#ifdef CONFIG_NUMA
+        unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
+        /* lastbit == MAX_NODES means maxnode = 0 */
+        unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
+        /* ensure policy won't be ignored in case memory is preallocated
+         * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
+         * this doesn't catch hugepage case. */
+        unsigned flags = MPOL_MF_STRICT;
+
+        /* check for invalid host-nodes and policies and give more verbose
+         * error messages than mbind(). */
+        if (maxnode && backend->policy == MPOL_DEFAULT) {
+            error_setg(errp, "host-nodes must be empty for policy default,"
+                       " or you should explicitly specify a policy other"
+                       " than default");
+            return;
+        } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
+            error_setg(errp, "host-nodes must be set for policy %s",
+                       HostMemPolicy_lookup[backend->policy]);
+            return;
+        }
+
+        /* We can have up to MAX_NODES nodes, but we need to pass maxnode+1
+         * as argument to mbind() due to an old Linux bug (feature?) which
+         * cuts off the last specified node. This means backend->host_nodes
+         * must have MAX_NODES+1 bits available.
+         */
+        assert(sizeof(backend->host_nodes) >=
+               BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
+        assert(maxnode <= MAX_NODES);
+        if (mbind(ptr, sz, backend->policy,
+                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+            error_setg_errno(errp, errno,
+                             "cannot bind memory to host NUMA nodes");
+            return;
+        }
+#endif
+        /* Preallocate memory after the NUMA policy has been instantiated.
+         * This is necessary to guarantee memory is allocated with
+         * specified NUMA policy in place.
+         */
         if (backend->prealloc) {
             os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz);
         }
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 4cae673..1ce4394 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -12,10 +12,12 @@
 #ifndef QEMU_RAM_H
 #define QEMU_RAM_H
 
+#include "sysemu/sysemu.h" /* for MAX_NODES */
 #include "qom/object.h"
 #include "qapi/error.h"
 #include "exec/memory.h"
 #include "qemu/option.h"
+#include "qemu/bitmap.h"
 
 #define TYPE_MEMORY_BACKEND "memory-backend"
 #define MEMORY_BACKEND(obj) \
@@ -54,6 +56,8 @@ struct HostMemoryBackend {
     uint64_t size;
     bool merge, dump;
     bool prealloc, force_prealloc;
+    DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
+    HostMemPolicy policy;
 
     MemoryRegion mr;
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index d5ab066..0898c00 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4759,3 +4759,23 @@
    '*cpus':   ['uint16'],
    '*mem':    'size',
    '*memdev': 'str' }}
+
+##
+# @HostMemPolicy
+#
+# Host memory policy types
+#
+# @default: restore default policy, remove any nondefault policy
+#
+# @preferred: set the preferred host nodes for allocation
+#
+# @bind: a strict policy that restricts memory allocation to the
+#        host nodes specified
+#
+# @interleave: memory allocations are interleaved across the set
+#              of host nodes specified
+#
+# Since 2.1
+##
+{ 'enum': 'HostMemPolicy',
+  'data': [ 'default', 'preferred', 'bind', 'interleave' ] }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 24/29] Introduce signed range.
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (22 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 23/29] hostmem: add properties for NUMA memory policy Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:42   ` Peter Maydell
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 25/29] qapi: make string input visitor parse int list Hu Tao
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 include/qemu/range.h | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index aae9720..8879f8a 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -3,6 +3,7 @@
 
 #include <inttypes.h>
 #include <qemu/typedefs.h>
+#include "qemu/queue.h"
 
 /*
  * Operations on 64 bit address ranges.
@@ -60,4 +61,127 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+typedef struct SignedRangeList SignedRangeList;
+
+typedef struct SignedRange {
+    int64_t start;
+    int64_t length;
+
+    QTAILQ_ENTRY(SignedRange) entry;
+} SignedRange;
+
+QTAILQ_HEAD(SignedRangeList, SignedRange);
+
+static inline int64_t s_range_end(int64_t start, int64_t length)
+{
+    return start + length - 1;
+}
+
+/* negative length or overflow */
+static inline bool s_range_overflow(int64_t start, int64_t length)
+{
+    return s_range_end(start, length) < start;
+}
+
+static inline SignedRange *s_range_new(int64_t start, int64_t length)
+{
+    SignedRange *range = NULL;
+
+    if (s_range_overflow(start, length)) {
+        return NULL;
+    }
+
+    range = g_malloc0(sizeof(*range));
+    range->start = start;
+    range->length = length;
+
+    return range;
+}
+
+static inline void s_range_free(SignedRange *range)
+{
+    g_free(range);
+}
+
+static inline bool s_range_overlap(int64_t start1, int64_t length1,
+                                   int64_t start2, int64_t length2)
+{
+    return !((start1 + length1) < start2 || (start2 + length2) < start1);
+}
+
+static inline int s_range_join(SignedRange *range,
+                               int64_t start, int64_t length)
+{
+    if (s_range_overflow(start, length)) {
+        return -1;
+    }
+
+    if (s_range_overlap(range->start, range->length, start, length)) {
+        int64_t end = s_range_end(range->start, range->length);
+        if (end < s_range_end(start, length)) {
+            end = s_range_end(start, length);
+        }
+        if (range->start > start) {
+            range->start = start;
+        }
+        range->length = end - range->start + 1;
+        return 0;
+    }
+
+    return -1;
+}
+
+static inline int s_range_compare(int64_t start1, int64_t length1,
+                                  int64_t start2, int64_t length2)
+{
+    if (start1 == start2 && length1 == length2) {
+        return 0;
+    } else if (s_range_end(start1, length1) <
+               s_range_end(start2, length2)) {
+        return -1;
+    } else {
+        return 1;
+    }
+}
+
+/* Add range to list. Keep them sorted, and merge ranges whenever possible */
+static inline bool range_list_add(SignedRangeList *list,
+                                  int64_t start, int64_t length)
+{
+    SignedRange *r, *next, *new_range = NULL, *cur = NULL;
+
+    if (s_range_overflow(start, length)) {
+        return false;
+    }
+
+    QTAILQ_FOREACH_SAFE(r, list, entry, next) {
+        if (s_range_overlap(r->start, r->length, start, length)) {
+            s_range_join(r, start, length);
+            break;
+        } else if (s_range_compare(start, length, r->start, r->length) < 0) {
+            cur = r;
+            break;
+        }
+    }
+
+    if (!r) {
+        new_range = s_range_new(start, length);
+        QTAILQ_INSERT_TAIL(list, new_range, entry);
+    } else if (cur) {
+        new_range = s_range_new(start, length);
+        QTAILQ_INSERT_BEFORE(cur, new_range, entry);
+    } else {
+        SignedRange *next = QTAILQ_NEXT(r, entry);
+        while (next && s_range_overlap(r->start, r->length,
+                                       next->start, next->length)) {
+            s_range_join(r, next->start, next->length);
+            QTAILQ_REMOVE(list, next, entry);
+            s_range_free(next);
+            next = QTAILQ_NEXT(r, entry);
+        }
+    }
+
+    return true;
+}
+
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 25/29] qapi: make string input visitor parse int list
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (23 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 24/29] Introduce signed range Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 26/29] qapi: make string output " Hu Tao
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 qapi/string-input-visitor.c       | 181 ++++++++++++++++++++++++++++++++++++--
 tests/test-string-input-visitor.c |  39 ++++++++
 2 files changed, 212 insertions(+), 8 deletions(-)

diff --git a/qapi/string-input-visitor.c b/qapi/string-input-visitor.c
index 5780944..85ac6a1 100644
--- a/qapi/string-input-visitor.c
+++ b/qapi/string-input-visitor.c
@@ -15,31 +15,182 @@
 #include "qapi/visitor-impl.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/option.h"
+#include "qemu/queue.h"
+#include "qemu/range.h"
+
 
 struct StringInputVisitor
 {
     Visitor visitor;
+
+    bool head;
+
+    SignedRangeList *ranges;
+    SignedRange *cur_range;
+    int64_t cur;
+
     const char *string;
 };
 
+static void parse_str(StringInputVisitor *siv, Error **errp)
+{
+    char *str = (char *) siv->string;
+    long long start, end;
+    SignedRange *r, *next;
+    char *endptr;
+
+    if (siv->ranges) {
+        return;
+    }
+
+    siv->ranges = g_malloc0(sizeof(*siv->ranges));
+    QTAILQ_INIT(siv->ranges);
+    errno = 0;
+    do {
+        start = strtoll(str, &endptr, 0);
+        if (errno == 0 && endptr > str && INT64_MIN <= start &&
+            start <= INT64_MAX) {
+            if (*endptr == '\0') {
+                if (!range_list_add(siv->ranges, start, 1)) {
+                    goto error;
+                }
+                str = NULL;
+            } else if (*endptr == '-') {
+                str = endptr + 1;
+                end = strtoll(str, &endptr, 0);
+                if (errno == 0 && endptr > str &&
+                    INT64_MIN <= end && end <= INT64_MAX && start <= end &&
+                    (start > INT64_MAX - 65536 ||
+                     end < start + 65536)) {
+                    if (*endptr == '\0') {
+                        if (!range_list_add(siv->ranges, start,
+                                            end - start + 1)) {
+                            goto error;
+                        }
+                        str = NULL;
+                    } else if (*endptr == ',') {
+                        str = endptr + 1;
+                        if (!range_list_add(siv->ranges, start,
+                                            end - start + 1)) {
+                            goto error;
+                        }
+                    } else {
+                        goto error;
+                    }
+                } else {
+                    goto error;
+                }
+            } else if (*endptr == ',') {
+                str = endptr + 1;
+                if (!range_list_add(siv->ranges, start, 1)) {
+                    goto error;
+                }
+            } else {
+                goto error;
+            }
+        } else {
+            goto error;
+        }
+    } while (str);
+
+    return;
+error:
+    if (siv->ranges) {
+        QTAILQ_FOREACH_SAFE(r, siv->ranges, entry, next) {
+            QTAILQ_REMOVE(siv->ranges, r, entry);
+            g_free(r);
+        }
+        g_free(siv->ranges);
+        siv->ranges = NULL;
+    }
+}
+
+static void
+start_list(Visitor *v, const char *name, Error **errp)
+{
+    StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
+
+    parse_str(siv, errp);
+
+    if (siv->ranges) {
+        siv->cur_range = QTAILQ_FIRST(siv->ranges);
+        if (siv->cur_range) {
+            siv->cur = siv->cur_range->start;
+        }
+    }
+}
+
+static GenericList *
+next_list(Visitor *v, GenericList **list, Error **errp)
+{
+    StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
+    GenericList **link;
+
+    if (!siv->ranges || !siv->cur_range) {
+        return NULL;
+    }
+
+    if (siv->cur < siv->cur_range->start ||
+        siv->cur >= (siv->cur_range->start + siv->cur_range->length)) {
+        siv->cur_range = QTAILQ_NEXT(siv->cur_range, entry);
+        if (siv->cur_range) {
+            siv->cur = siv->cur_range->start;
+        } else {
+            return NULL;
+        }
+    }
+
+    if (siv->head) {
+        link = list;
+        siv->head = false;
+    } else {
+        link = &(*list)->next;
+    }
+
+    *link = g_malloc0(sizeof **link);
+    return *link;
+}
+
+static void
+end_list(Visitor *v, Error **errp)
+{
+    StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
+    siv->head = true;
+}
+
 static void parse_type_int(Visitor *v, int64_t *obj, const char *name,
                            Error **errp)
 {
     StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
-    char *endp = (char *) siv->string;
-    long long val;
 
-    errno = 0;
-    if (siv->string) {
-        val = strtoll(siv->string, &endp, 0);
-    }
-    if (!siv->string || errno || endp == siv->string || *endp) {
+    if (!siv->string) {
         error_set(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
                   "integer");
         return;
     }
 
-    *obj = val;
+    parse_str(siv, errp);
+
+    if (!siv->ranges) {
+        goto error;
+    }
+
+    if (!siv->cur_range) {
+        siv->cur_range = QTAILQ_FIRST(siv->ranges);
+        if (siv->cur_range) {
+            siv->cur = siv->cur_range->start;
+        } else {
+            goto error;
+        }
+    }
+
+    *obj = siv->cur;
+    siv->cur++;
+    return;
+
+error:
+    error_set(errp, QERR_INVALID_PARAMETER_VALUE, name,
+              "an int64 value or range");
 }
 
 static void parse_type_size(Visitor *v, uint64_t *obj, const char *name,
@@ -140,6 +291,16 @@ Visitor *string_input_get_visitor(StringInputVisitor *v)
 
 void string_input_visitor_cleanup(StringInputVisitor *v)
 {
+    SignedRange *r, *next;
+
+    if (v->ranges) {
+        QTAILQ_FOREACH_SAFE(r, v->ranges, entry, next) {
+            QTAILQ_REMOVE(v->ranges, r, entry);
+            g_free(r);
+        }
+        g_free(v->ranges);
+    }
+
     g_free(v);
 }
 
@@ -155,8 +316,12 @@ StringInputVisitor *string_input_visitor_new(const char *str)
     v->visitor.type_bool = parse_type_bool;
     v->visitor.type_str = parse_type_str;
     v->visitor.type_number = parse_type_number;
+    v->visitor.start_list = start_list;
+    v->visitor.next_list = next_list;
+    v->visitor.end_list = end_list;
     v->visitor.optional = parse_optional;
 
     v->string = str;
+    v->head = true;
     return v;
 }
diff --git a/tests/test-string-input-visitor.c b/tests/test-string-input-visitor.c
index 877e737..b08a7db 100644
--- a/tests/test-string-input-visitor.c
+++ b/tests/test-string-input-visitor.c
@@ -64,6 +64,35 @@ static void test_visitor_in_int(TestInputVisitorData *data,
     g_assert_cmpint(res, ==, value);
 }
 
+static void test_visitor_in_intList(TestInputVisitorData *data,
+                                    const void *unused)
+{
+    int64_t value[] = {-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 20};
+    int16List *res = NULL, *tmp;
+    Error *errp = NULL;
+    Visitor *v;
+    int i = 0;
+
+    v = visitor_input_test_init(data, "1,2,-2-1,2-4,20,5-9,1-8");
+
+    visit_type_int16List(v, &res, NULL, &errp);
+    g_assert(errp == NULL);
+    tmp = res;
+    while (i < sizeof(value) / sizeof(value[0])) {
+        g_assert(tmp);
+        g_assert_cmpint(tmp->value, ==, value[i++]);
+        tmp = tmp->next;
+    }
+    g_assert(!tmp);
+
+    tmp = res;
+    while (tmp) {
+        res = res->next;
+        g_free(tmp);
+        tmp = res;
+    }
+}
+
 static void test_visitor_in_bool(TestInputVisitorData *data,
                                  const void *unused)
 {
@@ -170,6 +199,7 @@ static void test_visitor_in_fuzz(TestInputVisitorData *data,
                                  const void *unused)
 {
     int64_t ires;
+    intList *ilres;
     bool bres;
     double nres;
     char *sres;
@@ -193,6 +223,11 @@ static void test_visitor_in_fuzz(TestInputVisitorData *data,
 
         v = visitor_input_test_init(data, buf);
         visit_type_int(v, &ires, NULL, NULL);
+        visitor_input_teardown(data, NULL);
+
+        v = visitor_input_test_init(data, buf);
+        visit_type_intList(v, &ilres, NULL, NULL);
+        visitor_input_teardown(data, NULL);
 
         v = visitor_input_test_init(data, buf);
         visit_type_bool(v, &bres, NULL, NULL);
@@ -200,11 +235,13 @@ static void test_visitor_in_fuzz(TestInputVisitorData *data,
 
         v = visitor_input_test_init(data, buf);
         visit_type_number(v, &nres, NULL, NULL);
+        visitor_input_teardown(data, NULL);
 
         v = visitor_input_test_init(data, buf);
         sres = NULL;
         visit_type_str(v, &sres, NULL, NULL);
         g_free(sres);
+        visitor_input_teardown(data, NULL);
 
         v = visitor_input_test_init(data, buf);
         visit_type_EnumOne(v, &eres, NULL, NULL);
@@ -228,6 +265,8 @@ int main(int argc, char **argv)
 
     input_visitor_test_add("/string-visitor/input/int",
                            &in_visitor_data, test_visitor_in_int);
+    input_visitor_test_add("/string-visitor/input/intList",
+                           &in_visitor_data, test_visitor_in_intList);
     input_visitor_test_add("/string-visitor/input/bool",
                            &in_visitor_data, test_visitor_in_bool);
     input_visitor_test_add("/string-visitor/input/number",
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 26/29] qapi: make string output visitor parse int list
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (24 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 25/29] qapi: make string input visitor parse int list Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 27/29] qom: introduce object_property_get_enum and object_property_get_uint16List Hu Tao
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 qapi/string-output-visitor.c       | 230 +++++++++++++++++++++++++++++++++++--
 tests/test-string-output-visitor.c |  34 ++++++
 2 files changed, 254 insertions(+), 10 deletions(-)

diff --git a/qapi/string-output-visitor.c b/qapi/string-output-visitor.c
index fb1d2e8..ccebc7a 100644
--- a/qapi/string-output-visitor.c
+++ b/qapi/string-output-visitor.c
@@ -16,32 +16,173 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/host-utils.h"
 #include <math.h>
+#include "qemu/range.h"
+
+enum ListMode {
+    LM_NONE,             /* not traversing a list of repeated options */
+    LM_STARTED,          /* start_list() succeeded */
+
+    LM_IN_PROGRESS,      /* next_list() has been called.
+                          *
+                          * Generating the next list link will consume the most
+                          * recently parsed QemuOpt instance of the repeated
+                          * option.
+                          *
+                          * Parsing a value into the list link will examine the
+                          * next QemuOpt instance of the repeated option, and
+                          * possibly enter LM_SIGNED_INTERVAL or
+                          * LM_UNSIGNED_INTERVAL.
+                          */
+
+    LM_SIGNED_INTERVAL,  /* next_list() has been called.
+                          *
+                          * Generating the next list link will consume the most
+                          * recently stored element from the signed interval,
+                          * parsed from the most recent QemuOpt instance of the
+                          * repeated option. This may consume QemuOpt itself
+                          * and return to LM_IN_PROGRESS.
+                          *
+                          * Parsing a value into the list link will store the
+                          * next element of the signed interval.
+                          */
+
+    LM_UNSIGNED_INTERVAL,/* Same as above, only for an unsigned interval. */
+
+    LM_END
+};
+
+typedef enum ListMode ListMode;
 
 struct StringOutputVisitor
 {
     Visitor visitor;
     bool human;
-    char *string;
+    GString *string;
+    bool head;
+    ListMode list_mode;
+    union {
+        int64_t s;
+        uint64_t u;
+    } range_start, range_end;
+    SignedRangeList *ranges;
 };
 
 static void string_output_set(StringOutputVisitor *sov, char *string)
 {
-    g_free(sov->string);
-    sov->string = string;
+    if (sov->string) {
+        g_string_free(sov->string, true);
+    }
+    sov->string = g_string_new(string);
+    g_free(string);
+}
+
+static void string_output_append(StringOutputVisitor *sov, int64_t a)
+{
+    range_list_add(sov->ranges, a, 1);
+}
+
+static void string_output_append_range(StringOutputVisitor *sov,
+                                       int64_t s, int64_t e)
+{
+    range_list_add(sov->ranges, s, e);
 }
 
 static void print_type_int(Visitor *v, int64_t *obj, const char *name,
                            Error **errp)
 {
     StringOutputVisitor *sov = DO_UPCAST(StringOutputVisitor, visitor, v);
-    char *out;
+    SignedRange *r;
+
+    if (!sov->ranges) {
+        sov->ranges = g_malloc0(sizeof(*sov->ranges));
+        QTAILQ_INIT(sov->ranges);
+    }
+
+    switch (sov->list_mode) {
+    case LM_NONE:
+        string_output_append(sov, *obj);
+        break;
+
+    case LM_STARTED:
+        sov->range_start.s = *obj;
+        sov->range_end.s = *obj;
+        sov->list_mode = LM_IN_PROGRESS;
+        return;
+
+    case LM_IN_PROGRESS:
+        if (sov->range_end.s + 1 == *obj) {
+            sov->range_end.s++;
+        } else {
+            if (sov->range_start.s == sov->range_end.s) {
+                string_output_append(sov, sov->range_end.s);
+            } else {
+                assert(sov->range_start.s < sov->range_end.s);
+                string_output_append_range(sov, sov->range_start.s,
+                                           sov->range_end.s -
+                                           sov->range_start.s + 1);
+            }
+
+            sov->range_start.s = *obj;
+            sov->range_end.s = *obj;
+        }
+        return;
+
+    case LM_END:
+        if (sov->range_end.s + 1 == *obj) {
+            sov->range_end.s++;
+            assert(sov->range_start.s < sov->range_end.s);
+            string_output_append_range(sov, sov->range_start.s,
+                                       sov->range_end.s -
+                                       sov->range_start.s + 1);
+        } else {
+            if (sov->range_start.s == sov->range_end.s) {
+                string_output_append(sov, sov->range_end.s);
+            } else {
+                assert(sov->range_start.s < sov->range_end.s);
+
+                string_output_append_range(sov, sov->range_start.s,
+                                           sov->range_end.s -
+                                           sov->range_start.s + 1);
+            }
+            string_output_append(sov, *obj);
+        }
+        break;
+
+    default:
+        abort();
+    }
+
+    QTAILQ_FOREACH(r, sov->ranges, entry) {
+        if (r->length > 1) {
+            g_string_append_printf(sov->string, "%" PRId64 "-%" PRId64,
+                                   r->start,
+                                   s_range_end(r->start, r->length));
+        } else {
+            g_string_append_printf(sov->string, "%" PRId64,
+                                   r->start);
+        }
+        if (r->entry.tqe_next) {
+            g_string_append(sov->string, ",");
+        }
+    }
 
     if (sov->human) {
-        out = g_strdup_printf("%lld (%#llx)", (long long) *obj, (long long) *obj);
-    } else {
-        out = g_strdup_printf("%lld", (long long) *obj);
+        g_string_append(sov->string, " (");
+        QTAILQ_FOREACH(r, sov->ranges, entry) {
+            if (r->length > 1) {
+                g_string_append_printf(sov->string, "%" PRIx64 "-%" PRIx64,
+                                       r->start,
+                                       s_range_end(r->start, r->length));
+            } else {
+                g_string_append_printf(sov->string, "%" PRIx64,
+                                       r->start);
+            }
+            if (r->entry.tqe_next) {
+                g_string_append(sov->string, ",");
+            }
+        }
+        g_string_append(sov->string, ")");
     }
-    string_output_set(sov, out);
 }
 
 static void print_type_size(Visitor *v, uint64_t *obj, const char *name,
@@ -103,9 +244,61 @@ static void print_type_number(Visitor *v, double *obj, const char *name,
     string_output_set(sov, g_strdup_printf("%f", *obj));
 }
 
+static void
+start_list(Visitor *v, const char *name, Error **errp)
+{
+    StringOutputVisitor *sov = DO_UPCAST(StringOutputVisitor, visitor, v);
+
+    /* we can't traverse a list in a list */
+    assert(sov->list_mode == LM_NONE);
+    sov->list_mode = LM_STARTED;
+    sov->head = true;
+}
+
+static GenericList *
+next_list(Visitor *v, GenericList **list, Error **errp)
+{
+    StringOutputVisitor *sov = DO_UPCAST(StringOutputVisitor, visitor, v);
+    GenericList *ret = NULL;
+    if (*list) {
+        if (sov->head) {
+            ret = *list;
+        } else {
+            ret = (*list)->next;
+        }
+
+        if (sov->head) {
+            if (ret && ret->next == NULL) {
+                sov->list_mode = LM_NONE;
+            }
+            sov->head = false;
+        } else {
+            if (ret && ret->next == NULL) {
+                sov->list_mode = LM_END;
+            }
+        }
+    }
+
+    return ret;
+}
+
+static void
+end_list(Visitor *v, Error **errp)
+{
+    StringOutputVisitor *sov = DO_UPCAST(StringOutputVisitor, visitor, v);
+
+    assert(sov->list_mode == LM_STARTED ||
+           sov->list_mode == LM_END ||
+           sov->list_mode == LM_NONE ||
+           sov->list_mode == LM_IN_PROGRESS);
+    sov->list_mode = LM_NONE;
+    sov->head = true;
+
+}
+
 char *string_output_get_string(StringOutputVisitor *sov)
 {
-    char *string = sov->string;
+    char *string = g_string_free(sov->string, false);
     sov->string = NULL;
     return string;
 }
@@ -117,7 +310,20 @@ Visitor *string_output_get_visitor(StringOutputVisitor *sov)
 
 void string_output_visitor_cleanup(StringOutputVisitor *sov)
 {
-    g_free(sov->string);
+    SignedRange *r, *next;
+
+    if (sov->string) {
+        g_string_free(sov->string, true);
+    }
+
+    if (sov->ranges) {
+        QTAILQ_FOREACH_SAFE(r, sov->ranges, entry, next) {
+            QTAILQ_REMOVE(sov->ranges, r, entry);
+            s_range_free(r);
+        }
+        g_free(sov->ranges);
+    }
+
     g_free(sov);
 }
 
@@ -127,6 +333,7 @@ StringOutputVisitor *string_output_visitor_new(bool human)
 
     v = g_malloc0(sizeof(*v));
 
+    v->string = g_string_new(NULL);
     v->human = human;
     v->visitor.type_enum = output_type_enum;
     v->visitor.type_int = print_type_int;
@@ -134,6 +341,9 @@ StringOutputVisitor *string_output_visitor_new(bool human)
     v->visitor.type_bool = print_type_bool;
     v->visitor.type_str = print_type_str;
     v->visitor.type_number = print_type_number;
+    v->visitor.start_list = start_list;
+    v->visitor.next_list = next_list;
+    v->visitor.end_list = end_list;
 
     return v;
 }
diff --git a/tests/test-string-output-visitor.c b/tests/test-string-output-visitor.c
index 2af5a21..d2ad580 100644
--- a/tests/test-string-output-visitor.c
+++ b/tests/test-string-output-visitor.c
@@ -57,6 +57,38 @@ static void test_visitor_out_int(TestOutputVisitorData *data,
     g_free(str);
 }
 
+static void test_visitor_out_intList(TestOutputVisitorData *data,
+                                     const void *unused)
+{
+    int64_t value[] = {-10, -7, -2, -1, 0, 1, 9, 10, 16, 15, 14,
+        3, 4, 5, 6, 11, 12, 13, 21, 22, INT64_MAX - 1, INT64_MAX};
+    intList *list = NULL, **tmp = &list;
+    int i;
+    Error *errp = NULL;
+    char *str;
+
+    for (i = 0; i < sizeof(value) / sizeof(value[0]); i++) {
+        *tmp = g_malloc0(sizeof(**tmp));
+        (*tmp)->value = value[i];
+        tmp = &(*tmp)->next;
+    }
+
+    visit_type_intList(data->ov, &list, NULL, &errp);
+    g_assert(errp == NULL);
+
+    str = string_output_get_string(data->sov);
+    g_assert(str != NULL);
+    g_assert_cmpstr(str, ==,
+        "-10,-7,-2-1,3-6,9-16,21-22,9223372036854775806-9223372036854775807");
+    g_free(str);
+    while (list) {
+        intList *tmp2;
+        tmp2 = list->next;
+        g_free(list);
+        list = tmp2;
+    }
+}
+
 static void test_visitor_out_bool(TestOutputVisitorData *data,
                                   const void *unused)
 {
@@ -182,6 +214,8 @@ int main(int argc, char **argv)
                             &out_visitor_data, test_visitor_out_enum);
     output_visitor_test_add("/string-visitor/output/enum-errors",
                             &out_visitor_data, test_visitor_out_enum_errors);
+    output_visitor_test_add("/string-visitor/output/intList",
+                            &out_visitor_data, test_visitor_out_intList);
 
     g_test_run();
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 27/29] qom: introduce object_property_get_enum and object_property_get_uint16List
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (25 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 26/29] qapi: make string output " Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev Hu Tao
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 include/qom/object.h | 28 ++++++++++++++++++++++++++++
 qom/object.c         | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index a641dcd..b882ccc 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -917,6 +917,34 @@ int64_t object_property_get_int(Object *obj, const char *name,
                                 Error **errp);
 
 /**
+ * object_property_get_enum:
+ * @obj: the object
+ * @name: the name of the property
+ * @strings: strings corresponding to enums
+ * @errp: returns an error if this function fails
+ *
+ * Returns: the value of the property, converted to an integer, or
+ * undefined if an error occurs (including when the property value is not
+ * an enum).
+ */
+int object_property_get_enum(Object *obj, const char *name,
+                             const char *strings[], Error **errp);
+
+/**
+ * object_property_get_uint16List:
+ * @obj: the object
+ * @name: the name of the property
+ * @list: the returned int list
+ * @errp: returns an error if this function fails
+ *
+ * Returns: the value of the property, converted to integers, or
+ * undefined if an error occurs (including when the property value is not
+ * an list of integers).
+ */
+void object_property_get_uint16List(Object *obj, const char *name,
+                                    uint16List **list, Error **errp);
+
+/**
  * object_property_set:
  * @obj: the object
  * @v: the visitor that will be used to write the property value.  This should
diff --git a/qom/object.c b/qom/object.c
index e42b254..3876618 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -13,6 +13,7 @@
 #include "qom/object.h"
 #include "qemu-common.h"
 #include "qapi/visitor.h"
+#include "qapi-visit.h"
 #include "qapi/string-input-visitor.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/qmp/qerror.h"
@@ -938,6 +939,40 @@ int64_t object_property_get_int(Object *obj, const char *name,
     return retval;
 }
 
+int object_property_get_enum(Object *obj, const char *name,
+                             const char *strings[], Error **errp)
+{
+    StringOutputVisitor *sov;
+    StringInputVisitor *siv;
+    int ret;
+
+    sov = string_output_visitor_new(false);
+    object_property_get(obj, string_output_get_visitor(sov), name, errp);
+    siv = string_input_visitor_new(string_output_get_string(sov));
+    string_output_visitor_cleanup(sov);
+    visit_type_enum(string_input_get_visitor(siv),
+                    &ret, strings, NULL, name, errp);
+    string_input_visitor_cleanup(siv);
+
+    return ret;
+}
+
+void object_property_get_uint16List(Object *obj, const char *name,
+                                    uint16List **list, Error **errp)
+{
+    StringOutputVisitor *ov;
+    StringInputVisitor *iv;
+
+    ov = string_output_visitor_new(false);
+    object_property_get(obj, string_output_get_visitor(ov),
+                        name, errp);
+    iv = string_input_visitor_new(string_output_get_string(ov));
+    visit_type_uint16List(string_input_get_visitor(iv),
+                          list, NULL, errp);
+    string_output_visitor_cleanup(ov);
+    string_input_visitor_cleanup(iv);
+}
+
 void object_property_parse(Object *obj, const char *string,
                            const char *name, Error **errp)
 {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (26 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 27/29] qom: introduce object_property_get_enum and object_property_get_uint16List Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 12:36   ` Igor Mammedov
  2014-06-09 17:24   ` Eric Blake
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 29/29] hmp: add info memdev Hu Tao
  2014-06-09 10:30 ` [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Michael S. Tsirkin
  29 siblings, 2 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

Add qmp command query-memdev to query for information
of memory devices

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 numa.c           | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 34 ++++++++++++++++++++++++++
 qmp-commands.hx  | 32 +++++++++++++++++++++++++
 3 files changed, 138 insertions(+)

diff --git a/numa.c b/numa.c
index 1a83733..4e2fdc4 100644
--- a/numa.c
+++ b/numa.c
@@ -31,9 +31,14 @@
 #include "qapi-visit.h"
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
+#include "qapi/qmp-output-visitor.h"
+#include "qapi/qmp-input-visitor.h"
+#include "qapi/string-output-visitor.h"
+#include "qapi/string-input-visitor.h"
 #include "qapi/qmp/qerror.h"
 #include "hw/boards.h"
 #include "sysemu/hostmem.h"
+#include "qmp-commands.h"
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -280,3 +285,70 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
         addr += size;
     }
 }
+
+MemdevList *qmp_query_memdev(Error **errp)
+{
+    MemdevList *list = NULL, *m;
+    HostMemoryBackend *backend;
+    Error *err = NULL;
+    int i;
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        backend = numa_info[i].node_memdev;
+
+        m = g_malloc0(sizeof(*m));
+        m->value = g_malloc0(sizeof(*m->value));
+        m->value->size = object_property_get_int(OBJECT(backend), "size",
+                                                 &err);
+        if (err) {
+            goto error;
+        }
+
+        m->value->merge = object_property_get_bool(OBJECT(backend), "merge",
+                                                   &err);
+        if (err) {
+            goto error;
+        }
+
+        m->value->dump = object_property_get_bool(OBJECT(backend), "dump",
+                                                  &err);
+        if (err) {
+            goto error;
+        }
+
+        m->value->prealloc = object_property_get_bool(OBJECT(backend),
+                                                      "prealloc", &err);
+        if (err) {
+            goto error;
+        }
+
+        m->value->policy = object_property_get_enum(OBJECT(backend),
+                                                    "policy",
+                                                    HostMemPolicy_lookup,
+                                                    &err);
+        if (err) {
+            goto error;
+        }
+
+        object_property_get_uint16List(OBJECT(backend), "host-nodes",
+                                       &m->value->host_nodes, &err);
+        if (err) {
+            goto error;
+        }
+
+        m->next = list;
+        list = m;
+    }
+
+    return list;
+
+error:
+    while (list) {
+        m = list;
+        list = list->next;
+        g_free(m->value);
+        g_free(m);
+    }
+    qerror_report_err(err);
+    return NULL;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 0898c00..f23c3f1 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4779,3 +4779,37 @@
 ##
 { 'enum': 'HostMemPolicy',
   'data': [ 'default', 'preferred', 'bind', 'interleave' ] }
+
+##
+# @Memdev:
+#
+# Information of memory device
+#
+# @size: memory device size
+#
+# @host-nodes: host nodes for its memory policy
+#
+# @policy: memory policy of memory device
+#
+# Since: 2.1
+##
+
+{ 'type': 'Memdev',
+  'data': {
+    'size':       'size',
+    'merge':      'bool',
+    'dump':       'bool',
+    'prealloc':   'bool',
+    'host-nodes': ['uint16'],
+    'policy':     'HostMemPolicy' }}
+
+##
+# @query-memdev:
+#
+# Returns information for all memory devices.
+#
+# Returns: a list of @Memdev.
+#
+# Since: 2.1
+##
+{ 'command': 'query-memdev', 'returns': ['Memdev'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d8aa4ed..ea8958f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3572,3 +3572,35 @@ Example:
                    } } ] }
 
 EQMP
+
+    {
+        .name       = "query-memdev",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_memdev,
+    },
+
+SQMP
+query-memdev
+------------
+
+Show memory devices information.
+
+
+Example (1):
+
+-> { "execute": "query-memdev" }
+<- { "return": [
+       {
+         "size": 536870912,
+         "host-nodes": [0, 1],
+         "policy": "bind"
+       },
+       {
+         "size": 536870912,
+         "host-nodes": [2, 3],
+         "policy": "preferred"
+       }
+     ]
+   }
+
+EQMP
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH v4 29/29] hmp: add info memdev
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (27 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev Hu Tao
@ 2014-06-09 10:25 ` Hu Tao
  2014-06-09 10:30 ` [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Michael S. Tsirkin
  29 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-09 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin,
	Eduardo Habkost, Igor Mammedov

This is the hmp counterpart of qmp query-memdev.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
---
 hmp.c     | 36 ++++++++++++++++++++++++++++++++++++
 hmp.h     |  1 +
 monitor.c |  7 +++++++
 3 files changed, 44 insertions(+)

diff --git a/hmp.c b/hmp.c
index ccc35d4..4e80e24 100644
--- a/hmp.c
+++ b/hmp.c
@@ -22,6 +22,8 @@
 #include "qemu/sockets.h"
 #include "monitor/monitor.h"
 #include "qapi/opts-visitor.h"
+#include "qapi/string-output-visitor.h"
+#include "qapi-visit.h"
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
@@ -1676,3 +1678,37 @@ void hmp_object_del(Monitor *mon, const QDict *qdict)
     qmp_object_del(id, &err);
     hmp_handle_error(mon, &err);
 }
+
+void hmp_info_memdev(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    MemdevList *memdev_list = qmp_query_memdev(&err);
+    MemdevList *m = memdev_list;
+    StringOutputVisitor *ov;
+    int i = 0;
+
+
+    while (m) {
+        ov = string_output_visitor_new(false);
+        visit_type_uint16List(string_output_get_visitor(ov),
+                              &m->value->host_nodes, NULL, NULL);
+        monitor_printf(mon, "memory device %d\n", i);
+        monitor_printf(mon, "  size: %" PRId64 "\n", m->value->size);
+        monitor_printf(mon, "  merge: %s\n",
+                       m->value->merge ? "true" : "false");
+        monitor_printf(mon, "  dump: %s\n",
+                       m->value->dump ? "true" : "false");
+        monitor_printf(mon, "  prealloc: %s\n",
+                       m->value->prealloc ? "true" : "false");
+        monitor_printf(mon, "  policy: %s\n",
+                       HostMemPolicy_lookup[m->value->policy]);
+        monitor_printf(mon, "  host nodes: %s\n",
+                       string_output_get_string(ov));
+
+        string_output_visitor_cleanup(ov);
+        m = m->next;
+        i++;
+    }
+
+    monitor_printf(mon, "\n");
+}
diff --git a/hmp.h b/hmp.h
index aba59e9..17a91d3 100644
--- a/hmp.h
+++ b/hmp.h
@@ -103,5 +103,6 @@ void chardev_add_completion(ReadLineState *rs, int nb_args, const char *str);
 void set_link_completion(ReadLineState *rs, int nb_args, const char *str);
 void netdev_add_completion(ReadLineState *rs, int nb_args, const char *str);
 void netdev_del_completion(ReadLineState *rs, int nb_args, const char *str);
+void hmp_info_memdev(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/monitor.c b/monitor.c
index a2a4466..c270fd8 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2959,6 +2959,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.cmd = hmp_info_tpm,
     },
     {
+        .name       = "memdev",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show the memory device",
+        .mhandler.cmd = hmp_info_memdev,
+    },
+    {
         .name       = NULL,
     },
 };
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
                   ` (28 preceding siblings ...)
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 29/29] hmp: add info memdev Hu Tao
@ 2014-06-09 10:30 ` Michael S. Tsirkin
  2014-06-09 11:40   ` Michael S. Tsirkin
  29 siblings, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-09 10:30 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> note: this series is based on MST's pci tree.

No, please rebase on top of numa branch in my tree, not on
pci branch.
I applied a bunch of your there and don't want
spend time going over them again.


> changes to v3.2:
> 
> - rebase to latest git tree since there are several conflicts
> - no error_is_set() now
> - no QEMUMachineInitArgs now
> - use mbind flag MPOL_MF_STRICT to catch memory allocations that don't
>   follow policy
> - some document & error message fix
> 
> 
> Hu Tao (8):
>   hostmem: separate allocation from UserCreatable complete method
>   hostmem: add properties for NUMA memory policy
>   Introduce signed range.
>   qapi: make string input visitor parse int list
>   qapi: make string output visitor parse int list
>   qom: introduce object_property_get_enum and
>     object_property_get_uint16List
>   qmp: add query-memdev
>   hmp: add info memdev
> 
> Luiz Capitulino (1):
>   man: improve -numa doc
> 
> Paolo Bonzini (14):
>   vl: redo -object parsing
>   qmp: improve error reporting for -object and object-add
>   pc: pass MachineState to pc_memory_init
>   numa: introduce memory_region_allocate_system_memory
>   numa: add -numa node,memdev= option
>   memory: reorganize file-based allocation
>   memory: move mem_path handling to memory_region_allocate_system_memory
>   memory: add error propagation to file-based RAM allocation
>   memory: move preallocation code out of exec.c
>   memory: move RAM_PREALLOC_MASK to exec.c, rename
>   hostmem: add file-based HostMemoryBackend
>   hostmem: add merge and dump properties
>   hostmem: allow preallocation of any memory region
>   hostmem: add property to map memory with MAP_SHARED
> 
> Wanlong Gao (6):
>   NUMA: move numa related code to new file numa.c
>   NUMA: check if the total numa memory size is equal to ram_size
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: convert -numa option to use OptsVisitor
>   NUMA: expand MAX_NODES from 64 to 128
>   configure: add Linux libnuma detection
> 
>  Makefile.target                    |   2 +-
>  backends/Makefile.objs             |   1 +
>  backends/hostmem-file.c            | 134 ++++++++++++++
>  backends/hostmem-ram.c             |   7 +-
>  backends/hostmem.c                 | 300 +++++++++++++++++++++++++++++--
>  configure                          |  32 ++++
>  cpus.c                             |  14 --
>  exec.c                             | 211 +++++++++++-----------
>  hmp.c                              |  36 ++++
>  hmp.h                              |   1 +
>  hw/i386/pc.c                       |  37 ++--
>  hw/i386/pc_piix.c                  |   8 +-
>  hw/i386/pc_q35.c                   |   4 +-
>  hw/ppc/spapr.c                     |  11 +-
>  include/exec/cpu-all.h             |   8 -
>  include/exec/cpu-common.h          |   2 +
>  include/exec/memory.h              |  33 ++++
>  include/exec/ram_addr.h            |   4 +
>  include/hw/boards.h                |   6 +-
>  include/hw/i386/pc.h               |   7 +-
>  include/qemu/osdep.h               |  12 ++
>  include/qemu/range.h               | 124 +++++++++++++
>  include/qom/object.h               |  28 +++
>  include/sysemu/cpus.h              |   1 -
>  include/sysemu/hostmem.h           |   8 +
>  include/sysemu/sysemu.h            |  18 +-
>  memory.c                           |  29 +++
>  monitor.c                          |   9 +-
>  numa.c                             | 354 +++++++++++++++++++++++++++++++++++++
>  qapi-schema.json                   |  91 ++++++++++
>  qapi/string-input-visitor.c        | 181 ++++++++++++++++++-
>  qapi/string-output-visitor.c       | 230 ++++++++++++++++++++++--
>  qemu-options.hx                    |  16 +-
>  qmp-commands.hx                    |  32 ++++
>  qmp.c                              |   3 +-
>  qom/object.c                       |  35 ++++
>  tests/test-string-input-visitor.c  |  39 ++++
>  tests/test-string-output-visitor.c |  34 ++++
>  util/oslib-posix.c                 |  73 ++++++++
>  vl.c                               | 216 +++++-----------------
>  40 files changed, 2014 insertions(+), 377 deletions(-)
>  create mode 100644 backends/hostmem-file.c
>  create mode 100644 numa.c
> 
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/29] Introduce signed range.
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 24/29] Introduce signed range Hu Tao
@ 2014-06-09 10:42   ` Peter Maydell
  2014-06-09 10:59     ` Michael S. Tsirkin
  0 siblings, 1 reply; 92+ messages in thread
From: Peter Maydell @ 2014-06-09 10:42 UTC (permalink / raw)
  To: Hu Tao
  Cc: Eduardo Habkost, Michael S. Tsirkin, QEMU Developers,
	Paolo Bonzini, Igor Mammedov, Yasunori Goto

On 9 June 2014 11:25, Hu Tao <hutao@cn.fujitsu.com> wrote:
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  include/qemu/range.h | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 124 insertions(+)
>
> diff --git a/include/qemu/range.h b/include/qemu/range.h
> index aae9720..8879f8a 100644
> --- a/include/qemu/range.h
> +++ b/include/qemu/range.h
> @@ -3,6 +3,7 @@
>
>  #include <inttypes.h>
>  #include <qemu/typedefs.h>
> +#include "qemu/queue.h"
>
>  /*
>   * Operations on 64 bit address ranges.
> @@ -60,4 +61,127 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>      return !(last2 < first1 || last1 < first2);
>  }
>
> +typedef struct SignedRangeList SignedRangeList;
> +
> +typedef struct SignedRange {
> +    int64_t start;
> +    int64_t length;
> +
> +    QTAILQ_ENTRY(SignedRange) entry;
> +} SignedRange;
> +
> +QTAILQ_HEAD(SignedRangeList, SignedRange);

This seems to be missing documentation about what the
semantics are and why we need it as well as the standard
Range. For instance, what does a SignedRange with a
negative length mean?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method Hu Tao
@ 2014-06-09 10:47   ` Igor Mammedov
  2014-06-10  1:55     ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 10:47 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, 9 Jun 2014 18:25:16 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> This allows the superclass to set various policies on the memory
> region that the subclass creates. Drops hostmem-ram's complete method
> accordingly.
> 
> While at file hostmem.c, s/hostmemory/host_memory/ to keep names
> consistant.
nitpick, it would be better to split rename s/hostmemory/host_memory/ in
separate patch

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  backends/hostmem-ram.c   |  7 +++----
>  backends/hostmem.c       | 40 ++++++++++++++++++++++++++++++----------
>  include/sysemu/hostmem.h |  2 ++
>  3 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
> index bba2ebc..d9a8290 100644
> --- a/backends/hostmem-ram.c
> +++ b/backends/hostmem-ram.c
> @@ -16,9 +16,8 @@
>  
>  
>  static void
> -ram_backend_memory_init(UserCreatable *uc, Error **errp)
> +ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>  {
> -    HostMemoryBackend *backend = MEMORY_BACKEND(uc);
>      char *path;
>  
>      if (!backend->size) {
> @@ -35,9 +34,9 @@ ram_backend_memory_init(UserCreatable *uc, Error **errp)
>  static void
>  ram_backend_class_init(ObjectClass *oc, void *data)
>  {
> -    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
> +    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
>  
> -    ucc->complete = ram_backend_memory_init;
> +    bc->alloc = ram_backend_memory_alloc;
>  }
>  
>  static const TypeInfo ram_backend_info = {
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 2f578ac..cc57c13 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -17,7 +17,7 @@
>  #include "qom/object_interfaces.h"
>  
>  static void
> -hostmemory_backend_get_size(Object *obj, Visitor *v, void *opaque,
> +host_memory_backend_get_size(Object *obj, Visitor *v, void *opaque,
>                              const char *name, Error **errp)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -27,7 +27,7 @@ hostmemory_backend_get_size(Object *obj, Visitor *v, void *opaque,
>  }
>  
>  static void
> -hostmemory_backend_set_size(Object *obj, Visitor *v, void *opaque,
> +host_memory_backend_set_size(Object *obj, Visitor *v, void *opaque,
>                              const char *name, Error **errp)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
> @@ -53,14 +53,14 @@ out:
>      error_propagate(errp, local_err);
>  }
>  
> -static void hostmemory_backend_init(Object *obj)
> +static void host_memory_backend_init(Object *obj)
>  {
>      object_property_add(obj, "size", "int",
> -                        hostmemory_backend_get_size,
> -                        hostmemory_backend_set_size, NULL, NULL, NULL);
> +                        host_memory_backend_get_size,
> +                        host_memory_backend_set_size, NULL, NULL, NULL);
>  }
>  
> -static void hostmemory_backend_finalize(Object *obj)
> +static void host_memory_backend_finalize(Object *obj)
>  {
>      HostMemoryBackend *backend = MEMORY_BACKEND(obj);
>  
> @@ -75,14 +75,34 @@ host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
>      return memory_region_size(&backend->mr) ? &backend->mr : NULL;
>  }
>  
> -static const TypeInfo hostmemory_backend_info = {
> +static void
> +host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(uc);
> +    HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
> +
> +    if (bc->alloc) {
> +        bc->alloc(backend, errp);
> +    }
> +}
> +
> +static void
> +host_memory_backend_class_init(ObjectClass *oc, void *data)
> +{
> +    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
> +
> +    ucc->complete = host_memory_backend_memory_complete;
> +}
> +
> +static const TypeInfo host_memory_backend_info = {
>      .name = TYPE_MEMORY_BACKEND,
>      .parent = TYPE_OBJECT,
>      .abstract = true,
>      .class_size = sizeof(HostMemoryBackendClass),
> +    .class_init = host_memory_backend_class_init,
>      .instance_size = sizeof(HostMemoryBackend),
> -    .instance_init = hostmemory_backend_init,
> -    .instance_finalize = hostmemory_backend_finalize,
> +    .instance_init = host_memory_backend_init,
> +    .instance_finalize = host_memory_backend_finalize,
>      .interfaces = (InterfaceInfo[]) {
>          { TYPE_USER_CREATABLE },
>          { }
> @@ -91,7 +111,7 @@ static const TypeInfo hostmemory_backend_info = {
>  
>  static void register_types(void)
>  {
> -    type_register_static(&hostmemory_backend_info);
> +    type_register_static(&host_memory_backend_info);
>  }
>  
>  type_init(register_types);
> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> index 4fc081e..923f672 100644
> --- a/include/sysemu/hostmem.h
> +++ b/include/sysemu/hostmem.h
> @@ -34,6 +34,8 @@ typedef struct HostMemoryBackendClass HostMemoryBackendClass;
>   */
>  struct HostMemoryBackendClass {
>      ObjectClass parent_class;
> +
> +    void (*alloc)(HostMemoryBackend *backend, Error **errp);
>  };
>  
>  /**
> -- 
> 1.9.3
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/29] Introduce signed range.
  2014-06-09 10:42   ` Peter Maydell
@ 2014-06-09 10:59     ` Michael S. Tsirkin
  2014-06-10  6:51       ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-09 10:59 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, Yasunori Goto, Hu Tao, QEMU Developers,
	Paolo Bonzini, Igor Mammedov

On Mon, Jun 09, 2014 at 11:42:14AM +0100, Peter Maydell wrote:
> On 9 June 2014 11:25, Hu Tao <hutao@cn.fujitsu.com> wrote:
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > ---
> >  include/qemu/range.h | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 124 insertions(+)
> >
> > diff --git a/include/qemu/range.h b/include/qemu/range.h
> > index aae9720..8879f8a 100644
> > --- a/include/qemu/range.h
> > +++ b/include/qemu/range.h
> > @@ -3,6 +3,7 @@
> >
> >  #include <inttypes.h>
> >  #include <qemu/typedefs.h>
> > +#include "qemu/queue.h"
> >
> >  /*
> >   * Operations on 64 bit address ranges.
> > @@ -60,4 +61,127 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
> >      return !(last2 < first1 || last1 < first2);
> >  }
> >
> > +typedef struct SignedRangeList SignedRangeList;
> > +
> > +typedef struct SignedRange {
> > +    int64_t start;
> > +    int64_t length;
> > +
> > +    QTAILQ_ENTRY(SignedRange) entry;
> > +} SignedRange;
> > +
> > +QTAILQ_HEAD(SignedRangeList, SignedRange);
> 
> This seems to be missing documentation about what the
> semantics are and why we need it as well as the standard
> Range. For instance, what does a SignedRange with a
> negative length mean?
> 
> thanks
> -- PMM


Yes I also don't care for list macros being mixed in with structure.

Also, numa surely uses positive numbers? why do you want
signed values?

-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend Hu Tao
@ 2014-06-09 11:32   ` Igor Mammedov
  2014-06-09 11:35     ` Michael S. Tsirkin
  2014-06-10  2:00     ` Hu Tao
  0 siblings, 2 replies; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 11:32 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, 9 Jun 2014 18:25:23 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  backends/Makefile.objs  |   1 +
>  backends/hostmem-file.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 108 insertions(+)
>  create mode 100644 backends/hostmem-file.c
> 
> diff --git a/backends/Makefile.objs b/backends/Makefile.objs
> index 7fb7acd..506a46c 100644
> --- a/backends/Makefile.objs
> +++ b/backends/Makefile.objs
> @@ -8,3 +8,4 @@ baum.o-cflags := $(SDL_CFLAGS)
>  common-obj-$(CONFIG_TPM) += tpm.o
>  
>  common-obj-y += hostmem.o hostmem-ram.o
> +common-obj-$(CONFIG_LINUX) += hostmem-file.o
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> new file mode 100644
> index 0000000..b8df933
> --- /dev/null
> +++ b/backends/hostmem-file.c
> @@ -0,0 +1,107 @@
> +/*
> + * QEMU Host Memory Backend for hugetlbfs
> + *
> + * Copyright (C) 2013 Red Hat Inc
> + *
> + * Authors:
> + *   Paolo Bonzini <pbonzini@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "sysemu/hostmem.h"
> +#include "qom/object_interfaces.h"
> +
> +/* hostmem-file.c */
> +/**
> + * @TYPE_MEMORY_BACKEND_FILE:
> + * name of backend that uses mmap on a file descriptor
> + */
> +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
how about naming it after what it really is? "memory-backend-hugepage"
Later we could split it into generic superclass mmap-ed "memory-backend-file"
and have TPH specific code moved into this backend.

> +
> +#define MEMORY_BACKEND_FILE(obj) \
> +    OBJECT_CHECK(HostMemoryBackendFile, (obj), TYPE_MEMORY_BACKEND_FILE)
> +
> +typedef struct HostMemoryBackendFile HostMemoryBackendFile;
> +
> +struct HostMemoryBackendFile {
> +    HostMemoryBackend parent_obj;
> +    char *mem_path;
> +};
> +
> +static void
> +file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
> +
> +    if (!backend->size) {
> +        error_setg(errp, "can't create backend with size 0");
> +        return;
> +    }
> +    if (!fb->mem_path) {
> +        error_setg(errp, "mem-path property not set");
> +        return;
> +    }
> +#ifndef CONFIG_LINUX
> +    error_setg(errp, "-mem-path not supported on this host");
Is it possible to not compile this backend on non linux host at all, instead
of ifdefs.

> +#else
> +    if (!memory_region_size(&backend->mr)) {
> +        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
> +                                 object_get_canonical_path(OBJECT(backend)),
> +                                 backend->size,
> +                                 fb->mem_path, errp);
> +    }
> +#endif
> +}
> +
> +static void
> +file_backend_class_init(ObjectClass *oc, void *data)
> +{
> +    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
> +
> +    bc->alloc = file_backend_memory_alloc;
> +}
> +
> +static char *get_mem_path(Object *o, Error **errp)
> +{
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +
> +    return g_strdup(fb->mem_path);
> +}
> +
> +static void set_mem_path(Object *o, const char *str, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> +
> +    if (memory_region_size(&backend->mr)) {
> +        error_setg(errp, "cannot change property value");
> +        return;
> +    }
> +    if (fb->mem_path) {
> +        g_free(fb->mem_path);
> +    }
> +    fb->mem_path = g_strdup(str);
> +}
> +
> +static void
> +file_backend_instance_init(Object *o)
> +{
> +    object_property_add_str(o, "mem-path", get_mem_path,
> +                            set_mem_path, NULL);
s/"mem-path"/"path"/


> +}
> +
> +static const TypeInfo file_backend_info = {
> +    .name = TYPE_MEMORY_BACKEND_FILE,
> +    .parent = TYPE_MEMORY_BACKEND,
> +    .class_init = file_backend_class_init,
> +    .instance_init = file_backend_instance_init,
> +    .instance_size = sizeof(HostMemoryBackendFile),
> +};
> +
> +static void register_types(void)
> +{
> +    type_register_static(&file_backend_info);
> +}
> +
> +type_init(register_types);
> -- 
> 1.9.3
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-09 11:32   ` Igor Mammedov
@ 2014-06-09 11:35     ` Michael S. Tsirkin
  2014-06-09 12:06       ` Igor Mammedov
  2014-06-10  2:00     ` Hu Tao
  1 sibling, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-09 11:35 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Paolo Bonzini

On Mon, Jun 09, 2014 at 01:32:46PM +0200, Igor Mammedov wrote:
> On Mon, 9 Jun 2014 18:25:23 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
> 
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > ---
> >  backends/Makefile.objs  |   1 +
> >  backends/hostmem-file.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 108 insertions(+)
> >  create mode 100644 backends/hostmem-file.c
> > 
> > diff --git a/backends/Makefile.objs b/backends/Makefile.objs
> > index 7fb7acd..506a46c 100644
> > --- a/backends/Makefile.objs
> > +++ b/backends/Makefile.objs
> > @@ -8,3 +8,4 @@ baum.o-cflags := $(SDL_CFLAGS)
> >  common-obj-$(CONFIG_TPM) += tpm.o
> >  
> >  common-obj-y += hostmem.o hostmem-ram.o
> > +common-obj-$(CONFIG_LINUX) += hostmem-file.o
> > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > new file mode 100644
> > index 0000000..b8df933
> > --- /dev/null
> > +++ b/backends/hostmem-file.c
> > @@ -0,0 +1,107 @@
> > +/*
> > + * QEMU Host Memory Backend for hugetlbfs
> > + *
> > + * Copyright (C) 2013 Red Hat Inc
> > + *
> > + * Authors:
> > + *   Paolo Bonzini <pbonzini@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +#include "sysemu/hostmem.h"
> > +#include "qom/object_interfaces.h"
> > +
> > +/* hostmem-file.c */
> > +/**
> > + * @TYPE_MEMORY_BACKEND_FILE:
> > + * name of backend that uses mmap on a file descriptor
> > + */
> > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> how about naming it after what it really is? "memory-backend-hugepage"
> Later we could split it into generic superclass mmap-ed "memory-backend-file"
> and have TPH specific code moved into this backend.

What does this last sentence mean?

THP is transparent huge pages, right?



> > +
> > +#define MEMORY_BACKEND_FILE(obj) \
> > +    OBJECT_CHECK(HostMemoryBackendFile, (obj), TYPE_MEMORY_BACKEND_FILE)
> > +
> > +typedef struct HostMemoryBackendFile HostMemoryBackendFile;
> > +
> > +struct HostMemoryBackendFile {
> > +    HostMemoryBackend parent_obj;
> > +    char *mem_path;
> > +};
> > +
> > +static void
> > +file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> > +{
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
> > +
> > +    if (!backend->size) {
> > +        error_setg(errp, "can't create backend with size 0");
> > +        return;
> > +    }
> > +    if (!fb->mem_path) {
> > +        error_setg(errp, "mem-path property not set");
> > +        return;
> > +    }
> > +#ifndef CONFIG_LINUX
> > +    error_setg(errp, "-mem-path not supported on this host");
> Is it possible to not compile this backend on non linux host at all, instead
> of ifdefs.
> 
> > +#else
> > +    if (!memory_region_size(&backend->mr)) {
> > +        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
> > +                                 object_get_canonical_path(OBJECT(backend)),
> > +                                 backend->size,
> > +                                 fb->mem_path, errp);
> > +    }
> > +#endif
> > +}
> > +
> > +static void
> > +file_backend_class_init(ObjectClass *oc, void *data)
> > +{
> > +    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
> > +
> > +    bc->alloc = file_backend_memory_alloc;
> > +}
> > +
> > +static char *get_mem_path(Object *o, Error **errp)
> > +{
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > +
> > +    return g_strdup(fb->mem_path);
> > +}
> > +
> > +static void set_mem_path(Object *o, const char *str, Error **errp)
> > +{
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > +
> > +    if (memory_region_size(&backend->mr)) {
> > +        error_setg(errp, "cannot change property value");
> > +        return;
> > +    }
> > +    if (fb->mem_path) {
> > +        g_free(fb->mem_path);
> > +    }
> > +    fb->mem_path = g_strdup(str);
> > +}
> > +
> > +static void
> > +file_backend_instance_init(Object *o)
> > +{
> > +    object_property_add_str(o, "mem-path", get_mem_path,
> > +                            set_mem_path, NULL);
> s/"mem-path"/"path"/
> 
> 
> > +}
> > +
> > +static const TypeInfo file_backend_info = {
> > +    .name = TYPE_MEMORY_BACKEND_FILE,
> > +    .parent = TYPE_MEMORY_BACKEND,
> > +    .class_init = file_backend_class_init,
> > +    .instance_init = file_backend_instance_init,
> > +    .instance_size = sizeof(HostMemoryBackendFile),
> > +};
> > +
> > +static void register_types(void)
> > +{
> > +    type_register_static(&file_backend_info);
> > +}
> > +
> > +type_init(register_types);
> > -- 
> > 1.9.3
> > 
> 
> 
> -- 
> Regards,
>   Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-09 10:30 ` [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Michael S. Tsirkin
@ 2014-06-09 11:40   ` Michael S. Tsirkin
  2014-06-10  1:55     ` Hu Tao
  2014-06-10  9:51     ` Hu Tao
  0 siblings, 2 replies; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-09 11:40 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > note: this series is based on MST's pci tree.
> 
> No, please rebase on top of numa branch in my tree, not on
> pci branch.
> I applied a bunch of your there and don't want
> spend time going over them again.
> 

If you want me to drop some patches, pls mention this
in the cover letter. But you don't need to keep
reposting everything.


> > changes to v3.2:
> > 
> > - rebase to latest git tree since there are several conflicts
> > - no error_is_set() now
> > - no QEMUMachineInitArgs now
> > - use mbind flag MPOL_MF_STRICT to catch memory allocations that don't
> >   follow policy
> > - some document & error message fix
> > 
> > 
> > Hu Tao (8):
> >   hostmem: separate allocation from UserCreatable complete method
> >   hostmem: add properties for NUMA memory policy
> >   Introduce signed range.
> >   qapi: make string input visitor parse int list
> >   qapi: make string output visitor parse int list
> >   qom: introduce object_property_get_enum and
> >     object_property_get_uint16List
> >   qmp: add query-memdev
> >   hmp: add info memdev
> > 
> > Luiz Capitulino (1):
> >   man: improve -numa doc
> > 
> > Paolo Bonzini (14):
> >   vl: redo -object parsing
> >   qmp: improve error reporting for -object and object-add
> >   pc: pass MachineState to pc_memory_init
> >   numa: introduce memory_region_allocate_system_memory
> >   numa: add -numa node,memdev= option
> >   memory: reorganize file-based allocation
> >   memory: move mem_path handling to memory_region_allocate_system_memory
> >   memory: add error propagation to file-based RAM allocation
> >   memory: move preallocation code out of exec.c
> >   memory: move RAM_PREALLOC_MASK to exec.c, rename
> >   hostmem: add file-based HostMemoryBackend
> >   hostmem: add merge and dump properties
> >   hostmem: allow preallocation of any memory region
> >   hostmem: add property to map memory with MAP_SHARED
> > 
> > Wanlong Gao (6):
> >   NUMA: move numa related code to new file numa.c
> >   NUMA: check if the total numa memory size is equal to ram_size
> >   NUMA: Add numa_info structure to contain numa nodes info
> >   NUMA: convert -numa option to use OptsVisitor
> >   NUMA: expand MAX_NODES from 64 to 128
> >   configure: add Linux libnuma detection
> > 
> >  Makefile.target                    |   2 +-
> >  backends/Makefile.objs             |   1 +
> >  backends/hostmem-file.c            | 134 ++++++++++++++
> >  backends/hostmem-ram.c             |   7 +-
> >  backends/hostmem.c                 | 300 +++++++++++++++++++++++++++++--
> >  configure                          |  32 ++++
> >  cpus.c                             |  14 --
> >  exec.c                             | 211 +++++++++++-----------
> >  hmp.c                              |  36 ++++
> >  hmp.h                              |   1 +
> >  hw/i386/pc.c                       |  37 ++--
> >  hw/i386/pc_piix.c                  |   8 +-
> >  hw/i386/pc_q35.c                   |   4 +-
> >  hw/ppc/spapr.c                     |  11 +-
> >  include/exec/cpu-all.h             |   8 -
> >  include/exec/cpu-common.h          |   2 +
> >  include/exec/memory.h              |  33 ++++
> >  include/exec/ram_addr.h            |   4 +
> >  include/hw/boards.h                |   6 +-
> >  include/hw/i386/pc.h               |   7 +-
> >  include/qemu/osdep.h               |  12 ++
> >  include/qemu/range.h               | 124 +++++++++++++
> >  include/qom/object.h               |  28 +++
> >  include/sysemu/cpus.h              |   1 -
> >  include/sysemu/hostmem.h           |   8 +
> >  include/sysemu/sysemu.h            |  18 +-
> >  memory.c                           |  29 +++
> >  monitor.c                          |   9 +-
> >  numa.c                             | 354 +++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json                   |  91 ++++++++++
> >  qapi/string-input-visitor.c        | 181 ++++++++++++++++++-
> >  qapi/string-output-visitor.c       | 230 ++++++++++++++++++++++--
> >  qemu-options.hx                    |  16 +-
> >  qmp-commands.hx                    |  32 ++++
> >  qmp.c                              |   3 +-
> >  qom/object.c                       |  35 ++++
> >  tests/test-string-input-visitor.c  |  39 ++++
> >  tests/test-string-output-visitor.c |  34 ++++
> >  util/oslib-posix.c                 |  73 ++++++++
> >  vl.c                               | 216 +++++-----------------
> >  40 files changed, 2014 insertions(+), 377 deletions(-)
> >  create mode 100644 backends/hostmem-file.c
> >  create mode 100644 numa.c
> > 
> > -- 
> > 1.9.3

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-09 11:35     ` Michael S. Tsirkin
@ 2014-06-09 12:06       ` Igor Mammedov
  0 siblings, 0 replies; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 12:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Paolo Bonzini, qemu-devel, Eduardo Habkost, Hu Tao

On Mon, 9 Jun 2014 14:35:53 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jun 09, 2014 at 01:32:46PM +0200, Igor Mammedov wrote:
> > On Mon, 9 Jun 2014 18:25:23 +0800
> > Hu Tao <hutao@cn.fujitsu.com> wrote:
> > 
> > > From: Paolo Bonzini <pbonzini@redhat.com>
> > > 
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > > ---
> > >  backends/Makefile.objs  |   1 +
> > >  backends/hostmem-file.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 108 insertions(+)
> > >  create mode 100644 backends/hostmem-file.c
> > > 
> > > diff --git a/backends/Makefile.objs b/backends/Makefile.objs
> > > index 7fb7acd..506a46c 100644
> > > --- a/backends/Makefile.objs
> > > +++ b/backends/Makefile.objs
> > > @@ -8,3 +8,4 @@ baum.o-cflags := $(SDL_CFLAGS)
> > >  common-obj-$(CONFIG_TPM) += tpm.o
> > >  
> > >  common-obj-y += hostmem.o hostmem-ram.o
> > > +common-obj-$(CONFIG_LINUX) += hostmem-file.o
> > > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > > new file mode 100644
> > > index 0000000..b8df933
> > > --- /dev/null
> > > +++ b/backends/hostmem-file.c
> > > @@ -0,0 +1,107 @@
> > > +/*
> > > + * QEMU Host Memory Backend for hugetlbfs
> > > + *
> > > + * Copyright (C) 2013 Red Hat Inc
> > > + *
> > > + * Authors:
> > > + *   Paolo Bonzini <pbonzini@redhat.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > + * See the COPYING file in the top-level directory.
> > > + */
> > > +#include "sysemu/hostmem.h"
> > > +#include "qom/object_interfaces.h"
> > > +
> > > +/* hostmem-file.c */
> > > +/**
> > > + * @TYPE_MEMORY_BACKEND_FILE:
> > > + * name of backend that uses mmap on a file descriptor
> > > + */
> > > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> > how about naming it after what it really is? "memory-backend-hugepage"
> > Later we could split it into generic superclass mmap-ed "memory-backend-file"
> > and have TPH specific code moved into this backend.
> 
> What does this last sentence mean?
1. currently file_ram_alloc() uses TPH specific code, I suggest to keep name
   "memory-backend-file" free for now so that in case if there would be need in
   a generic file backend, we could introduce it without causing confusion
   with TPH backend.
2. There is not much point to build TPH backend for every host, we can exclude
   it safely from non linux builds, instead of building it and make it
   failing at runtime. 

> 
> THP is transparent huge pages, right?
yes.

> 
> 
> 
> > > +
> > > +#define MEMORY_BACKEND_FILE(obj) \
> > > +    OBJECT_CHECK(HostMemoryBackendFile, (obj), TYPE_MEMORY_BACKEND_FILE)
> > > +
> > > +typedef struct HostMemoryBackendFile HostMemoryBackendFile;
> > > +
> > > +struct HostMemoryBackendFile {
> > > +    HostMemoryBackend parent_obj;
> > > +    char *mem_path;
> > > +};
> > > +
> > > +static void
> > > +file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> > > +{
> > > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
> > > +
> > > +    if (!backend->size) {
> > > +        error_setg(errp, "can't create backend with size 0");
> > > +        return;
> > > +    }
> > > +    if (!fb->mem_path) {
> > > +        error_setg(errp, "mem-path property not set");
> > > +        return;
> > > +    }
> > > +#ifndef CONFIG_LINUX
> > > +    error_setg(errp, "-mem-path not supported on this host");
> > Is it possible to not compile this backend on non linux host at all, instead
> > of ifdefs.
> > 
> > > +#else
> > > +    if (!memory_region_size(&backend->mr)) {
> > > +        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
> > > +                                 object_get_canonical_path(OBJECT(backend)),
> > > +                                 backend->size,
> > > +                                 fb->mem_path, errp);
> > > +    }
> > > +#endif
> > > +}
> > > +
> > > +static void
> > > +file_backend_class_init(ObjectClass *oc, void *data)
> > > +{
> > > +    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
> > > +
> > > +    bc->alloc = file_backend_memory_alloc;
> > > +}
> > > +
> > > +static char *get_mem_path(Object *o, Error **errp)
> > > +{
> > > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > > +
> > > +    return g_strdup(fb->mem_path);
> > > +}
> > > +
> > > +static void set_mem_path(Object *o, const char *str, Error **errp)
> > > +{
> > > +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> > > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > > +
> > > +    if (memory_region_size(&backend->mr)) {
> > > +        error_setg(errp, "cannot change property value");
> > > +        return;
> > > +    }
> > > +    if (fb->mem_path) {
> > > +        g_free(fb->mem_path);
> > > +    }
> > > +    fb->mem_path = g_strdup(str);
> > > +}
> > > +
> > > +static void
> > > +file_backend_instance_init(Object *o)
> > > +{
> > > +    object_property_add_str(o, "mem-path", get_mem_path,
> > > +                            set_mem_path, NULL);
> > s/"mem-path"/"path"/
> > 
> > 
> > > +}
> > > +
> > > +static const TypeInfo file_backend_info = {
> > > +    .name = TYPE_MEMORY_BACKEND_FILE,
> > > +    .parent = TYPE_MEMORY_BACKEND,
> > > +    .class_init = file_backend_class_init,
> > > +    .instance_init = file_backend_instance_init,
> > > +    .instance_size = sizeof(HostMemoryBackendFile),
> > > +};
> > > +
> > > +static void register_types(void)
> > > +{
> > > +    type_register_static(&file_backend_info);
> > > +}
> > > +
> > > +type_init(register_types);
> > > -- 
> > > 1.9.3
> > > 
> > 
> > 
> > -- 
> > Regards,
> >   Igor
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region Hu Tao
@ 2014-06-09 12:28   ` Igor Mammedov
  2014-06-09 12:32     ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 12:28 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Michael S. Tsirkin, qemu-devel, Eduardo Habkost,
	Paolo Bonzini

On Mon, 9 Jun 2014 18:25:25 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> And allow preallocation of file-based memory even without -mem-prealloc.
> Some care is necessary because -mem-prealloc does not allow disabling
> preallocation for hostmem-file.
maybe 'prealloc' property should belong to hostmem-file instead of the
abstract hostmem.

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  backends/hostmem-file.c  |  3 +++
>  backends/hostmem.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
>  exec.c                   |  7 +++++++
>  include/exec/memory.h    | 10 ++++++++++
>  include/exec/ram_addr.h  |  1 +
>  include/sysemu/hostmem.h |  1 +
>  memory.c                 | 11 +++++++++++
>  7 files changed, 75 insertions(+)
> 
[...]

> @@ -165,6 +204,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>          if (!backend->dump) {
>              qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
>          }
> +        if (backend->prealloc) {
> +            os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz);
> +        }
could it be done inside of hostmem_file->alloc()?

>      }
>  }
>  
> diff --git a/exec.c b/exec.c
> index 739f0cf..520d673 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1432,6 +1432,13 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
>  }
>  #endif /* !_WIN32 */
>  
> +int qemu_get_ram_fd(ram_addr_t addr)
> +{
> +    RAMBlock *block = qemu_get_ram_block(addr);
> +
> +    return block->fd;
> +}
> +
>  /* Return a host pointer to ram allocated with qemu_ram_alloc.
>     With the exception of the softmmu code in this file, this should
>     only be used for local memory (e.g. video ram) that the device owns,
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 82d7781..36226f7 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -534,6 +534,16 @@ bool memory_region_is_logging(MemoryRegion *mr);
>  bool memory_region_is_rom(MemoryRegion *mr);
>  
>  /**
> + * memory_region_get_fd: Get a file descriptor backing a RAM memory region.
> + *
> + * Returns a file descriptor backing a file-based RAM memory region,
> + * or -1 if the region is not a file-based RAM memory region.
> + *
> + * @mr: the RAM or alias memory region being queried.
> + */
> +int memory_region_get_fd(MemoryRegion *mr);
> +
> +/**
>   * memory_region_get_ram_ptr: Get a pointer into a RAM memory region.
>   *
>   * Returns a host pointer to a RAM memory region (created with
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index f9518a6..d352f60 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -27,6 +27,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>  ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>                                     MemoryRegion *mr);
>  ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
> +int qemu_get_ram_fd(ram_addr_t addr);
>  void *qemu_get_ram_ptr(ram_addr_t addr);
>  void qemu_ram_free(ram_addr_t addr);
>  void qemu_ram_free_from_ptr(ram_addr_t addr);
> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> index ede5ec9..4cae673 100644
> --- a/include/sysemu/hostmem.h
> +++ b/include/sysemu/hostmem.h
> @@ -53,6 +53,7 @@ struct HostMemoryBackend {
>      /* protected */
>      uint64_t size;
>      bool merge, dump;
> +    bool prealloc, force_prealloc;
>  
>      MemoryRegion mr;
>  };
> diff --git a/memory.c b/memory.c
> index 310729a..bcef72b 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1258,6 +1258,17 @@ void memory_region_reset_dirty(MemoryRegion *mr, hwaddr addr,
>      cpu_physical_memory_reset_dirty(mr->ram_addr + addr, size, client);
>  }
>  
> +int memory_region_get_fd(MemoryRegion *mr)
> +{
> +    if (mr->alias) {
> +        return memory_region_get_fd(mr->alias);
> +    }
> +
> +    assert(mr->terminates);
> +
> +    return qemu_get_ram_fd(mr->ram_addr & TARGET_PAGE_MASK);
> +}
> +
>  void *memory_region_get_ram_ptr(MemoryRegion *mr)
>  {
>      if (mr->alias) {
> -- 
> 1.9.3
> 
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region
  2014-06-09 12:28   ` Igor Mammedov
@ 2014-06-09 12:32     ` Paolo Bonzini
  0 siblings, 0 replies; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:32 UTC (permalink / raw)
  To: Igor Mammedov, Hu Tao
  Cc: Yasunori Goto, qemu-devel, Eduardo Habkost, Michael S. Tsirkin

Il 09/06/2014 14:28, Igor Mammedov ha scritto:
> On Mon, 9 Jun 2014 18:25:25 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
>
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> And allow preallocation of file-based memory even without -mem-prealloc.
>> Some care is necessary because -mem-prealloc does not allow disabling
>> preallocation for hostmem-file.
> maybe 'prealloc' property should belong to hostmem-file instead of the
> abstract hostmem.

No, prealloc makes sense even in hostmem-ram (especially in combination 
with "-realtime mlock").

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev Hu Tao
@ 2014-06-09 12:36   ` Igor Mammedov
  2014-06-09 12:58     ` Paolo Bonzini
  2014-06-09 17:24   ` Eric Blake
  1 sibling, 1 reply; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 12:36 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, 9 Jun 2014 18:25:33 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> Add qmp command query-memdev to query for information
> of memory devices
> 
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  numa.c           | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 34 ++++++++++++++++++++++++++
>  qmp-commands.hx  | 32 +++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)
> 
> diff --git a/numa.c b/numa.c
> index 1a83733..4e2fdc4 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -31,9 +31,14 @@
>  #include "qapi-visit.h"
>  #include "qapi/opts-visitor.h"
>  #include "qapi/dealloc-visitor.h"
> +#include "qapi/qmp-output-visitor.h"
> +#include "qapi/qmp-input-visitor.h"
> +#include "qapi/string-output-visitor.h"
> +#include "qapi/string-input-visitor.h"
>  #include "qapi/qmp/qerror.h"
>  #include "hw/boards.h"
>  #include "sysemu/hostmem.h"
> +#include "qmp-commands.h"
>  
>  QemuOptsList qemu_numa_opts = {
>      .name = "numa",
> @@ -280,3 +285,70 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>          addr += size;
>      }
>  }
> +
> +MemdevList *qmp_query_memdev(Error **errp)
> +{
> +    MemdevList *list = NULL, *m;
> +    HostMemoryBackend *backend;
> +    Error *err = NULL;
> +    int i;
> +
> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        backend = numa_info[i].node_memdev;
> +
> +        m = g_malloc0(sizeof(*m));
> +        m->value = g_malloc0(sizeof(*m->value));
> +        m->value->size = object_property_get_int(OBJECT(backend), "size",
> +                                                 &err);
> +        if (err) {
> +            goto error;
> +        }
> +
> +        m->value->merge = object_property_get_bool(OBJECT(backend), "merge",
> +                                                   &err);
> +        if (err) {
> +            goto error;
> +        }
> +
> +        m->value->dump = object_property_get_bool(OBJECT(backend), "dump",
> +                                                  &err);
> +        if (err) {
> +            goto error;
> +        }
> +
> +        m->value->prealloc = object_property_get_bool(OBJECT(backend),
> +                                                      "prealloc", &err);
> +        if (err) {
> +            goto error;
> +        }
> +
> +        m->value->policy = object_property_get_enum(OBJECT(backend),
> +                                                    "policy",
> +                                                    HostMemPolicy_lookup,
> +                                                    &err);
> +        if (err) {
> +            goto error;
> +        }
> +
> +        object_property_get_uint16List(OBJECT(backend), "host-nodes",
> +                                       &m->value->host_nodes, &err);
> +        if (err) {
> +            goto error;
> +        }
> +
> +        m->next = list;
> +        list = m;
> +    }
> +
> +    return list;
> +
> +error:
> +    while (list) {
> +        m = list;
> +        list = list->next;
> +        g_free(m->value);
> +        g_free(m);
> +    }
> +    qerror_report_err(err);
> +    return NULL;
> +}
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 0898c00..f23c3f1 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4779,3 +4779,37 @@
>  ##
>  { 'enum': 'HostMemPolicy',
>    'data': [ 'default', 'preferred', 'bind', 'interleave' ] }
> +
> +##
> +# @Memdev:
> +#
> +# Information of memory device
> +#
> +# @size: memory device size
> +#
> +# @host-nodes: host nodes for its memory policy
> +#
> +# @policy: memory policy of memory device
> +#
> +# Since: 2.1
> +##
> +
> +{ 'type': 'Memdev',
> +  'data': {
> +    'size':       'size',
> +    'merge':      'bool',
> +    'dump':       'bool',
> +    'prealloc':   'bool',
> +    'host-nodes': ['uint16'],
> +    'policy':     'HostMemPolicy' }}
> +
> +##
> +# @query-memdev:
> +#
> +# Returns information for all memory devices.
> +#
> +# Returns: a list of @Memdev.
> +#
> +# Since: 2.1
> +##
> +{ 'command': 'query-memdev', 'returns': ['Memdev'] }
Could we make it union, that returns MemdevRam + MemdevFile

MemdevFile will have additional file-only specific properties.

> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index d8aa4ed..ea8958f 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3572,3 +3572,35 @@ Example:
>                     } } ] }
>  
>  EQMP
> +
> +    {
> +        .name       = "query-memdev",
> +        .args_type  = "",
> +        .mhandler.cmd_new = qmp_marshal_input_query_memdev,
> +    },
> +
> +SQMP
> +query-memdev
> +------------
> +
> +Show memory devices information.
> +
> +
> +Example (1):
> +
> +-> { "execute": "query-memdev" }
> +<- { "return": [
> +       {
> +         "size": 536870912,
> +         "host-nodes": [0, 1],
> +         "policy": "bind"
> +       },
> +       {
> +         "size": 536870912,
> +         "host-nodes": [2, 3],
> +         "policy": "preferred"
> +       }
> +     ]
> +   }
> +
> +EQMP
> -- 
> 1.9.3
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 12:36   ` Igor Mammedov
@ 2014-06-09 12:58     ` Paolo Bonzini
  2014-06-09 13:32       ` Igor Mammedov
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-09 12:58 UTC (permalink / raw)
  To: Igor Mammedov, Hu Tao
  Cc: Yasunori Goto, Michael S. Tsirkin, qemu-devel, Eduardo Habkost

Il 09/06/2014 14:36, Igor Mammedov ha scritto:
>> > +{ 'type': 'Memdev',
>> > +  'data': {
>> > +    'size':       'size',
>> > +    'merge':      'bool',
>> > +    'dump':       'bool',
>> > +    'prealloc':   'bool',
>> > +    'host-nodes': ['uint16'],
>> > +    'policy':     'HostMemPolicy' }}
>> > +
>> > +##
>> > +# @query-memdev:
>> > +#
>> > +# Returns information for all memory devices.
>> > +#
>> > +# Returns: a list of @Memdev.
>> > +#
>> > +# Since: 2.1
>> > +##
>> > +{ 'command': 'query-memdev', 'returns': ['Memdev'] }
> Could we make it union, that returns MemdevRam + MemdevFile
>
> MemdevFile will have additional file-only specific properties.
>

Which are the file-only properties (in the current definition of Memdev)?

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/29] pc: pass MachineState to pc_memory_init
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 09/29] pc: pass MachineState to pc_memory_init Hu Tao
@ 2014-06-09 13:14   ` Igor Mammedov
  0 siblings, 0 replies; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 13:14 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, 9 Jun 2014 18:25:14 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  hw/i386/pc.c         | 23 +++++++++++------------
>  hw/i386/pc_piix.c    |  8 +++-----
>  hw/i386/pc_q35.c     |  4 +---
>  include/hw/i386/pc.h |  7 +++----
>  4 files changed, 18 insertions(+), 24 deletions(-)

Reviewed-By: Igor Mammedov <imammedo@redhat.com>

> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 2c75ecc..9860e3f 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1194,10 +1194,8 @@ void pc_acpi_init(const char *default_dsdt)
>      }
>  }
>  
> -FWCfgState *pc_memory_init(MemoryRegion *system_memory,
> -                           const char *kernel_filename,
> -                           const char *kernel_cmdline,
> -                           const char *initrd_filename,
> +FWCfgState *pc_memory_init(MachineState *machine,
> +                           MemoryRegion *system_memory,
>                             ram_addr_t below_4g_mem_size,
>                             ram_addr_t above_4g_mem_size,
>                             MemoryRegion *rom_memory,
> @@ -1208,18 +1206,18 @@ FWCfgState *pc_memory_init(MemoryRegion
> *system_memory, MemoryRegion *ram, *option_rom_mr;
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      FWCfgState *fw_cfg;
> -    ram_addr_t ram_size = below_4g_mem_size + above_4g_mem_size;
> -    MachineState *machine = MACHINE(qdev_get_machine());
>      PCMachineState *pcms = PC_MACHINE(machine);
>  
> -    linux_boot = (kernel_filename != NULL);
> +    assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size);
> +
> +    linux_boot = (machine->kernel_filename != NULL);
>  
>      /* Allocate RAM.  We allocate it as a single memory region and use
>       * aliases to address portions of it, mostly for backwards compatibility
>       * with older qemus that used qemu_ram_alloc().
>       */
>      ram = g_malloc(sizeof(*ram));
> -    memory_region_init_ram(ram, NULL, "pc.ram", ram_size);
> +    memory_region_init_ram(ram, NULL, "pc.ram", machine->ram_size);
>      vmstate_register_ram_global(ram);
>      *ram_memory = ram;
>      ram_below_4g = g_malloc(sizeof(*ram_below_4g));
> @@ -1238,7 +1236,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
>  
>      if (!guest_info->has_reserved_memory &&
>          (machine->ram_slots ||
> -         (machine->maxram_size > ram_size))) {
> +         (machine->maxram_size > machine->ram_size))) {
>          MachineClass *mc = MACHINE_GET_CLASS(machine);
>  
>          error_report("\"-memory 'slots|maxmem'\" is not supported by: %s",
> @@ -1248,9 +1246,9 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
>  
>      /* initialize hotplug memory address space */
>      if (guest_info->has_reserved_memory &&
> -        (ram_size < machine->maxram_size)) {
> +        (machine->ram_size < machine->maxram_size)) {
>          ram_addr_t hotplug_mem_size =
> -            machine->maxram_size - ram_size;
> +            machine->maxram_size - machine->ram_size;
>  
>          if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
>              error_report("unsupported amount of memory slots: %"PRIu64,
> @@ -1295,7 +1293,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
>      }
>  
>      if (linux_boot) {
> -        load_linux(fw_cfg, kernel_filename, initrd_filename,
> kernel_cmdline, below_4g_mem_size);
> +        load_linux(fw_cfg, machine->kernel_filename,
> machine->initrd_filename,
> +                   machine->kernel_cmdline, below_4g_mem_size);
>      }
>  
>      for (i = 0; i < nb_option_roms; i++) {
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index a13e8d6..3e7524b 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -156,11 +156,9 @@ static void pc_init1(MachineState *machine,
>  
>      /* allocate ram and load rom/bios */
>      if (!xen_enabled()) {
> -        fw_cfg = pc_memory_init(system_memory,
> -                       machine->kernel_filename, machine->kernel_cmdline,
> -                       machine->initrd_filename,
> -                       below_4g_mem_size, above_4g_mem_size,
> -                       rom_memory, &ram_memory, guest_info);
> +        fw_cfg = pc_memory_init(machine, system_memory,
> +                                below_4g_mem_size, above_4g_mem_size,
> +                                rom_memory, &ram_memory, guest_info);
>      }
>  
>      gsi_state = g_malloc0(sizeof(*gsi_state));
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 629eb2d..aa71332 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -143,9 +143,7 @@ static void pc_q35_init(MachineState *machine)
>  
>      /* allocate ram and load rom/bios */
>      if (!xen_enabled()) {
> -        pc_memory_init(get_system_memory(),
> -                       machine->kernel_filename, machine->kernel_cmdline,
> -                       machine->initrd_filename,
> +        pc_memory_init(machine, get_system_memory(),
>                         below_4g_mem_size, above_4g_mem_size,
>                         rom_memory, &ram_memory, guest_info);
>      }
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index fe9e18b..f337d54 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -3,6 +3,7 @@
>  
>  #include "qemu-common.h"
>  #include "exec/memory.h"
> +#include "hw/boards.h"
>  #include "hw/isa/isa.h"
>  #include "hw/block/fdc.h"
>  #include "net/net.h"
> @@ -183,10 +184,8 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t
> below_4g_mem_size, void pc_pci_as_mapping_init(Object *owner, MemoryRegion
> *system_memory, MemoryRegion *pci_address_space);
>  
> -FWCfgState *pc_memory_init(MemoryRegion *system_memory,
> -                           const char *kernel_filename,
> -                           const char *kernel_cmdline,
> -                           const char *initrd_filename,
> +FWCfgState *pc_memory_init(MachineState *machine,
> +                           MemoryRegion *system_memory,
>                             ram_addr_t below_4g_mem_size,
>                             ram_addr_t above_4g_mem_size,
>                             MemoryRegion *rom_memory,
> -- 
> 1.9.3
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 12:58     ` Paolo Bonzini
@ 2014-06-09 13:32       ` Igor Mammedov
  2014-06-09 13:40         ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 13:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Hu Tao, Michael S. Tsirkin, qemu-devel, Eduardo Habkost, Yasunori Goto

On Mon, 09 Jun 2014 14:58:39 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 09/06/2014 14:36, Igor Mammedov ha scritto:
> >> > +{ 'type': 'Memdev',
> >> > +  'data': {
> >> > +    'size':       'size',
> >> > +    'merge':      'bool',
> >> > +    'dump':       'bool',
> >> > +    'prealloc':   'bool',
> >> > +    'host-nodes': ['uint16'],
> >> > +    'policy':     'HostMemPolicy' }}
> >> > +
> >> > +##
> >> > +# @query-memdev:
> >> > +#
> >> > +# Returns information for all memory devices.
> >> > +#
> >> > +# Returns: a list of @Memdev.
> >> > +#
> >> > +# Since: 2.1
> >> > +##
> >> > +{ 'command': 'query-memdev', 'returns': ['Memdev'] }
> > Could we make it union, that returns MemdevRam + MemdevFile
> >
> > MemdevFile will have additional file-only specific properties.
> >
> 
> Which are the file-only properties (in the current definition of Memdev)?
in current none, but for file backend exposing 'path' property might be useful
alternatively instead of union we could add 'type' and optional 'path' fields
to Memdev

> 
> Paolo
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 13:32       ` Igor Mammedov
@ 2014-06-09 13:40         ` Paolo Bonzini
  2014-06-09 14:08           ` Igor Mammedov
  2014-06-09 17:15           ` Eric Blake
  0 siblings, 2 replies; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-09 13:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Hu Tao, Michael S. Tsirkin, qemu-devel, Eduardo Habkost, Yasunori Goto

Il 09/06/2014 15:32, Igor Mammedov ha scritto:
>>>>> > >> > +{ 'command': 'query-memdev', 'returns': ['Memdev'] }
>>> > > Could we make it union, that returns MemdevRam + MemdevFile
>>> > >
>>> > > MemdevFile will have additional file-only specific properties.
>>> > >
>> >
>> > Which are the file-only properties (in the current definition of Memdev)?
> in current none, but for file backend exposing 'path' property might be useful
> alternatively instead of union we could add 'type' and optional 'path' fields
> to Memdev
>

Yes, I agree.  I think the latest additions to QAPI actually let you do 
that with a QAPI union while keeping backwards-compatible output for 
other fields.  Ok to do this later?  It should be acceptable for soft 
freeze.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 13:40         ` Paolo Bonzini
@ 2014-06-09 14:08           ` Igor Mammedov
  2014-06-09 17:15           ` Eric Blake
  1 sibling, 0 replies; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 14:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Hu Tao, Michael S. Tsirkin, qemu-devel, Eduardo Habkost, Yasunori Goto

On Mon, 09 Jun 2014 15:40:56 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 09/06/2014 15:32, Igor Mammedov ha scritto:
> >>>>> > >> > +{ 'command': 'query-memdev', 'returns': ['Memdev'] }
> >>> > > Could we make it union, that returns MemdevRam + MemdevFile
> >>> > >
> >>> > > MemdevFile will have additional file-only specific properties.
> >>> > >
> >> >
> >> > Which are the file-only properties (in the current definition of Memdev)?
> > in current none, but for file backend exposing 'path' property might be useful
> > alternatively instead of union we could add 'type' and optional 'path' fields
> > to Memdev
> >
> 
> Yes, I agree.  I think the latest additions to QAPI actually let you do 
> that with a QAPI union while keeping backwards-compatible output for 
> other fields.  Ok to do this later?  It should be acceptable for soft 
> freeze.
sure.
Actually, 
all my comments could be addressed as follow up patches before freeze,
there is no point in respining huge series for more or less cosmetic changes.

> 
> Paolo


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add Hu Tao
@ 2014-06-09 15:57   ` Igor Mammedov
  2014-06-10  2:07     ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Igor Mammedov @ 2014-06-09 15:57 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, 9 Jun 2014 18:25:13 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Use QERR_INVALID_PARAMETER_VALUE for consistency.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  qmp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qmp.c b/qmp.c
> index b722dbe..cef60fb 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -540,7 +540,8 @@ void object_add(const char *type, const char *id, const QDict *qdict,
>  
>      klass = object_class_by_name(type);
>      if (!klass) {
> -        error_setg(errp, "invalid class name");
> +        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
> +                  "qom-type", "a valid class name");
With implicit "qom-type" on CLI it might be not clear to user what value is
wrong. Perhaps following would be better:

error_setg(errp, "Invalid object type name: %s", type);
 
>          return;
>      }
>  
> -- 
> 1.9.3
> 


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 13:40         ` Paolo Bonzini
  2014-06-09 14:08           ` Igor Mammedov
@ 2014-06-09 17:15           ` Eric Blake
  1 sibling, 0 replies; 92+ messages in thread
From: Eric Blake @ 2014-06-09 17:15 UTC (permalink / raw)
  To: Paolo Bonzini, Igor Mammedov
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

On 06/09/2014 07:40 AM, Paolo Bonzini wrote:
> Il 09/06/2014 15:32, Igor Mammedov ha scritto:
>>>>>> > >> > +{ 'command': 'query-memdev', 'returns': ['Memdev'] }
>>>> > > Could we make it union, that returns MemdevRam + MemdevFile
>>>> > >
>>>> > > MemdevFile will have additional file-only specific properties.
>>>> > >
>>> >
>>> > Which are the file-only properties (in the current definition of
>>> Memdev)?
>> in current none, but for file backend exposing 'path' property might
>> be useful
>> alternatively instead of union we could add 'type' and optional 'path'
>> fields
>> to Memdev
>>
> 
> Yes, I agree.  I think the latest additions to QAPI actually let you do
> that with a QAPI union while keeping backwards-compatible output for
> other fields.  Ok to do this later?  It should be acceptable for soft
> freeze.

Correct, use of a discriminated union can add a new 'type' parameter,
which in turn controls what other parameters are also present as a
group, all within the same dictionary passed over the wire, so it is a
back-compat friendly change to convert from a single struct to a QAPI
union, and can be deferred to the point where you need such a change.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/29] numa: add -numa node, memdev= option
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node,memdev= option Hu Tao
@ 2014-06-09 17:22   ` Eric Blake
  2014-06-10  2:23     ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Eric Blake @ 2014-06-09 17:22 UTC (permalink / raw)
  To: Hu Tao, qemu-devel
  Cc: Yasunori Goto, Igor Mammedov, Michael S. Tsirkin,
	Eduardo Habkost, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

On 06/09/2014 04:25 AM, Hu Tao wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> This option provides the infrastructure for binding guest NUMA nodes
> to host NUMA nodes.  For example:
> 
>  -object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
>  -numa node,nodeid=0,cpus=0,memdev=ram-node0 \
>  -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
>  -numa node,nodeid=1,cpus=1,memdev=ram-node1
> 
> The option replaces "-numa node,mem=".
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  include/sysemu/sysemu.h |  1 +

> +# @mem: #optional memory size of this node; mutually exclusive with @memdev.
> +#       Equally divide total memory among nodes if both @mem and @memdev are
> +#       omitted.
> +#
> +# @memdev: #optional memory backend object.  If specified for one node,
> +#          it must be specified for all nodes.
>  #
>  # Since: 2.1
>  ##
> @@ -4753,4 +4757,5 @@
>    'data': {
>     '*nodeid': 'uint16',
>     '*cpus':   ['uint16'],
> -   '*mem':    'size' }}
> +   '*mem':    'size',
> +   '*memdev': 'str' }}

This looks okay.

> diff --git a/qemu-options.hx b/qemu-options.hx
> index d3cd2ce..e448d33 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -95,16 +95,20 @@ specifies the maximum number of hotpluggable CPUs.
>  ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> -    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> +    "-numa node[,mem=size][,memdev=id][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)

But this implies both parameters can be used at once.  Is it worth
rewriting in two lines:

"-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n"
"-numa node[,memdev=id][,cpus=cpu[-cpu]][,nodeid=node]\n"

to make the exclusion clearer?


> -to allocate RAM and vCPUs respectively.
> +to allocate RAM and vCPU srespectively, and possibly @option{-object}

s/vCPU srespectively/vCPUs respectively/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev Hu Tao
  2014-06-09 12:36   ` Igor Mammedov
@ 2014-06-09 17:24   ` Eric Blake
  2014-06-10  2:25     ` Hu Tao
  1 sibling, 1 reply; 92+ messages in thread
From: Eric Blake @ 2014-06-09 17:24 UTC (permalink / raw)
  To: Hu Tao, qemu-devel
  Cc: Yasunori Goto, Igor Mammedov, Michael S. Tsirkin,
	Eduardo Habkost, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

On 06/09/2014 04:25 AM, Hu Tao wrote:
> Add qmp command query-memdev to query for information
> of memory devices
> 
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  numa.c           | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json | 34 ++++++++++++++++++++++++++
>  qmp-commands.hx  | 32 +++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)
> 

> +
> +##
> +# @Memdev:
> +#
> +# Information of memory device
> +#
> +# @size: memory device size
> +#
> +# @host-nodes: host nodes for its memory policy
> +#
> +# @policy: memory policy of memory device
> +#

You documented three parameters...

> +# Since: 2.1
> +##
> +
> +{ 'type': 'Memdev',
> +  'data': {
> +    'size':       'size',
> +    'merge':      'bool',
> +    'dump':       'bool',
> +    'prealloc':   'bool',
> +    'host-nodes': ['uint16'],
> +    'policy':     'HostMemPolicy' }}

...but implemented six, all listed as mandatory,...

> +Show memory devices information.
> +
> +
> +Example (1):
> +
> +-> { "execute": "query-memdev" }
> +<- { "return": [
> +       {
> +         "size": 536870912,
> +         "host-nodes": [0, 1],
> +         "policy": "bind"
> +       },

...and then only demonstrate 3 in the example.  Something's not quite right.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size Hu Tao
@ 2014-06-09 23:02   ` Eric Blake
  2014-06-10  2:29     ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Eric Blake @ 2014-06-09 23:02 UTC (permalink / raw)
  To: Hu Tao, qemu-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, Paolo Bonzini,
	Igor Mammedov, Yasunori Goto, Wanlong Gao

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

On 06/09/2014 04:25 AM, Hu Tao wrote:
> From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> 
> If the total number of the assigned numa nodes memory is not
> equal to the assigned ram size, it will write the wrong data
> to ACPI table, then the guest will ignore the wrong ACPI table
> and recognize all memory to one node. It's buggy, we should
> check it to ensure that we write the right data to ACPI table.
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> ---
>  numa.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 

> +        if (numa_total != ram_size) {
> +            error_report("qemu: total memory size for NUMA nodes (%" PRIu64 ")"
> +                         " should equal to RAM size (" RAM_ADDR_FMT ")\n",

error_report() should not include trailing \n

> +                         numa_total, ram_size);
> +            exit(1);

Not your fault that this file is full of exit(1), but it would be nice
to have a cleanup patch someday that uses EXIT_FAILURE.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method
  2014-06-09 10:47   ` Igor Mammedov
@ 2014-06-10  1:55     ` Hu Tao
  0 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  1:55 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, Jun 09, 2014 at 12:47:19PM +0200, Igor Mammedov wrote:
> On Mon, 9 Jun 2014 18:25:16 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
> 
> > This allows the superclass to set various policies on the memory
> > region that the subclass creates. Drops hostmem-ram's complete method
> > accordingly.
> > 
> > While at file hostmem.c, s/hostmemory/host_memory/ to keep names
> > consistant.
> nitpick, it would be better to split rename s/hostmemory/host_memory/ in
> separate patch

OK.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-09 11:40   ` Michael S. Tsirkin
@ 2014-06-10  1:55     ` Hu Tao
  2014-06-10  9:51     ` Hu Tao
  1 sibling, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  1:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > note: this series is based on MST's pci tree.
> > 
> > No, please rebase on top of numa branch in my tree, not on
> > pci branch.
> > I applied a bunch of your there and don't want
> > spend time going over them again.
> > 
> 
> If you want me to drop some patches, pls mention this
> in the cover letter. But you don't need to keep
> reposting everything.
> 

Sure.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-09 11:32   ` Igor Mammedov
  2014-06-09 11:35     ` Michael S. Tsirkin
@ 2014-06-10  2:00     ` Hu Tao
  2014-06-10  5:09       ` Paolo Bonzini
  1 sibling, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-10  2:00 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, Jun 09, 2014 at 01:32:46PM +0200, Igor Mammedov wrote:
> On Mon, 9 Jun 2014 18:25:23 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
> 
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > ---
> >  backends/Makefile.objs  |   1 +
> >  backends/hostmem-file.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 108 insertions(+)
> >  create mode 100644 backends/hostmem-file.c
> > 
> > diff --git a/backends/Makefile.objs b/backends/Makefile.objs
> > index 7fb7acd..506a46c 100644
> > --- a/backends/Makefile.objs
> > +++ b/backends/Makefile.objs
> > @@ -8,3 +8,4 @@ baum.o-cflags := $(SDL_CFLAGS)
> >  common-obj-$(CONFIG_TPM) += tpm.o
> >  
> >  common-obj-y += hostmem.o hostmem-ram.o
> > +common-obj-$(CONFIG_LINUX) += hostmem-file.o
> > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > new file mode 100644
> > index 0000000..b8df933
> > --- /dev/null
> > +++ b/backends/hostmem-file.c
> > @@ -0,0 +1,107 @@
> > +/*
> > + * QEMU Host Memory Backend for hugetlbfs
> > + *
> > + * Copyright (C) 2013 Red Hat Inc
> > + *
> > + * Authors:
> > + *   Paolo Bonzini <pbonzini@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +#include "sysemu/hostmem.h"
> > +#include "qom/object_interfaces.h"
> > +
> > +/* hostmem-file.c */
> > +/**
> > + * @TYPE_MEMORY_BACKEND_FILE:
> > + * name of backend that uses mmap on a file descriptor
> > + */
> > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> how about naming it after what it really is? "memory-backend-hugepage"
> Later we could split it into generic superclass mmap-ed "memory-backend-file"
> and have TPH specific code moved into this backend.

OK.

> 
> > +
> > +#define MEMORY_BACKEND_FILE(obj) \
> > +    OBJECT_CHECK(HostMemoryBackendFile, (obj), TYPE_MEMORY_BACKEND_FILE)
> > +
> > +typedef struct HostMemoryBackendFile HostMemoryBackendFile;
> > +
> > +struct HostMemoryBackendFile {
> > +    HostMemoryBackend parent_obj;
> > +    char *mem_path;
> > +};
> > +
> > +static void
> > +file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
> > +{
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
> > +
> > +    if (!backend->size) {
> > +        error_setg(errp, "can't create backend with size 0");
> > +        return;
> > +    }
> > +    if (!fb->mem_path) {
> > +        error_setg(errp, "mem-path property not set");
> > +        return;
> > +    }
> > +#ifndef CONFIG_LINUX
> > +    error_setg(errp, "-mem-path not supported on this host");
> Is it possible to not compile this backend on non linux host at all, instead
> of ifdefs.

Good idea!

> 
> > +#else
> > +    if (!memory_region_size(&backend->mr)) {
> > +        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
> > +                                 object_get_canonical_path(OBJECT(backend)),
> > +                                 backend->size,
> > +                                 fb->mem_path, errp);
> > +    }
> > +#endif
> > +}
> > +
> > +static void
> > +file_backend_class_init(ObjectClass *oc, void *data)
> > +{
> > +    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
> > +
> > +    bc->alloc = file_backend_memory_alloc;
> > +}
> > +
> > +static char *get_mem_path(Object *o, Error **errp)
> > +{
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > +
> > +    return g_strdup(fb->mem_path);
> > +}
> > +
> > +static void set_mem_path(Object *o, const char *str, Error **errp)
> > +{
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> > +    HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
> > +
> > +    if (memory_region_size(&backend->mr)) {
> > +        error_setg(errp, "cannot change property value");
> > +        return;
> > +    }
> > +    if (fb->mem_path) {
> > +        g_free(fb->mem_path);
> > +    }
> > +    fb->mem_path = g_strdup(str);
> > +}
> > +
> > +static void
> > +file_backend_instance_init(Object *o)
> > +{
> > +    object_property_add_str(o, "mem-path", get_mem_path,
> > +                            set_mem_path, NULL);
> s/"mem-path"/"path"/

OK.

> 
> 
> > +}
> > +
> > +static const TypeInfo file_backend_info = {
> > +    .name = TYPE_MEMORY_BACKEND_FILE,
> > +    .parent = TYPE_MEMORY_BACKEND,
> > +    .class_init = file_backend_class_init,
> > +    .instance_init = file_backend_instance_init,
> > +    .instance_size = sizeof(HostMemoryBackendFile),
> > +};
> > +
> > +static void register_types(void)
> > +{
> > +    type_register_static(&file_backend_info);
> > +}
> > +
> > +type_init(register_types);
> > -- 
> > 1.9.3
> > 
> 
> 
> -- 
> Regards,
>   Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add
  2014-06-09 15:57   ` Igor Mammedov
@ 2014-06-10  2:07     ` Hu Tao
  0 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  2:07 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Mon, Jun 09, 2014 at 05:57:20PM +0200, Igor Mammedov wrote:
> On Mon, 9 Jun 2014 18:25:13 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
> 
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > Use QERR_INVALID_PARAMETER_VALUE for consistency.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> >  qmp.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qmp.c b/qmp.c
> > index b722dbe..cef60fb 100644
> > --- a/qmp.c
> > +++ b/qmp.c
> > @@ -540,7 +540,8 @@ void object_add(const char *type, const char *id, const QDict *qdict,
> >  
> >      klass = object_class_by_name(type);
> >      if (!klass) {
> > -        error_setg(errp, "invalid class name");
> > +        error_set(errp, QERR_INVALID_PARAMETER_VALUE,
> > +                  "qom-type", "a valid class name");
> With implicit "qom-type" on CLI it might be not clear to user what value is
> wrong. Perhaps following would be better:
> 
> error_setg(errp, "Invalid object type name: %s", type);

Looks better for user to understand. Thanks.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/29] numa: add -numa node, memdev= option
  2014-06-09 17:22   ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node, memdev= option Eric Blake
@ 2014-06-10  2:23     ` Hu Tao
  0 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  2:23 UTC (permalink / raw)
  To: Eric Blake
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Yasunori Goto

On Mon, Jun 09, 2014 at 11:22:05AM -0600, Eric Blake wrote:
> On 06/09/2014 04:25 AM, Hu Tao wrote:
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > This option provides the infrastructure for binding guest NUMA nodes
> > to host NUMA nodes.  For example:
> > 
> >  -object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
> >  -numa node,nodeid=0,cpus=0,memdev=ram-node0 \
> >  -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
> >  -numa node,nodeid=1,cpus=1,memdev=ram-node1
> > 
> > The option replaces "-numa node,mem=".
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > ---
> >  include/sysemu/sysemu.h |  1 +
> 
> > +# @mem: #optional memory size of this node; mutually exclusive with @memdev.
> > +#       Equally divide total memory among nodes if both @mem and @memdev are
> > +#       omitted.
> > +#
> > +# @memdev: #optional memory backend object.  If specified for one node,
> > +#          it must be specified for all nodes.
> >  #
> >  # Since: 2.1
> >  ##
> > @@ -4753,4 +4757,5 @@
> >    'data': {
> >     '*nodeid': 'uint16',
> >     '*cpus':   ['uint16'],
> > -   '*mem':    'size' }}
> > +   '*mem':    'size',
> > +   '*memdev': 'str' }}
> 
> This looks okay.
> 
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index d3cd2ce..e448d33 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -95,16 +95,20 @@ specifies the maximum number of hotpluggable CPUs.
> >  ETEXI
> >  
> >  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> > -    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> > +    "-numa node[,mem=size][,memdev=id][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> 
> But this implies both parameters can be used at once.  Is it worth
> rewriting in two lines:
> 
> "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n"
> "-numa node[,memdev=id][,cpus=cpu[-cpu]][,nodeid=node]\n"
> 
> to make the exclusion clearer?

OK.

> 
> 
> > -to allocate RAM and vCPUs respectively.
> > +to allocate RAM and vCPU srespectively, and possibly @option{-object}
> 
> s/vCPU srespectively/vCPUs respectively/

:-P

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev
  2014-06-09 17:24   ` Eric Blake
@ 2014-06-10  2:25     ` Hu Tao
  0 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  2:25 UTC (permalink / raw)
  To: Eric Blake
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Yasunori Goto

On Mon, Jun 09, 2014 at 11:24:51AM -0600, Eric Blake wrote:
> On 06/09/2014 04:25 AM, Hu Tao wrote:
> > Add qmp command query-memdev to query for information
> > of memory devices
> > 
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > ---
> >  numa.c           | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json | 34 ++++++++++++++++++++++++++
> >  qmp-commands.hx  | 32 +++++++++++++++++++++++++
> >  3 files changed, 138 insertions(+)
> > 
> 
> > +
> > +##
> > +# @Memdev:
> > +#
> > +# Information of memory device
> > +#
> > +# @size: memory device size
> > +#
> > +# @host-nodes: host nodes for its memory policy
> > +#
> > +# @policy: memory policy of memory device
> > +#
> 
> You documented three parameters...
> 
> > +# Since: 2.1
> > +##
> > +
> > +{ 'type': 'Memdev',
> > +  'data': {
> > +    'size':       'size',
> > +    'merge':      'bool',
> > +    'dump':       'bool',
> > +    'prealloc':   'bool',
> > +    'host-nodes': ['uint16'],
> > +    'policy':     'HostMemPolicy' }}
> 
> ...but implemented six, all listed as mandatory,...
> 
> > +Show memory devices information.
> > +
> > +
> > +Example (1):
> > +
> > +-> { "execute": "query-memdev" }
> > +<- { "return": [
> > +       {
> > +         "size": 536870912,
> > +         "host-nodes": [0, 1],
> > +         "policy": "bind"
> > +       },
> 
> ...and then only demonstrate 3 in the example.  Something's not quite right.

Thanks for catching this!

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size
  2014-06-09 23:02   ` Eric Blake
@ 2014-06-10  2:29     ` Hu Tao
  2014-06-10  2:36       ` Eric Blake
  0 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-10  2:29 UTC (permalink / raw)
  To: Eric Blake
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Yasunori Goto, Wanlong Gao

On Mon, Jun 09, 2014 at 05:02:51PM -0600, Eric Blake wrote:
> On 06/09/2014 04:25 AM, Hu Tao wrote:
> > From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > 
> > If the total number of the assigned numa nodes memory is not
> > equal to the assigned ram size, it will write the wrong data
> > to ACPI table, then the guest will ignore the wrong ACPI table
> > and recognize all memory to one node. It's buggy, we should
> > check it to ensure that we write the right data to ACPI table.
> > 
> > Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > ---
> >  numa.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> 
> > +        if (numa_total != ram_size) {
> > +            error_report("qemu: total memory size for NUMA nodes (%" PRIu64 ")"
> > +                         " should equal to RAM size (" RAM_ADDR_FMT ")\n",
> 
> error_report() should not include trailing \n

Thanks.

> 
> > +                         numa_total, ram_size);
> > +            exit(1);
> 
> Not your fault that this file is full of exit(1), but it would be nice
> to have a cleanup patch someday that uses EXIT_FAILURE.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size
  2014-06-10  2:29     ` Hu Tao
@ 2014-06-10  2:36       ` Eric Blake
  2014-06-10  2:52         ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Eric Blake @ 2014-06-10  2:36 UTC (permalink / raw)
  To: Hu Tao
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Yasunori Goto, Wanlong Gao

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On 06/09/2014 08:29 PM, Hu Tao wrote:
> On Mon, Jun 09, 2014 at 05:02:51PM -0600, Eric Blake wrote:
>> On 06/09/2014 04:25 AM, Hu Tao wrote:
>>> From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>>>
>>> If the total number of the assigned numa nodes memory is not
>>> equal to the assigned ram size, it will write the wrong data
>>> to ACPI table, then the guest will ignore the wrong ACPI table
>>> and recognize all memory to one node. It's buggy, we should
>>> check it to ensure that we write the right data to ACPI table.
>>>
>>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
>>> ---
>>>  numa.c | 14 ++++++++++++++
>>>  1 file changed, 14 insertions(+)
>>>
>>
>>> +        if (numa_total != ram_size) {
>>> +            error_report("qemu: total memory size for NUMA nodes (%" PRIu64 ")"
>>> +                         " should equal to RAM size (" RAM_ADDR_FMT ")\n",
>>
>> error_report() should not include trailing \n
> 
> Thanks.

Sorry for not noticing earlier, but a couple more things to fix:

error_report already prefixes the error message, so s/qemu: //

The grammar is a bit awkward; how about:

"total memory for NUMA nodes (%" PRIu64 ") should equal RAM size ("
RAM_ADDR_FMT ")"

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size
  2014-06-10  2:36       ` Eric Blake
@ 2014-06-10  2:52         ` Hu Tao
  0 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  2:52 UTC (permalink / raw)
  To: Eric Blake
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Yasunori Goto, Wanlong Gao

On Mon, Jun 09, 2014 at 08:36:17PM -0600, Eric Blake wrote:
> On 06/09/2014 08:29 PM, Hu Tao wrote:
> > On Mon, Jun 09, 2014 at 05:02:51PM -0600, Eric Blake wrote:
> >> On 06/09/2014 04:25 AM, Hu Tao wrote:
> >>> From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> >>>
> >>> If the total number of the assigned numa nodes memory is not
> >>> equal to the assigned ram size, it will write the wrong data
> >>> to ACPI table, then the guest will ignore the wrong ACPI table
> >>> and recognize all memory to one node. It's buggy, we should
> >>> check it to ensure that we write the right data to ACPI table.
> >>>
> >>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> >>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> >>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> >>> ---
> >>>  numa.c | 14 ++++++++++++++
> >>>  1 file changed, 14 insertions(+)
> >>>
> >>
> >>> +        if (numa_total != ram_size) {
> >>> +            error_report("qemu: total memory size for NUMA nodes (%" PRIu64 ")"
> >>> +                         " should equal to RAM size (" RAM_ADDR_FMT ")\n",
> >>
> >> error_report() should not include trailing \n
> > 
> > Thanks.
> 
> Sorry for not noticing earlier, but a couple more things to fix:
> 
> error_report already prefixes the error message, so s/qemu: //
> 
> The grammar is a bit awkward; how about:
> 
> "total memory for NUMA nodes (%" PRIu64 ") should equal RAM size ("
> RAM_ADDR_FMT ")"

MST's already said equal RAM size. I looked up in a dictionary and found
a `equal to' example. But I was still wrong that `euqal to' is adj.
form, if it is used as verb there is no following `to'. Thanks.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  2:00     ` Hu Tao
@ 2014-06-10  5:09       ` Paolo Bonzini
  2014-06-10  8:30         ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-10  5:09 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Igor Mammedov, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost


> > > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> > how about naming it after what it really is? "memory-backend-hugepage"
> > Later we could split it into generic superclass mmap-ed
> > "memory-backend-file" and have TPH specific code moved into this backend.
> 
> OK.

Actually I don't think there's anything hugepage-specific in this backend
(except perhaps passing a path instead of a filename).  It could be used
with a tmpfs backing storage like /dev/shm.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/29] Introduce signed range.
  2014-06-09 10:59     ` Michael S. Tsirkin
@ 2014-06-10  6:51       ` Hu Tao
  2014-06-10  9:50         ` Michael S. Tsirkin
  0 siblings, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-10  6:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Yasunori Goto, QEMU Developers,
	Paolo Bonzini, Igor Mammedov

On Mon, Jun 09, 2014 at 01:59:04PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 09, 2014 at 11:42:14AM +0100, Peter Maydell wrote:
> > On 9 June 2014 11:25, Hu Tao <hutao@cn.fujitsu.com> wrote:
> > > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > > ---
> > >  include/qemu/range.h | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 124 insertions(+)
> > >
> > > diff --git a/include/qemu/range.h b/include/qemu/range.h
> > > index aae9720..8879f8a 100644
> > > --- a/include/qemu/range.h
> > > +++ b/include/qemu/range.h
> > > @@ -3,6 +3,7 @@
> > >
> > >  #include <inttypes.h>
> > >  #include <qemu/typedefs.h>
> > > +#include "qemu/queue.h"
> > >
> > >  /*
> > >   * Operations on 64 bit address ranges.
> > > @@ -60,4 +61,127 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
> > >      return !(last2 < first1 || last1 < first2);
> > >  }
> > >
> > > +typedef struct SignedRangeList SignedRangeList;
> > > +
> > > +typedef struct SignedRange {
> > > +    int64_t start;
> > > +    int64_t length;
> > > +
> > > +    QTAILQ_ENTRY(SignedRange) entry;
> > > +} SignedRange;
> > > +
> > > +QTAILQ_HEAD(SignedRangeList, SignedRange);
> > 
> > This seems to be missing documentation about what the
> > semantics are and why we need it as well as the standard
> > Range. For instance, what does a SignedRange with a
> > negative length mean?
> > 
> > thanks
> > -- PMM
> 
> 
> Yes I also don't care for list macros being mixed in with structure.
> 
> Also, numa surely uses positive numbers? why do you want
> signed values?

It's not purely for numa but for parsing int list like in string
input/output visitor.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  5:09       ` Paolo Bonzini
@ 2014-06-10  8:30         ` Hu Tao
  2014-06-10  8:56           ` Paolo Bonzini
  2014-06-10  9:07           ` Igor Mammedov
  0 siblings, 2 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  8:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yasunori Goto, Igor Mammedov, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Tue, Jun 10, 2014 at 01:09:32AM -0400, Paolo Bonzini wrote:
> 
> > > > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> > > how about naming it after what it really is? "memory-backend-hugepage"
> > > Later we could split it into generic superclass mmap-ed
> > > "memory-backend-file" and have TPH specific code moved into this backend.
> > 
> > OK.
> 
> Actually I don't think there's anything hugepage-specific in this backend
> (except perhaps passing a path instead of a filename).  It could be used
> with a tmpfs backing storage like /dev/shm.

What's the point compared to memory-backend-ram?

Igor suggested memory-backend-file be compiled only for Linux. Does this mean
memory-backend-file shuold be compiled also for systems supporting tmpfs
or like?

Regards,
Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  8:30         ` Hu Tao
@ 2014-06-10  8:56           ` Paolo Bonzini
  2014-06-10  9:21             ` Hu Tao
  2014-06-10  9:59             ` Michael S. Tsirkin
  2014-06-10  9:07           ` Igor Mammedov
  1 sibling, 2 replies; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-10  8:56 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Igor Mammedov, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

Il 10/06/2014 10:30, Hu Tao ha scritto:
> On Tue, Jun 10, 2014 at 01:09:32AM -0400, Paolo Bonzini wrote:
>>
>>>>> +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
>>>> how about naming it after what it really is? "memory-backend-hugepage"
>>>> Later we could split it into generic superclass mmap-ed
>>>> "memory-backend-file" and have TPH specific code moved into this backend.
>>>
>>> OK.
>>
>> Actually I don't think there's anything hugepage-specific in this backend
>> (except perhaps passing a path instead of a filename).  It could be used
>> with a tmpfs backing storage like /dev/shm.
>
> What's the point compared to memory-backend-ram?

That you can use shared memory, for example together with vhost-user.

> Igor suggested memory-backend-file be compiled only for Linux. Does this mean
> memory-backend-file shuold be compiled also for systems supporting tmpfs
> or like?

Yes, I think it should be compiled on all POSIX systems.  But it can be 
done later.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  8:30         ` Hu Tao
  2014-06-10  8:56           ` Paolo Bonzini
@ 2014-06-10  9:07           ` Igor Mammedov
  2014-06-10  9:54             ` Michael S. Tsirkin
  1 sibling, 1 reply; 92+ messages in thread
From: Igor Mammedov @ 2014-06-10  9:07 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Tue, 10 Jun 2014 16:30:06 +0800
Hu Tao <hutao@cn.fujitsu.com> wrote:

> On Tue, Jun 10, 2014 at 01:09:32AM -0400, Paolo Bonzini wrote:
> > 
> > > > > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> > > > how about naming it after what it really is? "memory-backend-hugepage"
> > > > Later we could split it into generic superclass mmap-ed
> > > > "memory-backend-file" and have TPH specific code moved into this backend.
> > > 
> > > OK.
> > 
> > Actually I don't think there's anything hugepage-specific in this backend
> > (except perhaps passing a path instead of a filename).  It could be used
> > with a tmpfs backing storage like /dev/shm.
> 
> What's the point compared to memory-backend-ram?
> 
> Igor suggested memory-backend-file be compiled only for Linux. Does this mean
> memory-backend-file shuold be compiled also for systems supporting tmpfs
> or like?
I was too hasty with this suggestion, looking again at behind scenes
file_ram_alloc(), for now it works only with THP /gethugepagesize()/ but
it could be modified to run on non linux hosts as well and take /dev/shm or
just any file on host as backing storage.


> 
> Regards,
> Hu


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  8:56           ` Paolo Bonzini
@ 2014-06-10  9:21             ` Hu Tao
  2014-06-10  9:59             ` Michael S. Tsirkin
  1 sibling, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  9:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yasunori Goto, Igor Mammedov, Michael S. Tsirkin, qemu-devel,
	Eduardo Habkost

On Tue, Jun 10, 2014 at 10:56:42AM +0200, Paolo Bonzini wrote:
> Il 10/06/2014 10:30, Hu Tao ha scritto:
> >On Tue, Jun 10, 2014 at 01:09:32AM -0400, Paolo Bonzini wrote:
> >>
> >>>>>+#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> >>>>how about naming it after what it really is? "memory-backend-hugepage"
> >>>>Later we could split it into generic superclass mmap-ed
> >>>>"memory-backend-file" and have TPH specific code moved into this backend.
> >>>
> >>>OK.
> >>
> >>Actually I don't think there's anything hugepage-specific in this backend
> >>(except perhaps passing a path instead of a filename).  It could be used
> >>with a tmpfs backing storage like /dev/shm.
> >
> >What's the point compared to memory-backend-ram?
> 
> That you can use shared memory, for example together with vhost-user.
> 
> >Igor suggested memory-backend-file be compiled only for Linux. Does this mean
> >memory-backend-file shuold be compiled also for systems supporting tmpfs
> >or like?
> 
> Yes, I think it should be compiled on all POSIX systems.  But it can
> be done later.

OK. I'll leave the patch as is.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/29] Introduce signed range.
  2014-06-10  6:51       ` Hu Tao
@ 2014-06-10  9:50         ` Michael S. Tsirkin
  0 siblings, 0 replies; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10  9:50 UTC (permalink / raw)
  To: Hu Tao
  Cc: Peter Maydell, Eduardo Habkost, Yasunori Goto, QEMU Developers,
	Paolo Bonzini, Igor Mammedov

On Tue, Jun 10, 2014 at 02:51:49PM +0800, Hu Tao wrote:
> On Mon, Jun 09, 2014 at 01:59:04PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jun 09, 2014 at 11:42:14AM +0100, Peter Maydell wrote:
> > > On 9 June 2014 11:25, Hu Tao <hutao@cn.fujitsu.com> wrote:
> > > > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> > > > ---
> > > >  include/qemu/range.h | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 124 insertions(+)
> > > >
> > > > diff --git a/include/qemu/range.h b/include/qemu/range.h
> > > > index aae9720..8879f8a 100644
> > > > --- a/include/qemu/range.h
> > > > +++ b/include/qemu/range.h
> > > > @@ -3,6 +3,7 @@
> > > >
> > > >  #include <inttypes.h>
> > > >  #include <qemu/typedefs.h>
> > > > +#include "qemu/queue.h"
> > > >
> > > >  /*
> > > >   * Operations on 64 bit address ranges.
> > > > @@ -60,4 +61,127 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
> > > >      return !(last2 < first1 || last1 < first2);
> > > >  }
> > > >
> > > > +typedef struct SignedRangeList SignedRangeList;
> > > > +
> > > > +typedef struct SignedRange {
> > > > +    int64_t start;
> > > > +    int64_t length;
> > > > +
> > > > +    QTAILQ_ENTRY(SignedRange) entry;
> > > > +} SignedRange;
> > > > +
> > > > +QTAILQ_HEAD(SignedRangeList, SignedRange);
> > > 
> > > This seems to be missing documentation about what the
> > > semantics are and why we need it as well as the standard
> > > Range. For instance, what does a SignedRange with a
> > > negative length mean?
> > > 
> > > thanks
> > > -- PMM
> > 
> > 
> > Yes I also don't care for list macros being mixed in with structure.
> > 
> > Also, numa surely uses positive numbers? why do you want
> > signed values?
> 
> It's not purely for numa but for parsing int list like in string
> input/output visitor.
> 
> Hu

I doubt we need negative ranges anywhere.
Let's stick to the minimum functionality necessary for numa,
this patchset is big as is.
I would be happier if we could drop this patch completely
use existing unsigned range APIs, if you need a list struct
add it within the file that uses it.

Having said that we can clean this up afterwards.

-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-09 11:40   ` Michael S. Tsirkin
  2014-06-10  1:55     ` Hu Tao
@ 2014-06-10  9:51     ` Hu Tao
  2014-06-10  9:56       ` Michael S. Tsirkin
  1 sibling, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-10  9:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > note: this series is based on MST's pci tree.
> > 
> > No, please rebase on top of numa branch in my tree, not on
> > pci branch.
> > I applied a bunch of your there and don't want
> > spend time going over them again.
> > 
> 
> If you want me to drop some patches, pls mention this
> in the cover letter. But you don't need to keep
> reposting everything.

4 patches dropped from your numa tree. And do you mind patches
reordering? I rebased v5 on commit 7a3af0813f9, the patches keep in
order as in v4. Those patches with your Acked-by line are cherry-picked
from your numa tree without conflict. Are you OK with this?

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  9:07           ` Igor Mammedov
@ 2014-06-10  9:54             ` Michael S. Tsirkin
  0 siblings, 0 replies; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10  9:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Paolo Bonzini

On Tue, Jun 10, 2014 at 11:07:35AM +0200, Igor Mammedov wrote:
> On Tue, 10 Jun 2014 16:30:06 +0800
> Hu Tao <hutao@cn.fujitsu.com> wrote:
> 
> > On Tue, Jun 10, 2014 at 01:09:32AM -0400, Paolo Bonzini wrote:
> > > 
> > > > > > +#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> > > > > how about naming it after what it really is? "memory-backend-hugepage"
> > > > > Later we could split it into generic superclass mmap-ed
> > > > > "memory-backend-file" and have TPH specific code moved into this backend.
> > > > 
> > > > OK.
> > > 
> > > Actually I don't think there's anything hugepage-specific in this backend
> > > (except perhaps passing a path instead of a filename).  It could be used
> > > with a tmpfs backing storage like /dev/shm.
> > 
> > What's the point compared to memory-backend-ram?
> > 
> > Igor suggested memory-backend-file be compiled only for Linux. Does this mean
> > memory-backend-file shuold be compiled also for systems supporting tmpfs
> > or like?
> I was too hasty with this suggestion, looking again at behind scenes
> file_ram_alloc(), for now it works only with THP

You mean Hugetlbfs I guess, not THP?

> /gethugepagesize()/ but
> it could be modified to run on non linux hosts as well and take /dev/shm or
> just any file on host as backing storage.

Yes, however there's a problem: on linux THP does not work with non
anonymous memory at the moment.
So using this feature would slow everything down as you get more
TLB misses. That would be quite unexpected for users.
Requiring hugetlbfs follows the principle of least surprise.

> 
> > 
> > Regards,
> > Hu
> 
> 
> -- 
> Regards,
>   Igor

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-10  9:51     ` Hu Tao
@ 2014-06-10  9:56       ` Michael S. Tsirkin
  2014-06-10  9:57         ` Hu Tao
  2014-06-10 10:19         ` Hu Tao
  0 siblings, 2 replies; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10  9:56 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Tue, Jun 10, 2014 at 05:51:48PM +0800, Hu Tao wrote:
> On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > > note: this series is based on MST's pci tree.
> > > 
> > > No, please rebase on top of numa branch in my tree, not on
> > > pci branch.
> > > I applied a bunch of your there and don't want
> > > spend time going over them again.
> > > 
> > 
> > If you want me to drop some patches, pls mention this
> > in the cover letter. But you don't need to keep
> > reposting everything.
> 
> 4 patches dropped from your numa tree.

OK, which ones?

> And do you mind patches
> reordering? I rebased v5 on commit 7a3af0813f9, the patches keep in
> order as in v4. Those patches with your Acked-by line are cherry-picked
> from your numa tree without conflict. Are you OK with this?
> 
> Hu

I'd like to avoid reposting same patches.
Can't you rebase on top of 0b85e768de158cf815003638df228ba1424c6c3b?
What's the problem?


-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-10  9:56       ` Michael S. Tsirkin
@ 2014-06-10  9:57         ` Hu Tao
  2014-06-10 10:19         ` Hu Tao
  1 sibling, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Tue, Jun 10, 2014 at 12:56:56PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 10, 2014 at 05:51:48PM +0800, Hu Tao wrote:
> > On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > > > note: this series is based on MST's pci tree.
> > > > 
> > > > No, please rebase on top of numa branch in my tree, not on
> > > > pci branch.
> > > > I applied a bunch of your there and don't want
> > > > spend time going over them again.
> > > > 
> > > 
> > > If you want me to drop some patches, pls mention this
> > > in the cover letter. But you don't need to keep
> > > reposting everything.
> > 
> > 4 patches dropped from your numa tree.
> 
> OK, which ones?
> 
> > And do you mind patches
> > reordering? I rebased v5 on commit 7a3af0813f9, the patches keep in
> > order as in v4. Those patches with your Acked-by line are cherry-picked
> > from your numa tree without conflict. Are you OK with this?
> > 
> > Hu
> 
> I'd like to avoid reposting same patches.
> Can't you rebase on top of 0b85e768de158cf815003638df228ba1424c6c3b?
> What's the problem?

Let me try.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  8:56           ` Paolo Bonzini
  2014-06-10  9:21             ` Hu Tao
@ 2014-06-10  9:59             ` Michael S. Tsirkin
  2014-06-10 11:12               ` Paolo Bonzini
  1 sibling, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10  9:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

On Tue, Jun 10, 2014 at 10:56:42AM +0200, Paolo Bonzini wrote:
> Il 10/06/2014 10:30, Hu Tao ha scritto:
> >On Tue, Jun 10, 2014 at 01:09:32AM -0400, Paolo Bonzini wrote:
> >>
> >>>>>+#define TYPE_MEMORY_BACKEND_FILE "memory-backend-file"
> >>>>how about naming it after what it really is? "memory-backend-hugepage"
> >>>>Later we could split it into generic superclass mmap-ed
> >>>>"memory-backend-file" and have TPH specific code moved into this backend.
> >>>
> >>>OK.
> >>
> >>Actually I don't think there's anything hugepage-specific in this backend
> >>(except perhaps passing a path instead of a filename).  It could be used
> >>with a tmpfs backing storage like /dev/shm.
> >
> >What's the point compared to memory-backend-ram?
> 
> That you can use shared memory, for example together with vhost-user.

I don't think it's a good idea until THP supports shared memory.

> >Igor suggested memory-backend-file be compiled only for Linux. Does this mean
> >memory-backend-file shuold be compiled also for systems supporting tmpfs
> >or like?
> 
> Yes, I think it should be compiled on all POSIX systems.  But it can be done
> later.
> 
> Paolo

For example when someone actually requests this :).
Which is unlikely to happen soon I think.

-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-10  9:56       ` Michael S. Tsirkin
  2014-06-10  9:57         ` Hu Tao
@ 2014-06-10 10:19         ` Hu Tao
  2014-06-10 10:27           ` Michael S. Tsirkin
  1 sibling, 1 reply; 92+ messages in thread
From: Hu Tao @ 2014-06-10 10:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Tue, Jun 10, 2014 at 12:56:56PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 10, 2014 at 05:51:48PM +0800, Hu Tao wrote:
> > On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > > > note: this series is based on MST's pci tree.
> > > > 
> > > > No, please rebase on top of numa branch in my tree, not on
> > > > pci branch.
> > > > I applied a bunch of your there and don't want
> > > > spend time going over them again.
> > > > 
> > > 
> > > If you want me to drop some patches, pls mention this
> > > in the cover letter. But you don't need to keep
> > > reposting everything.
> > 
> > 4 patches dropped from your numa tree.
> 
> OK, which ones?

b36ea0f3c7eae45b71bd238fe5a3ade58ee3b1f8  NUMA: check if the total numa memory size is equal to ram_size

bf40b6aa334d9cc92ca2be6db743f009f20206e7  qmp: improve error reporting for -object and object-add

bc95a49358d37b0079435ad1a54ac35e324b1ab5  numa: introduce memory_region_allocate_system_memory

e517f31225795983f7cbaedb93a07d930bd8f939  numa: add -numa node, memdev= option


These 4 patches are modified according to comments to v4. they are in
the middle of your numa tree, how can I rebase them?



> 
> > And do you mind patches
> > reordering? I rebased v5 on commit 7a3af0813f9, the patches keep in
> > order as in v4. Those patches with your Acked-by line are cherry-picked
> > from your numa tree without conflict. Are you OK with this?
> > 
> > Hu
> 
> I'd like to avoid reposting same patches.
> Can't you rebase on top of 0b85e768de158cf815003638df228ba1424c6c3b?
> What's the problem?
> 
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-10 10:19         ` Hu Tao
@ 2014-06-10 10:27           ` Michael S. Tsirkin
  2014-06-10 11:09             ` Hu Tao
  0 siblings, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10 10:27 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Tue, Jun 10, 2014 at 06:19:35PM +0800, Hu Tao wrote:
> On Tue, Jun 10, 2014 at 12:56:56PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 10, 2014 at 05:51:48PM +0800, Hu Tao wrote:
> > > On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > > > > note: this series is based on MST's pci tree.
> > > > > 
> > > > > No, please rebase on top of numa branch in my tree, not on
> > > > > pci branch.
> > > > > I applied a bunch of your there and don't want
> > > > > spend time going over them again.
> > > > > 
> > > > 
> > > > If you want me to drop some patches, pls mention this
> > > > in the cover letter. But you don't need to keep
> > > > reposting everything.
> > > 
> > > 4 patches dropped from your numa tree.
> > 
> > OK, which ones?
> 
> b36ea0f3c7eae45b71bd238fe5a3ade58ee3b1f8  NUMA: check if the total numa memory size is equal to ram_size
> 
> bf40b6aa334d9cc92ca2be6db743f009f20206e7  qmp: improve error reporting for -object and object-add
> 
> bc95a49358d37b0079435ad1a54ac35e324b1ab5  numa: introduce memory_region_allocate_system_memory
> 
> e517f31225795983f7cbaedb93a07d930bd8f939  numa: add -numa node, memdev= option
> 
> 
> These 4 patches are modified according to comments to v4. they are in
> the middle of your numa tree, how can I rebase them?

Ah so they are modified, not dropped.
Send patches on top as part of your series, with subjects like this:

fixup! NUMA: check if the total numa memory size is equal to ram_size

When I apply the fixups automatically float up.

> 
> 
> > 
> > > And do you mind patches
> > > reordering? I rebased v5 on commit 7a3af0813f9, the patches keep in
> > > order as in v4. Those patches with your Acked-by line are cherry-picked
> > > from your numa tree without conflict. Are you OK with this?
> > > 
> > > Hu
> > 
> > I'd like to avoid reposting same patches.
> > Can't you rebase on top of 0b85e768de158cf815003638df228ba1424c6c3b?
> > What's the problem?
> > 
> > 
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements
  2014-06-10 10:27           ` Michael S. Tsirkin
@ 2014-06-10 11:09             ` Hu Tao
  0 siblings, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-10 11:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Paolo Bonzini, Igor Mammedov, qemu-devel, Eduardo Habkost

On Tue, Jun 10, 2014 at 01:27:07PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 10, 2014 at 06:19:35PM +0800, Hu Tao wrote:
> > On Tue, Jun 10, 2014 at 12:56:56PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 10, 2014 at 05:51:48PM +0800, Hu Tao wrote:
> > > > On Mon, Jun 09, 2014 at 02:40:30PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Jun 09, 2014 at 01:30:05PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jun 09, 2014 at 06:25:05PM +0800, Hu Tao wrote:
> > > > > > > note: this series is based on MST's pci tree.
> > > > > > 
> > > > > > No, please rebase on top of numa branch in my tree, not on
> > > > > > pci branch.
> > > > > > I applied a bunch of your there and don't want
> > > > > > spend time going over them again.
> > > > > > 
> > > > > 
> > > > > If you want me to drop some patches, pls mention this
> > > > > in the cover letter. But you don't need to keep
> > > > > reposting everything.
> > > > 
> > > > 4 patches dropped from your numa tree.
> > > 
> > > OK, which ones?
> > 
> > b36ea0f3c7eae45b71bd238fe5a3ade58ee3b1f8  NUMA: check if the total numa memory size is equal to ram_size
> > 
> > bf40b6aa334d9cc92ca2be6db743f009f20206e7  qmp: improve error reporting for -object and object-add
> > 
> > bc95a49358d37b0079435ad1a54ac35e324b1ab5  numa: introduce memory_region_allocate_system_memory

Sorry, this one is changed because of patch reordering, keep untouched.
There are 3 fixups.

> > 
> > e517f31225795983f7cbaedb93a07d930bd8f939  numa: add -numa node, memdev= option
> > 
> > 
> > These 4 patches are modified according to comments to v4. they are in
> > the middle of your numa tree, how can I rebase them?
> 
> Ah so they are modified, not dropped.
> Send patches on top as part of your series, with subjects like this:
> 
> fixup! NUMA: check if the total numa memory size is equal to ram_size
> 
> When I apply the fixups automatically float up.
> 
> > 
> > 
> > > 
> > > > And do you mind patches
> > > > reordering? I rebased v5 on commit 7a3af0813f9, the patches keep in
> > > > order as in v4. Those patches with your Acked-by line are cherry-picked
> > > > from your numa tree without conflict. Are you OK with this?
> > > > 
> > > > Hu
> > > 
> > > I'd like to avoid reposting same patches.
> > > Can't you rebase on top of 0b85e768de158cf815003638df228ba1424c6c3b?
> > > What's the problem?
> > > 
> > > 
> > > -- 
> > > MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10  9:59             ` Michael S. Tsirkin
@ 2014-06-10 11:12               ` Paolo Bonzini
  2014-06-10 11:23                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-10 11:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

Il 10/06/2014 11:59, Michael S. Tsirkin ha scritto:
> > >What's the point compared to memory-backend-ram?
> >
> > That you can use shared memory, for example together with vhost-user.
>
> I don't think it's a good idea until THP supports shared memory.

Why?  For example it would be useful for testing on machines that you 
don't have root for, and that do not have a hugetlbfs mount point.  For 
example you could run the test case from the vhost-user's patches.

THP is not a magic wand and you can get slowness from memory 
fragmentation at any time.  We should not limit ourselves due to kernel 
bugs.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10 11:12               ` Paolo Bonzini
@ 2014-06-10 11:23                 ` Michael S. Tsirkin
  2014-06-10 11:29                   ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10 11:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

On Tue, Jun 10, 2014 at 01:12:04PM +0200, Paolo Bonzini wrote:
> Il 10/06/2014 11:59, Michael S. Tsirkin ha scritto:
> >> >What's the point compared to memory-backend-ram?
> >>
> >> That you can use shared memory, for example together with vhost-user.
> >
> >I don't think it's a good idea until THP supports shared memory.
> 
> Why?  For example it would be useful for testing on machines that you don't
> have root for, and that do not have a hugetlbfs mount point.  For example
> you could run the test case from the vhost-user's patches.

Sounds useful, I guess we could allow this when running under qtest.

> THP is not a magic wand and you can get slowness from memory fragmentation
> at any time.

Right but there's a difference between "can get slowness when memory
is overcommitted" and "will get slowness even on a mostly idle box".

> We should not limit ourselves due to kernel bugs.
> 
> Paolo

Why not?  Practically people do have to run this on some kernel,
we should not use kernel in a way that it can't support well.
Old firefox doing a ton of fsync commands and slowing
the box to a crawl comes to mind as another example of this.

-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10 11:23                 ` Michael S. Tsirkin
@ 2014-06-10 11:29                   ` Paolo Bonzini
  2014-06-10 11:35                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-10 11:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

Il 10/06/2014 13:23, Michael S. Tsirkin ha scritto:
> On Tue, Jun 10, 2014 at 01:12:04PM +0200, Paolo Bonzini wrote:
>> Il 10/06/2014 11:59, Michael S. Tsirkin ha scritto:
>>>>> What's the point compared to memory-backend-ram?
>>>>
>>>> That you can use shared memory, for example together with vhost-user.
>>>
>>> I don't think it's a good idea until THP supports shared memory.
>>
>> Why?  For example it would be useful for testing on machines that you don't
>> have root for, and that do not have a hugetlbfs mount point.  For example
>> you could run the test case from the vhost-user's patches.
>
> Sounds useful, I guess we could allow this when running under qtest.

Or TCG, or Xen.  At this point, why single out KVM?

(Also, "--enable-kvm -mem-path /dev/shm" works on 2.0, and it would be a 
regression in 2.1).

>> THP is not a magic wand and you can get slowness from memory fragmentation
>> at any time.
>
> Right but there's a difference between "can get slowness when memory
> is overcommitted" and "will get slowness even on a mostly idle box".

I would like to see the slowness on a real-world benchmark though.  I 
suspect in most scenarios it would not matter.

>> We should not limit ourselves due to kernel bugs.
>
> Why not?  Practically people do have to run this on some kernel,
> we should not use kernel in a way that it can't support well.
> Old firefox doing a ton of fsync commands and slowing
> the box to a crawl comes to mind as another example of this.

Yes, and firefox doesn't say "no sorry can't do it" when running on such 
a kernel (which is much worse than more expensive TLB misses).

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10 11:29                   ` Paolo Bonzini
@ 2014-06-10 11:35                     ` Michael S. Tsirkin
  2014-06-10 11:43                       ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10 11:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

On Tue, Jun 10, 2014 at 01:29:06PM +0200, Paolo Bonzini wrote:
> Il 10/06/2014 13:23, Michael S. Tsirkin ha scritto:
> >On Tue, Jun 10, 2014 at 01:12:04PM +0200, Paolo Bonzini wrote:
> >>Il 10/06/2014 11:59, Michael S. Tsirkin ha scritto:
> >>>>>What's the point compared to memory-backend-ram?
> >>>>
> >>>>That you can use shared memory, for example together with vhost-user.
> >>>
> >>>I don't think it's a good idea until THP supports shared memory.
> >>
> >>Why?  For example it would be useful for testing on machines that you don't
> >>have root for, and that do not have a hugetlbfs mount point.  For example
> >>you could run the test case from the vhost-user's patches.
> >
> >Sounds useful, I guess we could allow this when running under qtest.
> 
> Or TCG, or Xen.  At this point, why single out KVM?
> 
> (Also, "--enable-kvm -mem-path /dev/shm" works on 2.0, and it would be a
> regression in 2.1).

It prints
        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);

Correct?
I guess I agree then, hopefully the warning is enough.
Maybe add an extra warning that performance will suffer.

> >>THP is not a magic wand and you can get slowness from memory fragmentation
> >>at any time.
> >
> >Right but there's a difference between "can get slowness when memory
> >is overcommitted" and "will get slowness even on a mostly idle box".
> 
> I would like to see the slowness on a real-world benchmark though.  I
> suspect in most scenarios it would not matter.

Weird.  Things like kernel build time are known to be measureably
improved by using THP.

> >>We should not limit ourselves due to kernel bugs.
> >
> >Why not?  Practically people do have to run this on some kernel,
> >we should not use kernel in a way that it can't support well.
> >Old firefox doing a ton of fsync commands and slowing
> >the box to a crawl comes to mind as another example of this.
> 
> Yes, and firefox doesn't say "no sorry can't do it" when running on such a
> kernel (which is much worse than more expensive TLB misses).
> 
> Paolo

kernel can't speed up fsync.  So IIRC instead, firefox switched to using
renames instead of fsync. IMHO QEMU should do the same, look for a
mechanism that kernel can support efficiently, instead of
insisting on using a feature that it can't.

-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10 11:35                     ` Michael S. Tsirkin
@ 2014-06-10 11:43                       ` Paolo Bonzini
  2014-06-10 11:48                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-10 11:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

Il 10/06/2014 13:35, Michael S. Tsirkin ha scritto:
>>>> Why?  For example it would be useful for testing on machines that you don't
>>>> have root for, and that do not have a hugetlbfs mount point.  For example
>>>> you could run the test case from the vhost-user's patches.
>>>
>>> Sounds useful, I guess we could allow this when running under qtest.
>>
>> Or TCG, or Xen.  At this point, why single out KVM?
>>
>> (Also, "--enable-kvm -mem-path /dev/shm" works on 2.0, and it would be a
>> regression in 2.1).
>
> It prints
>         fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>
> Correct?

Yes.

> I guess I agree then, hopefully the warning is enough.
> Maybe add an extra warning that performance will suffer.
>
>>>> THP is not a magic wand and you can get slowness from memory fragmentation
>>>> at any time.
>>>
>>> Right but there's a difference between "can get slowness when memory
>>> is overcommitted" and "will get slowness even on a mostly idle box".
>>
>> I would like to see the slowness on a real-world benchmark though.  I
>> suspect in most scenarios it would not matter.
>
> Weird.  Things like kernel build time are known to be measureably
> improved by using THP.

Even measurable speedups in most scenarios would not matter.  I don't 
care if a kernel compile takes 2 minutes vs. 110 seconds (for a 10% 
speedup), even though it's great that THP can speed up such a common task.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10 11:43                       ` Paolo Bonzini
@ 2014-06-10 11:48                         ` Michael S. Tsirkin
  2014-06-10 11:51                           ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-10 11:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

On Tue, Jun 10, 2014 at 01:43:27PM +0200, Paolo Bonzini wrote:
> Il 10/06/2014 13:35, Michael S. Tsirkin ha scritto:
> >>>>Why?  For example it would be useful for testing on machines that you don't
> >>>>have root for, and that do not have a hugetlbfs mount point.  For example
> >>>>you could run the test case from the vhost-user's patches.
> >>>
> >>>Sounds useful, I guess we could allow this when running under qtest.
> >>
> >>Or TCG, or Xen.  At this point, why single out KVM?
> >>
> >>(Also, "--enable-kvm -mem-path /dev/shm" works on 2.0, and it would be a
> >>regression in 2.1).
> >
> >It prints
> >        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> >
> >Correct?
> 
> Yes.
> 
> >I guess I agree then, hopefully the warning is enough.
> >Maybe add an extra warning that performance will suffer.
> >
> >>>>THP is not a magic wand and you can get slowness from memory fragmentation
> >>>>at any time.
> >>>
> >>>Right but there's a difference between "can get slowness when memory
> >>>is overcommitted" and "will get slowness even on a mostly idle box".
> >>
> >>I would like to see the slowness on a real-world benchmark though.  I
> >>suspect in most scenarios it would not matter.
> >
> >Weird.  Things like kernel build time are known to be measureably
> >improved by using THP.
> 
> Even measurable speedups in most scenarios would not matter.  I don't care
> if a kernel compile takes 2 minutes vs. 110 seconds (for a 10% speedup),
> even though it's great that THP can speed up such a common task.
> 
> Paolo

True. But I am not sure why would such a user play with vhost-user at all.
That one seems to mostly be about using aggressive polling
to drive down guest to guest latency.

-- 
MST

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend
  2014-06-10 11:48                         ` Michael S. Tsirkin
@ 2014-06-10 11:51                           ` Paolo Bonzini
  0 siblings, 0 replies; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-10 11:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Hu Tao, qemu-devel, Eduardo Habkost, Igor Mammedov

Il 10/06/2014 13:48, Michael S. Tsirkin ha scritto:
>> Even measurable speedups in most scenarios would not matter.  I don't care
>> if a kernel compile takes 2 minutes vs. 110 seconds (for a 10% speedup),
>> even though it's great that THP can speed up such a common task.
>
> True. But I am not sure why would such a user play with vhost-user at all.
> That one seems to mostly be about using aggressive polling
> to drive down guest to guest latency.

But then there is so much more you have to do to get the performance 
you're looking for, including using GB hugepages which needs hugetlbfs 
anyway.

Anyhow, since there is a warning and the behavior is the same as 2.0 the 
question is moot, I think.  Renaming memory-backend-file to 
memory-backend-hugetlbfs would suggest that there is a regression, which 
is not the case.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor Hu Tao
@ 2014-06-16 14:08   ` Eduardo Habkost
  2014-06-16 14:16     ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Eduardo Habkost @ 2014-06-16 14:08 UTC (permalink / raw)
  To: Hu Tao
  Cc: Michael S. Tsirkin, libvir-list, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Martin Kletzander, Yasunori Goto, Wanlong Gao

On Mon, Jun 09, 2014 at 06:25:09PM +0800, Hu Tao wrote:
> From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> Tested-by: Eduardo Habkost <ehabkost@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>

So, this is when the ability to use multiple "cpus" ranges on -numa is
finally enabled, right?

Is there some capability probing mechanism that can be used by
management to detect the new feature?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-16 14:08   ` Eduardo Habkost
@ 2014-06-16 14:16     ` Paolo Bonzini
  2014-06-16 14:23       ` [Qemu-devel] [libvirt] " Eric Blake
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-16 14:16 UTC (permalink / raw)
  To: Eduardo Habkost, Hu Tao
  Cc: Michael S. Tsirkin, libvir-list, qemu-devel, Igor Mammedov,
	Martin Kletzander, Yasunori Goto, Wanlong Gao

Il 16/06/2014 16:08, Eduardo Habkost ha scritto:
> Is there some capability probing mechanism that can be used by
> management to detect the new feature?

No, there isn't.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-16 14:16     ` Paolo Bonzini
@ 2014-06-16 14:23       ` Eric Blake
  2014-06-16 14:32         ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Eric Blake @ 2014-06-16 14:23 UTC (permalink / raw)
  To: Paolo Bonzini, Eduardo Habkost, Hu Tao
  Cc: libvir-list, Igor Mammedov, Martin Kletzander, qemu-devel,
	Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 382 bytes --]

On 06/16/2014 08:16 AM, Paolo Bonzini wrote:
> Il 16/06/2014 16:08, Eduardo Habkost ha scritto:
>> Is there some capability probing mechanism that can be used by
>> management to detect the new feature?
> 
> No, there isn't.

Not even query-command-line-options?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-16 14:23       ` [Qemu-devel] [libvirt] " Eric Blake
@ 2014-06-16 14:32         ` Paolo Bonzini
  2014-06-16 15:39           ` Eduardo Habkost
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-16 14:32 UTC (permalink / raw)
  To: Eric Blake, Eduardo Habkost, Hu Tao
  Cc: libvir-list, Igor Mammedov, Martin Kletzander, qemu-devel,
	Michael S. Tsirkin

Il 16/06/2014 16:23, Eric Blake ha scritto:
> On 06/16/2014 08:16 AM, Paolo Bonzini wrote:
>> Il 16/06/2014 16:08, Eduardo Habkost ha scritto:
>>> Is there some capability probing mechanism that can be used by
>>> management to detect the new feature?
>>
>> No, there isn't.
>
> Not even query-command-line-options?

No, this is a (backwards-compatible) extension to how the option is parsed.

I suppose with QAPI introspection you could check if the QAPI type for 
-numa is defined...

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-16 14:32         ` Paolo Bonzini
@ 2014-06-16 15:39           ` Eduardo Habkost
  2014-06-16 15:46             ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Eduardo Habkost @ 2014-06-16 15:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Michael S. Tsirkin, libvir-list, Hu Tao, qemu-devel,
	Martin Kletzander, Igor Mammedov

On Mon, Jun 16, 2014 at 04:32:09PM +0200, Paolo Bonzini wrote:
> Il 16/06/2014 16:23, Eric Blake ha scritto:
> >On 06/16/2014 08:16 AM, Paolo Bonzini wrote:
> >>Il 16/06/2014 16:08, Eduardo Habkost ha scritto:
> >>>Is there some capability probing mechanism that can be used by
> >>>management to detect the new feature?
> >>
> >>No, there isn't.
> >
> >Not even query-command-line-options?
> 
> No, this is a (backwards-compatible) extension to how the option is parsed.

But numa started appearing on query-command-line-options only after this
patch got applied. It is not the semantics I would expect from
query-command-line-options, but it is a hint that we have a QEMU version
with the new numa option handling code.

> 
> I suppose with QAPI introspection you could check if the QAPI type for -numa
> is defined...

Does this exist, or is it just an idea?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-16 15:39           ` Eduardo Habkost
@ 2014-06-16 15:46             ` Paolo Bonzini
  2014-06-16 16:05               ` Eric Blake
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2014-06-16 15:46 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, libvir-list, Hu Tao, qemu-devel,
	Martin Kletzander, Igor Mammedov

Il 16/06/2014 17:39, Eduardo Habkost ha scritto:
> > No, this is a (backwards-compatible) extension to how the option is parsed.
>
> But numa started appearing on query-command-line-options only after this
> patch got applied. It is not the semantics I would expect from
> query-command-line-options, but it is a hint that we have a QEMU version
> with the new numa option handling code.

Oh, then indeed it's possible.  I had forgotten that -numa was using ad 
hoc parsing code, not QemuOpts.

>> > I suppose with QAPI introspection you could check if the QAPI type for -numa
>> > is defined...
> Does this exist, or is it just an idea?

Amos was working on it.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [libvirt] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor
  2014-06-16 15:46             ` Paolo Bonzini
@ 2014-06-16 16:05               ` Eric Blake
  0 siblings, 0 replies; 92+ messages in thread
From: Eric Blake @ 2014-06-16 16:05 UTC (permalink / raw)
  To: Paolo Bonzini, Eduardo Habkost
  Cc: Michael S. Tsirkin, libvir-list, Hu Tao, qemu-devel,
	Martin Kletzander, Igor Mammedov

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

On 06/16/2014 09:46 AM, Paolo Bonzini wrote:
> Il 16/06/2014 17:39, Eduardo Habkost ha scritto:
>> > No, this is a (backwards-compatible) extension to how the option is
>> parsed.
>>
>> But numa started appearing on query-command-line-options only after this
>> patch got applied. It is not the semantics I would expect from
>> query-command-line-options, but it is a hint that we have a QEMU version
>> with the new numa option handling code.
> 
> Oh, then indeed it's possible.  I had forgotten that -numa was using ad
> hoc parsing code, not QemuOpts.

Then we have our witness, even without waiting for introspection :)

> 
>>> > I suppose with QAPI introspection you could check if the QAPI type
>>> for -numa
>>> > is defined...
>> Does this exist, or is it just an idea?
> 
> Amos was working on it.

Although given that soft freeze is tomorrow, it looks very unlikely that
we'll have it in 2.1.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c
  2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c Hu Tao
@ 2014-06-18 19:14   ` Michael S. Tsirkin
  2014-06-19  0:43     ` Hu Tao
  2014-06-20  3:26     ` Hu Tao
  0 siblings, 2 replies; 92+ messages in thread
From: Michael S. Tsirkin @ 2014-06-18 19:14 UTC (permalink / raw)
  To: Hu Tao
  Cc: Yasunori Goto, Igor Mammedov, qemu-devel, Eduardo Habkost, Paolo Bonzini

On Mon, Jun 09, 2014 at 06:25:21PM +0800, Hu Tao wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> So that backends can use it.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>

OK this breaks mingw build because you are moving
code to posix file and use it unconditionally on all platforms.
Pls setup mingw build and fix pci branch up, send me fix.

> ---
>  exec.c               | 44 +------------------------------
>  include/qemu/osdep.h |  2 ++
>  util/oslib-posix.c   | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 76 insertions(+), 43 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 36301e2..b640425 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1011,13 +1011,6 @@ static long gethugepagesize(const char *path)
>      return fs.f_bsize;
>  }
>  
> -static sigjmp_buf sigjump;
> -
> -static void sigbus_handler(int signal)
> -{
> -    siglongjmp(sigjump, 1);
> -}
> -
>  static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              const char *path,
> @@ -1087,42 +1080,7 @@ static void *file_ram_alloc(RAMBlock *block,
>      }
>  
>      if (mem_prealloc) {
> -        int ret, i;
> -        struct sigaction act, oldact;
> -        sigset_t set, oldset;
> -
> -        memset(&act, 0, sizeof(act));
> -        act.sa_handler = &sigbus_handler;
> -        act.sa_flags = 0;
> -
> -        ret = sigaction(SIGBUS, &act, &oldact);
> -        if (ret) {
> -            perror("file_ram_alloc: failed to install signal handler");
> -            exit(1);
> -        }
> -
> -        /* unblock SIGBUS */
> -        sigemptyset(&set);
> -        sigaddset(&set, SIGBUS);
> -        pthread_sigmask(SIG_UNBLOCK, &set, &oldset);
> -
> -        if (sigsetjmp(sigjump, 1)) {
> -            fprintf(stderr, "file_ram_alloc: failed to preallocate pages\n");
> -            exit(1);
> -        }
> -
> -        /* MAP_POPULATE silently ignores failures */
> -        for (i = 0; i < (memory/hpagesize); i++) {
> -            memset(area + (hpagesize*i), 0, 1);
> -        }
> -
> -        ret = sigaction(SIGBUS, &oldact, NULL);
> -        if (ret) {
> -            perror("file_ram_alloc: failed to reinstall signal handler");
> -            exit(1);
> -        }
> -
> -        pthread_sigmask(SIG_SETMASK, &oldset, NULL);
> +        os_mem_prealloc(fd, area, memory);
>      }
>  
>      block->fd = fd;
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index ffb2966..9c1a119 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -251,4 +251,6 @@ void qemu_init_auxval(char **envp);
>  
>  void qemu_set_tty_echo(int fd, bool echo);
>  
> +void os_mem_prealloc(int fd, char *area, size_t sz);
> +
>  #endif
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 8e9c770..1524ead 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -46,6 +46,7 @@ extern int daemon(int, int);
>  #else
>  #  define QEMU_VMALLOC_ALIGN getpagesize()
>  #endif
> +#define HUGETLBFS_MAGIC       0x958458f6
>  
>  #include <termios.h>
>  #include <unistd.h>
> @@ -58,9 +59,12 @@ extern int daemon(int, int);
>  #include "qemu/sockets.h"
>  #include <sys/mman.h>
>  #include <libgen.h>
> +#include <setjmp.h>
> +#include <sys/signal.h>
>  
>  #ifdef CONFIG_LINUX
>  #include <sys/syscall.h>
> +#include <sys/vfs.h>
>  #endif
>  
>  #ifdef __FreeBSD__
> @@ -332,3 +336,72 @@ char *qemu_get_exec_dir(void)
>  {
>      return g_strdup(exec_dir);
>  }
> +
> +static sigjmp_buf sigjump;
> +
> +static void sigbus_handler(int signal)
> +{
> +    siglongjmp(sigjump, 1);
> +}
> +
> +static size_t fd_getpagesize(int fd)
> +{
> +#ifdef CONFIG_LINUX
> +    struct statfs fs;
> +    int ret;
> +
> +    if (fd != -1) {
> +        do {
> +            ret = fstatfs(fd, &fs);
> +        } while (ret != 0 && errno == EINTR);
> +
> +        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> +            return fs.f_bsize;
> +        }
> +    }
> +#endif
> +
> +    return getpagesize();
> +}
> +
> +void os_mem_prealloc(int fd, char *area, size_t memory)
> +{
> +    int ret, i;
> +    struct sigaction act, oldact;
> +    sigset_t set, oldset;
> +    size_t hpagesize = fd_getpagesize(fd);
> +
> +    memset(&act, 0, sizeof(act));
> +    act.sa_handler = &sigbus_handler;
> +    act.sa_flags = 0;
> +
> +    ret = sigaction(SIGBUS, &act, &oldact);
> +    if (ret) {
> +        perror("os_mem_prealloc: failed to install signal handler");
> +        exit(1);
> +    }
> +
> +    /* unblock SIGBUS */
> +    sigemptyset(&set);
> +    sigaddset(&set, SIGBUS);
> +    pthread_sigmask(SIG_UNBLOCK, &set, &oldset);
> +
> +    if (sigsetjmp(sigjump, 1)) {
> +        fprintf(stderr, "os_mem_prealloc: failed to preallocate pages\n");
> +        exit(1);
> +    }
> +
> +    /* MAP_POPULATE silently ignores failures */
> +    memory = (memory + hpagesize - 1) & -hpagesize;
> +    for (i = 0; i < (memory/hpagesize); i++) {
> +        memset(area + (hpagesize*i), 0, 1);
> +    }
> +
> +    ret = sigaction(SIGBUS, &oldact, NULL);
> +    if (ret) {
> +        perror("os_mem_prealloc: failed to reinstall signal handler");
> +        exit(1);
> +    }
> +
> +    pthread_sigmask(SIG_SETMASK, &oldset, NULL);
> +}
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c
  2014-06-18 19:14   ` Michael S. Tsirkin
@ 2014-06-19  0:43     ` Hu Tao
  2014-06-20  3:26     ` Hu Tao
  1 sibling, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-19  0:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Igor Mammedov, qemu-devel, Eduardo Habkost, Paolo Bonzini

On Wed, Jun 18, 2014 at 10:14:23PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 09, 2014 at 06:25:21PM +0800, Hu Tao wrote:
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > So that backends can use it.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> 
> OK this breaks mingw build because you are moving
> code to posix file and use it unconditionally on all platforms.
> Pls setup mingw build and fix pci branch up, send me fix.

Sure.

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c
  2014-06-18 19:14   ` Michael S. Tsirkin
  2014-06-19  0:43     ` Hu Tao
@ 2014-06-20  3:26     ` Hu Tao
  1 sibling, 0 replies; 92+ messages in thread
From: Hu Tao @ 2014-06-20  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yasunori Goto, Igor Mammedov, qemu-devel, Eduardo Habkost, Paolo Bonzini

On Wed, Jun 18, 2014 at 10:14:23PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jun 09, 2014 at 06:25:21PM +0800, Hu Tao wrote:
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > So that backends can use it.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
> 
> OK this breaks mingw build because you are moving
> code to posix file and use it unconditionally on all platforms.
> Pls setup mingw build and fix pci branch up, send me fix.

Builds with today's updates. Is there still problem with this patch?

Hu

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2014-06-20  3:28 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-09 10:25 [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 01/29] NUMA: move numa related code to new file numa.c Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 02/29] NUMA: check if the total numa memory size is equal to ram_size Hu Tao
2014-06-09 23:02   ` Eric Blake
2014-06-10  2:29     ` Hu Tao
2014-06-10  2:36       ` Eric Blake
2014-06-10  2:52         ` Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 03/29] NUMA: Add numa_info structure to contain numa nodes info Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 04/29] NUMA: convert -numa option to use OptsVisitor Hu Tao
2014-06-16 14:08   ` Eduardo Habkost
2014-06-16 14:16     ` Paolo Bonzini
2014-06-16 14:23       ` [Qemu-devel] [libvirt] " Eric Blake
2014-06-16 14:32         ` Paolo Bonzini
2014-06-16 15:39           ` Eduardo Habkost
2014-06-16 15:46             ` Paolo Bonzini
2014-06-16 16:05               ` Eric Blake
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 05/29] NUMA: expand MAX_NODES from 64 to 128 Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 06/29] man: improve -numa doc Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 07/29] vl: redo -object parsing Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 08/29] qmp: improve error reporting for -object and object-add Hu Tao
2014-06-09 15:57   ` Igor Mammedov
2014-06-10  2:07     ` Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 09/29] pc: pass MachineState to pc_memory_init Hu Tao
2014-06-09 13:14   ` Igor Mammedov
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 10/29] numa: introduce memory_region_allocate_system_memory Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 11/29] hostmem: separate allocation from UserCreatable complete method Hu Tao
2014-06-09 10:47   ` Igor Mammedov
2014-06-10  1:55     ` Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node,memdev= option Hu Tao
2014-06-09 17:22   ` [Qemu-devel] [PATCH v4 12/29] numa: add -numa node, memdev= option Eric Blake
2014-06-10  2:23     ` Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 13/29] memory: reorganize file-based allocation Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 14/29] memory: move mem_path handling to memory_region_allocate_system_memory Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 15/29] memory: add error propagation to file-based RAM allocation Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 16/29] memory: move preallocation code out of exec.c Hu Tao
2014-06-18 19:14   ` Michael S. Tsirkin
2014-06-19  0:43     ` Hu Tao
2014-06-20  3:26     ` Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 17/29] memory: move RAM_PREALLOC_MASK to exec.c, rename Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 18/29] hostmem: add file-based HostMemoryBackend Hu Tao
2014-06-09 11:32   ` Igor Mammedov
2014-06-09 11:35     ` Michael S. Tsirkin
2014-06-09 12:06       ` Igor Mammedov
2014-06-10  2:00     ` Hu Tao
2014-06-10  5:09       ` Paolo Bonzini
2014-06-10  8:30         ` Hu Tao
2014-06-10  8:56           ` Paolo Bonzini
2014-06-10  9:21             ` Hu Tao
2014-06-10  9:59             ` Michael S. Tsirkin
2014-06-10 11:12               ` Paolo Bonzini
2014-06-10 11:23                 ` Michael S. Tsirkin
2014-06-10 11:29                   ` Paolo Bonzini
2014-06-10 11:35                     ` Michael S. Tsirkin
2014-06-10 11:43                       ` Paolo Bonzini
2014-06-10 11:48                         ` Michael S. Tsirkin
2014-06-10 11:51                           ` Paolo Bonzini
2014-06-10  9:07           ` Igor Mammedov
2014-06-10  9:54             ` Michael S. Tsirkin
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 19/29] hostmem: add merge and dump properties Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 20/29] hostmem: allow preallocation of any memory region Hu Tao
2014-06-09 12:28   ` Igor Mammedov
2014-06-09 12:32     ` Paolo Bonzini
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 21/29] hostmem: add property to map memory with MAP_SHARED Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 22/29] configure: add Linux libnuma detection Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 23/29] hostmem: add properties for NUMA memory policy Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 24/29] Introduce signed range Hu Tao
2014-06-09 10:42   ` Peter Maydell
2014-06-09 10:59     ` Michael S. Tsirkin
2014-06-10  6:51       ` Hu Tao
2014-06-10  9:50         ` Michael S. Tsirkin
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 25/29] qapi: make string input visitor parse int list Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 26/29] qapi: make string output " Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 27/29] qom: introduce object_property_get_enum and object_property_get_uint16List Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 28/29] qmp: add query-memdev Hu Tao
2014-06-09 12:36   ` Igor Mammedov
2014-06-09 12:58     ` Paolo Bonzini
2014-06-09 13:32       ` Igor Mammedov
2014-06-09 13:40         ` Paolo Bonzini
2014-06-09 14:08           ` Igor Mammedov
2014-06-09 17:15           ` Eric Blake
2014-06-09 17:24   ` Eric Blake
2014-06-10  2:25     ` Hu Tao
2014-06-09 10:25 ` [Qemu-devel] [PATCH v4 29/29] hmp: add info memdev Hu Tao
2014-06-09 10:30 ` [Qemu-devel] [PATCH v4 00/29] NUMA series and hostmem improvements Michael S. Tsirkin
2014-06-09 11:40   ` Michael S. Tsirkin
2014-06-10  1:55     ` Hu Tao
2014-06-10  9:51     ` Hu Tao
2014-06-10  9:56       ` Michael S. Tsirkin
2014-06-10  9:57         ` Hu Tao
2014-06-10 10:19         ` Hu Tao
2014-06-10 10:27           ` Michael S. Tsirkin
2014-06-10 11:09             ` Hu Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.